Winforms Webbrowser Control and Character Encoding

Winforms Webbrowser Control and Character Encoding - c#

I am trying to use the Winforms WebBrowser control to display a webpage that can be in many different languages. In one example, I have a HTML file that has characters in Farsi (examples are not limited to this, however - the file can have Mongolian, Japanese, or any other language).
When I try to Navigate() to this file in the WebBrowser control, it displays in a bunch of garbled characters (eg. ÓÇÚÊ). If I display the same file in Firefox, however, it will correctly display with all the the expected characters (eg. ساعت). I can also load the raw HTML file in Notepad++ and it seems to automatically detect the character set/encoding that is being used.
I have read numerous threads that talk about setting the WebBrowser encoding like so:
webBrowser.Document.Encoding = "UTF-8"
however given a set of HTML files, I have no way of telling what language it is written in and therefore no way to know the encoding. I can also confirm there is no "meta" tag within the HTML source that specifies any encoding.
Is there some magic going on behind the scenes of Firefox and Notepad++ to detect the correct character set and if so, how? Can someone tell me how I can get the WebBrowser control to behave in a similar way?

As #Bizan pointed out, in the absence of meta tags to describe the encoding, modern browsers seem to use heuristics to determine the language encoding.
Since the WebBrowser control is based on dated Internet Explorer technology, it appears there is no in-built logic to do these heuristics automatically.
The solutions are:
Implement your own heuristics method and then set the encoding manually.
Use the WebView control (as mentioned by #PanagiotisKanavos) which appears to be use the latest Edge browser. I have tested the pages in Edge and they work correctly. The minimum requirement for WebView is Windows 10 however, which will rule out any use-cases where you need your software to run on earlier versions of Windows.

Related

Display MSWord file content in any browser

I want to display content of word file in browser same like we display pdf file in browser. I don't want any plugin because if I use plugin I have to install for all browser. I want just one solution which works in all browser.
I have searched on google, but I found all link which directly download word file and open it.
Currently I am using object tag for displaying pdf file but it is not working for word file. It is showing message: The plug-in is not supported.

Using a browser plug-in (such as the free Word Viewer) is by far the easiest method, and arguably the most correct - however, there are some alternatives if you really don't want to do this:
Convert the Word document to another format (e.g. HTML/PDF) on-the-fly before the response is sent. For Word 97-2003 documents, you can do this with VSTO/Automation. For Word 2007+ documents, you can use the OpenXML SDK (although you will have to write the conversion algorithm yourself).
Use an XSL stylesheet to transform the Word markup (docx) into html/css. You can do this server-side or, potentially, with client-side scripting (JavaScript). Some useful resources here and here.

Great question. In principle, browsers only really tend to support viewing websites (e.g. html). Most, however, also support viewing PDFs, and, as you've correctly identified, you could use plugins to extend the behaviour. Crucially, though, some browsers provide document viewing with a javascript-based viewer.
I wasn't aware of it before you asked, but there are apparently javascript implementations of non-PDF document readers--for example, ViewerJS--that seem to directly support .odt. With a little digging, you might be able to find an implementation/plugin for a javascript viewer that supports .docx. However, I can't recommend one from personal experience at the moment. I would recommend searching for javascript document viewers though.

Printing HTML in A4 and A5 paper format

Apparently a question about Printing HTML in A4 and A5 format is the exact same thing as asking about Printing RTF formatted text, so here's another question with the details a bit more laid out.
I am developing a C# (WinForms) program that should print orders. This program is written for the .NET 2.0 framework. The program has to be able to print in both A4 and A5 paper formats, without any user interferrance (no dialogs). The printed order should look exactly like it does in a webbrowser. You can check out this sample if you like to see what it should look like. As you can see, very basic stuff.
Here's what I have tried so far:
Asking a question on StackOverflow, got closed for being a duplicate. Never found the duplicate.
Print with the WebBrowser element. Cannot print A5, so that was no solution.
Put the HTML in RichTextBox and print that. It worked, but it didnt show the HTML like it's viewed in a webbrowser, which is a requirement. Code can be found here.
I've looked into converting to PDF before printing, but that is either expensive or just impossible to use (for me, as a programmer with little C# experience). These tools usually rely on software being installed on the users' computer (like Acrobat Reader for printing), which I'm trying to avoid.
Viewed about every relevant link in Google for at least 13 pages, no luck. I've been at this for about 2 and a half days now.
If someone has a (free) better way to print formatted HTML like it's viewed in the browser without user interferrence or external dependancies, please share. I really need this to work.
Please don't close this question, believe me when I say I actively searched for a solution or article that describes my problem. Some were relevant, but did not solve the problem that I'm having. I also used the advanced searched on this website, with no luck.
Thank you for taking the time to read this.
Note: When I say I never found the duplicate, I mean I never found the article that literally describes my problem.

You're going to have to bite the bullet and use a PDF library, there is no way you are going to be able to fully control the end printed result from HTML with so many different users, browsers and printers. I did printing to A4 for an internal business application, we had a very limited user base (maybe 5) and all printing to a specific printer, even then it was flaky at best. I don't believe there is a way to distinguish between A4 and A5 without user interference, i.e. they have to select paper size from the print options.
Suggest you take a look at PDF Sharp, Sharp PDF and iTextSharp , they are all OpenSource.
This can all be done from the server, i.e. nothing needs to be installed on the users machine. It should be possible to select the paper size using this (I am not 100% sure), but what the end user prints it on ultimately up to them.

I have done this successfully using the PrintHelper method described here. I used it to enable users in multiple locations to print barcode labels from a CMS system. The labels had strict requirements regarding layout, font size and positioning and all this was managed using HTML and CSS.
The PrinterHelper class works by passing it a webcontrol populated with the print data (I used a repeater to allow multiple labels) & the helper class builds a page in memory and opens the print dialog. You register your CSS like so:
pg.ClientScript.RegisterStartupScript(pg.GetType(),"LabelCSS","<link href=\"Styles/labelPrint.css\" rel=\"stylesheet\" type=\"text/css\" />");
One caveat though, it only worked with Firefox, and some settings had to be changed e.g. margins set to zero, but as the CMS required the use of that browser it wasn't a problem.

ActiveReports interface to RightFax

I'm trying send ActiveReports formatted reports to my company's RightFax server, and pre-set some of the fax fields, like FAX number, sender, and recipient. The RightFax documentation says that the document must include Embedded Codes to set these values, e.g. <TOFAXNUM:12345556789><TONAME:Recipient><FROMNAME:Sender>. I create a TextBox or Label in ActiveReports that contains this text. But the values are not set when RightFax receives the document and brings up the RightFax client UI. The Embedded Codes remain in the fax image. I have the RightFax printer driver set to HP LaserJet 4. I'm developing in C#, using Visual Studio 2010 Professional.
One suggestion on the web is to make sure the Embedded Codes are set in Courier or another "printer font". However, Visual Studio does not have "Courier" or "Times Roman", only MS true type versions of these standard fonts, "Courier New" and "Times New Roman".
This method of sending faxes is working with older software, that doesn't use ActiveReports, on another machine using the same RightFax server.
Any experience you can share interfacing ActiveReports to RightFax would be most appreciated.
Thanks,
Gregg Lobdell

You could accomplish your task by controlling the whole printing process and sending the escape sequence using Windows API. I assume you using ActiveReports 6 or section-based reports of ActiveReports 7.
In more detail:
create your own PrintDocument and define PrintPage handler
in the PrintPage handler send escape sequence to printer using Escape winapi call (see the example on CodeProject)
render the page itself by calling the Page.Draw()

As long as you use a true type font, the printer should recognize that font and be able to "read" the text in it. Only old bitmap fonts are fonts that might be not readable by the printer. Usually commonly used TT fonts on windows like "Courier New" or "Times New Roman", are already in the printer so they won't even be downloaded.
However, RightFax does have some documentation on escape codes here, so you might want to try using escape codes with ActiveReports. Also, here is an example of using the SystemPrinter object in ActiveReports6 to send escape codes directly to the printer without using any special API. You might try using that code and replacing the escape code there with ones that RightFax understands.
Finally, ActiveReports essentially prints by getting a graphics from the printer and drawing on it. Textboxes are real text drawn with appropriate text commands (i.e. text is not rendered as bitmaps). This is a normal way of printing in modern windows so any printer should see the text as normal text. You should be abel to see the same exact results by writing your own simple printing code in .NET and sending it to the RightFax driver. If it works there it will work in ActiveReports.
If it isn't working, and the escape code trick above won't work, I think contacting RightFax and asking them for insight into how to print to their driver from a .NET application would be the next logical step.
Hope this helps!

Creating PDFs Online

We are using Report Definition laguage (RDL) templates to define various reports in one of our Sharepoint applications. These reports are (then) saved as PDFs into various SharePoint Document Library's. One report in-particular renders, but is considered to be "failing" due to the styling needs of the report. So it appears RDL only understand "very simple" HTML.
For Example:
Trademark characters are not rendering as superscript (they render as normal text instead)
The ability to assign Line Height fails
The ability to assign Word Spacing fails (so printers "leading" requirements fail)
Both of these point to various marked Microsoft limitation for RDL's to interprint various HTML...of which we are now aware.
So...
I need a better tool...and we are scratching our heads on this one!
QUESTION:
What tools take-in HTML, understand CSS (well!) and can generate PDFs from C-Sharp objects?
Please keep in-mind I need the to PDF generator tools you recommend (below) to understand CSS and HTML.
NOTE:
I looked at the various other StackEchange sites to see if there is a better forum for this particular question, but this one was the only one that seemed to fit-the-bill. If you are a mediator, and feel this question is mis-placed, please feel free to move this question.

This HTML to PDF converter has the most accurate conversion of a complex html/css page. There is also a demo to try the conversion with your html

Maybe you can give Amyuni WebkitPDF a try. It is a Free component for converting HTML+CSS into PDF files. From the home page:
Directly convert HTML files into PDF without the use of a web browser or a printer driver
Convert HTML files into XAML/XPS for rendering within Silverlight
Integrate and deploy the HTML conversion feature within your applications
Generate either a single continuous PDF page or split the HTML into multiple PDF pages
Amyuni WebkitPDF is distributed as a library with a sample application, and sample code for C++ and C#.
Disclaimer: I currently work as software developer at Amyuni Technologies.

I only know a workaround for the "leading space" issue. This example "leads" the value with 10 spaces:
=space(10) & Fields!FieldName.Value
This should work for any renderer, I'll update this if I come around other tricks.

Have a look at Aspose.Pdf for .NET: http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx

Converting pdf to text

I need to create a C# or C++ (MFC) application that converts pdf files to txt. I need not only to convert, but remove headers, footers, some garbage characters on the left margin etc. Thus the application shold allow the user to set page margins to cut off what is not needed. I actually have already created such an application using xpdf, but it gives me some problems when I am trying to insert custom tags into the extracted text to preserve italics and bold. Maybe somebody could suggest something useful?
Thanks.

There are shareware and freeware utilities out there. Try fetching their source code, or perhaps use them the way they are.
A public version of the PDF specification can be found here: Adobe PDF Specification
PDF Shareware readers can be found: PDF Reader source code # SourceForge

Please look at Podofo. It's a LGPL-licensed library that has many powerful editing features. One of it's examples, txt2pdf IIRC, is a good start: it shows basic text-extraction; From there you can check if pre (in pdf engine) or post (in text) filtering suffices to your goals. I didn't get to use Pdf Hummus, but it's supposed to have these capabilities too, although it's less straightforward.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.