HTML Data Not Available On RDP/Terminal Services Clipboard - c#

I'm parsing HTML data that is currently available on the clipboard (Clipboard.GetDataObject). However, I was surprised to find that when copying data from a webpage in an RDP session, the HTML format is not available and is instead replaced with "OEMText".
Can anyone discribe why this occurs or a way around it?

I think RDP just filters that out. It does pass RTF though, and some OLE formats.
However, "OEMText" is not a replacement, it's always available whenever plain text is present. You typically have Text, OEMText, and UnicodeText.

Related

Winforms Webbrowser Control and Character Encoding

I am trying to use the Winforms WebBrowser control to display a webpage that can be in many different languages. In one example, I have a HTML file that has characters in Farsi (examples are not limited to this, however - the file can have Mongolian, Japanese, or any other language).
When I try to Navigate() to this file in the WebBrowser control, it displays in a bunch of garbled characters (eg. ÓÇÚÊ). If I display the same file in Firefox, however, it will correctly display with all the the expected characters (eg. ساعت). I can also load the raw HTML file in Notepad++ and it seems to automatically detect the character set/encoding that is being used.
I have read numerous threads that talk about setting the WebBrowser encoding like so:
webBrowser.Document.Encoding = "UTF-8"
however given a set of HTML files, I have no way of telling what language it is written in and therefore no way to know the encoding. I can also confirm there is no "meta" tag within the HTML source that specifies any encoding.
Is there some magic going on behind the scenes of Firefox and Notepad++ to detect the correct character set and if so, how? Can someone tell me how I can get the WebBrowser control to behave in a similar way?
As #Bizan pointed out, in the absence of meta tags to describe the encoding, modern browsers seem to use heuristics to determine the language encoding.
Since the WebBrowser control is based on dated Internet Explorer technology, it appears there is no in-built logic to do these heuristics automatically.
The solutions are:
Implement your own heuristics method and then set the encoding manually.
Use the WebView control (as mentioned by #PanagiotisKanavos) which appears to be use the latest Edge browser. I have tested the pages in Edge and they work correctly. The minimum requirement for WebView is Windows 10 however, which will rule out any use-cases where you need your software to run on earlier versions of Windows.

Create PDF from HTML form results in C#

I have a project where I need to create an HTML form (no problem) and then create a PDF file from the results using C#.
I have done this before in PHP using FPDF but this one needs to be C#. Ideally I want to put the code into a user control and then stick it in an Umbraco website.
Can anyone recommend a good way to do this? PDF doesn't need to be fancy, it'll just display text, we aim to create a generic purchase order based on what the customer wants from the form, which can then be emailed to them to print off on headed paper.
Thanks
There are a couple of recent problems with iTextSharp. The most annoying is that in the latest version they've deprecated the HTML parser. So now everything has to work through the XMLWorkerHelper singleton and parses through ParseXHtml. I find this a real pain, since HTML pages which aren't well formed appear fine on browser, parse OK in the old method and now crash out with an exception. So it necessitates an extra step to make sure your HTML is well formed (as XHTML) first. If you are generating your HTML from an ASPX page, then using Server.Execute() to get the stream, then this might be useful to you for iTextSharp:
http://jwcooney.com/2012/12/30/generate-a-pdf-from-an-asp-net-web-page-using-the-itextsharp-xmlworker-namespace/
Be mindful that iTextSharp has a distinct lack of any decent documentation of the modern changes (being mindful that the Java iText documents don't translate perfectly to C#), it makes the learning curve far too long and steep for any practical use in short spaces of time. I've basically given up on that platform, though may just create a baseline system to get something working lean whilst I then learn another framework.
As a result, I'm looking at PDFizer and PDFSharp libraries. If I have some success, I'll report back.
here is a library for converting HTML to PDF
http://pdfcrowd.com/web-html-to-pdf-net/
I like the PDFsharp library. Not sure how it would work for your needs, though.

Windows Forms WebBrowser control: DocumentText vs Document.Body.OuterHtml

I am trying to obtain html from the WebBrowser control, but it must include the value attributes of input elements on the page as well.
If I use webBrowser.DocumentText, I get the full HTML of the page as it was initially loaded. The input field values are not included.
If I use webBrowser.Document.Body.OuterHtml, I get the values, but not the other contents of (), which I need so I can get the stylesheet links, etc.
Is there a clean dependable way to obtain the full HTML of the DOM in its current state from the WebBrowser? I am passing the HTML to a library for it to be rendered to PDF, so suggestions for programmatically saving from the WebBrowser control to PDF will also be appreciated.
Thanks
There are some undocumented ways (changing registry, undocumented dll export) to print the document to XPS or PDF printers without parsing the page, that is, if your can afford to roll out required printer drivers to your customer's network.
If you want to parse the web page, documentElement.outerHTML should give you the full canonicalized document, but not the linked image, script or stylesheet files. You need to parse the page, enumerate elements and check element types and get resource urls before digging the WinInet cache or downloading for additional resources. To get the documentElement property, you need to cast HtmlDocument.DomDocument to mshtml.IHTMLDocument2 if you use Windows Forms, or cast WebBrowser.Document to mshtml.IHTMLDocument2 if you use WPF. If you need to wait before the Ajax code finishes execution, starting a timer when the DocumentComplete event is raised.
At this stage, I would parse the HTML DOM and get the necessary data in order to generate a report via a template, so you always have the option to generate other formats supported by the report engine, such as Microsoft Word. Very rarely I need to render the HTML as parsed, for example, printing a long table without adding customized header and footer on each page. That said, you can check Convert HTML to PDF in .NET and test which one of the suggested software/components works best with your target web site, if you do not have long tables.

How to create HTML text from C# application?

I have C# application that must store some information into MS SQL that
would be later sent to email with DB Mail.
Within C# application I have a class with several properties and I need to use it to generate email text. So what I would like is set up a template with placeholders for variables. I need to create text as HTML and plain text.
What tools, libraries would you
recommend for HTML?
Is String.Format() best alternative to
work with plain text?
I do this in other applications by having the e-mail body available somewhere (SharePoint list, data table) already in the right format, but with named placeholders, corresponding to the information you have in your application.
Then sending the e-mail means replacing the placeholder with the right information. StringBuilder.Replace works fine.
I would say the most important thing you need to decide is when to encode the text. If you are emailing text supplied byusers, you will want to HtmlEncode it before including it in an email. It's probably ok to store it "as recieved" in the data base as long as every consumer encodes it before using it. I typically do this in the data layer that "gets" data from the data base.

include image in XML file

I want to be able to send an image from HTML page to a XML file using C#.
The image should be sent along with some text, the problem is how do I store the image in the XML file efficiently, so it can be sent over the wire and how do I store the position of the image on the HTML page, so it can be restored later in the original position?
I was originally going to keep a hyperlink in my XML file to an image and load it that way on the HTML page, using ASP.NET, but I wondered if there's better ways?
EDIT:
So how do I keep the coordinates of the picture in the page in relation to all other objects. What ways can I save it to the XML file and how do I get the coordinates? Using ASP.NET, HTML and or JavaScript?
You can do it but its a really bad idea. If this is in an ASP.Net context then the hyperlink method sounds much more reasonable.
However, if you insist on encoding images in XML, then have a look at base64 encoding or ASCII 85.
There are binary xml specifications out there for sending of such data via xml, but these don't cater to your specific "position of the image" issue.
I would simply keep the image URL and use that.
I wouldn't recommend doing this (better add a link on the serverside that you can reference a < img />, like http://yourhost/someGuid maybe, and leave the serialization to your webserver and the browser), but base64 would be an easy option for your usecase.
Follow the KISS principle.
Do what you were originally going to do, and keep the link in the file.
You could encode the image as a data URI, which most browsers seem to support now. It's still base 64, and so less compact than a separate binary file, but it is a standard way of inlining small images.

Categories