I've got a WebView with some HTML content which I want to convert into RTF. I've looked at the RTF conversion functions out there and they all look a little flaky to be honest. So my idea is to copy content from the WebView into a RichEditBox, and save to RTF from there.
I've seen this example numerous times.
WebBrowser1.Document.ExecCommand("SelectAll", false, null);
WebBrowser1.Document.ExecCommand("Copy", false, null);
Unfortunately, WinRT's WebView control doesn't have a Document property, so I can't do this
Is there any way to pull the content from the control? To be clear, I don't want the HTML itself - I can get that already using
InvokeScript("eval", new string[] { "document.getElementById('editor').innerHTML;" });
What I want is the actual rendered HTML - the same as if I were to select everything in my WebView, press CTRL+C and then paste it into wordpad.
This is part of a series of questions I asked in trying to accomplish a bigger task - converting HTML to RTF in a Windows Store App.
I'm delighted to report that the above can be done. I finally figured out how to do it, using DataPackage - normally used for sharing content between apps.
First, this javascript function must exist in the HTML loaded in the webview.
function select_body() {
var range = document.body.createTextRange();
range.select();
}
Next, you'll need to add using Windows.ApplicationModel.DataTransfer; to the top of your document. Not enough StackOverflow answers mention the namespaces used. I always have to go hunting for them.
Here's the code that does the magic:
// call the select_body function to select the body of our document
MyWebView.InvokeScript("select_body", null);
// capture a DataPackage object
DataPackage p = await MyWebView.CaptureSelectedContentToDataPackageAsync();
// extract the RTF content from the DataPackage
string RTF = await p.GetView().GetRtfAsync();
// SetText of the RichEditBox to our RTF string
MyRichEditBox.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, RTF);
I've spent about 2 weeks trying to get this to work. Its such a relief to finally discover I don't have to manually encode the file to RTF. Now if I can just get it to work the other way around, I'll be ecstatic. Not essential to the app I'm building, but it would be a lovely feature.
UPDATE
In retrospect you probably don't need to have the function in the HTML, you could probably get away with this (though I haven't tested):
MyWebView.InvokeScript("execScript", new string[] {"document.body.createTextRange().select();"})
Related
I am working on automating a process within my business, part of which is sending an email through SalesForce. We don't have access to the SF API and the email has to be sent through salesforce in order to keep the communication searchable for the coworkers.
I need to use a template which can be selected in SalesForce, however this function does not work in IE (which our RPA solution uses) so I need to build this email from scratch.
I see two options for this:
Use the HTML to recreate the format with the right variables. This entails inserting/injecting/manipulating HTML.
Copy the format into memory/the clipboard, edit it programatically and paste it into the SF interface
This question will be about option 2. I have posted an additional question with regards to the first option separately. It can be found here.
Now on to the question: I found that I can copy the text in the template from the SalesForce webpage to the clipboard (using CTRL+C) and then paste it into word, doing so will preserve the formatting and layout of the text as well as included images. I can then copy and paste this back from word to the SF webpage as well preserving the images, format and layout. This means that the formatting and image data is retained in the clipboard.
Now I need to edit the template text before I paste it into the webpage in order to use this as a solution for automation. I have been experimenting with this for the past week but I can't find a way to edit the text while preserving the formatting and layout.
I use C# and know how to GetText and SetText as wel as images (separately) from the clipboard. However the text is then a plain string without any layout and formatting. I want to replace certain keywords in the template with the required variables.
At this point I have the following code to investigate the contents of the clipboard:
// Initialise a DataObject
DataObject t = null;
try
{
// Get Data from clipboard
t = (DataObject)Clipboard.GetDataObject(); // GetData(DataFormats.Html);
// Get formats in the DataObject
string[] tFormats = t.GetFormats();
// For each format that was found create a separate object (I did this manually, I
// inspected the formats earlier by eye.
object objectDescriptor = t.GetData("Object Descriptor", true);
object rtf = t.GetData("Rich Text Format", true);
object HTMLFormat = t.GetData("HTML Format", true);
object sysString = t.GetData("System.String", true);
object unicodeText = t.GetData("UnicodeText", true);
object text = t.GetData("Text", true);
object enhancedMetafile = t.GetData("EnhancedMetafile", true);
object metaFilePict = t.GetData("MetaFilePict", true);
object embedSource = t.GetData("Embed Source", true);
object linkSource = t.GetData("Link Source", true);
object linkSourceDescriptor = t.GetData("Link Source Descriptor", true);
object objectLink = t.GetData("ObjectLink", true);
// Try replacing the text
HTMLFormat = (object)HTMLFormat.ToString().Replace("|KeyWord_A|", "Value_A");
t.SetData("HTML Format", HTMLFormat);
sysString = (object)sysString.ToString().Replace("|KeyWord_A|", "Value_A");
t.SetData("System.String", sysString);
unicodeText = (object)unicodeText.ToString().Replace("|KeyWord_A|", "Value_A");
t.SetData("UnicodeText", unicodeText);
text = (object)text.ToString().Replace("|KeyWord_A|", "Value_A");
t.SetData("Text", text);
Clipboard.SetDataObject(t);
}
catch (exception ex)
{
// Do exception handling
}
Inspecting all these objects I see that, even though they are apparently included in the DataObject, they are all null, except for HTML Format, System.String, UnicodeText and Text. Within these options the HTML Format data is the only one that appears to have some formatting data in there (I don't know where the included image is stored though). I am able to replace the keyword in that HTML Format succesfully, however if I set it back into the DataObject and then extract the HTML format again nothing has changed. The same goes for the other text items.
I also tried changing the text in the same way for all the text items. This however has the same results. If I then try to paste the clipboard contents into Word it freezes for a little while and then nothing happens.
How can I edit this DataObject? And will that actually translate into a properly formatted and layout-ed text including the image and hyperlinks, or is this approach a dead end?
Cheers!
I am working with .flac audio files that use extended tags for a bit of magic. There is a tag called ReleaseGuid. I want to be able to list the contents or create the tag if it doesn't exist. I have done the prerequisite beating of my head against the wall for three days now. I have found a way to add a usertextinformation frame...although I don't see the value just the Owner. Please help me figure this out.
The following are lines of code that at least compile and seem to do something.
I need to get this to the point where I can add the needed tag.
File objFile = TagLib.File.Create(path);
TagLib.Id3v2.Tag id3v2tag = (TagLib.Id3v2.Tag)objFile.GetTag TagLib.TagTypes.Id3v2, true);
if (id3v2tag != null)
{
// Get the private frame, create if necessary.
PrivateFrame frame = PrivateFrame.Get(id3v2tag, "Mytag", true);
frame.PrivateData = System.Text.Encoding.Unicode.GetBytes "MyInfo");
id3v2tag.AddFrame(frame);
}
I have used mp3tag to see the tags I am needing by clicking on "extended tags".
Which type of tags would these be if I can add them using mp3tag? How do I read/write them using taglib?
To search for the tag type, you can open the (.flac) audio file in a texteditor like Notepad++ and search for your 'ReleaseGuid'. In front of this ID you will see the type like TXXX or PRIV or COMM.
Or you can have a look into the documentation (source code?) of the program who writes this 'ReleaseGuid' in your audio files.
I use WebBrowser to display generated XML. My XML string loaded into browser by call to NavigateToString:
var text = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
+ Environment.NewLine
+ "<whatever/>";
Browser.NavigateToString(text);
After browser loads string content I'm trying to search for any displayed text using standard Ctrl+F search dialog - but it always shows warning "No matches found".
If I save the XML string to a file and use Browser.Navigate(filename) it works.
Any ideas?
When you navigate to a file, the WebBrowser control performs MIME-type sniffing (often using the file extension as a hint). Then it creates an Active Document object of the corresponding type. Most often it's an instance of MSHTML Document, but can also be an XML, PDF or Word document, all of which support Active Document interfaces.
Now, when you navigate to a string with NavigateToString, the WebBrowser doesn't make any attempts to recognize the document type, and simply creates and instance of MSHTML Document (rather than XML Document), then tries to parse the content as HTML and fails.
I don't think you can get the desired behavior using NavigateToString, and I believe the same applies to NavigateToStream. To illustrate what's going on, take your XML content and save it as filename.html, filename.txt and filename.xml. Try opening each file with IE.
On a side note, when you navigate to a URL, the server actually has an option to suggest the MIME type, using HTTP headers. The browser may or may not tolerate such suggestion (it will still perform some validation checks).
The bottom line: you will not be able to render XML with NavigateToString or NavigateToStream. You're going to have to convert it to HTML first (e.g., with an XSLT transform).
I just had the same problem.
There is even the possibility to open the xml file directly using the overload:
webbrowser.Navigate(string filepathToXML)
Going this way, the builtin search panel works like a charm.
I export data from my database to word in HTML format from my web application, which works fine for me , i have inserted image into record,
the document displays the image also , all works fine for me except when i save that file and send to someone else .. ms word will not find link to that image
Is there anyway to save that image on the document so path issues will not raise
Here is my code : StrTitle contains all the HTML including Image links as well
string strBody = "<html>" +
"<body>" + strTitle +
"</body>" +
"</html>";
string fileName = "Policies.doc";
//object missing = System.Reflection.Missing.Value;
// You can add whatever you want to add as the HTML and it will be generated as Ms Word docs
Response.AppendHeader("Content-Type", "application/msword");
Response.AppendHeader("Content-disposition", "attachment; filename=" + fileName);
Response.Write(strBody);
You can create your html img tag with the image data encoded with base64. This way the image data is contained in the html document it self.
<img src="..." />
You images are probably only available via filesystem (i.e. their src starts with file).
There are a few ways
Make the image available via the internet: make sure their src starts with http and that they are hosted on a web server visible to the downloader (for example, the same server from which they are dowonloading the image)
Use a library, for example see NuGet
You can inline the images as #DevZer0 suggests.
Based on experience
Is the simplest to implement but has some annoyances (the server needs to be available to the user)
Is probably the best way if you do a lot of Word or Office files manipulation.
Can be done and it would solve the problem, although you wouldn't have a full library to support further use cases.
Use a word document creation library if you really want to have flexibility in creating doc or docx type files. Like all other popular document formats, the structure needs to be accurate enough for the program that opens up the documents. Like you obviously cannot create a PDF file just by setting content type "application/PDF", if your content is not in a structure that PDF reader expects. Content type would just make the browser identify the extension (incorrectly in this case) and download it as a PDF, but its actually simple text. Same goes for MS word or any other format that requires a particular document structure to be parsed and displyed properly.
Since every picture, table is of type shape in Word/Excel/Powerpoint, you could simply add with your program an AlternativeText to your picture, which would actually save a URL of the download URL and when you open, it will retrieve its URL and replace it.
foreach (NetOffice.WordApi.InlineShape s in docWord.InlineShapes)
{
if (s.Type==NetOffice.WordApi.Enums.WdInlineShapeType.wdInlineShapePicture && s.AlternativeText.Contains("|"))
{
s.AlternativeText=<your website URL to download the picture>;
}
}
This would be the C# approach, but would require more time for the picture. If you write a small software for it, which replaces all pictures which contain a s.AlternativeText, you could replace a lot of pictures at same time.
NetOffice.WordApi.InlineShape i=appWord.ActiveDocument.InlineShapes.AddPicture(s.AlternativeText, false, true);
It will look for the picture at that location.
You can do that for your whole document with the 1 loop I wrote you. Means, if it is a picture and contains some AlternativeText, then inside you loop you use the AddPicture function.
Edit: Anoter solution, would be to set a hyperlink to your picture, which would actually go to a FTP server where the picture is located and when you click on the picture, it will open it, means he can replace it by himself(bad, if you have 200 pictures in your document)
Edit according Clipboard:
string html = Clipboard.GetText(TextDataFormat.Html);
File.WriteAllText(temporaryFilePath, html);
NetOffice.WordApi.InlineShape i=appWord.ActiveDocument.InlineShapes.AddPicture(temporaryFilePath, false, true);
The Clipboard in Word is capable to transform a given HTML and when you paste it to transform that table or picture into Word. This works too for Excel, but doesn't for Powerpoint. You could do something like that for your pictures and drag and drop from your database.
I'm sending emails that have invoices attached as PDFs. I'm already - elsewhere in the application - creating the invoices in an .aspx page. I'd like to use Server.Execute to return the output HTML and generate a PDF from that. Otherwise, I'd have to use a reporting tool to "draw" the invoice on a PDF. That blows for lots of reasons, not the least of which is that I'd have to update both the .aspx page and the report for every minor change. What to do...
There is no way to generate a PDF from an HTML string directly within .NET, but there are number of third party controls that work well.
I've had success with this one: http://www.html-to-pdf.net
and this: http://www.htmltopdfasp.net
The important questions to ask are:
Does it render correctly as compared to the 3 major browsers: IE, FF and Safari/Chrome?
Does it handle CSS fine?
Does the control have it's own rendering engine? If so, bounce it. You don't want to trust a home grown rendering engine - the browsers have a hard enough problem getting everything pixel perfect.
What dependencies does the third party control require? The fewer, the better.
There are a few others but they deal with ActiveX displays and such.
We use a product called ABCPDF for this and it works fantastic.
http://www.websupergoo.com/abcpdf-1.htm
This sounds like a job for Prince. It can take HTML and CSS and generate a PDF, which you can then present to your users. It supports CSS3 better than most web browsers (staff include HÃ¥kon Wium Lie, the inventor of CSS).
See the samples, especially the ones for Wikipedia pages, for the beautiful output it can generate. There's also an interesting Google Tech Talk with the authors.
Edit: There is a .NET wrapper available.
wkhtmltopdf is a free and cool exe to generate pdf from html. Its written in c++. But nReco htmltopdf is a wrapper dotnet library for this awesome tool. I implemented using this dotnet library and it was just so good it does everything by its own you just need to give html as a data source.
/// <summary>
/// Converts html into PDF using nReco dll and wkhtmltopdf.exe.
/// </summary>
private byte[] ConvertHtmlToPDF()
{
HtmlToPdfConverter nRecohtmltoPdfObj = new HtmlToPdfConverter();
nRecohtmltoPdfObj.Orientation = PageOrientation.Portrait;
nRecohtmltoPdfObj.PageFooterHtml = CreatePDFFooter();
nRecohtmltoPdfObj.CustomWkHtmlArgs = "--margin-top 35 --header-spacing 0 --margin-left 0 --margin-right 0";
return nRecohtmltoPdfObj.GeneratePdf(CreatePDFScript() + ShowHtml() + "</body></html>");
}
The above function is an excerpt from the below link post which explains it in detail.
HTML to PDF in ASP.Net
The initial question is about converting another aspx page containing an invoice to a PDF document. The invoice is probably using some session data and the user suggests to use Server.Execute() to obtain the invoice page HTML code and then to convert that code to PDF. Converting the invoice page URL directly is not possible because a new session would be created during conversion and the session data would be lost.
This is actually a good technique to preserve session data during conversion which is applied in Convert a HTML Page to PDF in Same Session ASP.NET Demo of the EvoPdf library. The complete C# code to get the HTML string rendered by the invoice page and to convert that string to PDF is:
// Execute the invoice page and get the HTML string rendered by this page
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
string htmlStringToConvert = outTextWriter.ToString();
// Create a HTML to PDF converter object with default settings
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
// Use the current page URL as base URL
string baseUrl = HttpContext.Current.Request.Url.AbsoluteUri;
// Convert the page HTML string to a PDF document in a memory buffer
byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlStringToConvert, baseUrl);
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Convert_Page_in_Same_Session.pdf; size={0}", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();
As long as you can make sure to use proper XHTML, you could also use a product like Alt-Soft's Xml2PDF to convert XML (XHTML) into PDF by means of XSLT/XSL-FO.
It takes a bit of a learning curve to master, but it works very well once you've "got" it!
Marc
Since you are producing the answer, you can use a tool like Report.NET:
http://sourceforge.net/projects/report/
I disagree with the answers that say you cannot convert directly from output to PDF, however, as you can "re-call" the page and get the HTML as a stream and convert it. I am not sure what tool you would want to use to do this, however. In other words, it is possible, but I am not sure it is worth it. The PDF creation libs, like Report.NET, even though they force reusing some logic and no automagic converrsion, it is easier.
I have not tried this component, but I have heard good things about it from those who have. The model is more like HTML, but I am not sure you can simply send a rendered ASPX to it to create PDF:
http://www.websupergoo.com/abcpdf-8.htm
If you try to find some html to pdf software via GOOGLE you'll get a pile of this stuff.
There are about 10 leaders but most of them use IE dlls in background mode.
Just couple of them use their own parsing engine.
Please try PDF Duo .NET component in your ASP.NET project if you wish to create a PDF programaticaly.
It is light component for a cool generating of PDF invoces, reports e.g.
I'd go a different route. Assuming you are using SQL Server, use SSRS and generate the PDF that way.
A possible minimal solution to use Server.Execute() to obtain the HTML of the invoice page and convert that code to a PDF using winnovative html to pdf api for .net is:
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
byte[] pdfBytes = htmlToPdfConverter.ConvertHtml(outTextWriter.ToString(),
httpContext.Current.Request.Url.AbsoluteUri);
You can use PDFSharp or iTextSharp to convert html to pdf. PDFSharp is not free.