SelectPDF missing content after conversion

SelectPDF missing content after conversion - c#

So I am using SelectPDF to convert an MVC view's html to PDF, but for some reason, a great deal of the content is missing from the PDF.
I have tested the program on other sites I have made in MVC and they work...
So as per the readme doc's recommendation, I use this code to produce a
SelectPdf.HtmlToPdf converter = new SelectPdf.HtmlToPdf();
SelectPdf.PdfDocument doc = converter.ConvertUrl(HTMLPath);
//SelectPdf.PdfDocument doc = converter.ConvertHtmlString(ViewHTML);
doc.Save(PDFSavePath);
doc.Close();
So HTMLSave path is the location of the HTML file created from the rendered MVC view. While ViewHTML contains the HTML in the form of a string.
So this is how the HTML looks (the file and the view look identical, I did double check in case the HTML was captured incorrectly):
But this is how the PDF looks:
Absolutely tragic...
Not sure if it will be helpful, but here's the HTML:
https://drive.google.com/open?id=0B8DiACLG11oYd3p5Tzc2ZlJQLVk
Unfortunately, all the HTML is on one line thanks to the MVC View to HTML

It seems that the html/css engine used to render the page does not support vh, vw for font sizes. It's a common issue with all converters, so probably you should try to change this if you need the page converted to pdf.
Later update: it seems that there is support for vw, but not for vh. That's why the "each" words appear. They use vw in css.

Related

Including Large Static HTML File with Image in MVC View

I'm converting a desktop application that hosts HTML content into an online application. I have various large pieces of prebuilt static html that need to be included in an MVC page depending on user actions. Each of the static html pages includes at least one img tag that references a file that, in a Web Forms page would be located in the same directory. Here's a very simplified example:
Static html:
<html>
<!-- Large chunk of html -->
<img src="logo.gif" >
<!-- More html -->
</html>
SampleController:
Dim html as String = GetFileContentAsString("~/Content/Sample/Static.html")
ViewBag.StaticHTML = PrepareHTMLContent(html)
View:
#Html.Raw(ViewBag.StaticHTML)
The result of the above is a page (e.g. http://localhost:12345/Sample) with a broken image link in the middle of the HTML. I'm already preprocessing the html where possible to strip out useless tags and insert Javascript and CSS links but preprocessing the image paths is unreliable because they could be anywhere in the static html and are quite likely to be inconsistent or otherwise quirky.
So how can I place (or create) the image file in the right location for the static html to pick it up? Is there any other option (bearing in mind that I also need to link CSS and JavaScript files and that the static html has a bunch of other files associated with it that need to be kept in a single location)?
Or is there a way to define or override the location the dynamic MVC page is built?

Easiest thing would be to have /Sample return an HTML page that simply loads /~Content/Sample/Static.html into an iframe so that the browser will resolve relative paths in static.html to be within /~Content/Sample/

How to write rich text to word document generated from htm file in C#

I am trying to generate a word doc from saved HTML file using an Open XML library.
If the HTML file does not contain an image I can simply use the code below and write text content to word doc.
HtmlDocument doc = new HtmlDocument();
doc.Load(fileName); //fileName is the Htm file
string Detail = string.Empty;
string webData = string.Empty;
HtmlNode hcollection = doc.DocumentNode.SelectSingleNode("//body");
Detail = hcollection.InnerText;
But if the HTML file contains an embedded image I am struggling to include that image in the word doc.
Using hcollection.InnerText only writes the text part and excludes the image.
When I use
HtmlNode hcollection = doc.DocumentNode.SelectSingleNode("//body");
Detail = hcollection.InnerHtml;
All the HTML tags get written to the word doc along with path of Image in the tag
<table border='0' width='100%' cellpadding='0' cellspacing='0' align='center'>
<tr><td valign='top' align="left">
<div style='width:100%'><div id="div_img">
<div>
<img src="http://www.myweb.com/web/img/2013/07/18/img_1.jpg">
<span>Sample Text</span></div></div><br><br>Sample Text Content here<br><br> </div></td></tr></table>
How to remove the html tags and instead of path shown like
<img src="http://www.myweb.com/web/img/2013/07/18/img_1.jpg">
the corresponding picture gets loaded.
Please help.

You'll need to look at the HTML and translate it to OpenXML somehow.
I've used HtmlToOpenXml open-source library (license), and that works well enough. It should handle images (inline, local or remote) and correctly insert them into the OpenXML document. I recently submitted a patch which was accepted, so the project is still somewhat active.
There are some limitations with the library though:
Javascript (<script>), CSS <style>, <meta> and other not supported tags does not generate an error but are ignored.
It does handle inline style information, but it entirely ignores other CSS, which was something I needed. I ended up integrating some simple parsing of a single <style> element from another open-source project (jsonfx, using MIT license).
Note: handling multiple <style> elements, downloading CSS files, sorting out which style rules have precedence -- these are all problems which I did not address.

Actually the converting of HTML document to MS Word is a very complex task and there are a lot of cases besides of IMAGE tags which need to be solved. The difference between Open XML and HTML formats is absolutely decisive.
If I were you I would look for 3rd party tools for that. It would be chiper to pay for them than spending weeks on investigation and learning of all aspects of the task, writing the code, and then fixing miltiple bugs.
Personaly me used Aspose.Words library for that. It worked perfectly fine, but maybe you want to try another one.

how to convert part of html page into pdf using wkhtmltopdf or something else

Iam using wkhtmltopdf for html to pdf conversion of an html page using C# code and it is working absolutely perfect but i want to convert a particular part of html page to pdf like by specifying div id of that part or any similar method.How can i do this?
Please help

You can just create a new HTML page with just your div as a body:
<html>
<head><title>...</title></head>
<body>
<!-- PLACE YOUR DIV HERE -->
</body>
</html>
Since you already know how to produce a PDF from an HTML file, you know know how to produce a PDF from a DIV ;)
BUT: Of course the CSS is going to be all wonky, since the selectors won't match anymore etc. This is a problem for the general case, but I'm guessing in your specific case, you can make that work.

ASP.NET Which HTML editor can do everything I want?

I have tried to use the standard AJAX HTMLeditor from here (http://www.asp.net/ajaxlibrary/act.ashx) and I have try to work with the FCKEditor (from http://ckeditor.com/)
But both don't do everything. I call the AJAX standard control A and the FCKeditor F.
With the A editor it is impossible to get your HTML text in the HTML content. You can only get it in the Design content. (this next code doesn't do the job: string htmlContentStr = Editor1.Content).
With F it is possible to get it in the HTML content (it does this by default), but to get your changes back in HTML is impossible. (this next code doesn't do the job: string htmlContentStr = FCKeditor1.Value).
So what I need is a HTML editor that is possible to put HTML text in HTML content, a user can make changes in the designcontent and after the changes 're make it must be possible to get the HTMLcontent and put it away in a string or database.
Is this possible or do I need a commercial one to get this feature?
If my question isn't clear, please let me know.
Thnx

I've used XStandard quite easily and it let me manipulate the HTML. I didn't bother using it as a control, but just read and wrote (escaped) the HTML where needed into the asp output.

Easiest way to display XML on an ASP.NET page

I have some XML in an XmlDocument, and I want to display it on an ASP.NET page. (The XML should be in a control; the page will have other content.) Right now, we're using the Xml control for that. Trouble is, the XML displays with no indentation. Ugly.
It appears that I'm supposed to create an XSLT for it, but that seems kind of boring. I'd rather just throw it into a control and have it automagically parse the XML and indent correctly. Is there an easy way to do that?

You could try to use XmlWriter/XmlTextWriter, set the writer's Indentation property, write to a StringBuilder or MemoryStream, and output the result inside a <pre> tag

A quick (and dirty) way of doing this would be to use an IFrame.
In truth, an XSLT is the "ideal" way for formatting an XML for display. Another option would be to parse it manually for display.
To use an Iframe:
ASPX side:
< iframe runat="server" id="myXMLFrame" src="~/MyXmlFile.xml" /></pre>
Code Side:
myXMLFrame.src = Page.ResolveClientUrl("~/MyXmlFile.xml")

You can find a slightly modified version of the XSLT that IE uses to transform XML to HTML when viewing in IE at http://www.dpawson.co.uk/xsl/sect4/N10301.html#d15977e117.
I have used it in a WebBrowser control in a WinForms application, and it works lika a charm. I have not tested it in FireFox/Chrome/Safari/Operat, though.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

SelectPDF missing content after conversion - c#

Related

Including Large Static HTML File with Image in MVC View

How to write rich text to word document generated from htm file in C#

how to convert part of html page into pdf using wkhtmltopdf or something else

ASP.NET Which HTML editor can do everything I want?

Easiest way to display XML on an ASP.NET page

Categories

Resources