Possibility to convert HTM file to PDF - c#

Is there a way to convert a HTM file to a PDF? Based on my understanding, HTML and HTM file extensions are the same. With that in mind, I tried the following code using Spire but my output was a blank PDF.
if (filelist[f].EndsWith(".htm"))
{
PdfDocument doc = new PdfDocument();
string filext = System.IO.Path.GetExtension(filelist[f]);
string outputDocName = filelist[f].Replace(filext, ".pdf");
doc.SaveToFile(outputDocName);
doc.Close();
}
I have searched on Google but I couldn't find much on converting a HTML file to a PDF. I have even looked into Python using ImageMagick, but there is multiple steps, so will try that once I run out of options. Is iTextSharp a possibility? Do I need to do another conversion to the HTM file to another file type before turning that into a PDF or for what I am trying to do doesn't exists?

HTML (with file extension of .htm or .html) is in a sense a plain text file which needs parsing and rendering to produce "visual" output. So ImageMagick or similar tools will not work if they have no concept of rendering HTML.
If this is a one-off requirement - get a PDF printer driver (for example CUPS PDF) and just "print" the pages from your browser.
If this needs to be an automated process my personal suggestion would be phantomJS. But search and Google are your friends - Converting HTML to PDF using PHP?

Related

C# Trying to get all fields from a PDF using ITextSharp returns 0 fields

I have a PDF document that was obtained by converting a XSL-FO document to PDF using a XSL-FO to PDF render engine called fonet.
To convert XSL-FO document to PDF I did as explained here.
Now, I am trying to get all the fields in the PDF file using ITextSharp. To do this I proceed as explained here. However, in line af.Fields it returns 0 fields. Why?
fonet does not support forms.
If you were to search through their documentation for the term 'form' you would find exactly 1 match.
In the sentence:
A variety of overloaded forms of Render are available.
If however you are already using iText, then why not simply use a tool like apache XSLT to convert your XSL-FO into HTML? Then you can use pdfHTML which converts HTML to PDF.

How HTML to PDF works (specially abcPDF)

My new project is converting the HTML into PDF on the fly using the URL.
I have searched a lot in my initial period and come up with the solution so that HTML convert to IMAGE and IMAGE goes to PDF.
But its not ideal solution as user can not copy paste from the PDF file.
Recently i came across abcPDF component, you can check their demo here http://www.abcpdfeditor.com/
Now i am wondering how they are able to produce such a nice PDF with all such feature. What will be their logic? I dont think they are going to parse each and every HTML tag to create document. Do you guys have any idea?
Any help will be much appreciated
In short, this is how most HTML to PDF conversion works.
HTML ----Converted To ----> EMF (Metafile/Vector Image) ----> PDF
Basically, IE's rendering engine (i.e, MSHTML) has some APIs through which you can export loaded HTML page as Emf (Enhanced metafile format) which is nothing but a vector image.
You can make use of this open-source web browser control for this purpose.
http://groups.google.com/group/csexwb
Then you have to render the generated EMF file on to PDF page. This is typically called as, EMF to PDF conversion. Based on my understanding there is no free Emf to PDF conversion software available. But ITextsharp provides minimal support for WMF format.

converting doc to html and pdf

How can we convert a doc file into an html and a pdf file using c# in a web application?
I am making a web application and i want to convert a doc file into html and pdf as soon as the user clicks on the desired button
If you want to convert a word doc to pure html, you will need to run a function that strips all of the garbage characters from your word document.. you would be recreating your own basic CMS.
If you want to convert a word doc to PDF, you will need to run Microsoft Office on a Microsoft server and it gets quite complex and expensive to buy licences for using Office on the server.
If you want to simply upload a word doc to a server, you can display it online using Google Doc Viewer without any conversion. Here is an short article on how to embed a document on a website. All that is required is a short amount of code:
<iframe src="http://docs.google.com/gview?url=http://infolab.stanford.edu/pub/papers/google.pdf&embedded=true" style="width:600px; height:500px;" frameborder="0"></iframe>

How can I convert PDF to doc without microsoft.office.interop?

I need to convert PDF files into .doc files using C#. The computer has no file system though it doesn't have Office installed. Any good ideas how I can approach this? I did some research and most of people use the interop services.
You need to understand that PDF is not really implemented as a single document format.
If your PDF docs are created by rendering text to a PDF file, then direct PDF conversion is not only possible, but can be very good (reliable).
If the source of your PDF is either a scanner or fax (essentially a scanner...) then what you have is a document with an "picture" of text. This scenario is more difficult to deal with. If you open up the markup for this there is no 'text' to be converted. In this situation you have to deal with some manner of OCR (optical character recognition) which is less reliable due to a variety of issues.
If you have the option of intercepting the data before it is rendered to PDF (say like in SSRS or Crystal) then it would be better for you to bypass the PDF stage and move your data to a Word document.
If you are constrained to receiving faxes and then needing to interpret their content, prepare for OCR hell. It has been a while since I was there, so I hope that it has gotten better.
Even with out office installed on your machine, you have access (with Visual Studios) to the Office developer toolkit which will allow you build documents to be distributed in the Word formats.(.doc/.docx).
An option/idea may be to convert the PDF to Html, which can be opened in Word?
use aspose pdf kit to conver pdf to text and then text to doc using filestream or aspose doc

Convert .doc and .txt format file into pdf file for for .aspnet?

I really try to find on google how can we convert .doc and .txt file into pdf file but
not getting required answer or code.
I want any command line software of converter code which will provide code that convert above to format file into pdf file.
if any user upload upload .txt and .doc file then it will be convert into .pdf file in my application.
so need coversion code.
and after coverting into pdf when i clicked on it. it will be open into pdf in browser.
Need help and code and explanation to do it.
This topic (actually both topics - PDF generation and sending a PDF stream or file in the response) have been discussed quite a lot on SO. Just do a quick search for "asp net pdf generation".
Take a look at this article on code project, http://www.codeproject.com/KB/cs/convertdocintootherformat.aspx it shows how to convert a doc to rtf using C# you should be able to adapt this to convert to text

Categories