HTML to Image .tiff File - c#

Is there a way to convert a HTML string into a Image .tiff file?
I am using C# .NET 3.5. The requirement is to give the user an option to fact a confirmation. The confirmation is created with XML and a XSLT. Typically it is e-mailed.
Is there a way I can take the HTML string generated by the transformation HTML string and convert that to a .tiff or any image that can be faxed?
3rd party software is allowed, however the cheaper the better.
We are using a 3rd party fax library, that will only accept .tiff images, but if I can get the HTML to be any image I can covert it into a .tiff.

Here are some free-as-in-beer possibilities:
You can use the PDFCreator printer driver that comes with ghostscript and print
directly to a TIFF file or many other formats.
If you have MSOffice installed, the Microsoft Office Document Image Writer will produce
a file you can convert to other formats.
But in general, your best bet is to print to a driver that will produce and
image file of some kind or a windows meta-file format (.wmf) file.
Is there some reason why you can't just print-to-fax? Does the third-party software not support a printer driver? That's unusual these days.

A starting point might be the software of WebSuperGoo, which provide rich image editing products, cheap or for free.
I know for sure their PDF Writer can do basic HTML (http://www.websupergoo.com/helppdf6net/source/3-concepts/b-htmlstyles.htm). This should not be too hard to convert to TIFF.
This does not include the full HTML subset or CSS. That might require using Microsofts IE ActiveX component.

Related

open or convert webarchive File in c#

I am trying to find a way to open or convert a webarchive file to any other format in C#. The goal is an automated import system with as few restrictions on file type as possible. I cannot seem to find any way of converting the file other than using safari to open it.
Unfortunately what you are looking for cannot really be done. A webarchive is a proprietary file type made by Apple to display offline webpages in a Safari. This is a combination of xml, html, and binary data, but there are examples in Objective-C to convert the webarchive to a zip archive that contains the html and embedded images/media that was originally displayed on the website that was saved into the webarchive file.
Here is an Objective-C example from GitHub - WebArchiveExtractor
As for converting to PDF...not sure that can be done, you would be better off printing the webpage to PDF in the first place and then uploading that to your document management system.
Apparently though the webarchive filetype contains XML with binary encoded images/media similar to an MHTML file, so you may be able to figure out the format by viewing them in text editors and then writing a conversion utility, but there is very limited information on the web regarding the internal schema of the webarchive file format, so this may be a daunting task. However, since WebKit is open source you can see their code for created an archive and try to reverse it to build your converter. Here's the source code (in C++) for the archiving features in Safari, which actually looks like they are using mhtml, but I haven't explored deep enough to tell if it's exactly the same format: http://trac.webkit.org/browser/trunk/Source/WebCore/loader/archive
Good Luck!

.NET graphic libraries to display images (pdf, .docx and any other format of image) in the browser

I am developing a ASP .NET MVC application where users are able to upload files to a repository. Those files could be pdf, doc, any type of image and so on.
When the user select a file to be imported I would like to display this file in the browser so they can review its contents before the upload.
I know I could use some sort of IFrame to display pdf but I am looking for some specific class or .net libraries to implement this feature.
I just need a north.
This is an extremely difficult problem. There are some libraries that can help. For instance PDF files might be rendered to images with ghostscript. Word and Excel files might be converted to PDF or image with a number of libraries. None of them, AFAIK, are very good at it so I can not recommend one.
You could automate MSO to perform the conversion to PDF, but that is decidedly not safe for server code. Another possibility is convert source documents to SWF files (like flexpaper) and display in flash. There are some great libraries out there, but it will limit your supported clients. Sharepoint has support for providing some of this capability as well. Others have used OpenOffice to convert MSO documents but also at a loss of quality.
I can't really advise any specific direction as it is highly dependent on what you/your company is willing to spend and the desired results. Good luck.
You could try to rely on Windows and the explorer thumbnails for it, like here, but then you'd have to make sure that:
You can abuse the server in the most elaborate way (install stuff, talk to the shell from ASP.NET)
You have a thumbnail provider installed on the server for every type that you want to preview. I guess from the moment you can see the thumbnail in explorer, you're set. So for pdf, you might need to install PDF Reader from Adobe.
Docx files should be saved with thumbnail checked (see link). There seems to be no other easy, free way to convert a docx to a thumbnail. The "best" solution I came across, was saving it automatically again somehow, and making sure the thumbnail option is checked.
I don't want to say that's impossible, but it can't be done with finite effort.
What you are asking for is a browser-based solution, because you want the user to be able to "review" the document before uploading.
Therefore you cannot use a server side solution, which is essentially what is being asked by referring to a ".Net library".
.Net libraries are dependent on an installed version of .Net, which does not exist in all versions for all operating systems for which graphical browsers exist.
Next, recent changes in browser security do not allow to read the full client-side file name of the selected file in the input field.
You'd have to rely on HTML5 and its FileReader to access the file's byte stream, but even then you can only retrieve image from image files. (see sample)
Excluding browser-based solutions in Flash, ActiveX, Java, due to browser and platform support, this leaves JavaScript as the only "reasonable" solution: you'd need a library for each supported format to either convert a file into an image in an image format supported by browsers, or extract the text(+image) representation of a file.
Great awnsers... Just want to share the result of my research and I found a nice client-based solution supported by Mozilla Labs. This is a framework based on HTML5 and Javascript with no native code needed.
Here the project website:
https://github.com/mozilla/pdf.js
This is what you are capable of:
http://mozilla.github.com/pdf.js/web/viewer.html
And for the last a great video explaning how everthing works
http://www.youtube.com/watch?v=Iv15UY-4Fg8&noredirect=1
Reguarding my question we are going to converter every possible file to PDF on the server and then render this PDF using this framework.

How HTML to PDF works (specially abcPDF)

My new project is converting the HTML into PDF on the fly using the URL.
I have searched a lot in my initial period and come up with the solution so that HTML convert to IMAGE and IMAGE goes to PDF.
But its not ideal solution as user can not copy paste from the PDF file.
Recently i came across abcPDF component, you can check their demo here http://www.abcpdfeditor.com/
Now i am wondering how they are able to produce such a nice PDF with all such feature. What will be their logic? I dont think they are going to parse each and every HTML tag to create document. Do you guys have any idea?
Any help will be much appreciated
In short, this is how most HTML to PDF conversion works.
HTML ----Converted To ----> EMF (Metafile/Vector Image) ----> PDF
Basically, IE's rendering engine (i.e, MSHTML) has some APIs through which you can export loaded HTML page as Emf (Enhanced metafile format) which is nothing but a vector image.
You can make use of this open-source web browser control for this purpose.
http://groups.google.com/group/csexwb
Then you have to render the generated EMF file on to PDF page. This is typically called as, EMF to PDF conversion. Based on my understanding there is no free Emf to PDF conversion software available. But ITextsharp provides minimal support for WMF format.

How can I convert PDF to doc without microsoft.office.interop?

I need to convert PDF files into .doc files using C#. The computer has no file system though it doesn't have Office installed. Any good ideas how I can approach this? I did some research and most of people use the interop services.
You need to understand that PDF is not really implemented as a single document format.
If your PDF docs are created by rendering text to a PDF file, then direct PDF conversion is not only possible, but can be very good (reliable).
If the source of your PDF is either a scanner or fax (essentially a scanner...) then what you have is a document with an "picture" of text. This scenario is more difficult to deal with. If you open up the markup for this there is no 'text' to be converted. In this situation you have to deal with some manner of OCR (optical character recognition) which is less reliable due to a variety of issues.
If you have the option of intercepting the data before it is rendered to PDF (say like in SSRS or Crystal) then it would be better for you to bypass the PDF stage and move your data to a Word document.
If you are constrained to receiving faxes and then needing to interpret their content, prepare for OCR hell. It has been a while since I was there, so I hope that it has gotten better.
Even with out office installed on your machine, you have access (with Visual Studios) to the Office developer toolkit which will allow you build documents to be distributed in the Word formats.(.doc/.docx).
An option/idea may be to convert the PDF to Html, which can be opened in Word?
use aspose pdf kit to conver pdf to text and then text to doc using filestream or aspose doc

How to convert PDF to Excel in C#?

I want to read tables which are in a PDF document and I want to store these values in a Database.
What I have found so far through searching the web:
Read text from PDF using abcpdf .net, which is freeware available. But it's not right solution because I want to read the tables.
Convert PDF document into Excel/Word. Tables will come in the target document as it is. Word conversion is possible by using EasyPDF Converter which is third party tool which is much cheaper than the other solution available in other tool which converts PDF into Excel.
But I am looking for any other solution/API classes which can convert PDF into Excel.
There are 2 possible solutions
a) Cometdocs makes a free online conversion from PDF to XLS surprisingly good and send for your email the result file.
b) Cognview is a comertial shareware that converts PDF to XLS. There is OCR and text version. I didn't use personally, but they have good recomendations.
If you are looking to upload your data into a database, converting your PDFs to CSV is probably the safest option. The PDFTables API will allow you to do this with C#, converting as many PDFs at once as necessary. https://pdftables.com/pdf-to-excel-api#csharp
You can try to use Quablo, a PDF table extractor available at this web page (link updated/corrected).

Categories