converting doc to html and pdf - c#

How can we convert a doc file into an html and a pdf file using c# in a web application?
I am making a web application and i want to convert a doc file into html and pdf as soon as the user clicks on the desired button

If you want to convert a word doc to pure html, you will need to run a function that strips all of the garbage characters from your word document.. you would be recreating your own basic CMS.
If you want to convert a word doc to PDF, you will need to run Microsoft Office on a Microsoft server and it gets quite complex and expensive to buy licences for using Office on the server.
If you want to simply upload a word doc to a server, you can display it online using Google Doc Viewer without any conversion. Here is an short article on how to embed a document on a website. All that is required is a short amount of code:
<iframe src="http://docs.google.com/gview?url=http://infolab.stanford.edu/pub/papers/google.pdf&embedded=true" style="width:600px; height:500px;" frameborder="0"></iframe>

Related

Possibility to convert HTM file to PDF

Is there a way to convert a HTM file to a PDF? Based on my understanding, HTML and HTM file extensions are the same. With that in mind, I tried the following code using Spire but my output was a blank PDF.
if (filelist[f].EndsWith(".htm"))
{
PdfDocument doc = new PdfDocument();
string filext = System.IO.Path.GetExtension(filelist[f]);
string outputDocName = filelist[f].Replace(filext, ".pdf");
doc.SaveToFile(outputDocName);
doc.Close();
}
I have searched on Google but I couldn't find much on converting a HTML file to a PDF. I have even looked into Python using ImageMagick, but there is multiple steps, so will try that once I run out of options. Is iTextSharp a possibility? Do I need to do another conversion to the HTM file to another file type before turning that into a PDF or for what I am trying to do doesn't exists?
HTML (with file extension of .htm or .html) is in a sense a plain text file which needs parsing and rendering to produce "visual" output. So ImageMagick or similar tools will not work if they have no concept of rendering HTML.
If this is a one-off requirement - get a PDF printer driver (for example CUPS PDF) and just "print" the pages from your browser.
If this needs to be an automated process my personal suggestion would be phantomJS. But search and Google are your friends - Converting HTML to PDF using PHP?

editable word document attachment

Its a general scenario when we provide an option of attaching a file (MS .doc) to end user. This file is stored in DB as binary. When user try to access this attachment next time, we allow them to download it. Now, here I want to give a feature to user where he should be able to open this doc file on click, edit it and save it without downloading.
.doc is a binary format and not easy to work with - a library such as Aspose, as mentioned by Christian, is definitely the way to go.
However, if .DOCX is acceptable (and that's Office 2007 and higher), then you can achieve what you want in three steps:
Convert .docx to HTML
Convert Word to HTML then render HTML on webpage
Display the HTML using any rich text control of your choice
What is the best rich textarea editor for jQuery?
Finally, convert HTML back to .docx:
Convert Html to Docx in c#
You would have to "reinvent" Microsoft Office Online (look into your skydrive account). I am unsure if there are any "out of the box" libraries for that, but you could build a simple editing app by leveraging Aspose word (or some other library). But that would be far from simple.
Link to aspose: http://www.aspose.com/.net/word-component.aspx
Word will only open files that are locally stored. What you are looking for is something similar to editing items that SharePoint provides using the WebDAV interface.
You may be able to use this approach to support your requirement. You should be cautious about the security aspects of the solution unless you have fully authenticated access to the shared folder on the server.
I am not sure if a standalone MS Word Document editor exists. However, this can be done with using a combination of rich text formatting / converting tool (for example, the DevExpress ASPxHtmlEditor + Document Server):
Load binary data from a DB;
Import loaded data (MS Word content) as HTML content into the ASPxHtmlEditor;
Edit imported data via the WYSIWYG ASPxHtmlEditor;
Convert the edited HTML back to MS Word content;
Save the converted / edited MS Word content back to the DB.
I believe, it is possible to do something like this if you have such products (free or commercial analogs) in your project.

c# manipulate doc and save as pdf

I have been reading a lot of questions about convert doc files to pdf but I haven't read any response which solve my problem.
I tried ASPOSE, which is really good for what we want but it is really expensive and my boss doesn't want to spend a lot of money.
I need to open a docx file, manipulate it and save as pdf. My boss doesn't want the system save the file as docx and then convert to pdf.
Anyone has a simple solution to do that?
Thank you in advance.
PS: We have abcpdf and asppdf components but I didn't find any documentation about open a pdf file and save it as doc
If your boss wants to open a .DOC and save as .PDF then maybe Word or Word automation will help.
Newer versions of Microsoft Word are able to produce PDFs.
EDIT
Here are some links to sample code:
How do I convert Word files to PDF programmatically? (see accepted answer)
Word Doc to PDF Conversion. Command line using VBScript and automation
You can use iTextSharp to read the content and manipulate it then use openxml sdk to create word document from the read information.
Openxml SDK:
http://openxmldeveloper.org/

How can I convert PDF to doc without microsoft.office.interop?

I need to convert PDF files into .doc files using C#. The computer has no file system though it doesn't have Office installed. Any good ideas how I can approach this? I did some research and most of people use the interop services.
You need to understand that PDF is not really implemented as a single document format.
If your PDF docs are created by rendering text to a PDF file, then direct PDF conversion is not only possible, but can be very good (reliable).
If the source of your PDF is either a scanner or fax (essentially a scanner...) then what you have is a document with an "picture" of text. This scenario is more difficult to deal with. If you open up the markup for this there is no 'text' to be converted. In this situation you have to deal with some manner of OCR (optical character recognition) which is less reliable due to a variety of issues.
If you have the option of intercepting the data before it is rendered to PDF (say like in SSRS or Crystal) then it would be better for you to bypass the PDF stage and move your data to a Word document.
If you are constrained to receiving faxes and then needing to interpret their content, prepare for OCR hell. It has been a while since I was there, so I hope that it has gotten better.
Even with out office installed on your machine, you have access (with Visual Studios) to the Office developer toolkit which will allow you build documents to be distributed in the Word formats.(.doc/.docx).
An option/idea may be to convert the PDF to Html, which can be opened in Word?
use aspose pdf kit to conver pdf to text and then text to doc using filestream or aspose doc

How do I output a webpage that contains MathML to PDF?

My web application displays MathML embedded in HTML using the MathPlayer plugin. I need to output to PDF. I have PDF components (Dynamic PDF, ABCpdf), but they don't know how to parse the MathML, of course.
Is there a library that can help me translate the MathML to an image or something that I can feed to the PDF components on the fly in the web application?
Design Science has a command line Windows executable (also available as a DLL) that will convert all of the MathML in a document to EPS for use in PDF. It's the Document Composer, which is part of the MathFlow SDK. Contact us if you're interested in more info or an evaluation.
FYI, I have also found another PDF component that supports MathML called AHFormatter. I have not tried it, but it apparently works very well.

Categories