How to convert Word documents (XML based) to PDF, with C#? - c#

I have to do some automation of converting Word documents to PDF. By doing some research, I found that starting from Microsoft Office 2007, Word documents are XML based. Furthermore, I found that there is a free solution ApacheFOP doing conversion from XML to PDF, however, I still didn't manage to find the way to automate it with C#. There is nFOP (version that runs on the .NET framework), but some detailed explanation of implementing it, not really.

You could use docx4j.NET
That's a .NET version of docx4j, which is a Java library which converts docx to PDF using FOP.
See ConvertOutPDF.java
Before you go to the effort of downloading etc, you might want to use the online demo to see whether the PDF output is close to your needs.
**Disclosure: I lead the docx4j project. **

An ugly solution would be to make a "save as" using microsoft office interop...
Read more here
And find the related stackoverflow post here

I have found one library that can convert XML to PDF in C#/.NET and vice versa known as Aspose.PDF for .NET . I hope it will solve your problem.

Related

Read word documents from C# and Display it Inline in browser

Q 1.
How can I read MS-Word documents(doc and docx) from C# without MS Office installed. I was able to read unformatted text using stream reader. I think I can use OpenXML for docx. But what about doc? Is there some open source solution to handle it? Is using OLE32dll an option in unlicensed scenario?
Is use of IFilter a solution? havent seen anywhere any samples using it though and also not sure about its support in windows 7 and 8.
EDIT : I stumbled upon this solution and found it acceptable for my situation
Q 2.
I need to display the doc and docx files in my Webpage as Inline or in a partial page or even iframe. How is that possible? Is COM interoperablity the only solution to it too?
Maybe you can use the redistributable Interop Assemblies from Microsoft, to read your ".doc" :
http://www.microsoft.com/en-us/download/details.aspx?id=3508
It doesn't require Office according to the description.

Open PDF and print to PDF programmatically C#

I am developing an application that is able to open and display PDFs after I open them and print them to another PDF using CutePDF, but the originals are not viewable.
I am looking for a way to programmatically open a PDF file, and print to another PDF file (not necessarily using CutePDF, just printing to another PDF is the desired functionality).
This will be integrated into a C# .NET project. Are there any suggestions how to go about doing this?
Thanks.
You could use Office Interop and generate the PDF, when you say "print to another pdf", I imagine you mean just generate? Or are you saying spool them to a pdf print driver that essentially will just create a PDF to be saved.
Use iText, which is available in Java and C# versions. I have used the Java version successfully. I recommend the iText in Action book to help you get up to speed with iText faster. The book discusses only the Java API, but I imagine you will be able to learn the principles of iText from the book and then figure out the minor differences for the C# version.
To implement this you can use PDFFlow library for generating PDF files from C#. It has easy fluent syntax and many features.
Here are many examples of real complex PDF documents: examples
Good luck :)

How to create a Online Word Processor also [ HTML/XML to .doc conversion in server using .net]

I'm interested in knowing to create a Online Word processor similar to Google Docs and MS Office web Apps. i want to do it using MicroSoft technologies and Tools only. I'm a beginner in ASP.net and C#.net. I've planned to do its front end using TinyMCE [ http://tinymce.moxiecode.com/tryit/full.php ]. but how to convert the data in the browser to .doc in the server? how can i do the formatting of a .doc file in the server using .net? what are the tools available in .net to work on such kind of projects? Thanks in Advance.
Use OpenXML to generate word docs. This is of course for Word 2007/2010, not 2003. PLenty of documentation on how to do it. You can reverse a word doc by changing extension from .docx to .zip then extracting the files and viewing them in notepad.
thinking about it more, you might want to create an XSLT to translate the html markup to OpenXML. But this is a lot of work (might already be available somewhere on the net) so you might try a 3rd party tool as suggested below.
There are a variety of third-party libraries, such as Aspose, that can do this.
I don't think you'll find any good free ones.
You can generate .docx (OpenXML) files using the OpenXML SDK.

Reading a Word 2007 table using C#

I am relatively new in Word 2007 programming. Pardon me if this question is already asked. I would like to read a word table and its child cells and extract that text out in C# (VSTO tools). I would like to build out an xml from the data extract later.
Please guide me if anyone has done something of this sort. Would really appreciate.
Thank you.
Anjan
Unless used in backward compatibility mode, Word 2007 produces documents in the "Office Open XML Format" for which Microsoft provides an library in .NET
This MSDN article provides various pointers and snippet, in C#, on how to do this kind of things. Also this Walkthough Word 2007 format may be useful.
If you need to access older MS-Word formats, you may be able to use or inspire yourself from the text-mining open source project (java).

Is there a way to generate word documents dynamically without having word on the machine

I am planning on generating a Word document on the webserver dynamically. Is there good way of doing this in c#? I know I could script Word to do this but I would prefer another option.
I've worked at a company in the past that really wanted generated word documents, in the end they were perfectly satisfied with RTF docs that had a ".doc" extension. Word has no problem recognizing and opening them.
The RTF docs were generated with iText.net (free .net library), the API is pretty easy to use, performs extremely well, you don't need word on the machine, also, you could extend to generating PDF, HTML, and Text docs in the future with very little effort. After four years the solution I created is still in place, so that's a little testimony in iText.net's favor.
It looks like the official iText page suggests that iText Sharp is the best .Net choice right now, so that's another option
You'd be better off generating an rtf file, which word will know how to open.
If want to generate Office 2007 documents check the Open XML File Formats, they're simple zipped XML files, check this links:
Open XML File Formats: What is it, and how can I get started?
Introducing the Office (2007) Open XML File Formats
Edit: Check this project, can serve you as a good starting point:
DocumentMaker
Seems very simple and customizable, look this code snippet:
Paragraph p = new Paragraph();
p.Runs.Add(new Run("Text can have multiple format styles, they can be "));
p.Runs.Add(new Run("bold and italic",
TextFormats.Format.Bold | TextFormats.Format.Italic));
doc.Paragraphs.Add(p);
Word will quite happily open a HTML with a .doc extension. If you include an internal style sheet, you can have it fully formatted. There was previous post on this subject:
Export to Word Document in C#
Creating the old .DOC files (pre-Word 2007) is nigh-impossible without Word itself. The format is just too complex. Microsoft has released the format description, but it's enough to reduce a grown programmer to tears. There is a reason for that too (historical), but that doesn't make things better.
The new .DOCX would be easier, although quite a bit of hassle still. However depending on which Word versions you are targeting, there are some other options too.
For one, there is the classic .RTF. The format is pretty complex still, yet well documented and has strong support across many applications and platforms. And you might use some string-replacing into template files to make things easier (it's non-binary).
Then there are the "old" Word XML files. I think they worked starting with Word XP. Kinda the predecessors of .DOCX. I've used them, not bad. And the documentation is pretty OK.
Finally, the easy way that I would choose, is to make a simple HTML. Word can load HTML files just fine starting with version 2000. In the simplest way just change the extension of a HTML file to .DOC and you have it. You can also add a few word-specific tags and comments to make it look even better in Word. Use the Word's Save As...HTML option to see what they are.
There are third party libraries about that will do the job.
Doing a quick google came up with this one, for example.
I haven't tried any, so I can't give you specific advice, I'm afraid!
Let us know how you get on...
In Office 2007 Microsoft introduced a new file format called the Microsoft Open Office XML Format (.docx). This format is not compatible with older versions of Microsoft Word. Since this is XML you can create or read with out having a Word installed.
Here is the component that generates document based on the custom template. The documents are generated from the sharepoint list ... so the data is pulled from the list item into the document on the fly:
http://store.sharemuch.com/products/generate-word-documents-from-sharepoint-list
Hope that helps,
Yaroslav Pentsarskyy
Blog: www.sharemuch.com

Categories