In my program I generate some reports in FlowDocument and display it with DocumentViewer control.
Now I need to add more export opportunities. I use iTextSharp to export in PDF, and I can save to XPS natively. Can I save a document directly to any office formats, DOC or XLS. Or maybe someone knows of a good library for converting from PDF / XPS in DOC or XLS?
I found a solution. As I can't export to Doc from WPF automatically, I reproduced my page layout with DocX Library. This is really awesome and simple library that don't required MSOffice installed to create Word 2007/2010 files.
I'm not sure if you're looking for an answer, so I'll be brief. You can use the Microsoft Interop assemblies to create Word documents. It's no menial task, but in my opinion, it's easier than using iTextSharp. They come with Visual Studio.
To create XPS documents, you'll need to generate FixedDocument objects from your FlowDocument, but from there, it's only a few lines of code. Eric Sink has a nice article that you can find here. This is also mentioned in this question posted here.
Related
My customer gave me some Word and Powerpoint documents which specify how certain 'reports' generated by our product are supposed to look like.
That means, I need to modify those documents (replace placeholders etc.) and then I need to export them as PDF.
How would you solve this problem in C# ?
TL;DR: Editing the office document is no problem at all, but exporting that document to PDF (using Interop) allegedly causes issues when running it as a web server application. That's the whole problem here.
I agree that Interop is not suitable for document manipulation in server environment. I would approach this problem by preparing MS Word template documents with placeholders for data. Then I would use c# to load the data for the reports and merge the data with templates to get final documents (docx, pdf, xps or various image formats). There are 3rd party toolkits which make it quite easy. Here is the code used by one such toolkit needed for merging xml data with the template to get a pdf document:
XElement customers = XElement.Load("Customers.xml");
DocumentGenerator dg = new DocumentGenerator(customers);
DocumentGenerationResult result = dg.GenerateDocument("MyTemplate.docx", "MyReport.pdf");
You can of course also use free libraries and SDKs based on OpenXML but you should expect a steep learning curve, lots of debugging and lots of time invested.
Wkthmltopdf might be an option.
A completely different "report approach" could be, to save those office documents with the placeholders as mht (That's MHTML a web archive format). This could be done directly in MS Office or even programatically.
The placeholders could be easily exchanged by string search and replace. The mht files could directly be used to show the report instead of the PDF. A clear disadvantage of the mht format, is the HTML formatting. With PDF you have a clear and fix positioning.
We are using this kind of report creation. There are some flaws, but it works and the customer could edit the mht templates directly by right-click Open-With the prefered MS Office flavor.
You can use report generators, like FastReport.Net for solving your problems. It can assign different data for placeholders and also allow export to PDF.
Using the technique in this answer I was successfully embedding the contents of an RTF file into an existing Word DOCX file, using OpenXML 2.5. Or so I thought.
We've now discovered that while the created file works fine in MS Word and Word Online, the document displays without the RTF content on other viewers such as:
Google Docs preview functionality
Windows Phone 8.1 (which has Office functionality built in)
Various iOS and Android viewers
In all cases, the document displays completely correctly except that the RTF content is just missing.
I did think it might be an issue in the viewers rather than the DOCX file, but for several tools to have the same issue makes me suspect it is a bug in our code.
It's a bit of an obscure case so trying to figure out the problem is proving difficult.
The technique you used (altChunk) relies on the viewer to convert the RTF content into WordML.
As you've discovered many don't do this.
To avoid this issue, you've really gotta convert the RTF content in your own code.
My objective is to make an automated server-side process to turn a .ppt into a .pdf. Microsoft themselves suggested that I use OpenXML, and now I'm looking at that.
My question is:
Can I actually achieve my objective using OpenXML?
I'm having a hard time finding the methods that I'd expect, such as "save as" here
Or perhaps I'm just misunderstanding how it all works?
... to turn a .ppt into a .pdf. Microsoft themselves suggested that I use OpenXML ... Can I actually achieve my objective using OpenXML?
For the conversion of a .ppt into .pdf? I'm curious to see where you have read this ;-)
No It's just impossible using OpenXml SDK:
OpenXml SDK permits to create, modify OpenXml documents (.pptx in case of PowerPoint) and here you are talking about .ppt (Biff format)
There is NO method for converting as PDF. OpenXml SDK permits to retrieve, create, modify the content of the document Without an Office Application but DOES NOT contain any methods to render it, or such Office Application methods such as SaveAs() ...
No, a common way to convert Office documents as pdf is to use Office.Interop.
This thread How do I convert Word files to PDF programmatically? is related to Word but it can help you, it's the same with PowerPoint.
I have been reading a lot of questions about convert doc files to pdf but I haven't read any response which solve my problem.
I tried ASPOSE, which is really good for what we want but it is really expensive and my boss doesn't want to spend a lot of money.
I need to open a docx file, manipulate it and save as pdf. My boss doesn't want the system save the file as docx and then convert to pdf.
Anyone has a simple solution to do that?
Thank you in advance.
PS: We have abcpdf and asppdf components but I didn't find any documentation about open a pdf file and save it as doc
If your boss wants to open a .DOC and save as .PDF then maybe Word or Word automation will help.
Newer versions of Microsoft Word are able to produce PDFs.
EDIT
Here are some links to sample code:
How do I convert Word files to PDF programmatically? (see accepted answer)
Word Doc to PDF Conversion. Command line using VBScript and automation
You can use iTextSharp to read the content and manipulate it then use openxml sdk to create word document from the read information.
Openxml SDK:
http://openxmldeveloper.org/
I need to convert PDF files into .doc files using C#. The computer has no file system though it doesn't have Office installed. Any good ideas how I can approach this? I did some research and most of people use the interop services.
You need to understand that PDF is not really implemented as a single document format.
If your PDF docs are created by rendering text to a PDF file, then direct PDF conversion is not only possible, but can be very good (reliable).
If the source of your PDF is either a scanner or fax (essentially a scanner...) then what you have is a document with an "picture" of text. This scenario is more difficult to deal with. If you open up the markup for this there is no 'text' to be converted. In this situation you have to deal with some manner of OCR (optical character recognition) which is less reliable due to a variety of issues.
If you have the option of intercepting the data before it is rendered to PDF (say like in SSRS or Crystal) then it would be better for you to bypass the PDF stage and move your data to a Word document.
If you are constrained to receiving faxes and then needing to interpret their content, prepare for OCR hell. It has been a while since I was there, so I hope that it has gotten better.
Even with out office installed on your machine, you have access (with Visual Studios) to the Office developer toolkit which will allow you build documents to be distributed in the Word formats.(.doc/.docx).
An option/idea may be to convert the PDF to Html, which can be opened in Word?
use aspose pdf kit to conver pdf to text and then text to doc using filestream or aspose doc