Convert Open XML Excel files to HTML - c#

I'm developing printing solution for MS Office 2007. Office automation is not right for me, because it requires Office to be installed. Open XML Document Viewer is solution for converting Word files (.docx) to HTML format by XSLT transform, but it works only for .docx. Can the same technology be used for Excel spreadsheets files?

You could use this article XSL transformation of SpreadsheetML to HTML as a starting point to develop your own transform. You can also look at the open source XSLTs in OpenXML/ODF Translator Add-ins for Office to get some ideas on things you may need to account for in any conversion outside of OOXML. The one thing to keep in mind is that SpreadsheetML is more similiar to PresentationML than it is to WordprocessingML in file structure inside the package (i.e. for every sheet, there is a seperate file).
If your doing this from .NET, I'd do this from LINQ instead of XSLT. I've done transforms from DrawingML into SVG and Linq makes it easy (in terms of similiar functionality to XSLT, staying within .NET, etc.)

If you're looking at Excel 97-03 (xls) or Excel 2007 (xlsx) files then I'd recommend FlexCel. I've used it, is very good and honestly quite cheap compared to it's competition.
Note that it doesn't fully support all formatting present in Excel 2007 yet I don't think. But it does have built in functionality to export to HTML.

You could write a SpreadsheetML parser. The schema is available online from Microsoft.
I wrote one a while back that covered data, structure and basic formatting to throw it throw a library and re-save it as an XLS file. Wasn't too difficult.

Related

Office Open XML SDK - good Introduction?

I am writing a program which modifies POwerpoint files via Office automation. THis is painfully slow and error prone so I attempting to move some functionality to Office Open XML SDK.
I read the introductory texts from Microsoft, but I am lacking a good understand how this whole format works. I am especially interested in the boundary between Excel and Powerpoint - I am planning to update charts via Office Open Xml.
Here is a link to a downloadable copy of Open XML Explained.
For updating charts, this docx4j code may be of interest; it shows you how to do it using docx4j; worst case, you can translate each step to C#/Open XML SDK.

is there any way to write one code that works with all possible office documents?

I'm writing a program that modifies word documents. Currently I have used Microsoft.Office,Interop.Word to work with Word document and it requiers Microsoft Office to be installed on users computer, but some my clients don't have MS Office, but they have Open Office.
So, which library should I use instead of Interop?
and also how can I make my code to be able to work with different word files, not only .doc and .docx, but also with other office program files?
currently I'm writing different code for every type of the document..
My program translates the documents from its original language to another, so it is very important for me to keep the formatting of the document in original format, that's why I used Interop.. but also I want my program to be useful for as many people as possible
I think you are not mentioning but, are you assuming all your clients use the same version of Office. To solve the issue of the office versions, you may want to look at this open source project: NetOffice http://netoffice.codeplex.com/ and do all your .doc and .docx file formats development in using that library.
For the OpenOffice or LibreOffice, I believe the best you can do is going into the projects website and download the SDK. For example, go here: http://api.libreoffice.org/examples/examples.html and you will find some examples in Java, Python, C++ to edit Text Document including odt files.
LibreOffice SDK download here: http://www.libreoffice.org/download/
And finally, there is also the OpenXML format (mentioned on another answer) which is:
ECMA Office Open XML ("Open XML") is an international, open standard for word-processing documents, presentations, and spreadsheets that can be freely implemented by multiple applications on multiple platforms.
And you can download also its SDK here: http://msdn.microsoft.com/en-us/office/bb265236.aspx
Hope that helps.
You will likely end up writing separate code to work with each file type. There may be some similarities within, say, Office products, but for the most part you're going to need an adapter for each type.
However, you could (and should) minimize the amount of duplicate code by placing the translation logic and other non-type-specific functions in a shared library that each adapter would then reference.
We are using aspose words. This supports DOC, DOCX, RTF and OOXML.
But it's not free.

Transforming Excel 2010 documents?

I am interested in writing an application that will take in an excel document of a specific format, massage the data and create a new Excel document that has different formatting.
I am curious if anyone can recommend a good place to start on this.
My first thought was to write something my self in C#. I came across this tool on codeplex:
http://excelwrapperdotnet.codeplex.com/wikipage?title=Usage%20-%20Example&referringTitle=Documentation
But it appears to only be for Excel 2007.
Is there a best practice for doing this type of thing for Excel 2010 documents? Do I even need to program something custom to do this or does Excel offer something that might handle this?
Another nice library to modify Excel 2007/2010 documents (.xlsx) is EPPlus. It gives you a nice object model on your spreadsheets.
Excel files (.xslx) are archived XML files. They use 'Open XML', take a look here MICROSOFT Open XML
That should get you going on the right path.

Is there a way to generate word documents dynamically without having word on the machine

I am planning on generating a Word document on the webserver dynamically. Is there good way of doing this in c#? I know I could script Word to do this but I would prefer another option.
I've worked at a company in the past that really wanted generated word documents, in the end they were perfectly satisfied with RTF docs that had a ".doc" extension. Word has no problem recognizing and opening them.
The RTF docs were generated with iText.net (free .net library), the API is pretty easy to use, performs extremely well, you don't need word on the machine, also, you could extend to generating PDF, HTML, and Text docs in the future with very little effort. After four years the solution I created is still in place, so that's a little testimony in iText.net's favor.
It looks like the official iText page suggests that iText Sharp is the best .Net choice right now, so that's another option
You'd be better off generating an rtf file, which word will know how to open.
If want to generate Office 2007 documents check the Open XML File Formats, they're simple zipped XML files, check this links:
Open XML File Formats: What is it, and how can I get started?
Introducing the Office (2007) Open XML File Formats
Edit: Check this project, can serve you as a good starting point:
DocumentMaker
Seems very simple and customizable, look this code snippet:
Paragraph p = new Paragraph();
p.Runs.Add(new Run("Text can have multiple format styles, they can be "));
p.Runs.Add(new Run("bold and italic",
TextFormats.Format.Bold | TextFormats.Format.Italic));
doc.Paragraphs.Add(p);
Word will quite happily open a HTML with a .doc extension. If you include an internal style sheet, you can have it fully formatted. There was previous post on this subject:
Export to Word Document in C#
Creating the old .DOC files (pre-Word 2007) is nigh-impossible without Word itself. The format is just too complex. Microsoft has released the format description, but it's enough to reduce a grown programmer to tears. There is a reason for that too (historical), but that doesn't make things better.
The new .DOCX would be easier, although quite a bit of hassle still. However depending on which Word versions you are targeting, there are some other options too.
For one, there is the classic .RTF. The format is pretty complex still, yet well documented and has strong support across many applications and platforms. And you might use some string-replacing into template files to make things easier (it's non-binary).
Then there are the "old" Word XML files. I think they worked starting with Word XP. Kinda the predecessors of .DOCX. I've used them, not bad. And the documentation is pretty OK.
Finally, the easy way that I would choose, is to make a simple HTML. Word can load HTML files just fine starting with version 2000. In the simplest way just change the extension of a HTML file to .DOC and you have it. You can also add a few word-specific tags and comments to make it look even better in Word. Use the Word's Save As...HTML option to see what they are.
There are third party libraries about that will do the job.
Doing a quick google came up with this one, for example.
I haven't tried any, so I can't give you specific advice, I'm afraid!
Let us know how you get on...
In Office 2007 Microsoft introduced a new file format called the Microsoft Open Office XML Format (.docx). This format is not compatible with older versions of Microsoft Word. Since this is XML you can create or read with out having a Word installed.
Here is the component that generates document based on the custom template. The documents are generated from the sharepoint list ... so the data is pulled from the list item into the document on the fly:
http://store.sharemuch.com/products/generate-word-documents-from-sharepoint-list
Hope that helps,
Yaroslav Pentsarskyy
Blog: www.sharemuch.com

Do I need to have MS Word installed for creating Word documents from asp.net?

See title...
No.
You can use WordML (Word XML)
Word 2007 version
You can create Word 2007 documents using its XML format without the need of installing Word in your server.
This can be a starting point.
I've already +1'd Mitch's reply, but as an aside: Word isn't even supported for use in service applications; it is designed to be user-interactive. So installing Word, even if it worked, wouldn't leave you in a great place.
If you're just generating the documents from scratch the solutions so far proposed work well. My situation was that I had an existing template that I needed to use and substitute in my own text in a few places (mail merge, if you will). This was several years ago - prior to Office 2007 - but we ended up going with the Aspose library of components for this. I've used the Words and Cells (Excel) components to generate documents from templates and spreadsheets on the fly to download from web sites. The interfaces are a little clunky and can be inconsistent between the various products. The installer, frankly, is awful, but the products work pretty well and made it much easier to do what needed to be done.
Word recognizes rtf as intrinsic, and if your intended document can be constructed as whatever.rtf - which for all of its fancy formatting is plain ASCII markup - then you shd be able to write the document without Word installed.
To get the picture, create an example document and save it as an rtf file. Then view that file with an ascii text editor (like Notepad). You'll have to learn rtf syntax, but there's at least one handbook around on that.
AS
Just to add another potential solution for you, OfficeWriter is a Word/Excel API that lets you create documents and spreadsheets in ASP.NET without using Office:
http://www.officewriter.com

Categories