Office Open XML SDK - good Introduction? - c#

I am writing a program which modifies POwerpoint files via Office automation. THis is painfully slow and error prone so I attempting to move some functionality to Office Open XML SDK.
I read the introductory texts from Microsoft, but I am lacking a good understand how this whole format works. I am especially interested in the boundary between Excel and Powerpoint - I am planning to update charts via Office Open Xml.

Here is a link to a downloadable copy of Open XML Explained.
For updating charts, this docx4j code may be of interest; it shows you how to do it using docx4j; worst case, you can translate each step to C#/Open XML SDK.

Related

How to get the same shape or picture in the whole powerpoint file and replace them?

I am using Visual Studio and developing an Office add-in. I need to identify same shapes and pictures and replace them.
I try to use OpenXml to do that but it doesn't seem to be able to be modified in files in use. It doesn't seem to work as an office add-in because it doesn't work with files that are already open.
And I have searched for many hours on the internet but not found a way to do that.
Help me please.
Thank you
TL; DR: OpenXML and Office add-ins (including VSTO) are competing technologies for different use cases that may lead to runtime issues if combined. Best to stick to just one.
I am ... developing a office add-in. I need to identify same shapes and pictures and replace them. I try to use OpenXml to do that but it doesn't seem to be able to be modified in files in use.
OpenXML is an API for creating/reading/modifying Office documents (Office 2010+ to be exact) that adhere to the contemporary XML document format such as Word 2010. It does so by manipulating the document directly at the file level rather than using COM. In fact it does not require Word to be installed on the machine at all! This makes OpenXML a rather light-weight approach to interacting with Office documents.
Unfortunately OpenXML (or other file-based approaches) are unsuitable for Office add-ins (VSTO or otherwise) if both are targeting the same document. This is because the document is already loaded into say Word and Word is hosting your add-in. Any attempt to modify the underlying file (including OpenXML by anything but Word or Word APIs) that represents the loaded document will encounter a:
sharing violation
To put this another way:
In order to run your Office add-in the Office app needs to be running first. This is known as hosting
Operations on Office documents orchestrated by your add-in require the document to be loaded into the Office app first
No external Windows process or in-process (add-in) operation can directly change the underlying Office document file whilst it is open in the Office app. (Add-ins can indirectly save to the file using Office APIs and ask the Office app to Save though generally such APIs don't expose any raw file interfaces to the caller so the later is probably not the same thing)
What to do?
My recommendation is to either:
a) use a pure OpenXML approach and discard the Office add-in or...
b) use a pure Office add-in (VSTO) approach and discard the OpenXML code
Considering that it seems you already have code for shapes and pictures via the OpenXML approach, perhaps option a) is the best.
See also
Open XML SDK
Visual Studio Tools for Office (VSTO)

How to edit docx file in C# using Microsoft.Office.Interop.Word

I need to replace user's meta tags #likethis# inside a docx file for a value in database. It was fine replacing simple strings editing the byte array of file directly. But it became more complex when I needed to load a table of data. So I had to try to use this lib but its documentation it's pretty poor.
I find on this reference how to replace bookmarks by values
https://social.msdn.microsoft.com/Forums/Lync/en-US/ed7278b1-1fc7-44d5-9e87-4c3e41a110cf/how-to-modify-bookmarked-fields-in-word-docx-file-from-code?forum=worddev
But there's a way to track down a string inside the text and replace it for any content (like other text, or a table or a image like a logo) ?
The Considerations for server-side Automation of Office article states the following:
Microsoft strongly recommends that developers find alternatives to Automation of Office if they need to develop server-side solutions. Because of the limitations to Office's design, changes to Office configuration are not enough to resolve all issues. Microsoft strongly recommends a number of alternatives that do not require Office to be installed server-side, and that can perform most common tasks more efficiently and more quickly than Automation. Before you involve Office as a server-side component in your project, consider alternatives.
Most server-side Automation tasks involve document creation or editing. Office 2007 supports new Open XML file formats that let developers create, edit, read, and transform file content on the server side. These file formats use the System.IO.Package.IO namespace in the Microsoft .NET 3.x Framework to edit Office files without using the Office client applications themselves. This is the recommended and supported method for handling changes to Office files from a service.
As a workaround you may consider using the Open XML SDK for open XML documents. Or just any third-party wrappers designed for the server-side execution (for example, Aspose).

is there any way to write one code that works with all possible office documents?

I'm writing a program that modifies word documents. Currently I have used Microsoft.Office,Interop.Word to work with Word document and it requiers Microsoft Office to be installed on users computer, but some my clients don't have MS Office, but they have Open Office.
So, which library should I use instead of Interop?
and also how can I make my code to be able to work with different word files, not only .doc and .docx, but also with other office program files?
currently I'm writing different code for every type of the document..
My program translates the documents from its original language to another, so it is very important for me to keep the formatting of the document in original format, that's why I used Interop.. but also I want my program to be useful for as many people as possible
I think you are not mentioning but, are you assuming all your clients use the same version of Office. To solve the issue of the office versions, you may want to look at this open source project: NetOffice http://netoffice.codeplex.com/ and do all your .doc and .docx file formats development in using that library.
For the OpenOffice or LibreOffice, I believe the best you can do is going into the projects website and download the SDK. For example, go here: http://api.libreoffice.org/examples/examples.html and you will find some examples in Java, Python, C++ to edit Text Document including odt files.
LibreOffice SDK download here: http://www.libreoffice.org/download/
And finally, there is also the OpenXML format (mentioned on another answer) which is:
ECMA Office Open XML ("Open XML") is an international, open standard for word-processing documents, presentations, and spreadsheets that can be freely implemented by multiple applications on multiple platforms.
And you can download also its SDK here: http://msdn.microsoft.com/en-us/office/bb265236.aspx
Hope that helps.
You will likely end up writing separate code to work with each file type. There may be some similarities within, say, Office products, but for the most part you're going to need an adapter for each type.
However, you could (and should) minimize the amount of duplicate code by placing the translation logic and other non-type-specific functions in a shared library that each adapter would then reference.
We are using aspose words. This supports DOC, DOCX, RTF and OOXML.
But it's not free.

Open word 2003 (doc) file using open xml file format API

I would like to know whether its possible to open Open word 2003 (doc) file using open xml file format API? like office 2007.
I have one windows service through which I am trying to open and edit doc files but getting lot of problems.I have posted question regarding that problem here but got no answer.
After lot of googling, I came across this page which tells about microsoft recommendations of Office automation on server-side code. Microsoft suggests that office automation should not be implemented in server-side code,as office applications are made for interactive client workstations. This page does not tell if its possible and how to open doc files using open xml format API.
Basically, I want one windows service which will take doc file as an input, open it, edit it and save it. How to achieve this?
My development enviornment : C#, .net 2.0 framework, Windows Vista, Office 2003
I think you might be missing the point of Office interop. Using Office interop basically means you communicate with a running Microsoft Office Word/Excel process and manipulate a document/spreadsheet in a defined manner. There is no need to directly modify a word document itself if you use Office interop.
If you wish to modify an Office .doc document directly without the presence of Microsoft Office, then your best bet would be to Google for a library that will directly manipulate .doc for you, although these libraries tend to be fairly buggy, and where they aren't buggy, they're expensive.
EDIT: If you're asking whether or not you can use Office 2007 interop to manipulate a .doc file, then the answer is yes.
You can't. Microsoft introduced office open xml standard from Word 2007. Word 2003 uses binary format.

Convert Open XML Excel files to HTML

I'm developing printing solution for MS Office 2007. Office automation is not right for me, because it requires Office to be installed. Open XML Document Viewer is solution for converting Word files (.docx) to HTML format by XSLT transform, but it works only for .docx. Can the same technology be used for Excel spreadsheets files?
You could use this article XSL transformation of SpreadsheetML to HTML as a starting point to develop your own transform. You can also look at the open source XSLTs in OpenXML/ODF Translator Add-ins for Office to get some ideas on things you may need to account for in any conversion outside of OOXML. The one thing to keep in mind is that SpreadsheetML is more similiar to PresentationML than it is to WordprocessingML in file structure inside the package (i.e. for every sheet, there is a seperate file).
If your doing this from .NET, I'd do this from LINQ instead of XSLT. I've done transforms from DrawingML into SVG and Linq makes it easy (in terms of similiar functionality to XSLT, staying within .NET, etc.)
If you're looking at Excel 97-03 (xls) or Excel 2007 (xlsx) files then I'd recommend FlexCel. I've used it, is very good and honestly quite cheap compared to it's competition.
Note that it doesn't fully support all formatting present in Excel 2007 yet I don't think. But it does have built in functionality to export to HTML.
You could write a SpreadsheetML parser. The schema is available online from Microsoft.
I wrote one a while back that covered data, structure and basic formatting to throw it throw a library and re-save it as an XLS file. Wasn't too difficult.

Categories