I am looking for a solution for reading (and possibly writing) custom properties of Office documents (both old and new formats) without resorting to Office automation.
I have found Dsofile.dll which seems to work good for old formats but chokes on new ones with a "class not registered". KB remarks say that a certain "Office Compatibility Pack" needs to be installed for this to work but I am really looking for an out-of-box solution.
I am not searching for a solution that reads (and writes) custom properties without Office installed. Actually, I am considering Office to be a prerequisite. It is just that I want a solution that does not require Office automation for simple custom property handling.
There is a "Microsoft Office Metadata Handler" Windows Explorer Shell Extension that shows/manages custom properties for Office documents pretty much the way I want to do it. With Dsofile.dll I seem to have one half of the solution by covering old Office formats.
DSOFile is what to use for the binary formats.
For the newer formats, you can just use XML (Open XML SDK is a fine choice, but you can also just access the DOCX/XLSX/PPTX file formats with System.IO.Packaging in .NET if you don't want to be all that heavy handed with yet-another-dll). See this article for accessing and setting properties: Manipulating Word 2007 Files with the Open XML Format API (Part 2 of 3)
Related
The project in which I'm involved has a requirement of generating Word documents.
Initially, I used Interop to achieve this. But since Interop requires the clients to have Office installed on their machines, I switched over to OpenXML instead.
Interop has a nice method for getting Content Controls by title "SelectContentControlsByTitle", is there any alternative to this in OpenXML?
Cheers!
I'm currently developing application using WinRT/C#. It is second version, first was developed on WPF. In application I need to generate some reports and export them into MS Word document.
In first version of application I used MS Office Interop to export reports in MS Word, but in WinRT there is no support of MS Office Interop. Is there any simple way to create MS Word document in WinRT? (I know there are third party libraries like Syncfusion for WinRT but I would prefer to not use them).
You won't be able to use MS Office interop from a Windows Store app. You could use Open XML SDK, though. It is also available on NuGet and seems to be WinRT compatible.
Using it won't be as easy as working with interop classes and you'll only be able to create XML based docx files, not binary doc files. On the other hand your users won't need to have Word installed and they'll be able to open the files in other Office suites like OpenOffice or LibreOffice. There's a set of tutorials available on MSDN to get you started.
There is a free solution that consiste to format your document in rtf.
For this i create a new librairy to help a developper to create a document compatible with word.
you can find this there : https://github.com/crogun/WinRTF-For-WinRT
the code is open source, and you can extend it if you want.
I want to access to office 2003 files (.doc, .xls and .ppt) in order to extract text and some metadata (number of words, number of sheets, pictures, template, etc.). I'm able to do it with Open XML SDK for office 2007 documents. However, this extracton will take place on a server, which can't have apps like Microsoft Office installed (that's the reason why I can't use Office's Interop).
I have tried NPOI, however actually it only supports .xls files. The other libraries that I found are not open-source, I can't use it on my work... I downloaded NPOI Scratchpad but the code is very "raw", I can't use it on my work.
Do you have any other idea to get the text and metadata from office 2003 documents? I'm not a very experienced programmer, and I'm using C# (However, if there is any solution to this problem in C++ I could consider to use it). Thanks.
There are many libraries like:
ClosedXML (Office 2007)
EPPlus (Office 2007)
Aspose cells(Office 97-2010, I use this one)
I don't know any free libraries supporting office 2003 format.
good luck
I'm getting the "Microsoft.ACE.OLEDB.12.0 provider is not registered" error on my asp.net application while I try to read an Excel file and after an exhaustive research on the web I just found myself in a dead-end. The only available solutions is to install the MS Component to achieve the objective. But there is a little inconvenient (as always), because of our costumer politics we can not install nothing but the application. And that's the real problem here. So I'm wondering if there is a way (an alternative way) to avoid the component installment. If not well I think we will have a little issue with the client but nothing that we can not solve. But lets try to avoid that uncomfortable part.
you do not need to install excel you should just be able to install the drivers.
2007 Office System Driver: Data Connectivity Components
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=23734
Take a look at EPPlus.
I've used it for writing Excel files but it is also capable of reading Excel files too. It uses Open XML and it's easy to work with. Also, you do not need to install an OLEDB engine on the client machine.
You havn't specified which version of the Excel format you want to read. You can use the OpenXML SDK to read (and write) the newer formats. This SDK does not require that Excel is installed. Actually you don't need to install anything besides the assemblies you reference in your project.
The SDK is much more powerful than the OLEDB provider but it probably also requires more effort to use.
Is there a way to remove the metadata information of MS Word files or Image files programmatically using C# or a Windows batch command?
The manual way to remove those information is to right click a file in the windows explorer and selecting 'Properties'>'Details'>'Remove Properties and Personal Information'.
It ain't easy, at least not to get it all.
You might look at the metadata removal package called Metadact by Litera (formerly Softwise).
There are several others out on the market too.
If you want to do it yourself, first, you'll need to decide on what you consider "metadata".
Some is pretty easy to get to using the Word Object Model (Interop from C# or VB).
Some can't be accessed via Word, so you'll need to use the Structured Storage API to get at it (Like last 10 authors).
If you're talking about DOCX files, you can use the OpenXML SDK to get at all the packages inside the file. then use XML to navigate and edit out the bits you don't want.
Going that way, though, it's MUCH harder to remove "metadata" in the content of the document, because you'll have to deal with internal Word structures like RUNs, and change tracking stuff.
Thanks!
I think I found way to remove (or add) meta information to office documents. There is a Microsoft article here: The Dsofile.dll files lets you edit Office document properties when you do not have Office installed (KB 224351)
The Dsofile.dll sample file is an in-process ActiveX component for
programmers that use Microsoft Visual Basic .NET or the Microsoft .NET
Framework. You can use this in your custom applications to read and to
edit the OLE document properties that are associated with Microsoft
Office files, such as the following:
Microsoft Excel workbooks
Microsoft PowerPoint presentations
Microsoft Word documents Microsoft
Project projects Microsoft Visio drawings
Other files that are saved in the OLE Structured Storage format
The Dsofile.dll sample file is
written in Microsoft Visual C++. The Dsofile.dll sample file
demonstrates how to use the OLE32 IPropertyStorage interface to access
the extended properties of OLE structured storage files. The component
converts the data to Automation friendly data types for easier use by
high level programming languages such as Visual Basic 6.0, Visual
Basic .NET, and C#. The Dsofile.dll sample file is given with full
source code and includes sample clients written in Visual Basic 6.0
and Visual Basic .NET 2003 (7.1).