I need to work on opening a DOT (word document template) file, replace the fillers and save it as Document file.
On opening DOT file I am getting "Document File is Corrupted".
Is it possible to work with DOT file using OpenXML.
UPDATE: I am saving DOT file as XML (manually using "Save as.."). Renaming file back to dot, so it is XML file built on WordML. Still trying to open it through OpenXML giving me the same error.
OpenXML is used in the new formats of Office 2007 (.dotx and .docx).
WordML is a bit older and cannot be used with the OpenXML libraries.
You can load a WordML file as an XmlDocument or load it into a string if you know exactly what data you want to replace.
Related
I have some ole file with ole2 format in legacy system.
These are office word or excel & with embed object (e.g. picture) I think.
If I rename the file with docx or xlsx externsion, it will say file is corrupted.
Could I extract the ole file with some existing C# library? And save it as word or excel document?
OLEFileStructure PNG
NOTE:
The OlePres\d\d\d stream are embed ole object I think.
The Ole stream says it's a embed file not link.
The compObj stream indicate it's file type. e.g. Microsoft Word Document
For package type ole file, I have follow below blog to extract the file from ole10native stream successfully -- https://eigenein.wordpress.com/2011/08/03/how-to-extract-ole-attachment-body-from-ole10native-stream/
Updates: (Possible solution)
For old style, e.g. xls, doc, it could just rename the ole file to those extension and it works. But some of the file cannot be opened via MS Office, but it open successfully via Libre Office.
For new style, e.g. xlsx, docx. It could extract the Package stream and save as xlsx or docx. file.
For old style, e.g. xls, doc, it could just rename the ole file to those extension and it works.
But some of the file cannot be opened via MS Office, but it open successfully via Libre Office. So I use the Libre office command line tool to convert it with same format, e.g. soffice --convert-to docx *.docx --outdir ../Converted
Then it could be opened via MS Office.
For new style, e.g. xlsx, docx. It could extract the Package stream and save as xlsx or docx. file.
We are in the process of migrating the documents from AppXTender ( a EMC Documentum tool ) to another system.
I took a XLS file from AppXtender physical store ( the tool have renamed the .XLS file to .BIN ) and I knew its an excel file I tried renaming it to .XLS but the file is not opening as excel.
I learnt that the file is been modified by the AppXtender with some content like, "FFL 1.0 followed by the original file name.XLS".
When I open the excel in a notepad I could see this in the first line and there are couple of more lines with some text like "Embedded" and with some number.
If I manually remove those lines and save the file and still the file is not opening as excel.
What are these custom texts? how do i programmatically (C#) remove them from the excel file and restore as the proper excel file?
Thanks!
Karthik
I am saving as any file to .xlsx format. If the original file is not with .xlsx extension, it is throwing exception while trying to open it. The exception message is :-
Excel cannot open the file 'abc.xlsx because the file format or file
extension is not valid. Verify that the file has not been corrupted
and that the file extenstion matches the format of the file.
Whereas if conversion is in .xls format, I can open the converted file with warning message.
The file you are trying to open, abc.xls, is in a different format
than specified by the file extension. Verify that the file is not
corrupted and is from a trusted source before opening the file. Do you
want to open the file now?
I need to convert file to .xlsx format by C# code regardless its extenstion and open it by Excel 2010.
Thats not how file-types work!
You can not simply rename a file and it gets converted to a different type.
Renaming *.xls to *.xlsx works, because both are Excel files which can be opened by MS Excel, but for all other types (except some other which Excel can handle, like e.g. *.csv) you need to read the file and "manually" convert them.
To write *.xlsx using C# you can use e.g. EPPlus (NuGet).
You need a library that convert the files for you.
I see that you need to open files from Office 2003, so you need to use something like NPOI
Unfortunately, even if EPPLUS is a great library for Office's files, it only support the OXML Documents like .xlsx or .docx but not the .xls.
NPOI is a free opensource library to work with Office 2003->2010/3 files.
Here is the link
I have an application that download docx file from a service.
The problem is that the service creates the docx with the an old format (I think it used the open xml version 2.0).
I don't own the service so I can't change the creation process of the word , but I thought about building a convertor , that will open the downloaded files and recreate them in the newest format using the open xml sdk version 2.5.
I was optimistic when I thought this code will work ( A simple open and save) :
WordprocessingDocument wordprocessingDocument =
WordprocessingDocument.Open(filePath, true);
wordprocessingDocument.MainDocumentPart.Document.Save();
I'm not fimiliar with the open xml sdk , so any help will be appreciated.
You're just opening the doc and saving it with no changes. You need to change the Word version, then save it. This article should provide the info you need: http://blogs.msmvps.com/wordmeister/2013/01/18/openxmlsdk-word-compatibility-mode/
has anyone tried to insert pdf document into word using open xml sdk 2.0 ?
Thank You!
With the SDK there is a tool called DocumentReflector (in folder C:\Program Files\Open XML Format SDK\V2.0\tools). This tool opens an existing OpenXML document and generates the code that will produce this document.
Now you can create a simple document in Word with an embedded PDF and open this document using DocumentReflector. The code generated can then be a base for your document creation process.