I have an .odt wordprocessing file, to be processed with libre office or Word, and I need to replace a bunch (20+) of strings in the text with other text.
I know an .odt file is really a .zip file, containing .xml files and that i need to access content.xml.
Do I unzip the content.xml to a stream, deserialize that and use LINQ or something?
Or is there an easier way, using some ready-made library?
If you're using .Net 4.5 you can make use of the new System.IO.Compression namespace. There are a couple articles out there on how to do it. Here's one http://www.codeguru.com/csharp/.net/zip-and-unzip-files-programmatically-in-c.htm
which I've found useful.
Related
I have an application which allows uploading a ppt/pptx file. I want to convert the presentation file to equivalent XML format.
A pptx file is essentially a zip file, renamed to pptx. If you rename and extract the content you can find the xml document.
With ppt you have more problem as it is proprietary to Microsoft and may not even be publicly documented. Office automation would most probably work, but rather complicated.
Is there any way for me to read contents of an xml file in a cab file in C#? I know how to use XDocument to load an xml file and read its contents, but not sure if it is possible to read an xml file that is zipped up in a cab file.
Any ideas?
What you are looking to do is to extract the contents of the CAB file first. You can either write the code to do that yourself or use a 3rd party library.
I have not used this personally, but I have seen it mentioned several times on this site and others: http://www.codeproject.com/KB/files/CABCompressExtract.aspx
To take a stab at writing it yourself, refer to the documentation here: http://msdn.microsoft.com/en-us/library/cc483132%28EXCHG.80%29.aspx
I have a single file, Setup1.cab, which is split up into Setup1.zip.001 and Setup1.zip.002 that I used 7zip to archive. Once those volumes reach their destination, I'd like to be able to use C# to extract that file from both archives into the same directory where they will reside. Is this something that SharpZipLib is capable of, or should I be using another tool?
Otherwise, is there a way to combine the two using C# (or another tool - I'm open!) into one zip file, THEN extract it using SharpZipLib?
Thanks!
EDIT: 7zip will not be installed on the destination machines. Also, I'm open to using a different method of archiving the original file; I just need it to be in chunks of under 500MB, and the original file is 570MB.
I would take a look at the SevenZipSharp library and actually use 7zip via C# to handle the decompression.
I need to create a script that extracts some data from a complex Excel 2003 file (with multiple sheets and different tables inside a single sheet) and produces different XML files that need to be validated against a given XSD file.
My preferred language is Python;
to create and validate XML files i would go with lxml.
What do you suggest for parsing XLS files?
Is xlrd the right tool to use for complex Excel files?
Or do i need to convert all the sheets in CSV manually, and read files line by line, splitting and getting data?
I accept C#, VB6, VBA suggestions too.
[disclaimer: I'm the author of xlrd]
xlrd is quite suited for this kind of job. Get the latest version from PyPI. Get the flavour from the tutorial found here. XLSX support is in alpha test; e-mail me if you need it. The awkwardness and lossiness of the save-as-CSV approach was one of the things that prompted me to write xlrd.
Xlrd is OK. We use it extensively to import XLS files full of references and formulas with multiple sheets and data presented in custom (not Latin-1) encoding.
I am convinced the most simple solution for this task is using Excel VBA together with MSXML parser. Look here for some links how to use the MSXML parser in VBA for reading XML files; you can adopt this easily for writing XML files, I think.
I cant answer whether xlrd/python is the right tool for the job - as I don't know python well enough.
But there are many ways to access the excel data...in the main you have VBA built directly in to Excel.
Then you have Ado.net See David Hayden's article here which allows you to access the data via any DotNet language...even IronPython
As we all know .epub is a collection of files. Does anyone have an idea how can we read all that files embed in .epub runtime using C#?
The ePub specification supports two formats, a collection of files or a package of files. Most epub's use the packaging. The package is simply a ZIP file with a renamed extension.
The specification can be found here. The OEBPS Container wraps around an ePub version of the Open Packaging Format.
The simplest way to read the content is to unzip the files and look at the xhtml files that were embedded within it.
It is a zip file so how about using the Compression namespace to read the contents. Haven't use it but I'm sure this namespace exposes classes to read zip files as a stream.
I found EPUB Sharp. Unfortunately, not released yet.
http://epubsharp.sourceforge.net/
You have to use the gitden reader or you can use iBook if you are using iOS.
Free online ePub reader focusing on the social aspects of reading. Now closed, but the concept has moved to: http://www.readups.com/ per: http://www.bookglutton.com/
Source: Wikipedia
Supports EPUB 2 and EPUB 3. Books not readable directly on computers other than Macs.