Compressing XML File - c#

All,
I have a requirement to Compress an XML file. At the moment I am using C# and the gzip algorithm in the .NET Classes. I does compress it but not at the rate I would like to.
For example a 12MB file was compressed to a little less than 4MB.
Is there any other way to compress it more than that? Speed of compression / decompression is not very important.
Thanks,
M

ZIP compression is well suited for compressing XML data. In .NET you best rely on third party libraries:
DotNetZip
SharpZipLib

You may try 7zip.

7-zip has an SDK.
Use the client version of 7-zip to try different compression settings to find the one with best compression for your particular data set.

This website compared different compression libraries against large amount of text data. 7-zip is also included. I hope that this helps you to choose correct library that matches your requirements.

Take a look at System.IO.Packaging.ZipPackage in WindowsBase. It's the .NET framework code behind the DOCX & XLSX file formats and these are more or less zipped XML files. You can zip multiple files of any format together, not just XML.

Related

C#/Android Compatible Compression Algorithm

I have a lot of plain-text content (English). I have a C# tool for creating the content, and it will be consumed in an Android app.
I need, therefore, to know my options for compression algorithms. What library can I use to compress/decompress, where I can compress in C# and decompress in Java?
I'm looking at probably 1-2MB of uncompressed text (at least), so it's definitely worth it to compress it.
You should be able to zip in C# using something like this and unzip with this. GZIP format should do the trick.

Compress multiple jpeg files together in C#

I was trying compress jpeg files (say 16 files) together using C#. I did successfully created a tar file and finally a tar.gz (using C# GZipStream class). But the problem with my solution is that the gzip pass increased for 37% the size of the tar file (so a compression ratio of 137%). I tried to manually compress the files together using winrar and it gave me a reduction of 10% in the size (compress ratio of 90%).
I believe that my problem is with GZipStream. I think I should go for another kind of compression (or compressor?!), do you have any idea/suggestion of compression to use.
The framework's compression routines don't always go a great job.
I would recommend trying DotNetZip to compress this. My experience is that the compression (even Gzip) there is much closer to other software, and far smaller than the framework classes. This is also nice in that it requires nearly no code changes from the framework's GzipStream class if you want to use their GzipStream implementation.

Unzip file while reading it

I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with .NET to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.
So there would be no need for extra space on my drive. Any ideas or suggestions?
Yes. Zip is a streamed format which means that you can use the data as you decompress it rather than having to decompress everything first.
With .net's System.IO.Compression classes you can apply similar compression as used in zip files (Deflate & GZip) to any stream you like, but if you want to work with actual zip format files you'll need a third party library like this one (sharpziplib).
A better solution might be to keep the files decompressed on the drive, but turn on compression on the file system level. This way you'll just be reading CSV files, and the OS will take care of making sure it doesn't take too much space.
Anyhoo, to answer your question, maybe the GZipStream class can help you.
sharpziplib allows for stream-based decompression - see this related question - the item provides similar stream-based Read methods, so you can process each item like you would with any stream.
I'm not sure about zip files, but you could use GZ format with GZipSteam (works like any other input stream). Unfortunately, the entire System.IO.Compression namespace is only 2 classes (the other does DEFLATE).
EDIT: There's a class called ZipPackage. I'm not sure how if it will let you do decompression streaming, but it might be worth looking into.
Also, take a look at #ziplib.

C# Primer or example on working with files on a granular level

Can someone provide an example or primer on working with files on a granular level with C#. Let's assume that I want to build a new program to compress and zip files. Can I write a program like this, with C#, that gets down to the bits & bytes level?
You can surely read files byte by byte by using FileStream-class and the StreamReader-class. BinaryReader is even more granular. Having the bytes you will have the possibilty to work on bit-base with the bit-operators (|, &, <<, >>).
Examples can be found at the posted links.
P.S: You could use SharpZipLib or the Compression classes of .Net to compress files.
I do not know of a good tutorial but BinaryReader is a good place to start for reading on the "bits & bytes" level
You can download the free #ziplib library including C# source code here:
http://www.icsharpcode.net/opensource/sharpziplib/
That should show you how to zip files at least.

Compress a file with GZipStream while maintaining its meta-data

How can I get the extension of compressed file after being compressed with System.IO.Compression.GZipStream?
For example, if the original file is named test.doc and compresses to test.gz, how do I know what file extension to use when decompressing?
There is no way to get the file name - in fact there may never be a filename at all, if for example a piece of data is created in memory and then send over a network connection.
Instead of replacing the file extension, why not append it, for example: test.doc.gz
Then you can simply strip it off when decompressing.
I had to do this some time ago. The solution is to use the J# libraries to do it. You still write it in C# however.
http://msdn.microsoft.com/en-us/magazine/cc164129.aspx
That's microsofts answer on the topic.
Not sure what is your question- I assume you want a mechanism to "remember" what the extension was before the compression took place?
If that is the question then the convention of test.doc compressing into test.doc.gz will work.
The test.gz is just a raw byte stream with no meta-data about what has been compressed (for example, original file name, extension etc). What you'd need to do is create an archive that contains the gzip stream and meta-data about each file contained in the archive.
The article linked to in Mech Software's answer provides a pretty reasonable way to implement this.
There was also this question (vaguely related) asked some time back which may help out:
How to compress a directory with the built in .net compression classes?

Categories