I have a text file that I want to compress after it gets an specified size. I've already seen GZipStream which works great, but RAR compression is much better.
I've been looking for a library that can compress a file with RAR (I really don't care about extracting or uncompressing), but I couldn't find one yet.
As the RAR compression algorithm isn't free (only the decompression algorithm is), you won't find a library for it (or have to purchase a license).
A good alternative is the LZMA SDK that delivers the compression algorithms used in 7-Zip.
For a compression ratio/speed comparison, you can have a look e.g. at the Maximum Compression summary page, ranks 50 and 52, comparing WinRAR 4.01 in "Best Solid" mode and 7-Zip 9.22 in "Ultra" mode. WinRAR compresses only slightly better (<1%) and faster, 7-Zip decompresses faster.
Note that, as peachykeen noted, if you look at the efficiency ratings instead of size, WinRAR in normal mode is much faster than 7-Zip.
Related
I'm currently using Docotic PDF library to write a compression program for a PDF file server hosting large scanned documents. (Intention is to get the smallest size in black and white that maintains a readable document- mostly legal briefs)
In testing I notice that certain files will respond better to JPEG compression while others respond better to Group3Fax or Flate. Is it possible to analyze the file and make an intelligent decision on which algorithm will produce the smallest PDF or would I actually have compress each file with all three algorithms and choose the smallest - which is incurs a ton of additional CPU overhead.
Any guidance is greatly appreciated. Thanks
I have a lot of plain-text content (English). I have a C# tool for creating the content, and it will be consumed in an Android app.
I need, therefore, to know my options for compression algorithms. What library can I use to compress/decompress, where I can compress in C# and decompress in Java?
I'm looking at probably 1-2MB of uncompressed text (at least), so it's definitely worth it to compress it.
You should be able to zip in C# using something like this and unzip with this. GZIP format should do the trick.
I was trying compress jpeg files (say 16 files) together using C#. I did successfully created a tar file and finally a tar.gz (using C# GZipStream class). But the problem with my solution is that the gzip pass increased for 37% the size of the tar file (so a compression ratio of 137%). I tried to manually compress the files together using winrar and it gave me a reduction of 10% in the size (compress ratio of 90%).
I believe that my problem is with GZipStream. I think I should go for another kind of compression (or compressor?!), do you have any idea/suggestion of compression to use.
The framework's compression routines don't always go a great job.
I would recommend trying DotNetZip to compress this. My experience is that the compression (even Gzip) there is much closer to other software, and far smaller than the framework classes. This is also nice in that it requires nearly no code changes from the framework's GzipStream class if you want to use their GzipStream implementation.
I have a zip file that contains folder hierarchies and files.
\images\
\images\1.jpg
\images\2.jpg
\something\something\a.exe
\something\something\b.exe
1.txt
I need to decompress the contents of this zip file to a location. I also need to preserve the structure of the zip file.
I've read about .NET's GZipStream and DeflateStream but I am of the opinion that it is too "complicated" for my purpose.
I've also used DotNetZip and SharpZipLib in the past for personal projects but since this is work related and I'm working at a huge company, I would have a hard time convincing legal to use these libraries.
Question:
Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?
PS: I've also read this but I think it's hacky because you'll need to produce another executable just to hide the progress dialog.
Thanks!
Check out if Ionic Zip helps?
DotNetZip would do what you want, but I understand your concerns about legal approval.
On a side note, It might be good for you to navigate the legal jungle associated with getting an open-source library approved for use in the company, just to understand what's involved. But I'll leave that up to you.
Getting back to rolling your own...
DotNetZip is pretty full featured, and it handles a number of scenarios you probably don't care about. Like Unicode filenames and comments, setting windows timestamps and permissions of extracted files, getting timestamps of zip files created on old unix systems, split archives, Encrypted archives, files over 2gb, or self-extracting archives, etc etc etc. Many zip files use none of those things.
Also DotNetZip does eventing and zip updates and zip creation - all the code associated with these things is probably not of interest to you, if you confine yourself just to the requirements you described in your question.
You could, though, grab the DotNetZip code and use it to help you roll your own solution. If you constrain yourself to JUST reading zip files and not dealing with all the possible special cases, the zip format is not difficult to parse.
here's how to do it:
open the zip file using new FileStream() or File.Open. You want a FileStream object.
Read 4 bytes. Verify that it is the zip-entry-header descriptor. (0x04034b50)
In the file, the order you will find these bytes is 50 4b 03 04.
if you find a match, you're in business.
at offset 14 is a 4-byte CRC. Get it. (Same byte ordering as above)
at offset 18 - the 4-byte length of the compressed blob. get it. (N)
at offset 22 - the 4-byte length of the UNcompressed blob. get it. (U)
at 26 - the 2-byte length of the filename. get it (L)
at 28 - the 2-byte length of the "extra field". get it (E)
Beyond the extra field, at offset 30, is the actual filename. read L bytes for the filename, and call System.Text.Encoding.ASCII.GetString(). The result will include a directory path, with the backslashes replaced with slashes (unix style). String.Replace() the slashes.
after the filename comes the extra field - seek E bytes to get beyond it. You can mostly ifgnore it. This is where the compressed data starts.
Open a System.IO.DeflateStream() on the zip FileStream, using CompressionMode.Decompress, and using the current offset of the FileStream as input. open a new FileStream, for output, with the file path you read in step 3. in a loop, call inflater.Read(). and output.Write(), to write the decompressed output of the DeflateStream to a filesystem file with the correct name. You will need to stop reading from the DeflateStream when you read exactly U (uncompressed) bytes.
Check the uncompressed size (U) against the data you actually wrote out from the DeflateStream (after compression). They should match.
If you are fancy, you can check the CRC of the output against what was in the header.
go to step 2, to look for the next entry in the file.
The most complicated part is step 3. Working code for that is easily found in this source module, look for the ReadHeader method.
Maybe the full features set of GZipStream it's a bit complicated, but note that the sample in the msdn page it's exactly what you need. I mean this msdn web (the 4.0 version) not the one you supply in the question.
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx#Y2750
All,
I have a requirement to Compress an XML file. At the moment I am using C# and the gzip algorithm in the .NET Classes. I does compress it but not at the rate I would like to.
For example a 12MB file was compressed to a little less than 4MB.
Is there any other way to compress it more than that? Speed of compression / decompression is not very important.
Thanks,
M
ZIP compression is well suited for compressing XML data. In .NET you best rely on third party libraries:
DotNetZip
SharpZipLib
You may try 7zip.
7-zip has an SDK.
Use the client version of 7-zip to try different compression settings to find the one with best compression for your particular data set.
This website compared different compression libraries against large amount of text data. 7-zip is also included. I hope that this helps you to choose correct library that matches your requirements.
Take a look at System.IO.Packaging.ZipPackage in WindowsBase. It's the .NET framework code behind the DOCX & XLSX file formats and these are more or less zipped XML files. You can zip multiple files of any format together, not just XML.