Compression issue with large archive of files in DotNetZip

Compression issue with large archive of files in DotNetZip - c#

Greetings....
I am writing a backup program in c# 3.5, using hte latest DotNetZip. The basics of the program is to be given a location on a server and the max size of a spanned zip file and go. From there it should traverse all the folder/files from the given location and add them to the archive, keeping the exact structure. It should also compress everything down to a reasonable amount. A given uncompressed collection of folders/files could easily be 10-25gb, with the created spanned files being limited to about 1gb each.
I have everything working (using DotNetZip). My only challenge is there is little to no compession actually happening. I chose to use the "AddDirectory" method for simplicity of code and just generally how well it seemed to fit my project. After reading around I am second guessing that decision.
Given the below code and the large amount of files in an archive, should I compress each file as it is added to the zip? or should the Adddirectory method provide about the same compression?
I have tried every level of compression offered by Ionic.Zlib.CompressionLevel and none seem to help. Should I think about using an outside compression algorithm and stream it into my DotNetZip file?
using (ZipFile zip = new ZipFile())
{
zip.AddDirectory(root.FullName);
if (zipPassword.Length > 0)
zip.Password = zipPassword;
float size = zipGbSize * 1024 * 1024 * 1024;
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.AddProgress += new EventHandler<AddProgressEventArgs>(Zip_AddProgress);
zip.ZipError += new EventHandler<ZipErrorEventArgs>(Zip_ZipError);
zip.Comment = "This zip was created at " + System.DateTime.Now.ToString("G");
zip.MaxOutputSegmentSize = (int)size; //in gig
zip.Name = archiveDir.FullName + #"\Task_" + taskId.ToString() + ".zip";
zip.Save();
}
Thank you for any help!

1.Given the below code and the large amount of files in an archive, should I compress each file as it is added to the zip?
The way DotNetZip works is to compress each file as it is added to the archive. Your app does not need to do compression. DotNetZip does this for you.
or should the Adddirectory method provide about the same compression?
Entries added to a zip file via the AddDirectory() method go through the same code path when the zip archive is written, as entries added via AddFile(). The file data is compressed, then optionally encrypted, then written to the zip file.
an unsolicited tip: you don't need to do:
zip.AddProgress += new EventHandler<AddProgressEventArgs>(Zip_AddProgress);
you can just do:
zip.AddProgress += Zip_AddProgress;
how are you determining that no compression is occurring?
If you are curious about the compression on each entry, you can register a SaveProgress event handler. The SaveProgress event is fired at various times during the writing of an archive, including when saving begins, when DotNetZip begins writing the data for one entry, at various intervals during the writing of one entry, after finishing writing the data for each entry, and after finishing writing all data. These stages and described in the ZipProgressEventType enumeration. When the EventType is Saving_AfterWriteEntry, you can calculate the compression ratio for THAT particular entry.
To verify that compression is not occurring, I'd suggest that you register such a SaveProgress event and look at that compression ratio.
Also, as described above, some file types cannot be compressed. JPG, MPG, MP3, ZIP files, and others are not very compressible.
Finally, doing a backup may be lots easier to do if you just use the DotNetZip command-line tool. If all you want to do is backup a particular directory, you could use the command line tool (zipit.exe) and avoid writing a program. With the zipit.exe tool, if you use the -v option, the tool prints progress reports, and will display the compression for each entry, via the mechanism I described above. Even if you prefer to write your own program, you might consider using zipit.exe to verify that compression is, or is not, occuring when you use DotNetZip.

Im not sure to have understated your question, but the maximum size for any zip file its 4Gb. Maybe you have to create a new ZipFile every time you reach that limit.
Sorry if that doesnt help you.

What sort of data are you compressing? Some sorts of data just doesn't compress very well, for example JPEGs, or ZIP files which are already compressed.

Related

Create Uncompressed Zip in C#

I have a question according to the ZipArchive Library in System.IO.Compression.
I want to create an uncompressed .zip file. My code so far looks like this:
//Creates a "Deflate"-Mode file in the created zip.
using (FileStream fs = new FileStream(zippath, FileMode.OpenOrCreate))
using (ZipArchive zip = new ZipArchive(fs, ZipArchiveMode.Update))
{
var demoFile = zip.CreateEntry("foo0.txt", CompressionLevel.NoCompression); //NoCompression does not seem to have an impact
using (var streamWriter = new StreamWriter(demoFile.Open()))
{
streamWriter.Write("Bar!");
}
}
Thats creating me a zip file, where the file in it was written in "DEFLATE" Mode not in STORE. How can I fix this. My thought was, my problem would be solved by using the CompressionLevel.NoCompression.
Also writing the file to the filesystem and zipping the directory is not an option, because i want to create a zipfile with potentially hundred of thousands small files. Furthermore just using GZipStream is not an option, because I want to create a directory structure in the .zip file.
I checked the mode with 7-zip:
(screenshot from 7-zip)

If for whatever reason you are required to add contents to a ZIP file with its compression method explicitly set to STORE (no compression), you will need to use some third party library.
The .NET classes in System.IO.Compression use DEFLATE by default. There is no apparent way to change this and use another compression method or algorithm.
Providing CompressionLevel.NoCompression just tells the DEFLATE algorithm to work with the lowest compression rate1. In terms of file size, this will probably give you roughly the same end result, anyway.
Third party libraries supporting the method STORE include:
SharpCompress
(see supported formats)
SharpZipLib
(see compression methods)
DotNetZip
1 which should be... no compression. See DEFLATE's non-compressed blocks

For anyone who happens to see this topic later on, I would highly recommend the ZipStorer class by Jaime Olivares:
https://github.com/jaime-olivares/zipstorer
It's easy to add this code to a C# project (not a DLL), and it's easy to add files using 'store' instead of 'deflate'.

packaging files to be read without need for extraction

In my current project i'm dealing with a huge number of files (over tens of milliard files with low volume-between 1 and 30 KB) as resources which copying them for my customer is time consuming job. i'm searching for a packaging mechanism that can help me to package each 1000 or 10000 one of them into a single file,resulting more copy speed because in that case i'm dealing with much less count of files; and also reading them from my application should not need any extraction and also no compression while i'm writing or changing them (because of the performance and nature of application which is distributed and resources are being shared between clients),I have searched and i know about following ZIP libraries:
SharpZipLib
DotNetZip
System.IO.Packaging
But seems above libraries need to be -at least- iterated through files to access a file in the zip or package without extraction. i need to access the files via their address (folder structure hierarchy) in the zip or package file! following links are similar questions which are answered via Iterating through the zip file:
how-to-read-data-from-a-zip-file-without-having-to-unzip-the-entire-file
content-inside-zip-file
Has anyone any idea or solution about this issue?
By the way,i'm coding in C# and the project is windows form-based.

I would do my own Package Format. With GZipStream or something else. For each files, you compress them with GZipStream, after you get the bytes values and you need to create a header in your Package Format which contains for each files (name, starting position and length). With this data in your header, that will probably by at the beginning of your package. You can get the information for your wanted file and after you just seek to the position of the compressed data, you get the byte array with the specified length.
But if you modify one files, you will need to recalculate all index after the modified files.

Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?

I have a zip file that contains folder hierarchies and files.
\images\
\images\1.jpg
\images\2.jpg
\something\something\a.exe
\something\something\b.exe
1.txt
I need to decompress the contents of this zip file to a location. I also need to preserve the structure of the zip file.
I've read about .NET's GZipStream and DeflateStream but I am of the opinion that it is too "complicated" for my purpose.
I've also used DotNetZip and SharpZipLib in the past for personal projects but since this is work related and I'm working at a huge company, I would have a hard time convincing legal to use these libraries.
Question:
Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?
PS: I've also read this but I think it's hacky because you'll need to produce another executable just to hide the progress dialog.
Thanks!

Check out if Ionic Zip helps?

DotNetZip would do what you want, but I understand your concerns about legal approval.
On a side note, It might be good for you to navigate the legal jungle associated with getting an open-source library approved for use in the company, just to understand what's involved. But I'll leave that up to you.
Getting back to rolling your own...
DotNetZip is pretty full featured, and it handles a number of scenarios you probably don't care about. Like Unicode filenames and comments, setting windows timestamps and permissions of extracted files, getting timestamps of zip files created on old unix systems, split archives, Encrypted archives, files over 2gb, or self-extracting archives, etc etc etc. Many zip files use none of those things.
Also DotNetZip does eventing and zip updates and zip creation - all the code associated with these things is probably not of interest to you, if you confine yourself just to the requirements you described in your question.
You could, though, grab the DotNetZip code and use it to help you roll your own solution. If you constrain yourself to JUST reading zip files and not dealing with all the possible special cases, the zip format is not difficult to parse.
here's how to do it:
open the zip file using new FileStream() or File.Open. You want a FileStream object.
Read 4 bytes. Verify that it is the zip-entry-header descriptor. (0x04034b50)
In the file, the order you will find these bytes is 50 4b 03 04.
if you find a match, you're in business.
at offset 14 is a 4-byte CRC. Get it. (Same byte ordering as above)
at offset 18 - the 4-byte length of the compressed blob. get it. (N)
at offset 22 - the 4-byte length of the UNcompressed blob. get it. (U)
at 26 - the 2-byte length of the filename. get it (L)
at 28 - the 2-byte length of the "extra field". get it (E)
Beyond the extra field, at offset 30, is the actual filename. read L bytes for the filename, and call System.Text.Encoding.ASCII.GetString(). The result will include a directory path, with the backslashes replaced with slashes (unix style). String.Replace() the slashes.
after the filename comes the extra field - seek E bytes to get beyond it. You can mostly ifgnore it. This is where the compressed data starts.
Open a System.IO.DeflateStream() on the zip FileStream, using CompressionMode.Decompress, and using the current offset of the FileStream as input. open a new FileStream, for output, with the file path you read in step 3. in a loop, call inflater.Read(). and output.Write(), to write the decompressed output of the DeflateStream to a filesystem file with the correct name. You will need to stop reading from the DeflateStream when you read exactly U (uncompressed) bytes.
Check the uncompressed size (U) against the data you actually wrote out from the DeflateStream (after compression). They should match.
If you are fancy, you can check the CRC of the output against what was in the header.
go to step 2, to look for the next entry in the file.
The most complicated part is step 3. Working code for that is easily found in this source module, look for the ReadHeader method.

Maybe the full features set of GZipStream it's a bit complicated, but note that the sample in the msdn page it's exactly what you need. I mean this msdn web (the 4.0 version) not the one you supply in the question.
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx#Y2750

Unzip file while reading it

I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with .NET to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.
So there would be no need for extra space on my drive. Any ideas or suggestions?

Yes. Zip is a streamed format which means that you can use the data as you decompress it rather than having to decompress everything first.
With .net's System.IO.Compression classes you can apply similar compression as used in zip files (Deflate & GZip) to any stream you like, but if you want to work with actual zip format files you'll need a third party library like this one (sharpziplib).

A better solution might be to keep the files decompressed on the drive, but turn on compression on the file system level. This way you'll just be reading CSV files, and the OS will take care of making sure it doesn't take too much space.
Anyhoo, to answer your question, maybe the GZipStream class can help you.

sharpziplib allows for stream-based decompression - see this related question - the item provides similar stream-based Read methods, so you can process each item like you would with any stream.

I'm not sure about zip files, but you could use GZ format with GZipSteam (works like any other input stream). Unfortunately, the entire System.IO.Compression namespace is only 2 classes (the other does DEFLATE).
EDIT: There's a class called ZipPackage. I'm not sure how if it will let you do decompression streaming, but it might be worth looking into.
Also, take a look at #ziplib.

Best process for auto-zipping of multiple MP3s

I've got a project which requires a fairly complicated process and I want to make sure I know the best way to do this. I'm using ASP.net C# with Adobe Flex 3. The app server is Mosso (cloud server) and the file storage server is Amazon S3. The existing site can be viewed at NoiseTrade.com
I need to do this:
Allow users to upload MP3 files to
an album "widget"
After the user has uploaded their
album/widget, I need to
automatically zip the mp3 (for other
users to download) and upload the
zip along with the mp3 tracks to
Amazon S3
I actually have this working already (using client side processing in Flex) but this no longer works because of Adobe's flash 10 "security" update. So now I need to implement this server-side.
The way I am thinking of doing this is:
Store the mp3 in a temporary folder
on the app server
When the artist "publishes" create a
zip of the files in that folder
using a c# library
Start the amazon S3 upload process (zip and mp3s)
and email the user when it is
finished (as well as deleting the
temporary folder)
The major problem I see with this approach is that if a user deletes or adds a track later on I'll have to update the zip file but the temporary files will not longer exist.
I'm at a loss at the best way to do this and would appreciate any advice you might have.
Thanks!

The bit about updating the zip but not having the temporary files if the user adds or removes a track leads me to suspect that you want to build zips containing multiple tracks, possibly complete albums. If this is incorrect and you're just putting a single mp3 into each zip, then StingyJack is right and you'll probably end up making the file (slightly) larger rather than smaller by zipping it.
If my interpretation is correct, then you're in luck. Command-line zip tools frequently have flags which can be used to add files to or delete files from an existing zip archive. You have not stated which library or other method you're using to do the zipping, but I expect that it probably has this capability as well.

MP3's are compressed. Why bother zipping them?

I would say it is not necessary to zip a compressed file format, you are only gong to get a five percent reduction in filesize, give or take a little. Mp3's dont really zip up by their nature the have compressed most of the possible data already.

DotNetZip can zip up files from C#/ASP.NET. I concur with the prior posters regarding compressibility of MP3s. DotNetZip will automatically skip compression on MP3, and just store the file, just for this reason. It still may be interesting to use a zip as a packaging/archive container, aside from the compression.
If you change the zip file later (user adds a track), you could grab the .zip file from S3, and just update it. DotNetZip can update zip files, too. But in this case you would have to pay for the transfer cost into and out of S3.
DotNetZip can do all of this with in-memory handling of the zips - though that may not be feasible for large archives with lots of MP3s and lots of concurrent users.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.