Compressing XML before writing file

Compressing XML before writing file - c#

I'm trying to save a large amount of data to a XML and the file ends up with a very large size. I've searched compression but all examples I found first write the file, then read it to compress to another file, ending with both the large and the compressed files, and the closest I got to removing the intermediate step of writing then reading, ended up with a zip containing an extension-less file(which I can open in notepad as a XML).
this is what I have now:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
using (FileStream outFile = File.Create(#"File.zip"))
{
using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
{
using (XmlWriter writer = XmlWriter.Create(Compress, settings))
{
//write the XML
}
}
}
How do I make the file inside the zip have the XML extension?

I think this might be a little misunderstanding of fundamentals. From what I know, GZip is a compression system, but not an archiving system. When working with UNIX systems, they tend to be treated as two separate things (whereas ZIP or RAR compression does both). Archiving puts a number of files in one file, and compression makes that file smaller.
Have you ever seen Unix packages that are downloaded as "filename.tar.gz"? That's generally the naming format - they took an archive file (filename.tar) and applied GZip compression to it (filename.tar.gz)
Actually, you're technically kind of causing a bit of confusion by naming your file ".zip" (which is a completely different, more commonly-used format). if you want to follow along with UNIX traditions, just name your file "file.xml.gz". If you want to archive it, use a Tar archiving library. Other libraries such as 7-zip's may have simpler compression systems that will do both for you, for instance if you want this file to be a .zip, easily read by people on Windows computers.

I think you have to write to a temp file first. Take a look at
DotNetPerls

Related

Create Uncompressed Zip in C#

I have a question according to the ZipArchive Library in System.IO.Compression.
I want to create an uncompressed .zip file. My code so far looks like this:
//Creates a "Deflate"-Mode file in the created zip.
using (FileStream fs = new FileStream(zippath, FileMode.OpenOrCreate))
using (ZipArchive zip = new ZipArchive(fs, ZipArchiveMode.Update))
{
var demoFile = zip.CreateEntry("foo0.txt", CompressionLevel.NoCompression); //NoCompression does not seem to have an impact
using (var streamWriter = new StreamWriter(demoFile.Open()))
{
streamWriter.Write("Bar!");
}
}
Thats creating me a zip file, where the file in it was written in "DEFLATE" Mode not in STORE. How can I fix this. My thought was, my problem would be solved by using the CompressionLevel.NoCompression.
Also writing the file to the filesystem and zipping the directory is not an option, because i want to create a zipfile with potentially hundred of thousands small files. Furthermore just using GZipStream is not an option, because I want to create a directory structure in the .zip file.
I checked the mode with 7-zip:
(screenshot from 7-zip)

If for whatever reason you are required to add contents to a ZIP file with its compression method explicitly set to STORE (no compression), you will need to use some third party library.
The .NET classes in System.IO.Compression use DEFLATE by default. There is no apparent way to change this and use another compression method or algorithm.
Providing CompressionLevel.NoCompression just tells the DEFLATE algorithm to work with the lowest compression rate1. In terms of file size, this will probably give you roughly the same end result, anyway.
Third party libraries supporting the method STORE include:
SharpCompress
(see supported formats)
SharpZipLib
(see compression methods)
DotNetZip
1 which should be... no compression. See DEFLATE's non-compressed blocks

For anyone who happens to see this topic later on, I would highly recommend the ZipStorer class by Jaime Olivares:
https://github.com/jaime-olivares/zipstorer
It's easy to add this code to a C# project (not a DLL), and it's easy to add files using 'store' instead of 'deflate'.

split and spanned not support error while zip file extraction in c#

While Zip file Extraction using ziparchival class in C#. It working fine for few files but files with size more than 1 GB it throwing "Split or spanned archives are not supported." I dont get what it mean. How to resolve it?

I've seen this error when you attempt to extract a Zip file that has become corrupt during a copy process.

DotNetZip allows you to do this. From their documentation:
The library supports zip passwords, Unicode, ZIP64, stream input and
output, AES encryption, multiple compression levels, self-extracting
archives, spanned archives, and more.

With Dotnetzip library you can zip even more than 4 gb. Also a lot easer to use then working with filestreams and bytearrays.
using (ZipFile zip = new ZipFile()) {
zip.CompressionLevel = CompressionLevel.BestCompression;
zip.UseZip64WhenSaving = Zip64Option.Always;
zip.BufferSize = 65536*8; //buffer size
foreach (var file in filenames) {
zip.AddFile(file);
}
zip.Save(outpath);
}

gzip format streaming

Wondering if there's someone here with some experience with gzip format. I have a very large gzip file that I need to parse. However, I may only need a small portion of the decompressed text file. Is is possible to stream this zip file without decompressing the entire file?
Anyone experience with gzip?

You do realise that you can stream using standard java library classes right? It's quite trivial, something like:
GZIPInputStream stream = new GZIPInputStream(new FileInputStream("some_file.gz"));
BufferedReader reader = new BufferedReader(stream);
// Now read line by line... till you hit the content you want.
The entire file is not decompressed on the disk, just chunks as you need it in memory. And you can optionally re-compress and write back out again using the corresponding output streams.

Compression issue with large archive of files in DotNetZip

Greetings....
I am writing a backup program in c# 3.5, using hte latest DotNetZip. The basics of the program is to be given a location on a server and the max size of a spanned zip file and go. From there it should traverse all the folder/files from the given location and add them to the archive, keeping the exact structure. It should also compress everything down to a reasonable amount. A given uncompressed collection of folders/files could easily be 10-25gb, with the created spanned files being limited to about 1gb each.
I have everything working (using DotNetZip). My only challenge is there is little to no compession actually happening. I chose to use the "AddDirectory" method for simplicity of code and just generally how well it seemed to fit my project. After reading around I am second guessing that decision.
Given the below code and the large amount of files in an archive, should I compress each file as it is added to the zip? or should the Adddirectory method provide about the same compression?
I have tried every level of compression offered by Ionic.Zlib.CompressionLevel and none seem to help. Should I think about using an outside compression algorithm and stream it into my DotNetZip file?
using (ZipFile zip = new ZipFile())
{
zip.AddDirectory(root.FullName);
if (zipPassword.Length > 0)
zip.Password = zipPassword;
float size = zipGbSize * 1024 * 1024 * 1024;
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.AddProgress += new EventHandler<AddProgressEventArgs>(Zip_AddProgress);
zip.ZipError += new EventHandler<ZipErrorEventArgs>(Zip_ZipError);
zip.Comment = "This zip was created at " + System.DateTime.Now.ToString("G");
zip.MaxOutputSegmentSize = (int)size; //in gig
zip.Name = archiveDir.FullName + #"\Task_" + taskId.ToString() + ".zip";
zip.Save();
}
Thank you for any help!

1.Given the below code and the large amount of files in an archive, should I compress each file as it is added to the zip?
The way DotNetZip works is to compress each file as it is added to the archive. Your app does not need to do compression. DotNetZip does this for you.
or should the Adddirectory method provide about the same compression?
Entries added to a zip file via the AddDirectory() method go through the same code path when the zip archive is written, as entries added via AddFile(). The file data is compressed, then optionally encrypted, then written to the zip file.
an unsolicited tip: you don't need to do:
zip.AddProgress += new EventHandler<AddProgressEventArgs>(Zip_AddProgress);
you can just do:
zip.AddProgress += Zip_AddProgress;
how are you determining that no compression is occurring?
If you are curious about the compression on each entry, you can register a SaveProgress event handler. The SaveProgress event is fired at various times during the writing of an archive, including when saving begins, when DotNetZip begins writing the data for one entry, at various intervals during the writing of one entry, after finishing writing the data for each entry, and after finishing writing all data. These stages and described in the ZipProgressEventType enumeration. When the EventType is Saving_AfterWriteEntry, you can calculate the compression ratio for THAT particular entry.
To verify that compression is not occurring, I'd suggest that you register such a SaveProgress event and look at that compression ratio.
Also, as described above, some file types cannot be compressed. JPG, MPG, MP3, ZIP files, and others are not very compressible.
Finally, doing a backup may be lots easier to do if you just use the DotNetZip command-line tool. If all you want to do is backup a particular directory, you could use the command line tool (zipit.exe) and avoid writing a program. With the zipit.exe tool, if you use the -v option, the tool prints progress reports, and will display the compression for each entry, via the mechanism I described above. Even if you prefer to write your own program, you might consider using zipit.exe to verify that compression is, or is not, occuring when you use DotNetZip.

Im not sure to have understated your question, but the maximum size for any zip file its 4Gb. Maybe you have to create a new ZipFile every time you reach that limit.
Sorry if that doesnt help you.

What sort of data are you compressing? Some sorts of data just doesn't compress very well, for example JPEGs, or ZIP files which are already compressed.

Unzip file while reading it

I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with .NET to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.
So there would be no need for extra space on my drive. Any ideas or suggestions?

Yes. Zip is a streamed format which means that you can use the data as you decompress it rather than having to decompress everything first.
With .net's System.IO.Compression classes you can apply similar compression as used in zip files (Deflate & GZip) to any stream you like, but if you want to work with actual zip format files you'll need a third party library like this one (sharpziplib).

A better solution might be to keep the files decompressed on the drive, but turn on compression on the file system level. This way you'll just be reading CSV files, and the OS will take care of making sure it doesn't take too much space.
Anyhoo, to answer your question, maybe the GZipStream class can help you.

sharpziplib allows for stream-based decompression - see this related question - the item provides similar stream-based Read methods, so you can process each item like you would with any stream.

I'm not sure about zip files, but you could use GZ format with GZipSteam (works like any other input stream). Unfortunately, the entire System.IO.Compression namespace is only 2 classes (the other does DEFLATE).
EDIT: There's a class called ZipPackage. I'm not sure how if it will let you do decompression streaming, but it might be worth looking into.
Also, take a look at #ziplib.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.