Are System.IO.Compression.GZipStream or System.IO.Compression.Deflate compatible with zlib compression?
I ran into this issue with Git objects. In that particular case, they store the objects as deflated blobs with a Zlib header, which is documented in RFC 1950. You can make a compatible blob by making a file that contains:
Two header bytes (CMF and FLG from RFC 1950) with the values 0x78 0x01
CM = 8 = deflate
CINFO = 7 = 32Kb window
FCHECK = 1 = checksum bits for this header
The output of the C# DeflateStream
An Adler32 checksum of the input data to the DeflateStream, big-endian format (MSB first)
I made my own Adler implementation
public class Adler32Computer
{
private int a = 1;
private int b = 0;
public int Checksum
{
get
{
return ((b * 65536) + a);
}
}
private static readonly int Modulus = 65521;
public void Update(byte[] data, int offset, int length)
{
for (int counter = 0; counter < length; ++counter)
{
a = (a + (data[offset + counter])) % Modulus;
b = (b + a) % Modulus;
}
}
}
And that was pretty much it.
DotNetZip includes a DeflateStream, a ZlibStream, and a GZipStream, to handle RFC 1950, 1951, and 1952. The all use the DEFLATE Algorithm but the framing and header bytes are different for each one.
As an advantage, the streams in DotNetZip do not exhibit the anomaly of expanding data size under compression, reported against the built-in streams. Also, there is no built-in ZlibStream, whereas DotNetZip gives you that, for good interop with zlib.
From MSDN about System.IO.Compression.GZipStream:
This class represents the gzip data format, which uses an industry standard algorithm for lossless file compression and decompression.
From the zlib FAQ:
The gz* functions in zlib on the other hand use the gzip format.
So zlib and GZipStream should be interoperable, but only if you use the zlib functions for handling the gzip-format.
System.IO.Compression.Deflate and zlib are reportedly not interoperable.
If you need to handle zip files (you probably don't, but someone else might need this) you need to use SharpZipLib or another third-party library.
I've used GZipStream to compress the output from the .NET XmlSerializer and it has worked perfectly fine to decompress the result with gunzip (in cygwin), winzip and another GZipStream.
For reference, here's what I did in code:
FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write);
using (GZipStream gzStream = new GZipStream(fs, CompressionMode.Compress))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
serializer.Serialize(gzStream, myData);
}
Then, to decompress in c#
FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (Stream input = new GZipStream(fs, CompressionMode.Decompress))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
myData = (MyDataType) serializer.Deserialize(input);
}
Using the 'file' utility in cygwin reveals that there is indeed a difference between the same file compressed with GZipStream and with GNU GZip (probably header information as others has stated in this thread). This difference, however, seems to not matter in practice.
gzip is deflate + some header/footer data, like a checksum and length, etc. So they're not compatible in the sense that one method can use a stream from the other, but they employ the same compression algorithm.
They just compressing the data using zlib or deflate algorithms , but does not provide the output for some specific file format. This means that if you store the stream as-is to the hard drive most probably you will not be able to open it using some application (gzip or winrar) because file headers (magic number, etc ) are not included in stream an you should write them yourself.
Starting from .NET Framework 4.5 the System.IO.Compression.DeflateStream class uses the zlib library.
From the class's MSDN article:
This class represents the Deflate algorithm, which is an industry-standard algorithm for lossless file compression and decompression. Starting with the .NET Framework 4.5, the DeflateStream class uses the zlib library. As a result, it provides a better compression algorithm and, in most cases, a smaller compressed file than it provides in earlier versions of the .NET Framework.
I agree with andreas. You probably won't be able to open the file in an external tool, but if that tool expects a stream you might be able to use it. You would also be able to deflate the file back using the same compression class.
Related
I have an uncompressed file that has been compressed with Zlib in Python and would like to compress it with Ionic.Zlib in c# and get the exact same compressed output file as Python's.
I was using System.IO.compression and DeflateStream at first, but the result was nowhere near identical. I am now using the Ionic.Zlib library for c# and getting closer to my goal, but some bytes are still different (or a lot of them, depending on the file.
This is the Python code:
import zlib
def compress():
with open("compressedFile.dat", "wb") as compressedFile:
with open("fileToCompress.txt", "rb") as fileToCompress:
data = fileToCompress.read()
compressedData = zlib.compress(data, 9)
compressedFile.write(compressedData)
and this is what I wrote in c# to try to get the same compressed output file:
using System;
using System.IO;
using Ionic.Zlib;
static class myClass{
static void compress(){
BinaryWriter compressedFile = new BinaryWriter(new FileStream("compressedFile.dat", FileMode.Create));
var compressedData = Ionic.Zlib.ZlibStream.CompressBuffer(File.ReadAllBytes("fileToCompress.txt"));
compressedFile.Seek(0, SeekOrigin.Begin);
compressedFile.Write(compressedData);
}
static void Main(){
compress();
}
}
Compression level is the same (9) and the compression header (first 2 bytes) are identical in both compressed files (78 DA). Next 3 bytes seem to be identical as well (EC 7D 0B) and then the rest really depends on the input uncompressed file... The first one I am trying to compress only has 2 bytes that are different among the 4 last bytes: **6E A5** 55 53 (Python) vs **6C 02** 55 53 (c#).
Thank you!
EDIT: SOLVED
For anyone who would like to know how to get the exact same compression as Python's Zlib.compress in c#, use zlibnet.
Get zlibnet.dll from one of zlibnet's releases and use ZLibNet.ZLibStream(<output stream>, CompressionMode.Compress, CompressionLevel.Level9) change the CompressionLevel.Level9 to the one used in Python.
Example:
MemoryStream memoryStream = new MemoryStream();
using(var compressor = new ZLibNet.ZLibStream(memoryStream, CompressionMode.Compress, CompressionLevel.Level9)){
fileStream.CopyTo(compressor);
compressor.Close();
}
You need only write memoryStream to a file now.
For anyone who would like to know how to get the exact same compression as Python's Zlib.compress in c#, use zlibnet.
Get zlibnet.dll from one of zlibnet's releases and use ZLibNet.ZLibStream(<output stream>, CompressionMode.Compress, CompressionLevel.Level9) change the CompressionLevel.Level9 to the one used in Python.
Example:
MemoryStream memoryStream = new MemoryStream();
using(var compressor = new ZLibNet.ZLibStream(memoryStream, CompressionMode.Compress, CompressionLevel.Level9)){
fileStream.CopyTo(compressor);
compressor.Close();
}
You need only write memoryStream to a file now.
I'm currently reading from a sqlite DB.
I am having trouble with one column thou ..
The data in that column is compressed.
In Java we can use Zlib and we can read that data easily.
data = zlib.decompress(row[3])
I see that Xamarin does not translate zlib in it's IDE and has no standard built in alternative ..
Ive seen some Zip components available but are concentrated on files rather than just feeding data directly ..
How would you do this in Xamarin C# ?
EDIT : Code So Far
var myCompressedData = cursor.GetBlob (cursor.GetColumnIndexOrThrow("TextCompressed"));
byte[] myCompressedByte = myCompressedData ;
MemoryStream stream = new MemoryStream(myCompressedByte );
using (DeflateStream decompressionStream = new DeflateStream(stream , CompressionMode.Decompress))
{
decompressionStream.Read(myCompressedByte , 0,myCompressedByte.Length );
}
string UnCompressedString = System.Text.Encoding.UTF8.GetString(myCompressedByte );
Somehow I'm getting a "{System.IO.IOException: Corrupted data ReadInternal at System.IO.Compression.DeflateStreamNative.Ch…} System.IO.IOException"
This Exception hits on
decompressionStream.Read(myCompressedByte , 0,myCompressedByte.Length );
You can achive this by using the DeflateStream class.
It wraps zlib, or in older versions provides a standard built-in alternative.
This class represents the Deflate algorithm, which is an industry-standard algorithm for lossless file compression and decompression. Starting with the net_v45, the DeflateStream class uses the zlib library.
I'm trying to decompress data compressed with zlib algorithm in C# using 2 most legitimate libraries compatible with zlib algorithm and I got similar exception thrown.
Using DotNetZip:
Ionic.Zlib.ZlibException: Bad state (invalid stored block lengths)
Using Zlib.Net:
inflate: invalid stored block lenghts
but using same data as input to zlib-flate command on linux using only default parameters, works great and decompressed without any warnings (output is correct):
zlib-flate -uncompress < ./dbgZlib
Any suggestions what I can do in order to decompress this data in C# or why actually decompression failing in this case?
Compressed data as hex:
root#localhost:~# od -t x1 -An ./dbgZlib |tr -d '\n '
789c626063520b666060606262d26160d05307329999e70a6400e93c2066644080cf8c938c0c0c4d0d0d0d2d839c437c02dcfd0c0c0c11d28ea121013e7e41860ce18e210640e06810141669c080051840012eb970d790800090f99eee409ea189025e806c8e8b5354a89b13d81c136ca60f3a000e5fd6af0fb14a3221873e96400506374cd6c7d52dc8d98980657e7e06460ace0a4ce86e80da9f0249030edf816c16481ab06b60404f03931169c0cdc728c0db0fd928681a3042a481480347336c6e21320d78fb8155195a9090067ca3420387771a400a546aa70100000000ffff
Compressed data as base64:
root#localhost:~# base64 ./dbgZlib
eJxiYGNSC2ZgYGBiYtJhYNBTBzKZmecKZADpPCBmZECAz4yTjAwMTQ0NDS2DnEN8Atz9DAwMEdKO
oSEBPn5BhgzhjiEGQOBoEBQWacCABRhAAS65cNeQgACQ+Z7uQJ6hiQJegGyOi1NUqJsT2BwTbKYP
OgAOX9avD7FKMiGHPpZABQY3TNbH1S3I2YmAZX5+BkYKzgpM6G6A2p8CSQMO34FsFkgasGtgQE8D
kxFpwM3HKMDbD9koaBowQqSBSANHM2xuITINePuBVRlakJAGfKNCA4d3GkAKVGqnAQAAAAD//w==
Data after decompression, encoded with base64 look like this:
root#localhost:~# zlib-flate -uncompress < ./dbgZlib | base64
AAYCJlMAAAACAgIsAAAuJwAAAAMDnRBoAAAAbgAAAAEAAAAAAAAAAAAAAPMBkjIwMTUxMTE5UkNU
TFBHTjAwMQAAAAAAAAAAAABBVVRQTE5SMQBXQVQwMDAwQTBSVlkwAAAAAAAAAAAAAAAAAAAAAAAA
AAAwMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMFdFVFBQTFBHTklHMDAwMTQgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAwMAAAAAAAAAAAAABEQlpVRkIAAAAAMDQAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAX14QAAAAAAAAAA
AAAAAAAAAAAAAAAAAAIBAAAAAAAAAAAAAABBVDAwMDBBMFJWWTAAAAAAAAAAUExOAAAAAAAAAABM
RUZSQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATk4wMiBDIAIAAAAAAAAAAAAAAAAA
AAABBfXhAAAAAGQCAgIsAABA9wAAAAQDnRBoAAA+gAAAAAEAAAAAAAAAAAAAAPMBkzIwMTUxMTE5
UkNGTDJQS04AAAAAAAAAAAAAAABBVVRQTE5SMgBXQVQwMDAwQTBZMEE2AAAAAAAAAAAAAAAAAAAA
AAAAAAAwMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMFdFVFBQTFBLTjAwMDAwMTggICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAwMAAAAAAAAAAAAABETVpVUUIAAAAAMDQAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAX14QAAAAAA
AAAAAAAAAAAAAAAAAAAAAAIBAAAAAAAAAAAAAABBVDAwMDBBMFkwQTYAAAAAAAAAUExOAAAAAAAA
AABMRUZSQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATk4wMiBDIAIAAAAAAAAAAAAA
AAAAAAABBfXhAAAAAGQ=
The problem is that you are using zlib-flate as a general-purpose compression algorithm which, according to the manpage for it, you should not do:
This program should not be used as a general purpose compression tool.
Use something like gzip(1) instead.
So perhaps you should follow the instructions given by your tools and not use them for things that they are not intended for. Use gzip and the System.IO.Compression.GZipStream instead, it's much simpler, especially when you're looking for cross-platform compatible compression algorithms.
That said...
The reason that you can't inflate the data is that it lacks a correct GZIP header. If you add the right header to it you will get something that can be decompressed.
For instance:
public static byte[] DecompressZLibRaw(byte[] bCompressed)
{
byte[] bHdr = new byte[] { 0x1F, 0x8b, 0x08, 0, 0, 0, 0, 0 };
using (var sOutput = new MemoryStream())
using (var sCompressed = new MemoryStream())
{
sCompressed.Write(bHdr, 0, bHdr.Length);
sCompressed.Write(bCompressed, 0, bCompressed.Length);
sCompressed.Position = 0;
using (var decomp = new GZipStream(sCompressed, CompressionMode.Decompress))
{
decomp.CopyTo(sOutput);
}
return sOutput.ToArray();
}
}
Adding the header makes all the difference.
NB: There are two bytes in the 10-byte GZIP header that are not stripped from your source. These are normally used to store the compression flags and the source file system. In the compressed data you present they are invalid values. Additionally the file footer is abbreviated to 5 bytes instead of 8 bytes... all of which is not actually required for decompression. Which probably has a lot to do with why the manpage says not to use this for general compression.
The stream you provided is not complete. It appears that you ended it with a Z_SYNC_FLUSH or Z_FULL_FLUSH in your C# code, instead of a Z_FINISH like you're supposed to. That is causing the error. If you terminate the stream properly, you won't have a problem.
zlib-flate is simply ignoring that error.
If you are not in control of the generation of the stream, you can still use zlib to decompress what's there. You just need to use it at a lower level where you operate on blocks of data and get the decompressed data available given the provided input.
I'm working on EBICS protocol and i want to read a data in an XML File to compare with another file.
I have successfull decode data from base64 using
Convert.FromBase64String(OrderData); but now i have a byte array.
To read the content i have to unzip it. I tried to unzip it using Gzip like this example :
static byte[] Decompress(byte[] data)
{
using (var compressedStream = new MemoryStream(data))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}
}
But it does not work i have an error message :
the magic number in gzip header is not correct. make sure you are passing in a gzip stream
Now i have no idea how i can unzip it, please help me !
Thanks !
The first four bytes provided by the OP in a comment to another answer: 0x78 0xda 0xe5 0x98 is the start of a zlib stream. It is neither gzip, nor zip, but zlib. You need a ZlibStream, which for some reason Microsoft does not provide. That's fine though, since what Microsoft does provide is buggy.
You should use DotNetZip, which provides ZlibStream, and it works.
Try using SharpZipLib. It copes with various compression formats and is free under the GPL license.
As others have pointed out, I suspect you have a zip stream and not gzip. If you check the first 4 bytes in a hex view, ZIP files always start with 0x04034b50 ZIP File Format Wiki whereas GZIP files start with 0x8b1f GZIP File Format Wiki
I think I finally got it - as usual the problem is not what is in the title. Luckily I've noticed the word EBICS in your post. So, according to EBICS spec the data is first compressed, then encrypted and finally base64 encoded. As you see, after decoding base64 you need first to decrypt the data and then try to unzip it.
UPDATE: If that's not the case, it turns out from the EBICS spec Chapter 16 Appendix: Standards and references that ZIP refers to zlib/deflate format, so all you need to do is to replace GZipStream with the DeflateStream
The MSDN documentation tells me the following:
The GZipStream class uses the gzip
data format, which includes a cyclic
redundancy check value for detecting
data corruption. The gzip data format
uses the same compression algorithm as
the DeflateStream class.
It seems GZipStream adds some extra data to the output (relative to DeflateStream). I'm wondering, in what type of a scenario would it be essential to use GZipStream and not DeflateStream?
Deflate is just the compression algorithm. GZip is actually a format.
If you use the GZipStream to compress a file (and save it with the extension .gz), the result can actually be opened by archivers such as WinZip or the gzip tool. If you compress with a DeflateStream, those tools won't recognize the file.
If the compressed file is designed to be opened by these tools, then it is essential to use GZipStream instead of DeflateStream.
I would also consider it essential if you're transferring a large amount of data over an unreliable medium (i.e. an internet connection) and not using an error-correcting protocol such as TCP/IP. For example, you might be transmitting over a serial port, raw socket, or UDP. In this case, you would definitely want the CRC information that is embedded in the GZip format in order to ensure that the data is correct.
GZipStream is the same as DeflateStream but it adds some CRC to ensure the data has no error.
Well, i was completely wrong in my first answer. I have looked up in Mono source code and found that GZipStream class actually redirects its read/write(and almost any other) calls to an appropriate calls of methods of an internal DeflateStream object:
public override int Read (byte[] dest, int dest_offset, int count)
{
return deflateStream.Read(dest, dest_offset, count);
}
public override void Write (byte[] src, int src_offset, int count)
{
deflateStream.Write (src, src_offset, count);
}
The only difference, is that it always creates a DeflateStream object with a gzip flag set to true.
This is certainly not an answer to you question, but maybe it'll help a bit.
While GZipStream seems to be using DeflateStream to do decompression, the two algorithms don't seem to be interchangeable. Following test code will give you an exception:
MemoryStream wtt=new MemoryStream();
using (var gs=new GZipStream(wtt,CompressionMode.Compress,true))
{
using (var sw=new StreamWriter(gs,Encoding.ASCII,1024,true))
{
sw.WriteLine("Hello");
}
}
wtt.Position = 0;
using (var ds = new DeflateStream(wtt, CompressionMode.Decompress, true))
{
using (var sr=new StreamReader(ds,Encoding.ASCII,true,1024,true))
{
var txt = sr.ReadLine();
}
}
Dito as per Aaronaught
Note one other important difference as per
http://www.webpronews.com/gzip-vs-deflate-compression-and-performance-2006-12:
I measured the DeflateStream to 41% faster than GZip.
I didn't measure the speed, but I measured the file size to be appx. the same.