Unzip Byte Array in C# - c#

I'm working on EBICS protocol and i want to read a data in an XML File to compare with another file.
I have successfull decode data from base64 using
Convert.FromBase64String(OrderData); but now i have a byte array.
To read the content i have to unzip it. I tried to unzip it using Gzip like this example :
static byte[] Decompress(byte[] data)
{
using (var compressedStream = new MemoryStream(data))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}
}
But it does not work i have an error message :
the magic number in gzip header is not correct. make sure you are passing in a gzip stream
Now i have no idea how i can unzip it, please help me !
Thanks !

The first four bytes provided by the OP in a comment to another answer: 0x78 0xda 0xe5 0x98 is the start of a zlib stream. It is neither gzip, nor zip, but zlib. You need a ZlibStream, which for some reason Microsoft does not provide. That's fine though, since what Microsoft does provide is buggy.
You should use DotNetZip, which provides ZlibStream, and it works.

Try using SharpZipLib. It copes with various compression formats and is free under the GPL license.
As others have pointed out, I suspect you have a zip stream and not gzip. If you check the first 4 bytes in a hex view, ZIP files always start with 0x04034b50 ZIP File Format Wiki whereas GZIP files start with 0x8b1f GZIP File Format Wiki

I think I finally got it - as usual the problem is not what is in the title. Luckily I've noticed the word EBICS in your post. So, according to EBICS spec the data is first compressed, then encrypted and finally base64 encoded. As you see, after decoding base64 you need first to decrypt the data and then try to unzip it.
UPDATE: If that's not the case, it turns out from the EBICS spec Chapter 16 Appendix: Standards and references that ZIP refers to zlib/deflate format, so all you need to do is to replace GZipStream with the DeflateStream

Related

System.Text.Encoding.Default.GetBytes fails

Here is my sample code:
CodeSnippet 1: This code executes in my file repository server and returns the file as encoded string using the WCF Service:
byte[] fileBytes = new byte[0];
using (FileStream stream = System.IO.File.OpenRead(#"D:\PDFFiles\Sample1.pdf"))
{
fileBytes = new byte[stream.Length];
stream.Read(fileBytes, 0, fileBytes.Length);
stream.Close();
}
string retVal = System.Text.Encoding.Default.GetString(fileBytes); // fileBytes size is 209050
Code Snippet 2:
Client box, which demanded the PDF file, receives the encoded string and converts to PDF and save to local.
byte[] encodedBytes = System.Text.Encoding.Default.GetBytes(retVal); /// GETTING corrupted here
string pdfPath = #"C:\DemoPDF\Sample2.pdf";
using (FileStream fileStream = new FileStream(pdfPath, FileMode.Create)) //encodedBytes is 327279
{
fileStream.Write(encodedBytes, 0, encodedBytes.Length);
fileStream.Close();
}
Above code working absolutely fine Framework 4.5 , 4.6.1
When I use the same code in Asp.Net Core 2.0, it fails to convert to Byte Array properly. I am not getting any runtime error but, the final PDF is not able to open after it is created. Throws error as pdf file is corrupted.
I tried with Encoding.Unicode and Encoding.UTF-8 also. But getting same error for final PDF.
Also, I have noticed that when I use Encoding.Unicode, atleast the Original Byte Array and Result byte array size are same. But other encoding types are mismatching with bytes size also.
So, the question is, System.Text.Encoding.Default.GetBytes broken in .NET Core 2.0 ?
I have edited my question for better understanding.
Sample1.pdf exists on a different server and communicate using WCF to transmit the data to Client which stores the file encoded stream and converts as Sample2.pdf
Hopefully my question makes some sense now.
1: the number of times you should ever use Encoding.Default is essentially zero; there may be a hypothetical case, but if there is one: it is elusive
2: PDF files are not text, so trying to use an Encoding on them is just... wrong; you aren't "GETTING corrupted here" - it just isn't text.
You may wish to see Extracting text from PDFs in C# or Reading text from PDF in .NET
If you simply wish to copy the content without parsing it: File.Copy or Stream.CopyTo are good options.

Zlib compression incompatibile C vs C# implementations

I'm trying to decompress data compressed with zlib algorithm in C# using 2 most legitimate libraries compatible with zlib algorithm and I got similar exception thrown.
Using DotNetZip:
Ionic.Zlib.ZlibException: Bad state (invalid stored block lengths)
Using Zlib.Net:
inflate: invalid stored block lenghts
but using same data as input to zlib-flate command on linux using only default parameters, works great and decompressed without any warnings (output is correct):
zlib-flate -uncompress < ./dbgZlib
Any suggestions what I can do in order to decompress this data in C# or why actually decompression failing in this case?
Compressed data as hex:
root#localhost:~# od -t x1 -An ./dbgZlib |tr -d '\n '
789c626063520b666060606262d26160d05307329999e70a6400e93c2066644080cf8c938c0c0c4d0d0d0d2d839c437c02dcfd0c0c0c11d28ea121013e7e41860ce18e210640e06810141669c080051840012eb970d790800090f99eee409ea189025e806c8e8b5354a89b13d81c136ca60f3a000e5fd6af0fb14a3221873e96400506374cd6c7d52dc8d98980657e7e06460ace0a4ce86e80da9f0249030edf816c16481ab06b60404f03931169c0cdc728c0db0fd928681a3042a481480347336c6e21320d78fb8155195a9090067ca3420387771a400a546aa70100000000ffff
Compressed data as base64:
root#localhost:~# base64 ./dbgZlib
eJxiYGNSC2ZgYGBiYtJhYNBTBzKZmecKZADpPCBmZECAz4yTjAwMTQ0NDS2DnEN8Atz9DAwMEdKO
oSEBPn5BhgzhjiEGQOBoEBQWacCABRhAAS65cNeQgACQ+Z7uQJ6hiQJegGyOi1NUqJsT2BwTbKYP
OgAOX9avD7FKMiGHPpZABQY3TNbH1S3I2YmAZX5+BkYKzgpM6G6A2p8CSQMO34FsFkgasGtgQE8D
kxFpwM3HKMDbD9koaBowQqSBSANHM2xuITINePuBVRlakJAGfKNCA4d3GkAKVGqnAQAAAAD//w==
Data after decompression, encoded with base64 look like this:
root#localhost:~# zlib-flate -uncompress < ./dbgZlib | base64
AAYCJlMAAAACAgIsAAAuJwAAAAMDnRBoAAAAbgAAAAEAAAAAAAAAAAAAAPMBkjIwMTUxMTE5UkNU
TFBHTjAwMQAAAAAAAAAAAABBVVRQTE5SMQBXQVQwMDAwQTBSVlkwAAAAAAAAAAAAAAAAAAAAAAAA
AAAwMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMFdFVFBQTFBHTklHMDAwMTQgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAwMAAAAAAAAAAAAABEQlpVRkIAAAAAMDQAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAX14QAAAAAAAAAA
AAAAAAAAAAAAAAAAAAIBAAAAAAAAAAAAAABBVDAwMDBBMFJWWTAAAAAAAAAAUExOAAAAAAAAAABM
RUZSQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATk4wMiBDIAIAAAAAAAAAAAAAAAAA
AAABBfXhAAAAAGQCAgIsAABA9wAAAAQDnRBoAAA+gAAAAAEAAAAAAAAAAAAAAPMBkzIwMTUxMTE5
UkNGTDJQS04AAAAAAAAAAAAAAABBVVRQTE5SMgBXQVQwMDAwQTBZMEE2AAAAAAAAAAAAAAAAAAAA
AAAAAAAwMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMFdFVFBQTFBLTjAwMDAwMTggICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAwMAAAAAAAAAAAAABETVpVUUIAAAAAMDQAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAX14QAAAAAA
AAAAAAAAAAAAAAAAAAAAAAIBAAAAAAAAAAAAAABBVDAwMDBBMFkwQTYAAAAAAAAAUExOAAAAAAAA
AABMRUZSQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATk4wMiBDIAIAAAAAAAAAAAAA
AAAAAAABBfXhAAAAAGQ=
The problem is that you are using zlib-flate as a general-purpose compression algorithm which, according to the manpage for it, you should not do:
This program should not be used as a general purpose compression tool.
Use something like gzip(1) instead.
So perhaps you should follow the instructions given by your tools and not use them for things that they are not intended for. Use gzip and the System.IO.Compression.GZipStream instead, it's much simpler, especially when you're looking for cross-platform compatible compression algorithms.
That said...
The reason that you can't inflate the data is that it lacks a correct GZIP header. If you add the right header to it you will get something that can be decompressed.
For instance:
public static byte[] DecompressZLibRaw(byte[] bCompressed)
{
byte[] bHdr = new byte[] { 0x1F, 0x8b, 0x08, 0, 0, 0, 0, 0 };
using (var sOutput = new MemoryStream())
using (var sCompressed = new MemoryStream())
{
sCompressed.Write(bHdr, 0, bHdr.Length);
sCompressed.Write(bCompressed, 0, bCompressed.Length);
sCompressed.Position = 0;
using (var decomp = new GZipStream(sCompressed, CompressionMode.Decompress))
{
decomp.CopyTo(sOutput);
}
return sOutput.ToArray();
}
}
Adding the header makes all the difference.
NB: There are two bytes in the 10-byte GZIP header that are not stripped from your source. These are normally used to store the compression flags and the source file system. In the compressed data you present they are invalid values. Additionally the file footer is abbreviated to 5 bytes instead of 8 bytes... all of which is not actually required for decompression. Which probably has a lot to do with why the manpage says not to use this for general compression.
The stream you provided is not complete. It appears that you ended it with a Z_SYNC_FLUSH or Z_FULL_FLUSH in your C# code, instead of a Z_FINISH like you're supposed to. That is causing the error. If you terminate the stream properly, you won't have a problem.
zlib-flate is simply ignoring that error.
If you are not in control of the generation of the stream, you can still use zlib to decompress what's there. You just need to use it at a lower level where you operate on blocks of data and get the decompressed data available given the provided input.

Decompress a stream of bytes using deflatestream [duplicate]

Are System.IO.Compression.GZipStream or System.IO.Compression.Deflate compatible with zlib compression?
I ran into this issue with Git objects. In that particular case, they store the objects as deflated blobs with a Zlib header, which is documented in RFC 1950. You can make a compatible blob by making a file that contains:
Two header bytes (CMF and FLG from RFC 1950) with the values 0x78 0x01
CM = 8 = deflate
CINFO = 7 = 32Kb window
FCHECK = 1 = checksum bits for this header
The output of the C# DeflateStream
An Adler32 checksum of the input data to the DeflateStream, big-endian format (MSB first)
I made my own Adler implementation
public class Adler32Computer
{
private int a = 1;
private int b = 0;
public int Checksum
{
get
{
return ((b * 65536) + a);
}
}
private static readonly int Modulus = 65521;
public void Update(byte[] data, int offset, int length)
{
for (int counter = 0; counter < length; ++counter)
{
a = (a + (data[offset + counter])) % Modulus;
b = (b + a) % Modulus;
}
}
}
And that was pretty much it.
DotNetZip includes a DeflateStream, a ZlibStream, and a GZipStream, to handle RFC 1950, 1951, and 1952. The all use the DEFLATE Algorithm but the framing and header bytes are different for each one.
As an advantage, the streams in DotNetZip do not exhibit the anomaly of expanding data size under compression, reported against the built-in streams. Also, there is no built-in ZlibStream, whereas DotNetZip gives you that, for good interop with zlib.
From MSDN about System.IO.Compression.GZipStream:
This class represents the gzip data format, which uses an industry standard algorithm for lossless file compression and decompression.
From the zlib FAQ:
The gz* functions in zlib on the other hand use the gzip format.
So zlib and GZipStream should be interoperable, but only if you use the zlib functions for handling the gzip-format.
System.IO.Compression.Deflate and zlib are reportedly not interoperable.
If you need to handle zip files (you probably don't, but someone else might need this) you need to use SharpZipLib or another third-party library.
I've used GZipStream to compress the output from the .NET XmlSerializer and it has worked perfectly fine to decompress the result with gunzip (in cygwin), winzip and another GZipStream.
For reference, here's what I did in code:
FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write);
using (GZipStream gzStream = new GZipStream(fs, CompressionMode.Compress))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
serializer.Serialize(gzStream, myData);
}
Then, to decompress in c#
FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (Stream input = new GZipStream(fs, CompressionMode.Decompress))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
myData = (MyDataType) serializer.Deserialize(input);
}
Using the 'file' utility in cygwin reveals that there is indeed a difference between the same file compressed with GZipStream and with GNU GZip (probably header information as others has stated in this thread). This difference, however, seems to not matter in practice.
gzip is deflate + some header/footer data, like a checksum and length, etc. So they're not compatible in the sense that one method can use a stream from the other, but they employ the same compression algorithm.
They just compressing the data using zlib or deflate algorithms , but does not provide the output for some specific file format. This means that if you store the stream as-is to the hard drive most probably you will not be able to open it using some application (gzip or winrar) because file headers (magic number, etc ) are not included in stream an you should write them yourself.
Starting from .NET Framework 4.5 the System.IO.Compression.DeflateStream class uses the zlib library.
From the class's MSDN article:
This class represents the Deflate algorithm, which is an industry-standard algorithm for lossless file compression and decompression. Starting with the .NET Framework 4.5, the DeflateStream class uses the zlib library. As a result, it provides a better compression algorithm and, in most cases, a smaller compressed file than it provides in earlier versions of the .NET Framework.
I agree with andreas. You probably won't be able to open the file in an external tool, but if that tool expects a stream you might be able to use it. You would also be able to deflate the file back using the same compression class.

Decompressing a Zip file from a string

I'm fetching an object from couchbase where one of the fields has a file. The file is zipped and then encoded in base64.
How would I be able to take this string and decompress it back to the original file?
Then, if I'm using ASP.MVC 4 - How would I send it back to the browser as a downloadable file?
The original file is being created on a Linux system and decoded on a Windows system (C#).
You should use Convert.FromBase64String to get the bytes, then decompress, and then use Controller.File to have the client download the file. To decompress, you need to open the zip file using some sort of ZIP library. .NET 4.5's built-in ZipArchive class should work. Or you could use another library, both SharpZipLib and DotNetZip support reading from streams.
public ActionResult MyAction()
{
string base64String = // get from Linux system
byte[] zipBytes = Convert.FromBase64String(base64String);
using (var zipStream = new MemoryStream(zipBytes))
using (var zipArchive = new ZipArchive(zipStream))
{
var entry = zipArchive.Entries.Single();
string mimeType = MimeMapping.GetMimeMapping(entry.Name);
using (var decompressedStream = entry.Open())
return File(decompressedStream, mimeType);
}
}
You'll also need the MIME type of the file, you can use MimeMapping.GetMimeMapping to help you get that for most common types.
I've used SharpZipLib successfully for this type of task in the past.
For an example that's very close to what you need to do have a look here.
Basically, the steps should be something like this:
you get the compressed input as a string from the database
create a MemoryStream and write the string to it
seek back to the beginning of the memory stream
use the MemoryStream as an input to the SharpZipLib ZipFile class
follow the example provided above to unpack the contents of the ZipFile
Update
If the string contains only the zipped contents of the file (not a full Zip archive) then you can simply use the GZipStream class in .NET to unzip the contents. You can find a sample here. But the initial steps are the same as above (get string from db, write to memory stream, feed memory stream as input to the GZipStream to decompress).

Can PHP decompress a file compressed with the .NET GZipStream class?

I have a C# application that communicates with a PHP-based SOAP web service for updates and licensing.
I am now working on a feedback system for users to submit errors and tracelogs automatically through the software. Based on a previous question I posted, I felt that a web service would be the best way to do it (most likely to work properly with least configuration).
My current thought is to use .NET built-in gzip compression to compress the text file, convert to base64, send to the web-service, and have the PHP script convert to binary and uncompress the data.
Can PHP decompress data compressed with GZipStream, and if so, how?
I actually tried this. GZipStream doesn't work. On the other hand, compressing with DeflateStream on .NET side and decompressing with gzinflate on PHP side do work. Your mileage may vary...
If the http-level libraries implements it (Both client and server), http has support for gzip-compression, in which case there would be no reason to manually compress anything. You should check if this is already happening before you venture any further.
Since the server is accepting web requests you really should be checking the HTTP headers to determine if any client accepts GZIP encoding rather than just guessing and gzipping each and every time.
If the PHP client can do gzip itll set the header and your code will then react according and do the right thing. Assuming or guessing is a poor choice when the facility is provided for your code to learn the capabilities of the client.
I wrote an article I recently posted that shows how to compress/decompress in C#. I used it for almost the same scenario. I wanted to transfer log files from the client to the server and they were often quite large. However in my case my webservice was running in .NET so I could use the decompress method. But looks like PHP does support a method called gzdecode that would work.
http://coding.infoconex.com/post/2009/05/Compress-and-Decompress-using-net-framework-and-built-in-GZipStream.aspx
Yes, PHP can decompress GZIP compressed strings, with or without headers.
gzdecode for GZIP file format (ie, compatible with gzip)
gzinflate for "raw" DEFLATE format
gzuncompress for ZLIB format (GZIP format without some header info)
I don't know for sure which one you'd want as I'm unfamiliar with .NET GZipStream. It sounds a little like gzuncompress, as the ZLIB format is kind of a "streaming" format, but try all three.
I was able to demo this with Gzip on C# and PHP.
Gzip Compressing in C#:
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
public class Program {
public static void Main() {
string s = "Hi!!";
byte[] byteArray = Encoding.UTF8.GetBytes(s);
byte[] b2 = Compress(byteArray);
Console.WriteLine(System.Convert.ToBase64String(b2));
}
public static byte[] Compress(byte[] bytes) {
using (var memoryStream = new MemoryStream()) {
using (var gzipStream = new GZipStream(memoryStream, CompressionLevel.Optimal)) {
gzipStream.Write(bytes, 0, bytes.Length);
}
return memoryStream.ToArray();
}
}
public static byte[] Decompress(byte[] bytes) {
using (var memoryStream = new MemoryStream(bytes)) {
using (var outputStream = new MemoryStream()) {
using (var decompressStream = new GZipStream(memoryStream, CompressionMode.Decompress)) {
decompressStream.CopyTo(outputStream);
}
return outputStream.ToArray();
}
}
}
}
the code above prints the base64 encoded compressed string which is H4sIAAAAAAAEAPPIVFQEANxaFPgEAAAA for the Hi!! input.
Here's the code to decompress in PHP:
echo gzdecode(base64_decode('H4sIAAAAAAAEAPPIVFQEANxaFPgEAAAA'));

Categories