Gzipstream header and suffix - c#

How do I know the size of my compressed file used GzipStream? I know that it has a header and suffix. First 10 bytes - it's header, second 8 bytes - suffix. How do I know the size file in the suffix?

Something a bit better written:
public int GetUncompressedSize(string FileName)
{
using(BinaryReader br = new BinaryReader(File.OpenRead(pathToFile))
{
br.BaseStream.Seek(SeekOrigin.End, -4);
return br.ReadInt32();
}
}

I see that you down voted my previous answer most likely because it was an example using Java. The principle is still the same, so the answer to your question would be that the last 4 bytes contain the information you require. Hopefully this answer is more what you are after.
Here is a C# Decompress function example of decompressing the GZip inclusive of getting the size of the compressed file used by GZipStream:
static public byte[] Decompress(byte[] b)
{
MemoryStream ms = new MemoryStream(b.length);
ms.Write(b, 0, b.Length);
//last 4 bytes of GZipStream = length of decompressed data
ms.Seek(-4, SeekOrigin.Current);
byte[] lb = new byte[4];
ms.Read(lb, 0, 4);
int len = BitConverter.ToInt32(lb, 0);
ms.Seek(0, SeekOrigin.Begin);
byte[] ob = new byte[len];
GZipStream zs = new GZipStream(ms, CompressionMode.Decompress);
zs.Read(ob, 0, len);
returen ob;
}

I see that you down voted my previous answer most likely because it was an example using Java. The principle is still the same, so the answer to your question would be that the last 4 bytes contain the information you require. Hopefully this answer is more what you are after.
Here is a C# Decompress function example of decompressing the GZip inclusive of getting the size of the compressed file used by GZipStream:
static public byte[] Decompress(byte[] b)
{
MemoryStream ms = new MemoryStream(b.length);
ms.Write(b, 0, b.Length);
//last 4 bytes of GZipStream = length of decompressed data
ms.Seek(-4, SeekOrigin.Current);
byte[] lb = new byte[4];
ms.Read(lb, 0, 4);
int len = BitConverter.ToInt32(lb, 0);
ms.Seek(0, SeekOrigin.Begin);
byte[] ob = new byte[len];
GZipStream zs = new GZipStream(ms, CompressionMode.Decompress);
zs.Read(ob, 0, len);
returen ob;
}

Related

How to compress data in C# to be decompressed in zlib python

I have a python zlib decompressor that takes default parameters as follows, where data is string:
import zlib
data_decompressed = zlib.decompress(data)
But, I don't know how I can compress a string in c# to be decompressed in python. I've tray the next piece of code but when I trie to decompresse 'incorrect header check' exception is trown.
static byte[] ZipContent(string entryName)
{
// remove whitespace from xml and convert to byte array
byte[] normalBytes;
using (StringWriter writer = new StringWriter())
{
//xml.Save(writer, SaveOptions.DisableFormatting);
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
normalBytes = encoding.GetBytes(writer.ToString());
}
// zip into new, zipped, byte array
using (Stream memOutput = new MemoryStream())
using (ZipOutputStream zipOutput = new ZipOutputStream(memOutput))
{
zipOutput.SetLevel(6);
ZipEntry entry = new ZipEntry(entryName);
entry.CompressionMethod = CompressionMethod.Deflated;
entry.DateTime = DateTime.Now;
zipOutput.PutNextEntry(entry);
zipOutput.Write(normalBytes, 0, normalBytes.Length);
zipOutput.Finish();
byte[] newBytes = new byte[memOutput.Length];
memOutput.Seek(0, SeekOrigin.Begin);
memOutput.Read(newBytes, 0, newBytes.Length);
zipOutput.Close();
return newBytes;
}
}
Anyone could help me please?
Thank you.
UPDATE 1:
I've tried with defalte function as Shiraz Bhaiji has posted:
public static byte[] Deflate(byte[] data)
{
if (null == data || data.Length < 1) return null;
byte[] compressedBytes;
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
}
}
return compressedBytes;
}
The problem is that to work properly in python code I've to add the "-zlib.MAX_WBITS" argument to decompress as follows:
data_decompressed = zlib.decompress(data, -zlib.MAX_WBITS)
So, my new question is: is it possible to code a deflate method in C# which compression result could be decompressed with zlib.decompress(data) as defaults?
In C# the DeflateStream class supports zlib. See:
https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream?view=netframework-4.8
As you described with your edit, zlib.decompress(data, -zlib.MAX_WBITS) is the correct way to decompress data from C#'s DeflateStream. There are two formats at play here:
deflate - as in specification RFC 1951 - this is what's C# is producing
zlib - as in specification RFC 1950 - this is what's Python is expecting by default
What is the difference between the two? It's small, really:
zlib = [compression flag byte] + [flags byte] + deflate + [adler checksum]
(there are also optional dictionary bytes but we don't have to worry about them)
Therefore, to get zlib format from deflate, we need to prepend two bytes of flags, and append Adler-32 checksum. Luckily we have an answer on stackoverflow for the flags, see What does a zlib header look like? and implementing Adler-32 is not that hard. So suppose you have your MemoryStream ms, we would first write the two flag bytes
ms.Write(new byte[] {0x78,0x9c});
...then we would do exactly what's in your answer
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
}
and, at last, compute the checksum and append it to the end of the stream:
uint a = 0;
uint b = 0;
for(int i = 0; i < data.Length; ++i)
{
a = (a + data[i]) % 65521;
b = (b + a) % 65521;
}
Sadly, I don't know a pretty way of writing uints into the stream. This is an ugly way:
ms.Write(new byte[] { (byte)(b>>8),
(byte)b,
(byte)(a>>8),
(byte)a
});

GZipping Javascript from .ashx returns decoding error in browser

Background
I'm setting up a generic handler to:
Combine & compress Javascript and CSS files
Cache a GZip version & a Non-GZip version
Serve the appropriate version based on the request
I'm working in MonoDevelop v2.8.2 on OSX 10.7.2
Problem
Since I want to Cache the GZipped version, I need to GZip without using a response filter
Using this code, I can compress and decompress a string on the server successfully, but when I serve it to the client I get:
Error 330 (net::ERR_CONTENT_DECODING_FAILED): Unknown error. (Chrome)
Cannot decode raw data (Safari)
The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression. (Firefox)
Relevant Code
string sCompiled =null;
if(bCanGZip)
{
context.Response.AddHeader("Content-Encoding", "gzip");
bHasValue = CurrentCache.CompiledScripts.TryGetValue(context.Request.Url.ToString() + "GZIP", out sCompiled);
}
//...
//Process files if bHasVale is false
//Compress result of file concatination/minification
//Compression method
public static string CompressString(string text)
{
UTF8Encoding encoding = new UTF8Encoding(false);
byte[] buffer = encoding.GetBytes(text);
using(MemoryStream memoryStream = new MemoryStream()){
using (GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
byte[] compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
byte[] gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
return Convert.ToBase64String(gZipBuffer);
}
}
//...
//Return value
switch(Type){
case FileType.CSS:
context.Response.ContentType = "text/css";
break;
case FileType.JS:
context.Response.ContentType = "application/javascript";
break;
}
context.Response.AddHeader("Content-Length", sCompiled.Length.ToString());
context.Response.Clear();
context.Response.Write(sCompiled);
Attempts to Resolve
Since I'm not sure what the lines:
byte[] gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
are accomplishing, I tried removing them.
I tried playing with different Encodings/options.
At this point I'm really not sure how to attack the problem since I don't know the source of the error (Encoding/Compression/other).
Any help would be very appreciated!
Other Resources I've found on the subject
http://beta.blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/24/how-to-compress-and-decompress-using-gzipstream-object.aspx
http://madskristensen.net/post/Compress-and-decompress-strings-in-C.aspx
http://www.codeproject.com/KB/files/GZipStream.aspx
http://www.codeproject.com/KB/aspnet/HttpCombine.aspx
http://webreflection.blogspot.com/2009/01/quick-tip-c-gzip-content.html
http://www.dominicpettifer.co.uk/Blog/17/gzip-compress-your-websites-html-css-script-in-code
This is one of those things where once you explain you problem, you quickly find the answer.
I need to write out the response as Binary. So modifying the compression algorithum to return a byte array:
public static byte[] CompressStringToArray(string text){
UTF8Encoding encoding = new UTF8Encoding(false);
byte[] buffer = encoding.GetBytes(text);
using(MemoryStream memoryStream = new MemoryStream()){
using (GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
byte[] compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
return compressedData;
}
}
and then calling:
//Writes a byte buffer without encoding the response stream
context.Response.BinaryWrite(GZipTools.CompressStringToArray(sCompiled));
Solves the issue. Hopefully this helps others who will face the same problem.

how to buffer an input stream until it is complete

i'm implementing a wcf service that accepts image streams. however i'm currently getting an exception when i run it. as its trying to get the length of the stream before the stream is complete. so what i'd like to do is buffer the stream until its complete. however i cant find any examples of how to do this...
can anyone help?
my code so far:
public String uploadUserImage(Stream stream)
{
Stream fs = stream;
BinaryReader br = new BinaryReader(fs);
Byte[] bytes = br.ReadBytes((Int32)fs.Length);// this causes exception
File.WriteAllBytes(filepath, bytes);
}
Rather than try to fetch the length, you should read from the stream until it returns that it's "done". In .NET 4, this is really easy:
// Assuming we *really* want to read it into memory first...
MemoryStream memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
memoryStream.Position = 0;
File.WriteAllBytes(filepath, memoryStream);
In .NET 3.5 there's no CopyTo method, but you can write something similar yourself:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
However, now we've got something to copy a stream, why bother reading it all into memory first? Let's just write it straight to a file:
using (FileStream output = File.OpenWrite(filepath))
{
CopyStream(stream, output); // Or stream.CopyTo(output);
}
I'm not sure what you are returning (or not returning), but something like this might work for you:
public String uploadUserImage(Stream stream) {
const int KB = 1024;
Byte[] bytes = new Byte[KB];
StringBuilder sb = new StringBuilder();
using (BinaryReader br = new BinaryReader(stream)) {
int len;
do {
len = br.Read(bytes, 0, KB);
string readData = Encoding.UTF8.GetString(bytes);
sb.Append(readData);
} while (len == KB);
}
//File.WriteAllBytes(filepath, bytes);
return sb.ToString();
}
A string can hold up to 2 GB, I believe.
Try this :
using (StreamWriter sw = File.CreateText(filepath))
{
stream.CopyTo(sw);
sw.Close();
}
Jon Skeets answer for .Net 3.5 and below using a Buffer Read is actually done incorrectly.
The buffer isn't cleared between reads which can result in issues on any read that returns less than 8192, for example if the 2nd read, read 192 bytes, the 8000 last bytes from the first read would STILL be in the buffer which would then be returned to the stream.
My code below you supply it a Stream and it will return a IEnumerable array.
Using this you can for-each it and Write to a MemoryStream and then use .GetBuffer() to end up with a compiled merged byte[].
private IEnumerable<byte[]> ReadFullStream(Stream stream) {
while(true) {
byte[] buffer = new byte[8192];//since this is created every loop, its buffer is cleared
int bytesRead = stream.Read(buffer, 0, buffer.Length);//read up to 8192 bytes into buffer
if (bytesRead == 0) {//if we read nothing, stream is finished
break;
}
if(bytesRead < buffer.Length) {//if we read LESS than 8192 bytes, resize the buffer to essentially remove everything after what was read, otherwise you will have nullbytes/0x00bytes at the end of your buffer
Array.Resize(ref buffer, bytesRead);
}
yield return buffer;//yield return the buffer data
}//loop here until we reach a read == 0 (end of stream)
}

GZIP Java vs .NET

Using the following Java code to compress/decompress bytes[] to/from GZIP.
First text bytes to gzip bytes:
public static byte[] fromByteToGByte(byte[] bytes) {
ByteArrayOutputStream baos = null;
try {
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
byte[] buffer = new byte[1024];
int len;
while((len = bais.read(buffer)) >= 0) {
gzos.write(buffer, 0, len);
}
gzos.close();
baos.close();
} catch (IOException e) {
e.printStackTrace();
}
return(baos.toByteArray());
}
Then the method that goes the other way compressed bytes to uncompressed bytes:
public static byte[] fromGByteToByte(byte[] gbytes) {
ByteArrayOutputStream baos = null;
ByteArrayInputStream bais = new ByteArrayInputStream(gbytes);
try {
baos = new ByteArrayOutputStream();
GZIPInputStream gzis = new GZIPInputStream(bais);
byte[] bytes = new byte[1024];
int len;
while((len = gzis.read(bytes)) > 0) {
baos.write(bytes, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
return(baos.toByteArray());
}
Think there is any effect since I'm not writing out to a gzip file?
Also I noticed that in the standard C# function that BitConverter reads the first four bytes and then the MemoryStream Write function is called with a start point of 4 and a length of input buffer length - 4. So is that effect the validity of the header?
Jim
I tryed it out, and I cant reproduce your 'Invalid GZip Header' issue. Here is what I did:
Java side
I took your Java compression method together with this java snippet:
public static String ToHexString(byte[] bytes){
StringBuilder hexString = new StringBuilder();
for (int i = 0; i < bytes.length; i++)
hexString.append((i == 0 ? "" : "-") +
Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
return hexString.toString();
}
So that this minimalistic java application, taking the bytes of a test string, compressing it, and converting it to a hex string of the compressed data...:
public static void main(String[] args){
System.out.println(ToHexString(fromByteToGByte("asdf".getBytes())));
}
... outputs the following (I added annotations):
1f-8b-08-00-00-00-00-00-00-00-4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00
^------- GZip Header -------^ ^----------- Compressed data -----------^
C# side
I wrote two methods for compressing and uncompressing a byte array to another byte array (compression method is just for completeness, and my testings):
public static byte[] Compress(byte[] uncompressed)
{
using (MemoryStream ms = new MemoryStream())
using (GZipStream gzs = new GZipStream(ms, CompressionMode.Compress))
{
gzs.Write(uncompressed, 0, uncompressed.Length);
gzs.Close();
return ms.ToArray();
}
}
public static byte[] Decompress(byte[] compressed)
{
byte[] buffer = new byte[4096];
using (MemoryStream ms = new MemoryStream(compressed))
using (GZipStream gzs = new GZipStream(ms, CompressionMode.Decompress))
using (MemoryStream uncompressed = new MemoryStream())
{
for (int r = -1; r != 0; r = gzs.Read(buffer, 0, buffer.Length))
if (r > 0) uncompressed.Write(buffer, 0, r);
return uncompressed.ToArray();
}
}
Together with a small function that takes a hex string and turns it back to a byte array... (also just for testing purposes):
public static byte[] ToByteArray(string hexString)
{
hexString = hexString.Replace("-", "");
int NumberChars = hexString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hexString.Substring(i, 2), 16);
return bytes;
}
... I did the following:
// Just hardcoded the output of the java program, convert it back to byte[]
byte[] fromjava = ToByteArray("1f-8b-08-00-00-00-00-00-00-00-" +
"4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00");
// Decompress it with my function above
byte[] uncompr = Decompress(fromjava);
// Get the string out of the byte[] and print it
Console.WriteLine(System.Text.ASCIIEncoding.ASCII
.GetString(uncompr, 0, uncompr.Length));
Et voila, the output is:
asdf
Works perfect for me. Maybe you should check your decompression method in your c# application.
You said in your previous question you are storing those byte arrays in a database, right? Maybe you want to check whether the bytes come back from the database the way you put them in.
Posting this as an answer so the code looks decent.
Note a couple things:
First, the round trip to the database did not appear to have any effect. Java on both sides produced exactly what I put in. Java in C# out worked fine with the Ionic API, as did C# in and Java out. Which brings me to the second point.
Second, my original decompress was on the order of:
public static string Decompress(byte[] gzBuffer)
{
using (MemoryStream ms = new MemoryStream())
{
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length – 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
{
zip.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
Which depended on the internal byte count, yours reads the whole file regardless of internal value. Don't know what the Ionic algorithm is. Yours works the same as the Java methods I've used. That's the only difference I see. Thanks very much for doing all that work. I will remember that way of doing it.
Thanks,
Jim

Problem with C# Decompression

Have some data in a sybase image type column that I want to use in a C# app. The data has been compressed by Java using the java.util.zip package. I wanted to test that I could decompress the data in C#. So I wrote a test app that pulls it out of the database:
byte[] bytes = (byte[])reader.GetValue(0);
This gives me a compressed byte[] of 2479 length.
Then I pass this to a seemingly standard C# decompression method:
public static byte[] Decompress(byte[] gzBuffer)
{
MemoryStream ms = new MemoryStream();
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
GZipStream zip = new GZipStream(ms, CompressionMode.Decompress);
zip.Read(buffer, 0, buffer.Length);
return buffer;
}
The value for msgLength is 1503501432 which seems way out of range. The original document should be in the range of 5K -50k. Anyway when I use that value to create "buffer" not surprisingly I get an OutOfMemoryException.
What is happening?
Jim
The Java compress method is as follows:
public byte[] compress(byte[] bytes) throws Exception {
byte[] results = new byte[bytes.length];
Deflater deflator = new Deflater();
deflater.setInput(bytes);
deflater.finish();
int len = deflater.deflate(results);
byte[] out = new byte[len];
for(int i=0; i<len; i++) {
out[i] = results[i];
}
return(out);
}
As I cant see your java code, I can only guess you are compressing your data to a zip file stream. Therefore it will obviously fail if you are trying to decompress that stream with a gzip decompression in c#. Either you change your java code to a gzip compression (Example here at the bottom of the page), or you decompress the zip file stream in c# with an appropriate library (e.g. SharpZipLib).
Update
Ok now, I see you are using deflate for the compression in java. So, obviously you have to use the same algorithm in c#: System.IO.Compression.DeflateStream
public static byte[] Decompress(byte[] buffer)
{
using (MemoryStream ms = new MemoryStream(buffer))
using (Stream zipStream = new DeflateStream(ms,
CompressionMode.Decompress, true))
{
int initialBufferLength = buffer.Length * 2;
byte[] buffer = new byte[initialBufferLength];
bool finishedExactly = false;
int read = 0;
int chunk;
while (!finishedExactly &&
(chunk = zipStream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
if (read == buffer.Length)
{
int nextByte = zipStream.ReadByte();
// End of Stream?
if (nextByte == -1)
{
finishedExactly = true;
}
else
{
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
}
if (!finishedExactly)
{
byte[] final = new byte[read];
Array.Copy(buffer, final, read);
buffer = final;
}
}
return buffer;
}

Categories