How to compress data in C# to be decompressed in zlib python

How to compress data in C# to be decompressed in zlib python - c#

I have a python zlib decompressor that takes default parameters as follows, where data is string:
import zlib
data_decompressed = zlib.decompress(data)
But, I don't know how I can compress a string in c# to be decompressed in python. I've tray the next piece of code but when I trie to decompresse 'incorrect header check' exception is trown.
static byte[] ZipContent(string entryName)
{
// remove whitespace from xml and convert to byte array
byte[] normalBytes;
using (StringWriter writer = new StringWriter())
{
//xml.Save(writer, SaveOptions.DisableFormatting);
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
normalBytes = encoding.GetBytes(writer.ToString());
}
// zip into new, zipped, byte array
using (Stream memOutput = new MemoryStream())
using (ZipOutputStream zipOutput = new ZipOutputStream(memOutput))
{
zipOutput.SetLevel(6);
ZipEntry entry = new ZipEntry(entryName);
entry.CompressionMethod = CompressionMethod.Deflated;
entry.DateTime = DateTime.Now;
zipOutput.PutNextEntry(entry);
zipOutput.Write(normalBytes, 0, normalBytes.Length);
zipOutput.Finish();
byte[] newBytes = new byte[memOutput.Length];
memOutput.Seek(0, SeekOrigin.Begin);
memOutput.Read(newBytes, 0, newBytes.Length);
zipOutput.Close();
return newBytes;
}
}
Anyone could help me please?
Thank you.
UPDATE 1:
I've tried with defalte function as Shiraz Bhaiji has posted:
public static byte[] Deflate(byte[] data)
{
if (null == data || data.Length < 1) return null;
byte[] compressedBytes;
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
}
}
return compressedBytes;
}
The problem is that to work properly in python code I've to add the "-zlib.MAX_WBITS" argument to decompress as follows:
data_decompressed = zlib.decompress(data, -zlib.MAX_WBITS)
So, my new question is: is it possible to code a deflate method in C# which compression result could be decompressed with zlib.decompress(data) as defaults?

In C# the DeflateStream class supports zlib. See:
https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream?view=netframework-4.8

As you described with your edit, zlib.decompress(data, -zlib.MAX_WBITS) is the correct way to decompress data from C#'s DeflateStream. There are two formats at play here:
deflate - as in specification RFC 1951 - this is what's C# is producing
zlib - as in specification RFC 1950 - this is what's Python is expecting by default
What is the difference between the two? It's small, really:
zlib = [compression flag byte] + [flags byte] + deflate + [adler checksum]
(there are also optional dictionary bytes but we don't have to worry about them)
Therefore, to get zlib format from deflate, we need to prepend two bytes of flags, and append Adler-32 checksum. Luckily we have an answer on stackoverflow for the flags, see What does a zlib header look like? and implementing Adler-32 is not that hard. So suppose you have your MemoryStream ms, we would first write the two flag bytes
ms.Write(new byte[] {0x78,0x9c});
...then we would do exactly what's in your answer
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
}
and, at last, compute the checksum and append it to the end of the stream:
uint a = 0;
uint b = 0;
for(int i = 0; i < data.Length; ++i)
{
a = (a + data[i]) % 65521;
b = (b + a) % 65521;
}
Sadly, I don't know a pretty way of writing uints into the stream. This is an ugly way:
ms.Write(new byte[] { (byte)(b>>8),
(byte)b,
(byte)(a>>8),
(byte)a
});

Related

Gzipstream header and suffix

How do I know the size of my compressed file used GzipStream? I know that it has a header and suffix. First 10 bytes - it's header, second 8 bytes - suffix. How do I know the size file in the suffix?

Something a bit better written:
public int GetUncompressedSize(string FileName)
{
using(BinaryReader br = new BinaryReader(File.OpenRead(pathToFile))
{
br.BaseStream.Seek(SeekOrigin.End, -4);
return br.ReadInt32();
}
}

I see that you down voted my previous answer most likely because it was an example using Java. The principle is still the same, so the answer to your question would be that the last 4 bytes contain the information you require. Hopefully this answer is more what you are after.
Here is a C# Decompress function example of decompressing the GZip inclusive of getting the size of the compressed file used by GZipStream:
static public byte[] Decompress(byte[] b)
{
MemoryStream ms = new MemoryStream(b.length);
ms.Write(b, 0, b.Length);
//last 4 bytes of GZipStream = length of decompressed data
ms.Seek(-4, SeekOrigin.Current);
byte[] lb = new byte[4];
ms.Read(lb, 0, 4);
int len = BitConverter.ToInt32(lb, 0);
ms.Seek(0, SeekOrigin.Begin);
byte[] ob = new byte[len];
GZipStream zs = new GZipStream(ms, CompressionMode.Decompress);
zs.Read(ob, 0, len);
returen ob;
}

I see that you down voted my previous answer most likely because it was an example using Java. The principle is still the same, so the answer to your question would be that the last 4 bytes contain the information you require. Hopefully this answer is more what you are after.
Here is a C# Decompress function example of decompressing the GZip inclusive of getting the size of the compressed file used by GZipStream:
static public byte[] Decompress(byte[] b)
{
MemoryStream ms = new MemoryStream(b.length);
ms.Write(b, 0, b.Length);
//last 4 bytes of GZipStream = length of decompressed data
ms.Seek(-4, SeekOrigin.Current);
byte[] lb = new byte[4];
ms.Read(lb, 0, 4);
int len = BitConverter.ToInt32(lb, 0);
ms.Seek(0, SeekOrigin.Begin);
byte[] ob = new byte[len];
GZipStream zs = new GZipStream(ms, CompressionMode.Decompress);
zs.Read(ob, 0, len);
returen ob;
}

protobuf-net returns null when calling Deserialize

My end goal is to use protobuf-net and GZipStream in an attempt to compress a List<MyCustomType> object to store in a varbinary(max) field in SQL Server. I'm working on unit tests to understand how everything works and fits together.
Target .NET framework is 3.5.
My current process is:
Serialize the data with protobuf-net (good).
Compress the serialized data from #1 with GZipStream (good).
Convert the compressed data to a base64 string (good).
At this point, the value from step #3 will be stored in a varbinary(max) field. I have no control over this. The steps resume with needing to take a base64 string and deserialize it to a concrete type.
Convert a base 64 string to a byte[] (good).
Decompress the data with GZipStream (good).
Deserialize the data with protobuf-net (bad).
Can someone assist with why the call to Serializer.Deserialize<string> returns null? I'm stuck on this one and hopefully a fresh set of eyes will help.
FWIW, I tried another version of this using List<T> where T is a custom class I created and I Deserialize<> still returns null.
FWIW 2, data.txt is a 4MB plaintext file residing on my C:.
[Test]
public void ForStackOverflow()
{
string data = "hi, my name is...";
//string data = File.ReadAllText(#"C:\Temp\data.txt");
string serializedBase64;
using (MemoryStream protobuf = new MemoryStream())
{
Serializer.Serialize(protobuf, data);
using (MemoryStream compressed = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(compressed, CompressionMode.Compress))
{
byte[] s = protobuf.ToArray();
gzip.Write(s, 0, s.Length);
gzip.Close();
}
serializedBase64 = Convert.ToBase64String(compressed.ToArray());
}
}
byte[] base64byteArray = Convert.FromBase64String(serializedBase64);
using (MemoryStream base64Stream = new MemoryStream(base64byteArray))
{
using (GZipStream gzip = new GZipStream(base64Stream, CompressionMode.Decompress))
{
using (MemoryStream plainText = new MemoryStream())
{
byte[] buffer = new byte[4096];
int read;
while ((read = gzip.Read(buffer, 0, buffer.Length)) > 0)
{
plainText.Write(buffer, 0, read);
}
// why does this call to Deserialize return null?
string deserialized = Serializer.Deserialize<string>(plainText);
Assert.IsNotNull(deserialized);
Assert.AreEqual(data, deserialized);
}
}
}
}

Because you didn't rewind plainText after writing to it. Actually, that entire Stream is unnecessary - this works:
using (MemoryStream base64Stream = new MemoryStream(base64byteArray))
{
using (GZipStream gzip = new GZipStream(
base64Stream, CompressionMode.Decompress))
{
string deserialized = Serializer.Deserialize<string>(gzip);
Assert.IsNotNull(deserialized);
Assert.AreEqual(data, deserialized);
}
}
Likewise, this should work for the serialize:
using (MemoryStream compressed = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(
compressed, CompressionMode.Compress, true))
{
Serializer.Serialize(gzip, data);
}
serializedBase64 = Convert.ToBase64String(
compressed.GetBuffer(), 0, (int)compressed.Length);
}

How to decompress a string in javascript, compressed in C#? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
ZLIB Decompression - Client Side
I'll try to be clear and I'm sorry for my bad english. This is the question:
In my web application i received a string that represent an image compressed with this algorithm, written in C#:
public static class Compression
{
public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}
ms.Position = 0;
MemoryStream outStream = new MemoryStream();
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String(gzBuffer);
}
public static string Decompress(string compressedText)
{
byte[] gzBuffer = Convert.FromBase64String(compressedText);
using (MemoryStream ms = new MemoryStream())
{
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
{
zip.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
}
The Decompress method is used in the Server side application. I receive an xml file with the string that represent the image compressed with the Compress method and I want to be able to decompress the string I received in javascript within my web app. Is there a way to do that? Are there other solutions? Thank's to everyone!!

The best solution might be to translate the decompression function from C# to Javascript. You could use one that's already available in Javascript such as this one, but you would need to change the source of the image or uncompress-recompress at the server, unless it happens to be compatible with the compression you're using.
Another option would be to convert the image in to .jpg or .png before you use it, again at the server. This would give you more flexibility in the long run, but might put a load on the server depending on traffic and image size.

You can use JSXCompressor library to do decompression (deflate, unzip).
But if your web server support compression at http level I think you can skip compression and decompression.

How to determine size of string, and compress it

I'm currently developing an application in C# that uses Amazon SQS
The size limit for a message is 8kb.
I have a method that is something like:
public void QueueMessage(string message)
Within this method, I'd like to first of all, compress the message (most messages are passed in as json, so are already fairly small)
If the compressed string is still larger than 8kb, I'll store it in S3.
My question is:
How can I easily test the size of a string, and what's the best way to compress it?
I'm not looking for massive reductions in size, just something nice and easy - and easy to decompress the other end.

To know the "size" (in kb) of a string we need to know the encoding. If we assume UTF8, then it is (not including BOM etc) like below (but swap the encoding if it isn't UTF8):
int len = Encoding.UTF8.GetByteCount(longString);
Re packing it; I would suggest GZIP via UTF8, optionally followed by base-64 if it has to be a string:
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
{
byte[] raw = Encoding.UTF8.GetBytes(longString);
gzip.Write(raw, 0, raw.Length);
gzip.Close();
}
byte[] zipped = ms.ToArray(); // as a BLOB
string base64 = Convert.ToBase64String(zipped); // as a string
// store zipped or base64
}

Give unzip bytes to this function.The best I could come up with was
public static byte[] ZipToUnzipBytes(byte[] bytesContext)
{
byte[] arrUnZipFile = null;
if (bytesContext.Length > 100)
{
using (var inFile = new MemoryStream(bytesContext))
{
using (var decompress = new GZipStream(inFile, CompressionMode.Decompress, false))
{
byte[] bufferWrite = new byte[4];
inFile.Position = (int)inFile.Length - 4;
inFile.Read(bufferWrite, 0, 4);
inFile.Position = 0;
arrUnZipFile = new byte[BitConverter.ToInt32(bufferWrite, 0) + 100];
decompress.Read(arrUnZipFile, 0, arrUnZipFile.Length);
}
}
}
return arrUnZipFile;
}

GZIP Java vs .NET

Using the following Java code to compress/decompress bytes[] to/from GZIP.
First text bytes to gzip bytes:
public static byte[] fromByteToGByte(byte[] bytes) {
ByteArrayOutputStream baos = null;
try {
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
byte[] buffer = new byte[1024];
int len;
while((len = bais.read(buffer)) >= 0) {
gzos.write(buffer, 0, len);
}
gzos.close();
baos.close();
} catch (IOException e) {
e.printStackTrace();
}
return(baos.toByteArray());
}
Then the method that goes the other way compressed bytes to uncompressed bytes:
public static byte[] fromGByteToByte(byte[] gbytes) {
ByteArrayOutputStream baos = null;
ByteArrayInputStream bais = new ByteArrayInputStream(gbytes);
try {
baos = new ByteArrayOutputStream();
GZIPInputStream gzis = new GZIPInputStream(bais);
byte[] bytes = new byte[1024];
int len;
while((len = gzis.read(bytes)) > 0) {
baos.write(bytes, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
return(baos.toByteArray());
}
Think there is any effect since I'm not writing out to a gzip file?
Also I noticed that in the standard C# function that BitConverter reads the first four bytes and then the MemoryStream Write function is called with a start point of 4 and a length of input buffer length - 4. So is that effect the validity of the header?
Jim

I tryed it out, and I cant reproduce your 'Invalid GZip Header' issue. Here is what I did:
Java side
I took your Java compression method together with this java snippet:
public static String ToHexString(byte[] bytes){
StringBuilder hexString = new StringBuilder();
for (int i = 0; i < bytes.length; i++)
hexString.append((i == 0 ? "" : "-") +
Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
return hexString.toString();
}
So that this minimalistic java application, taking the bytes of a test string, compressing it, and converting it to a hex string of the compressed data...:
public static void main(String[] args){
System.out.println(ToHexString(fromByteToGByte("asdf".getBytes())));
}
... outputs the following (I added annotations):
1f-8b-08-00-00-00-00-00-00-00-4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00
^------- GZip Header -------^ ^----------- Compressed data -----------^
C# side
I wrote two methods for compressing and uncompressing a byte array to another byte array (compression method is just for completeness, and my testings):
public static byte[] Compress(byte[] uncompressed)
{
using (MemoryStream ms = new MemoryStream())
using (GZipStream gzs = new GZipStream(ms, CompressionMode.Compress))
{
gzs.Write(uncompressed, 0, uncompressed.Length);
gzs.Close();
return ms.ToArray();
}
}
public static byte[] Decompress(byte[] compressed)
{
byte[] buffer = new byte[4096];
using (MemoryStream ms = new MemoryStream(compressed))
using (GZipStream gzs = new GZipStream(ms, CompressionMode.Decompress))
using (MemoryStream uncompressed = new MemoryStream())
{
for (int r = -1; r != 0; r = gzs.Read(buffer, 0, buffer.Length))
if (r > 0) uncompressed.Write(buffer, 0, r);
return uncompressed.ToArray();
}
}
Together with a small function that takes a hex string and turns it back to a byte array... (also just for testing purposes):
public static byte[] ToByteArray(string hexString)
{
hexString = hexString.Replace("-", "");
int NumberChars = hexString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hexString.Substring(i, 2), 16);
return bytes;
}
... I did the following:
// Just hardcoded the output of the java program, convert it back to byte[]
byte[] fromjava = ToByteArray("1f-8b-08-00-00-00-00-00-00-00-" +
"4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00");
// Decompress it with my function above
byte[] uncompr = Decompress(fromjava);
// Get the string out of the byte[] and print it
Console.WriteLine(System.Text.ASCIIEncoding.ASCII
.GetString(uncompr, 0, uncompr.Length));
Et voila, the output is:
asdf
Works perfect for me. Maybe you should check your decompression method in your c# application.
You said in your previous question you are storing those byte arrays in a database, right? Maybe you want to check whether the bytes come back from the database the way you put them in.

Posting this as an answer so the code looks decent.
Note a couple things:
First, the round trip to the database did not appear to have any effect. Java on both sides produced exactly what I put in. Java in C# out worked fine with the Ionic API, as did C# in and Java out. Which brings me to the second point.
Second, my original decompress was on the order of:
public static string Decompress(byte[] gzBuffer)
{
using (MemoryStream ms = new MemoryStream())
{
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length – 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
{
zip.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
Which depended on the internal byte count, yours reads the whole file regardless of internal value. Don't know what the Ionic algorithm is. Yours works the same as the Java methods I've used. That's the only difference I see. Thanks very much for doing all that work. I will remember that way of doing it.
Thanks,
Jim

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to compress data in C# to be decompressed in zlib python - c#

In C# the DeflateStream class supports zlib. See: https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream?view=netframework-4.8

Related

Gzipstream header and suffix

protobuf-net returns null when calling Deserialize

How to decompress a string in javascript, compressed in C#? [duplicate]

How to determine size of string, and compress it

GZIP Java vs .NET

Categories

Resources