GZIP Java vs .NET - c#

Using the following Java code to compress/decompress bytes[] to/from GZIP.
First text bytes to gzip bytes:
public static byte[] fromByteToGByte(byte[] bytes) {
ByteArrayOutputStream baos = null;
try {
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
byte[] buffer = new byte[1024];
int len;
while((len = bais.read(buffer)) >= 0) {
gzos.write(buffer, 0, len);
}
gzos.close();
baos.close();
} catch (IOException e) {
e.printStackTrace();
}
return(baos.toByteArray());
}
Then the method that goes the other way compressed bytes to uncompressed bytes:
public static byte[] fromGByteToByte(byte[] gbytes) {
ByteArrayOutputStream baos = null;
ByteArrayInputStream bais = new ByteArrayInputStream(gbytes);
try {
baos = new ByteArrayOutputStream();
GZIPInputStream gzis = new GZIPInputStream(bais);
byte[] bytes = new byte[1024];
int len;
while((len = gzis.read(bytes)) > 0) {
baos.write(bytes, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
return(baos.toByteArray());
}
Think there is any effect since I'm not writing out to a gzip file?
Also I noticed that in the standard C# function that BitConverter reads the first four bytes and then the MemoryStream Write function is called with a start point of 4 and a length of input buffer length - 4. So is that effect the validity of the header?
Jim

I tryed it out, and I cant reproduce your 'Invalid GZip Header' issue. Here is what I did:
Java side
I took your Java compression method together with this java snippet:
public static String ToHexString(byte[] bytes){
StringBuilder hexString = new StringBuilder();
for (int i = 0; i < bytes.length; i++)
hexString.append((i == 0 ? "" : "-") +
Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
return hexString.toString();
}
So that this minimalistic java application, taking the bytes of a test string, compressing it, and converting it to a hex string of the compressed data...:
public static void main(String[] args){
System.out.println(ToHexString(fromByteToGByte("asdf".getBytes())));
}
... outputs the following (I added annotations):
1f-8b-08-00-00-00-00-00-00-00-4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00
^------- GZip Header -------^ ^----------- Compressed data -----------^
C# side
I wrote two methods for compressing and uncompressing a byte array to another byte array (compression method is just for completeness, and my testings):
public static byte[] Compress(byte[] uncompressed)
{
using (MemoryStream ms = new MemoryStream())
using (GZipStream gzs = new GZipStream(ms, CompressionMode.Compress))
{
gzs.Write(uncompressed, 0, uncompressed.Length);
gzs.Close();
return ms.ToArray();
}
}
public static byte[] Decompress(byte[] compressed)
{
byte[] buffer = new byte[4096];
using (MemoryStream ms = new MemoryStream(compressed))
using (GZipStream gzs = new GZipStream(ms, CompressionMode.Decompress))
using (MemoryStream uncompressed = new MemoryStream())
{
for (int r = -1; r != 0; r = gzs.Read(buffer, 0, buffer.Length))
if (r > 0) uncompressed.Write(buffer, 0, r);
return uncompressed.ToArray();
}
}
Together with a small function that takes a hex string and turns it back to a byte array... (also just for testing purposes):
public static byte[] ToByteArray(string hexString)
{
hexString = hexString.Replace("-", "");
int NumberChars = hexString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hexString.Substring(i, 2), 16);
return bytes;
}
... I did the following:
// Just hardcoded the output of the java program, convert it back to byte[]
byte[] fromjava = ToByteArray("1f-8b-08-00-00-00-00-00-00-00-" +
"4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00");
// Decompress it with my function above
byte[] uncompr = Decompress(fromjava);
// Get the string out of the byte[] and print it
Console.WriteLine(System.Text.ASCIIEncoding.ASCII
.GetString(uncompr, 0, uncompr.Length));
Et voila, the output is:
asdf
Works perfect for me. Maybe you should check your decompression method in your c# application.
You said in your previous question you are storing those byte arrays in a database, right? Maybe you want to check whether the bytes come back from the database the way you put them in.

Posting this as an answer so the code looks decent.
Note a couple things:
First, the round trip to the database did not appear to have any effect. Java on both sides produced exactly what I put in. Java in C# out worked fine with the Ionic API, as did C# in and Java out. Which brings me to the second point.
Second, my original decompress was on the order of:
public static string Decompress(byte[] gzBuffer)
{
using (MemoryStream ms = new MemoryStream())
{
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length – 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
{
zip.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
Which depended on the internal byte count, yours reads the whole file regardless of internal value. Don't know what the Ionic algorithm is. Yours works the same as the Java methods I've used. That's the only difference I see. Thanks very much for doing all that work. I will remember that way of doing it.
Thanks,
Jim

Related

How to write hex into the text section of a binary file c#

If i have a hex string such as "6C5A3003AF4668B42922879D02364878"
How do I put this into the ascii section of the binary file. I can do it manually like this:
But I haven't found a way to this with code, It always writes it in the hex section like this:
I have tried binary writer and filestream but they write it into the hex section instead
Any help would be appreciated
I actually have this stored in a byte array called Data
I have done this:
for (int i = 0; i <Data.Length; i++)
{
int offset = 32 - i;
stream.Position = allData.Length - stuff; //last 32 bytes of the file
stream.WriteByte(Data[i]); //writes it into the hex section not text section
}
For example, I give the hex code GatewayServer(47-61-74-65-77-61-79-53-65-72-76-65-72) to the program to see how the program works
byte[] data = FromHex("47-61-74-65-77-61-79-53-65-72-76-65-72");
string s = Encoding.ASCII.GetString(data);
//write s = GatewayServer
Console.WriteLine(s);
convert hex data to byte data:
public static byte[] FromHex(string hex)
{
hex = hex.Replace("-", "");
byte[] raw = new byte[hex.Length / 2];
for (int i = 0; i < raw.Length; i++)
{
raw[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return raw;
}
and for write bytes to existing file
public static void AppendAllBytes(string path, byte[] bytes)
{
//argument-checking here.
using (var stream = new FileStream(path, FileMode.Append))
{
stream.Write(bytes, 0, bytes.Length);
}
}
and for write bytes to new file
public static void CreateFileAndWriteAllBytes(string path, byte[] bytes)
{
//argument-checking here.
using (var stream = new FileStream(path, FileMode.Create))
{
stream.Write(bytes, 0, bytes.Length);
}
}

How to compress data in C# to be decompressed in zlib python

I have a python zlib decompressor that takes default parameters as follows, where data is string:
import zlib
data_decompressed = zlib.decompress(data)
But, I don't know how I can compress a string in c# to be decompressed in python. I've tray the next piece of code but when I trie to decompresse 'incorrect header check' exception is trown.
static byte[] ZipContent(string entryName)
{
// remove whitespace from xml and convert to byte array
byte[] normalBytes;
using (StringWriter writer = new StringWriter())
{
//xml.Save(writer, SaveOptions.DisableFormatting);
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
normalBytes = encoding.GetBytes(writer.ToString());
}
// zip into new, zipped, byte array
using (Stream memOutput = new MemoryStream())
using (ZipOutputStream zipOutput = new ZipOutputStream(memOutput))
{
zipOutput.SetLevel(6);
ZipEntry entry = new ZipEntry(entryName);
entry.CompressionMethod = CompressionMethod.Deflated;
entry.DateTime = DateTime.Now;
zipOutput.PutNextEntry(entry);
zipOutput.Write(normalBytes, 0, normalBytes.Length);
zipOutput.Finish();
byte[] newBytes = new byte[memOutput.Length];
memOutput.Seek(0, SeekOrigin.Begin);
memOutput.Read(newBytes, 0, newBytes.Length);
zipOutput.Close();
return newBytes;
}
}
Anyone could help me please?
Thank you.
UPDATE 1:
I've tried with defalte function as Shiraz Bhaiji has posted:
public static byte[] Deflate(byte[] data)
{
if (null == data || data.Length < 1) return null;
byte[] compressedBytes;
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
}
}
return compressedBytes;
}
The problem is that to work properly in python code I've to add the "-zlib.MAX_WBITS" argument to decompress as follows:
data_decompressed = zlib.decompress(data, -zlib.MAX_WBITS)
So, my new question is: is it possible to code a deflate method in C# which compression result could be decompressed with zlib.decompress(data) as defaults?
In C# the DeflateStream class supports zlib. See:
https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream?view=netframework-4.8
As you described with your edit, zlib.decompress(data, -zlib.MAX_WBITS) is the correct way to decompress data from C#'s DeflateStream. There are two formats at play here:
deflate - as in specification RFC 1951 - this is what's C# is producing
zlib - as in specification RFC 1950 - this is what's Python is expecting by default
What is the difference between the two? It's small, really:
zlib = [compression flag byte] + [flags byte] + deflate + [adler checksum]
(there are also optional dictionary bytes but we don't have to worry about them)
Therefore, to get zlib format from deflate, we need to prepend two bytes of flags, and append Adler-32 checksum. Luckily we have an answer on stackoverflow for the flags, see What does a zlib header look like? and implementing Adler-32 is not that hard. So suppose you have your MemoryStream ms, we would first write the two flag bytes
ms.Write(new byte[] {0x78,0x9c});
...then we would do exactly what's in your answer
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
}
and, at last, compute the checksum and append it to the end of the stream:
uint a = 0;
uint b = 0;
for(int i = 0; i < data.Length; ++i)
{
a = (a + data[i]) % 65521;
b = (b + a) % 65521;
}
Sadly, I don't know a pretty way of writing uints into the stream. This is an ugly way:
ms.Write(new byte[] { (byte)(b>>8),
(byte)b,
(byte)(a>>8),
(byte)a
});

zlib.net code example for C# that take byte[] as input agument

I spent 3 hours searching for how to uncompress a string using Zlib.net.dll and I did not find anything useful.
Since my string is compressed by the old VB6 program that uses zlib.dll and I do not want to use file access each time I want to uncompress a string.
The problem is you need to know what the original size of the byte[] is before compression.
Or you can use dynamic array for decoding the data.
The code is here:
private string ZlibNetDecompress(string iCompressData, uint OriginalSize)
{
byte[] todecode_byte = Convert.FromBase64String(iCompressData);
byte[] lDecodeData = new byte[OriginalSize];
string lTempoString = System.Text.Encoding.Unicode.GetString(todecode_byte);
todecode_byte = System.Text.Encoding.Default.GetBytes(lTempoString);
string lReVal = "";
MemoryStream outStream = new MemoryStream();
MemoryStream InStream = new MemoryStream(todecode_byte);
zlib.ZOutputStream outZStream = new zlib.ZOutputStream(outStream);
try
{
CopyStream(InStream, outZStream);
lDecodeData = outStream.GetBuffer();
lReVal = System.Text.Encoding.Default.GetString(lDecodeData);
}
finally
{
outZStream.Close();
InStream.Close();
}
return lReVal;
}
private void CopyStream(System.IO.Stream input, System.IO.Stream output)
{
byte[] buffer = new byte[2000];
int len;
while ((len = input.Read(buffer, 0, 2000)) > 0)
{
output.Write(buffer, 0, len);
}
output.Flush();
}
You could use the GZipStreamClass from the framework.
var data = new byte[resultSizeMax];
using (Stream ds = new DeflateStream(stream, CompressionMode.Decompress))
for (var i=0; i< 1000; i+=ds.Read(data, i,1000-i);

Gzip uncompress from string error, The magic number in GZip header is not correct

I am trying to replicate the php function gzuncompress in C#
So far I got part of following code working. see comment and code below.
I thing the tricky bit is happening during byte[] and string convertion.
How can I fix this? and where did I missed??
I am using .Net 3.5 environment
var plaintext = Console.ReadLine();
Console.WriteLine("string to byte[] then to string");
byte[] buff = Encoding.UTF8.GetBytes(plaintext);
var compress = GZip.GZipCompress(buff);
//Uncompress working below
try
{
var unpressFromByte = GZip.GZipUncompress(compress);
Console.WriteLine("uncompress successful by uncompress byte[]");
}catch
{
Console.WriteLine("uncompress failed by uncompress byte[]");
}
var compressString = Encoding.UTF8.GetString(compress);
Console.WriteLine(compressString);
var compressBuff = Encoding.UTF8.GetBytes(compressString);
Console.WriteLine(Encoding.UTF8.GetString(compressBuff));
//Uncompress not working below by using string
//The magic number in GZip header is not correct
try
{
var uncompressFromString = GZip.GZipUncompress(compressBuff);
Console.WriteLine("uncompress successful by uncompress string");
}
catch
{
Console.WriteLine("uncompress failed by uncompress string");
}
code for class Gzip
public static class GZip
{
public static byte[] GZipUncompress(byte[] data)
{
using (var input = new MemoryStream(data))
using (var gzip = new GZipStream(input, CompressionMode.Decompress))
using (var output = new MemoryStream())
{
gzip.CopyTo(output);
return output.ToArray();
}
}
public static byte[] GZipCompress(byte[] data)
{
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return output.ToArray();
}
}
public static long CopyTo(this Stream source, Stream destination)
{
var buffer = new byte[2048];
int bytesRead;
long totalBytes = 0;
while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, bytesRead);
totalBytes += bytesRead;
}
return totalBytes;
}
}
This is inappropriate:
var compressString = Encoding.UTF8.GetString(compress);
compress isn't a UTF-8-encoded piece of text. You should treat it as arbitrary binary data - which isn't appropriate to pass into Encoding.GetString. If you really need to convert arbitrary binary data into text, use Convert.ToBase64String (and then reverse with Convert.FromBase64String):
var compressString = Convert.ToBase64String(compress);
Console.WriteLine(compressString);
var compressBuff = Convert.FromBase64String(compressString);
That may or may not match what PHP does, but it's a safe way of representing arbitrary binary data as text, unlike treating the binary data as if it were valid UTF-8-encoded text.
I am trying to replicate the php function gzuncompress in C#
Then use GZipStream or DeflateStream classes which are built into the .NET framework for this purpose.

Problem with C# Decompression

Have some data in a sybase image type column that I want to use in a C# app. The data has been compressed by Java using the java.util.zip package. I wanted to test that I could decompress the data in C#. So I wrote a test app that pulls it out of the database:
byte[] bytes = (byte[])reader.GetValue(0);
This gives me a compressed byte[] of 2479 length.
Then I pass this to a seemingly standard C# decompression method:
public static byte[] Decompress(byte[] gzBuffer)
{
MemoryStream ms = new MemoryStream();
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
GZipStream zip = new GZipStream(ms, CompressionMode.Decompress);
zip.Read(buffer, 0, buffer.Length);
return buffer;
}
The value for msgLength is 1503501432 which seems way out of range. The original document should be in the range of 5K -50k. Anyway when I use that value to create "buffer" not surprisingly I get an OutOfMemoryException.
What is happening?
Jim
The Java compress method is as follows:
public byte[] compress(byte[] bytes) throws Exception {
byte[] results = new byte[bytes.length];
Deflater deflator = new Deflater();
deflater.setInput(bytes);
deflater.finish();
int len = deflater.deflate(results);
byte[] out = new byte[len];
for(int i=0; i<len; i++) {
out[i] = results[i];
}
return(out);
}
As I cant see your java code, I can only guess you are compressing your data to a zip file stream. Therefore it will obviously fail if you are trying to decompress that stream with a gzip decompression in c#. Either you change your java code to a gzip compression (Example here at the bottom of the page), or you decompress the zip file stream in c# with an appropriate library (e.g. SharpZipLib).
Update
Ok now, I see you are using deflate for the compression in java. So, obviously you have to use the same algorithm in c#: System.IO.Compression.DeflateStream
public static byte[] Decompress(byte[] buffer)
{
using (MemoryStream ms = new MemoryStream(buffer))
using (Stream zipStream = new DeflateStream(ms,
CompressionMode.Decompress, true))
{
int initialBufferLength = buffer.Length * 2;
byte[] buffer = new byte[initialBufferLength];
bool finishedExactly = false;
int read = 0;
int chunk;
while (!finishedExactly &&
(chunk = zipStream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
if (read == buffer.Length)
{
int nextByte = zipStream.ReadByte();
// End of Stream?
if (nextByte == -1)
{
finishedExactly = true;
}
else
{
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
}
if (!finishedExactly)
{
byte[] final = new byte[read];
Array.Copy(buffer, final, read);
buffer = final;
}
}
return buffer;
}

Categories