GZipStream create a invalid charaset

GZipStream create a invalid charaset - c#

I have a simple function to create a gzip file. This function work fine and pass the unit test. Then I hosted the generated filed at amazon s3.
But it produce some invalid character when the input value contain a unicode character.
eg.アームバンド & ケース > 9ÎvøS‰
public static void CompressStringToFile(string fileName, string value)
{
// Use GZipStream to write compressed bytes to target file.
using (FileStream f2 = new FileStream(fileName, FileMode.Create))
using (GZipStream gz = new GZipStream(f2,CompressionMode.Compress, false))
{
byte[] b = Encoding.Unicode.GetBytes(value);
gz.Write(b, 0, b.Length);
gz.Flush();
}
}

The output of GZip compression isn't meant to be text. It's effectively arbitrary binary content, which you should only use to decompress it to the original binary content... which in your case is UTF-16-encoded text. You shouldn't expect to be able to read the gzip file as a text file.
GZip itself doesn't interpret the (binary) data that it's given - it just compresses it, so it can be faithfully decompressed later on. GZip couldn't care less whether it's text, an image, a sound file, whatever: it just does the best it can to compress it.

Related

Is it possible to get byte identical compressed files with Zlib in Python and Ionic.Zlib in c#?

I have an uncompressed file that has been compressed with Zlib in Python and would like to compress it with Ionic.Zlib in c# and get the exact same compressed output file as Python's.
I was using System.IO.compression and DeflateStream at first, but the result was nowhere near identical. I am now using the Ionic.Zlib library for c# and getting closer to my goal, but some bytes are still different (or a lot of them, depending on the file.
This is the Python code:
import zlib
def compress():
with open("compressedFile.dat", "wb") as compressedFile:
with open("fileToCompress.txt", "rb") as fileToCompress:
data = fileToCompress.read()
compressedData = zlib.compress(data, 9)
compressedFile.write(compressedData)
and this is what I wrote in c# to try to get the same compressed output file:
using System;
using System.IO;
using Ionic.Zlib;
static class myClass{
static void compress(){
BinaryWriter compressedFile = new BinaryWriter(new FileStream("compressedFile.dat", FileMode.Create));
var compressedData = Ionic.Zlib.ZlibStream.CompressBuffer(File.ReadAllBytes("fileToCompress.txt"));
compressedFile.Seek(0, SeekOrigin.Begin);
compressedFile.Write(compressedData);
}
static void Main(){
compress();
}
}
Compression level is the same (9) and the compression header (first 2 bytes) are identical in both compressed files (78 DA). Next 3 bytes seem to be identical as well (EC 7D 0B) and then the rest really depends on the input uncompressed file... The first one I am trying to compress only has 2 bytes that are different among the 4 last bytes: **6E A5** 55 53 (Python) vs **6C 02** 55 53 (c#).
Thank you!
EDIT: SOLVED
For anyone who would like to know how to get the exact same compression as Python's Zlib.compress in c#, use zlibnet.
Get zlibnet.dll from one of zlibnet's releases and use ZLibNet.ZLibStream(<output stream>, CompressionMode.Compress, CompressionLevel.Level9) change the CompressionLevel.Level9 to the one used in Python.
Example:
MemoryStream memoryStream = new MemoryStream();
using(var compressor = new ZLibNet.ZLibStream(memoryStream, CompressionMode.Compress, CompressionLevel.Level9)){
fileStream.CopyTo(compressor);
compressor.Close();
}
You need only write memoryStream to a file now.

For anyone who would like to know how to get the exact same compression as Python's Zlib.compress in c#, use zlibnet.
Get zlibnet.dll from one of zlibnet's releases and use ZLibNet.ZLibStream(<output stream>, CompressionMode.Compress, CompressionLevel.Level9) change the CompressionLevel.Level9 to the one used in Python.
Example:
MemoryStream memoryStream = new MemoryStream();
using(var compressor = new ZLibNet.ZLibStream(memoryStream, CompressionMode.Compress, CompressionLevel.Level9)){
fileStream.CopyTo(compressor);
compressor.Close();
}
You need only write memoryStream to a file now.

Decompress a stream of bytes using deflatestream [duplicate]

Are System.IO.Compression.GZipStream or System.IO.Compression.Deflate compatible with zlib compression?

I ran into this issue with Git objects. In that particular case, they store the objects as deflated blobs with a Zlib header, which is documented in RFC 1950. You can make a compatible blob by making a file that contains:
Two header bytes (CMF and FLG from RFC 1950) with the values 0x78 0x01
CM = 8 = deflate
CINFO = 7 = 32Kb window
FCHECK = 1 = checksum bits for this header
The output of the C# DeflateStream
An Adler32 checksum of the input data to the DeflateStream, big-endian format (MSB first)
I made my own Adler implementation
public class Adler32Computer
{
private int a = 1;
private int b = 0;
public int Checksum
{
get
{
return ((b * 65536) + a);
}
}
private static readonly int Modulus = 65521;
public void Update(byte[] data, int offset, int length)
{
for (int counter = 0; counter < length; ++counter)
{
a = (a + (data[offset + counter])) % Modulus;
b = (b + a) % Modulus;
}
}
}
And that was pretty much it.

DotNetZip includes a DeflateStream, a ZlibStream, and a GZipStream, to handle RFC 1950, 1951, and 1952. The all use the DEFLATE Algorithm but the framing and header bytes are different for each one.
As an advantage, the streams in DotNetZip do not exhibit the anomaly of expanding data size under compression, reported against the built-in streams. Also, there is no built-in ZlibStream, whereas DotNetZip gives you that, for good interop with zlib.

From MSDN about System.IO.Compression.GZipStream:
This class represents the gzip data format, which uses an industry standard algorithm for lossless file compression and decompression.
From the zlib FAQ:
The gz* functions in zlib on the other hand use the gzip format.
So zlib and GZipStream should be interoperable, but only if you use the zlib functions for handling the gzip-format.
System.IO.Compression.Deflate and zlib are reportedly not interoperable.
If you need to handle zip files (you probably don't, but someone else might need this) you need to use SharpZipLib or another third-party library.

I've used GZipStream to compress the output from the .NET XmlSerializer and it has worked perfectly fine to decompress the result with gunzip (in cygwin), winzip and another GZipStream.
For reference, here's what I did in code:
FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write);
using (GZipStream gzStream = new GZipStream(fs, CompressionMode.Compress))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
serializer.Serialize(gzStream, myData);
}
Then, to decompress in c#
FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (Stream input = new GZipStream(fs, CompressionMode.Decompress))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
myData = (MyDataType) serializer.Deserialize(input);
}
Using the 'file' utility in cygwin reveals that there is indeed a difference between the same file compressed with GZipStream and with GNU GZip (probably header information as others has stated in this thread). This difference, however, seems to not matter in practice.

gzip is deflate + some header/footer data, like a checksum and length, etc. So they're not compatible in the sense that one method can use a stream from the other, but they employ the same compression algorithm.

They just compressing the data using zlib or deflate algorithms , but does not provide the output for some specific file format. This means that if you store the stream as-is to the hard drive most probably you will not be able to open it using some application (gzip or winrar) because file headers (magic number, etc ) are not included in stream an you should write them yourself.

Starting from .NET Framework 4.5 the System.IO.Compression.DeflateStream class uses the zlib library.
From the class's MSDN article:
This class represents the Deflate algorithm, which is an industry-standard algorithm for lossless file compression and decompression. Starting with the .NET Framework 4.5, the DeflateStream class uses the zlib library. As a result, it provides a better compression algorithm and, in most cases, a smaller compressed file than it provides in earlier versions of the .NET Framework.

I agree with andreas. You probably won't be able to open the file in an external tool, but if that tool expects a stream you might be able to use it. You would also be able to deflate the file back using the same compression class.

byte array to pdf

I am trying to convert content of a file stored in a sql column to a pdf.
I use the following piece of code:
byte[] bytes;
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, fileContent);
bytes = ms.ToArray();
System.IO.File.WriteAllBytes("hello.pdf", bytes);
The pdf generated is corrupt in the sense that when I open the pdf in notepad++, I see some junk header (which is same irrespective of the fileContent). The junk header is NUL SOH NUL NUL NUL ....

You shouldn't be using the BinaryFormatter for this - that's for serializing .Net types to a binary file so they can be read back again as .Net types.
If it's stored in the database, hopefully, as a varbinary - then all you need to do is get the byte array from that (that will depend on your data access technology - EF and Linq to Sql, for example, will create a mapping that makes it trivial to get a byte array) and then write it to the file as you do in your last line of code.
With any luck - I'm hoping that fileContent here is the byte array? In which case you can just do
System.IO.File.WriteAllBytes("hello.pdf", fileContent);

Usually this happens if something is wrong with the byte array.
File.WriteAllBytes("filename.PDF", Byte[]);
This creates a new file, writes the specified byte array to the file, and then closes the file. If the target file already exists, it is overwritten.
Asynchronous implementation of this is also available.
public static System.Threading.Tasks.Task WriteAllBytesAsync
(string path, byte[] bytes, System.Threading.CancellationToken cancellationToken = null);

zipping memory stream in silverlight

im using the SLsharpziplip to try to compress a byte[] before sending it on the network to a server. the byte[] contains jpeg data which is already compressed by the jpeg encoder.
you may ask , if jpeg already compress the image, why do i need to compress it more, well because i tried it and it worked.
here is what happened:
I wrote the bytes in the byte[] to a txt file , the size of the txt file is ~5k , i compressed it with winzip and the result file was ~2k , so thats about 50% reduction in the file size. however , when i try to do it with the byte[] and use the slsharziplip to compress the byte[] , the reduction in size is minimal.
here is the code i used:
MemoryStream msCompressed = new MemoryStream();
GZipOutputStream gzCompressed = new GZipOutputStream(msCompressed);
gzCompressed.SetLevel(9);
// allframes is a byte array.
gzCompressed.Write(allframes, 0, allframes.Length);
gzCompressed.Finish();
gzCompressed.IsStreamOwner = false;
gzCompressed.Close();
// i used byte[] compresseddata = msCompressed.ToArray() but i thought i'll try this too.
msCompressed.Seek(0, SeekOrigin.Begin);
byte[] compresseddata = new byte[msCompressed.Length];
msCompressed.Read(compresseddata, 0, compresseddata.Length);
==================================================================================
from debugging the code, i can see that the difference of size between allframes.Length and compresseddata.lenght is minimal. but if that same data is written to a text file and zipped with winzip its size is reduced by 50%.
this is how i write the same data to a txt file:
TextWriter tw = new StreamWriter(MainPage.fs); // fs is a filestream.
foreach (byte b in allframes )
{
tw.Write(b);
}
===============================================================================
am i doing something wrong?! am i misunderstanding something!!
thanks up front :)

You are not comparing like with like.
There is no point in compressing JPEG image data as it is compressed already. Writing it out to a text file won't give you the same file size as writing it to a binary file.

Probably not, I would imagine WinZip has a superior zip algorithm to SLSharpZipLib. You can try varying the compression ratio but other than that, I would try different Silverlight compatible zip libraries.
JPEG as you've correctly pointed out is already a highly compressed file type, so finding a compression algorithm that can find further redundancy is going to be difficult.
Best regards,

Why does gzip/deflate compressing a small file result in many trailing zeroes?

I'm using the following code to compress a small (~4kB) HTML file in C#.
byte[] fileBuffer = ReadFully(inFile, ResponsePacket.maxResponsePayloadLength); // Read the entire requested HTML file into a memory buffer
inFile.Close(); // Close the requested HTML file
byte[] payload;
using (MemoryStream compMS = new MemoryStream()) // Create a new memory stream to hold the compressed HTML data
{
using (GZipStream gzip = new GZipStream(compMS, CompressionMode.Compress)) // Create a new GZip object pointing to the empty memory stream
{
gzip.Write(fileBuffer, 0, fileBuffer.Length); // Compress the file buffer and write it to the empty memory stream
gzip.Close(); // Close the GZip object
}
payload = compMS.GetBuffer(); // Write the compressed file buffer data in the memory stream to a byte buffer
}
The resulting compressed data is about 2k, but about half of it is just zeroes. This is for a very bandwidth sensitive application (which is why I'm bothering to compress 4kB in the first place), so the extra 1kB of zeroes is wasted valuable space. My best guess would be that the compression algorithm is padding out the data to a block boundary. If so, is there any way to override this behavior or change the block size? I get the same results with vanilla .NET GZipStream and zlib's GZipStream, as well as DeflateStream.

Wrong MemoryStream method. GetBuffer() returns the underlying buffer, it is always larger (or exactly as large) as the data in the stream. Very efficient because no copy needs to be made.
But you need the ToArray() method here. Or use the Length property.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

GZipStream create a invalid charaset - c#

Related

Is it possible to get byte identical compressed files with Zlib in Python and Ionic.Zlib in c#?

Decompress a stream of bytes using deflatestream [duplicate]

byte array to pdf

zipping memory stream in silverlight

Why does gzip/deflate compressing a small file result in many trailing zeroes?

Categories

Resources