GZIP Compression Java/C# difference in compression issue

GZIP Compression Java/C# difference in compression issue - c#

I'm adding in compression to my project with the aim of improving speed in the 3G Data communication from Android app to ASP.NET C# Server.
The methods I've researched/written/tested works. However, there's added white space after compression. And they are different as well. This really puzzles me.
Is it something to do with different implementation of the GZIP classes in both Java/ASP.NET C#? Is it something that I should be concerned with or do I just move on with .Trim() and .trim() after decompressing?
Java, compressing "Mary had a little lamb" gives:
Compressed data length: 42
Base64 Compressed String: H4sIAAAAAAAAAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA
protected static byte[] GZIPCompress(byte[] data) {
try {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
GZIPOutputStream gZIPOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gZIPOutputStream.write(data);
gZIPOutputStream.close();
return byteArrayOutputStream.toByteArray();
} catch(IOException e) {
Log.i("output", "GZIPCompress Error: " + e.getMessage());
return null;
}
}
ASP.NET C#, compressing "Mary had a little lamb"
Compressed data length: 137
Base64 Compressed String: H4sIAAAAAAAEAO29B2AcSZYlJi9tynt/SvVK1+B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee++999577733ujudTif33/8/XGZkAWz2zkrayZ4hgKrIHz9+fB8/Ir7I6ut0ns3SLC2Lti3ztMwWk/8Hesi6MhYAAAA=
public static byte[] GZIPCompress(byte[] data)
{
using (MemoryStream memoryStream = new MemoryStream())
{
using (GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
{
gZipStream.Write(data, 0, data.Length);
}
return memoryStream.ToArray();
}
}

I get 42 bytes on .NET as well. I suspect you're using an old version of .NET which had a flaw in its compression scheme.
Here's my test app using your code:
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
class Program
{
static void Main(string[] args)
{
var uncompressed = Encoding.UTF8.GetBytes("Mary had a little lamb");
var compressed = GZIPCompress(uncompressed);
Console.WriteLine(compressed.Length);
Console.WriteLine(Convert.ToBase64String(compressed));
}
static byte[] GZIPCompress(byte[] data)
{
using (var memoryStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(memoryStream,
CompressionMode.Compress))
{
gZipStream.Write(data, 0, data.Length);
}
return memoryStream.ToArray();
}
}
}
Results:
42
H4sIAAAAAAAEAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA
This is exactly the same as the Java data.
I'm using .NET 4.5. I suggest you try running the above code on your machine, and compare the results.
I've just decompressed the base64 data you provided, and it is a valid "compressed" form of "Mary had a little lamb", with 22 bytes in the uncompressed data. That surprises me... and reinforces my theory that it's a framework version difference.
EDIT: Okay, this is definitely a framework version difference. If I compile with the .NET 3.5 compiler, then use an app.config which forces it to run with that version of the framework, I see 137 bytes as well. Given comments, it looks like this was only fixed in .NET 4.5.

Related

Convert a wav file to 8000Hz 16Bit Mono Wav

I need to convert a wav file to 8000Hz 16Bit Mono Wav. I already have a code, which works well with NAudio library, but I want to use MemoryStream instead of temporary file.
using System.IO;
using NAudio.Wave;
static void Main()
{
var input = File.ReadAllBytes("C:/input.wav");
var output = ConvertWavTo8000Hz16BitMonoWav(input);
File.WriteAllBytes("C:/output.wav", output);
}
public static byte[] ConvertWavTo8000Hz16BitMonoWav(byte[] inArray)
{
using (var mem = new MemoryStream(inArray))
using (var reader = new WaveFileReader(mem))
using (var converter = WaveFormatConversionStream.CreatePcmStream(reader))
using (var upsampler = new WaveFormatConversionStream(new WaveFormat(8000, 16, 1), converter))
{
// todo: without saving to file using MemoryStream or similar
WaveFileWriter.CreateWaveFile("C:/tmp_pcm_8000_16_mono.wav", upsampler);
return File.ReadAllBytes("C:/tmp_pcm_8000_16_mono.wav");
}
}

Not sure if this is the optimal way, but it works...
public static byte[] ConvertWavTo8000Hz16BitMonoWav(byte[] inArray)
{
using (var mem = new MemoryStream(inArray))
{
using (var reader = new WaveFileReader(mem))
{
using (var converter = WaveFormatConversionStream.CreatePcmStream(reader))
{
using (var upsampler = new WaveFormatConversionStream(new WaveFormat(8000, 16, 1), converter))
{
byte[] data;
using (var m = new MemoryStream())
{
upsampler.CopyTo(m);
data = m.ToArray();
}
using (var m = new MemoryStream())
{
// to create a propper WAV header (44 bytes), which begins with RIFF
var w = new WaveFileWriter(m, upsampler.WaveFormat);
// append WAV data body
w.Write(data,0,data.Length);
return m.ToArray();
}
}
}
}
}
}

It might be added and sorry I can't comment yet due to lack of points. That NAudio ALWAYS writes 46 byte headers which in certain situations can cause crashes. I want to add this in case someone encouters this while searching for a clue why naudio wav files only start crashing certain programs.
I encoutered this problem after figuring out how to convert and/or sample wav with NAudio and was stuck after for 2 days now and only figured it out with a hex editor.
(The 2 extra bytes are located at byte 37 and 38 right before the data subchunck [d,a,t,a,size<4bytes>].
Here is a comparison of two wave file headers left is saved by NAudio 46 bytes; right by Audacity 44 bytes
You can check this back by looking at the NAudio src in WaveFormat.cs at line 310 where instead of 16 bytes for the fmt chunck 18+extra are reserved (+extra because there are some wav files which even contain bigger headers than 46 bytes) but NAudio always seems to write 46 byte headers and never 44 (MS standard). It may also be noted that in fact NAudio is able to read 44 byte headers (line 210 in WaveFormat.cs)

Compress a string using GZip, the string is not shorter

I used the following code to compress a string, but the string is not shorter. Can you explain why?
private string Compress(string str)
{
try
{
String returnValue;
byte[] buffer = Encoding.ASCII.GetBytes(str);
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
using (StreamReader sReader = new StreamReader(ms, Encoding.ASCII))
{
returnValue = sReader.ReadToEnd();
}
}
}
return returnValue;
}
catch
{
return str;
}
}

Ignoring issues in the code - there are multiple possible scenarios when this can happen.
Simplified explanation of compression algorithm - compression is based on the fact that data you are trying to compress contain redundant values - patterns which can be recognized by the compression algorithm and can be "shortened" by expressing the redundant values more concisely.
Some scenarios when the compressed result can be larger then the input:
1) Input is too short - compression algorithms have some data overhead and considering the short input, it is unable to compress it effectively. So you have some data overhead from the compression mechanism + original data.
2) Input is already compressed - again, compression algorithms have some data overhead and when is the input already compressed - it is unable to compress it effectively.
3) Input is too random - considering the input is generated by some random generator, the compression algorithm is unable to compress it effectively - no patterns can be recognized.

compressing a string in C# and uncompressing in python

I am trying to compress a large string on a client program in C# (.net 4) and send it to a server (django, python 2.7) using a PUT request.
Ideally I want to use the standard library at both ends, so I am trying to use gzip.
My C# code is:
public static string Compress(string s) {
var bytes = Encoding.Unicode.GetBytes(s);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
msi.CopyTo(gs);
}
return Convert.ToBase64String(mso.ToArray());
}
}
The python code is:
s = base64.standard_b64decode(request)
buff = cStringIO.StringIO(s)
with gzip.GzipFile(fileobj=buff) as gz:
decompressed_data = gz.read()
It's almost working, but the output is: {▯"▯c▯h▯a▯n▯g▯e▯d▯"▯} when it should be {"changed"}, i.e. every other letter is something weird.
If I take out every other character by doing decompressed_data[::2], then it works, but it's a bit of a hack, and clearly there is something else wrong.
I'm wondering if I need to base64 encode it at all for a PUT request? Is this only necessary for POST?

I think the main problem might be C# uses UTF-16 encoded strings. This may yield a problem similar to yours. As any other encoding problem, we might need a little luck here but I guess you can solve this by doing:
decompressed_data = gz.read().decode('utf-16')
There, decompressed_data should be Unicode and you can treat it as such for further work.
UPDATE: This worked for me:
C Sharp
static void Main(string[] args)
{
FileStream f = new FileStream("test", FileMode.CreateNew);
using (StreamWriter w = new StreamWriter(f))
{
w.Write(Compress("hello"));
}
}
public static string Compress(string s)
{
var bytes = Encoding.Unicode.GetBytes(s);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gs);
}
return Convert.ToBase64String(mso.ToArray());
}
}
Python
import base64
import cStringIO
import gzip
f = open('test','rb')
s = base64.standard_b64decode(f.read())
buff = cStringIO.StringIO(s)
with gzip.GzipFile(fileobj=buff) as gz:
decompressed_data = gz.read()
print decompressed_data.decode('utf-16')
Without decode('utf-16) it printed in the console:
>>>h e l l o
with it it did well:
>>>hello
Good luck, hope this helps!

It's almost working, but the output is: {▯"▯c▯h▯a▯n▯g▯e▯d▯"▯} when it should be {"changed"}
That's because you're using Encoding.Unicode to convert the string to bytes to start with.
If you can tell Python which encoding to use, you could do that - otherwise you need to use an encoding on the C# side which matches what Python expects.
If you can specify it on both sides, I'd suggest using UTF-8 rather than UTF-16. Even though you're compressing, it wouldn't hurt to make the data half the size (in many cases) to start with :)
I'm also somewhat suspicious of this line:
buff = cStringIO.StringIO(s)
s really isn't text data - it's compressed binary data, and should be treated as such. It may be okay - it's just worth checking whether there's a better way.

Deflating a stream with SharpZipLib has different behaviors in C# vs. VB.NET

I'm having troubles in writing a static Deflate extension method, that i would use to deflate a string, using BZip2 alghorithm of the SharpZipLib library (runtime version: v2.0.50727).
I'm doing it using .NET framework 4.
This is my VB.NET code:
Public Function Deflate(ByVal text As String)
Try
Dim compressedData As Byte() = Convert.FromBase64String(text)
System.Diagnostics.Debug.WriteLine(String.Concat("Compressed text data size: ", text.Length.ToString()))
System.Diagnostics.Debug.WriteLine(String.Concat("Compressed byte data size: ", compressedData.Length.ToString()))
Using compressedStream As MemoryStream = New MemoryStream(compressedData)
Using decompressionStream As BZip2OutputStream = New BZip2OutputStream(compressedStream)
Dim cleanData() As Byte
Using decompressedStream As MemoryStream = New MemoryStream()
decompressionStream.CopyTo(decompressedStream) // HERE THE ERROR!
cleanData = decompressedStream.ToArray()
End Using
decompressionStream.Close()
compressedStream.Close()
Dim cleanText As String = Encoding.UTF8.GetString(cleanData, 0, cleanData.Length)
System.Diagnostics.Debug.WriteLine(String.Concat("After decompression text data size: ", cleanText.Length.ToString()))
System.Diagnostics.Debug.WriteLine(String.Concat("After decompression byte data size: ", cleanData.Length.ToString()))
Return cleanText
End Using
End Using
Catch
Return String.Empty
End Try
End Function
The strange thing is that I wrote a C# counterpart of the same method, and it works perfectly!!! This is the code:
public static string Deflate(this string text)
{
try
{
byte[] compressedData = Convert.FromBase64String(text);
System.Diagnostics.Debug.WriteLine(String.Concat("Compressed text data size: ", text.Length.ToString()));
System.Diagnostics.Debug.WriteLine(String.Concat("Compressed byte data size: ", compressedData.Length.ToString()));
using (MemoryStream compressedStream = new MemoryStream(compressedData))
using (BZip2InputStream decompressionStream = new BZip2InputStream(compressedStream))
{
byte[] cleanData;
using (MemoryStream decompressedStream = new MemoryStream())
{
decompressionStream.CopyTo(decompressedStream);
cleanData = decompressedStream.ToArray();
}
decompressionStream.Close();
compressedStream.Close();
string cleanText = Encoding.UTF8.GetString(cleanData, 0, cleanData.Length);
System.Diagnostics.Debug.WriteLine(String.Concat("After decompression text data size: ", cleanText.Length.ToString()));
System.Diagnostics.Debug.WriteLine(String.Concat("After decompression byte data size: ", cleanData.Length.ToString()));
return cleanText;
}
}
catch(Exception e)
{
return String.Empty;
}
}
In VB.NET version I get this error: "Stream does not support reading." (see the code to understand where it comes!)
Where is the mistake?!! I cannot understand what's the difference between the two methods...
Thank you very much!

A game of spot the difference shows that in the first you are using BZip2OutputStream whereas the second is BZip2InputStream.
It seems reasonable that the output stream is used to write to and so as it says is not readable.
For what its worth there are a lot of good comparison tools out there. They won't cope with syntax different but the way the matching works it shows up quite well when you are using totally different objects (in this case at least). The one I personally use and recommend is Beyond Compare

You switched BZip2OutputStream and BZip2InputStream

In one version you are using a BZip2InputStream and in the other a BZip2OutputStream.

Modifying XMP data with C#

I'm using C# in ASP.NET version 2. I'm trying to open an image file, read (and change) the XMP header, and close it back up again. I can't upgrade ASP, so WIC is out, and I just can't figure out how to get this working.
Here's what I have so far:
Bitmap bmp = new Bitmap(Server.MapPath(imageFile));
MemoryStream ms = new MemoryStream();
StreamReader sr = new StreamReader(Server.MapPath(imageFile));
*[stuff with find and replace here]*
byte[] data = ToByteArray(sr.ReadToEnd());
ms = new MemoryStream(data);
originalImage = System.Drawing.Image.FromStream(ms);
Any suggestions?

How about this kinda thing?
byte[] data = File.ReadAllBytes(path);
... find & replace bit here ...
File.WriteAllBytes(path, data);
Also, i really recommend against using System.Bitmap in an asp.net process, as it leaks memory and will crash/randomly fail every now and again (even MS admit this)
Here's the bit from MS about why System.Drawing.Bitmap isn't stable:
http://msdn.microsoft.com/en-us/library/system.drawing.aspx
"Caution:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions."

Part 1 of the XMP spec 2012, page 10 specifically talks about how to edit a file in place without needing to understand the surrounding format (although they do suggest this as a last resort). The embedded XMP packet looks like this:
<?xpacket begin="■" id="W5M0MpCehiHzreSzNTczkc9d"?>
... the serialized XMP as described above: ...
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf= ...>
...
</rdf:RDF>
</x:xmpmeta>
... XML whitespace as padding ...
<?xpacket end="w"?>
In this example, ‘■’ represents the
Unicode “zero width non-breaking space
character” (U+FEFF) used as a
byte-order marker.
The (XMP Spec 2010, Part 3, Page 12) also gives specific byte patterns (UTF-8, UTF16, big/little endian) to look for when scanning the bytes. This would complement Chris' answer about reading the file in as a giant byte stream.

You can use the following functions to read/write the binary data:
public byte[] GetBinaryData(string path, int bufferSize)
{
MemoryStream ms = new MemoryStream();
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read))
{
int bytesRead;
byte[] buffer = new byte[bufferSize];
while((bytesRead = fs.Read(buffer,0,bufferSize))>0)
{
ms.Write(buffer,0,bytesRead);
}
}
return(ms.ToArray());
}
public void SaveBinaryData(string path, byte[] data, int bufferSize)
{
using (FileStream fs = File.Open(path, FileMode.Create, FileAccess.Write))
{
int totalBytesSaved = 0;
while (totalBytesSaved<data.Length)
{
int remainingBytes = Math.Min(bufferSize, data.Length - totalBytesSaved);
fs.Write(data, totalBytesSaved, remainingBytes);
totalBytesSaved += remainingBytes;
}
}
}
However, loading entire images to memory would use quite a bit of RAM. I don't know much about XMP headers, but if possible you should:
Load only the headers in memory
Manipulate the headers in memory
Write the headers to a new file
Copy the remaining data from the original file

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

GZIP Compression Java/C# difference in compression issue - c#

Related

Convert a wav file to 8000Hz 16Bit Mono Wav

Compress a string using GZip, the string is not shorter

compressing a string in C# and uncompressing in python

Deflating a stream with SharpZipLib has different behaviors in C# vs. VB.NET

Modifying XMP data with C#

Categories

Resources