How to determine size of string, and compress it - c#

I'm currently developing an application in C# that uses Amazon SQS
The size limit for a message is 8kb.
I have a method that is something like:
public void QueueMessage(string message)
Within this method, I'd like to first of all, compress the message (most messages are passed in as json, so are already fairly small)
If the compressed string is still larger than 8kb, I'll store it in S3.
My question is:
How can I easily test the size of a string, and what's the best way to compress it?
I'm not looking for massive reductions in size, just something nice and easy - and easy to decompress the other end.

To know the "size" (in kb) of a string we need to know the encoding. If we assume UTF8, then it is (not including BOM etc) like below (but swap the encoding if it isn't UTF8):
int len = Encoding.UTF8.GetByteCount(longString);
Re packing it; I would suggest GZIP via UTF8, optionally followed by base-64 if it has to be a string:
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
{
byte[] raw = Encoding.UTF8.GetBytes(longString);
gzip.Write(raw, 0, raw.Length);
gzip.Close();
}
byte[] zipped = ms.ToArray(); // as a BLOB
string base64 = Convert.ToBase64String(zipped); // as a string
// store zipped or base64
}

Give unzip bytes to this function.The best I could come up with was
public static byte[] ZipToUnzipBytes(byte[] bytesContext)
{
byte[] arrUnZipFile = null;
if (bytesContext.Length > 100)
{
using (var inFile = new MemoryStream(bytesContext))
{
using (var decompress = new GZipStream(inFile, CompressionMode.Decompress, false))
{
byte[] bufferWrite = new byte[4];
inFile.Position = (int)inFile.Length - 4;
inFile.Read(bufferWrite, 0, 4);
inFile.Position = 0;
arrUnZipFile = new byte[BitConverter.ToInt32(bufferWrite, 0) + 100];
decompress.Read(arrUnZipFile, 0, arrUnZipFile.Length);
}
}
}
return arrUnZipFile;
}

Related

How to compress data in C# to be decompressed in zlib python

I have a python zlib decompressor that takes default parameters as follows, where data is string:
import zlib
data_decompressed = zlib.decompress(data)
But, I don't know how I can compress a string in c# to be decompressed in python. I've tray the next piece of code but when I trie to decompresse 'incorrect header check' exception is trown.
static byte[] ZipContent(string entryName)
{
// remove whitespace from xml and convert to byte array
byte[] normalBytes;
using (StringWriter writer = new StringWriter())
{
//xml.Save(writer, SaveOptions.DisableFormatting);
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
normalBytes = encoding.GetBytes(writer.ToString());
}
// zip into new, zipped, byte array
using (Stream memOutput = new MemoryStream())
using (ZipOutputStream zipOutput = new ZipOutputStream(memOutput))
{
zipOutput.SetLevel(6);
ZipEntry entry = new ZipEntry(entryName);
entry.CompressionMethod = CompressionMethod.Deflated;
entry.DateTime = DateTime.Now;
zipOutput.PutNextEntry(entry);
zipOutput.Write(normalBytes, 0, normalBytes.Length);
zipOutput.Finish();
byte[] newBytes = new byte[memOutput.Length];
memOutput.Seek(0, SeekOrigin.Begin);
memOutput.Read(newBytes, 0, newBytes.Length);
zipOutput.Close();
return newBytes;
}
}
Anyone could help me please?
Thank you.
UPDATE 1:
I've tried with defalte function as Shiraz Bhaiji has posted:
public static byte[] Deflate(byte[] data)
{
if (null == data || data.Length < 1) return null;
byte[] compressedBytes;
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
}
}
return compressedBytes;
}
The problem is that to work properly in python code I've to add the "-zlib.MAX_WBITS" argument to decompress as follows:
data_decompressed = zlib.decompress(data, -zlib.MAX_WBITS)
So, my new question is: is it possible to code a deflate method in C# which compression result could be decompressed with zlib.decompress(data) as defaults?
In C# the DeflateStream class supports zlib. See:
https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream?view=netframework-4.8
As you described with your edit, zlib.decompress(data, -zlib.MAX_WBITS) is the correct way to decompress data from C#'s DeflateStream. There are two formats at play here:
deflate - as in specification RFC 1951 - this is what's C# is producing
zlib - as in specification RFC 1950 - this is what's Python is expecting by default
What is the difference between the two? It's small, really:
zlib = [compression flag byte] + [flags byte] + deflate + [adler checksum]
(there are also optional dictionary bytes but we don't have to worry about them)
Therefore, to get zlib format from deflate, we need to prepend two bytes of flags, and append Adler-32 checksum. Luckily we have an answer on stackoverflow for the flags, see What does a zlib header look like? and implementing Adler-32 is not that hard. So suppose you have your MemoryStream ms, we would first write the two flag bytes
ms.Write(new byte[] {0x78,0x9c});
...then we would do exactly what's in your answer
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
}
and, at last, compute the checksum and append it to the end of the stream:
uint a = 0;
uint b = 0;
for(int i = 0; i < data.Length; ++i)
{
a = (a + data[i]) % 65521;
b = (b + a) % 65521;
}
Sadly, I don't know a pretty way of writing uints into the stream. This is an ugly way:
ms.Write(new byte[] { (byte)(b>>8),
(byte)b,
(byte)(a>>8),
(byte)a
});

Decode Base64 and Inflate Zlib compressed XML

Sorry for the long post, will try to make this as short as possible.
I'm consuming a json API (which has zero documentation of course) which returns something like this:
{
uncompressedlength: 743637,
compressedlength: 234532,
compresseddata: "lkhfdsbjhfgdsfgjhsgfjgsdkjhfgj"
}
The data (xml in this case) is compressed and then base64 encoded data which I am attempting to extract. All I have is their demo code written in perl to decode it:
use Compress::Zlib qw(uncompress);
use MIME::Base64 qw(decode_base64);
my $uncompresseddata = uncompress(decode_base64($compresseddata));
Seems simple enough.
I've tried a number of methods to decode the base64:
private string DecodeFromBase64(string encodedData)
{
byte[] encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
string returnValue = System.Text.Encoding.Unicode.GetString(encodedDataAsBytes);
return returnValue;
}
public string base64Decode(string data)
{
try
{
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder utf8Decode = encoder.GetDecoder();
byte[] todecode_byte = Convert.FromBase64String(data);
int charCount = utf8Decode.GetCharCount(todecode_byte, 0, todecode_byte.Length);
char[] decoded_char = new char[charCount];
utf8Decode.GetChars(todecode_byte, 0, todecode_byte.Length, decoded_char, 0);
string result = new String(decoded_char);
return result;
}
catch (Exception e)
{
throw new Exception("Error in base64Decode" + e.Message);
}
}
And I have tried using Ionic.Zip.dll (DotNetZip?) and zlib.net to inflate the Zlib compression. But everything errors out. I am trying to track down where the problem is coming from. Is it the base64 decode or the Inflate?
I always get an error when inflating using zlib: I get a bad Magic Number error using zlib.net and I get "Bad state (invalid stored block lengths)" when using DotNetZip:
string decoded = DecodeFromBase64(compresseddata);
string decompressed = UnZipStr(GetBytes(decoded));
public static string UnZipStr(byte[] input)
{
using (MemoryStream inputStream = new MemoryStream(input))
{
using (Ionic.Zlib.DeflateStream zip =
new Ionic.Zlib.DeflateStream(inputStream, Ionic.Zlib.CompressionMode.Decompress))
{
using (StreamReader reader =
new StreamReader(zip, System.Text.Encoding.UTF8))
{
return reader.ReadToEnd();
}
}
}
}
After reading this:
http://george.chiramattel.com/blog/2007/09/deflatestream-block-length-does-not-match.html
And listening to one of the comments. I changed the code to this:
MemoryStream memStream = new MemoryStream(Convert.FromBase64String(compresseddata));
memStream.ReadByte();
memStream.ReadByte();
DeflateStream deflate = new DeflateStream(memStream, CompressionMode.Decompress);
string doc = new StreamReader(deflate, System.Text.Encoding.UTF8).ReadToEnd();
And it's working fine.
This was the culprit:
http://george.chiramattel.com/blog/2007/09/deflatestream-block-length-does-not-match.html
With skipping the first two bytes I was able to simplify it to:
MemoryStream memStream = new MemoryStream(Convert.FromBase64String(compresseddata));
memStream.ReadByte();
memStream.ReadByte();
DeflateStream deflate = new DeflateStream(memStream, CompressionMode.Decompress);
string doc = new StreamReader(deflate, System.Text.Encoding.UTF8).ReadToEnd();
First, use System.IO.Compression.DeflateStream to re-inflate the data. You should be able to use a MemoryStream as the input stream. You can create a MemoryStream using the byte[] result of Convert.FromBase64String.
You are likely causing all kinds of trouble trying to convert the base64 result to a given encoding; use the raw data directly to Deflate.

How to decompress a string in javascript, compressed in C#? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
ZLIB Decompression - Client Side
I'll try to be clear and I'm sorry for my bad english. This is the question:
In my web application i received a string that represent an image compressed with this algorithm, written in C#:
public static class Compression
{
public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}
ms.Position = 0;
MemoryStream outStream = new MemoryStream();
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String(gzBuffer);
}
public static string Decompress(string compressedText)
{
byte[] gzBuffer = Convert.FromBase64String(compressedText);
using (MemoryStream ms = new MemoryStream())
{
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
{
zip.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
}
The Decompress method is used in the Server side application. I receive an xml file with the string that represent the image compressed with the Compress method and I want to be able to decompress the string I received in javascript within my web app. Is there a way to do that? Are there other solutions? Thank's to everyone!!
The best solution might be to translate the decompression function from C# to Javascript. You could use one that's already available in Javascript such as this one, but you would need to change the source of the image or uncompress-recompress at the server, unless it happens to be compatible with the compression you're using.
Another option would be to convert the image in to .jpg or .png before you use it, again at the server. This would give you more flexibility in the long run, but might put a load on the server depending on traffic and image size.
You can use JSXCompressor library to do decompression (deflate, unzip).
But if your web server support compression at http level I think you can skip compression and decompression.

Gzip uncompress from string error, The magic number in GZip header is not correct

I am trying to replicate the php function gzuncompress in C#
So far I got part of following code working. see comment and code below.
I thing the tricky bit is happening during byte[] and string convertion.
How can I fix this? and where did I missed??
I am using .Net 3.5 environment
var plaintext = Console.ReadLine();
Console.WriteLine("string to byte[] then to string");
byte[] buff = Encoding.UTF8.GetBytes(plaintext);
var compress = GZip.GZipCompress(buff);
//Uncompress working below
try
{
var unpressFromByte = GZip.GZipUncompress(compress);
Console.WriteLine("uncompress successful by uncompress byte[]");
}catch
{
Console.WriteLine("uncompress failed by uncompress byte[]");
}
var compressString = Encoding.UTF8.GetString(compress);
Console.WriteLine(compressString);
var compressBuff = Encoding.UTF8.GetBytes(compressString);
Console.WriteLine(Encoding.UTF8.GetString(compressBuff));
//Uncompress not working below by using string
//The magic number in GZip header is not correct
try
{
var uncompressFromString = GZip.GZipUncompress(compressBuff);
Console.WriteLine("uncompress successful by uncompress string");
}
catch
{
Console.WriteLine("uncompress failed by uncompress string");
}
code for class Gzip
public static class GZip
{
public static byte[] GZipUncompress(byte[] data)
{
using (var input = new MemoryStream(data))
using (var gzip = new GZipStream(input, CompressionMode.Decompress))
using (var output = new MemoryStream())
{
gzip.CopyTo(output);
return output.ToArray();
}
}
public static byte[] GZipCompress(byte[] data)
{
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return output.ToArray();
}
}
public static long CopyTo(this Stream source, Stream destination)
{
var buffer = new byte[2048];
int bytesRead;
long totalBytes = 0;
while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, bytesRead);
totalBytes += bytesRead;
}
return totalBytes;
}
}
This is inappropriate:
var compressString = Encoding.UTF8.GetString(compress);
compress isn't a UTF-8-encoded piece of text. You should treat it as arbitrary binary data - which isn't appropriate to pass into Encoding.GetString. If you really need to convert arbitrary binary data into text, use Convert.ToBase64String (and then reverse with Convert.FromBase64String):
var compressString = Convert.ToBase64String(compress);
Console.WriteLine(compressString);
var compressBuff = Convert.FromBase64String(compressString);
That may or may not match what PHP does, but it's a safe way of representing arbitrary binary data as text, unlike treating the binary data as if it were valid UTF-8-encoded text.
I am trying to replicate the php function gzuncompress in C#
Then use GZipStream or DeflateStream classes which are built into the .NET framework for this purpose.

Create a Stream without having a physical file to create from

I'm needing to create a zip file containing documents that exist on the server. I am using the .Net Package class to do so, and to create a new Package (which is the zip file) I have to have either a path to a physical file or a stream. I am trying to not create an actual file that would be the zip file, instead just create a stream that would exist in memory or something.
My question is how do you instantiate a new Stream (i.e. FileStream, MemoryStream, etc) without having a physical file to instantiate from.
MemoryStream has several constructor overloads, none of which require a file.
There is an example of how to do this on the MSDN page for MemoryStream:
using System;
using System.IO;
using System.Text;
class MemStream
{
static void Main()
{
int count;
byte[] byteArray;
char[] charArray;
UnicodeEncoding uniEncoding = new UnicodeEncoding();
// Create the data to write to the stream.
byte[] firstString = uniEncoding.GetBytes(
"Invalid file path characters are: ");
byte[] secondString = uniEncoding.GetBytes(
Path.GetInvalidPathChars());
using(MemoryStream memStream = new MemoryStream(100))
{
// Write the first string to the stream.
memStream.Write(firstString, 0 , firstString.Length);
// Write the second string to the stream, byte by byte.
count = 0;
while(count < secondString.Length)
{
memStream.WriteByte(secondString[count++]);
}
// Write the stream properties to the console.
Console.WriteLine(
"Capacity = {0}, Length = {1}, Position = {2}\n",
memStream.Capacity.ToString(),
memStream.Length.ToString(),
memStream.Position.ToString());
// Set the position to the beginning of the stream.
memStream.Seek(0, SeekOrigin.Begin);
// Read the first 20 bytes from the stream.
byteArray = new byte[memStream.Length];
count = memStream.Read(byteArray, 0, 20);
// Read the remaining bytes, byte by byte.
while(count < memStream.Length)
{
byteArray[count++] =
Convert.ToByte(memStream.ReadByte());
}
// Decode the byte array into a char array
// and write it to the console.
charArray = new char[uniEncoding.GetCharCount(
byteArray, 0, count)];
uniEncoding.GetDecoder().GetChars(
byteArray, 0, count, charArray, 0);
Console.WriteLine(charArray);
}
}
}
Is this what you are looking for?
You can create a new stream and write to it. You don't need a file to construct the object.
http://msdn.microsoft.com/en-us/library/system.io.memorystream.aspx
Write Method:
http://msdn.microsoft.com/en-us/library/system.io.memorystream.write.aspx
Constructors for Memory Stream:
http://msdn.microsoft.com/en-us/library/system.io.memorystream.memorystream.aspx

Categories