Unzipped data being padded with '\0' when using DotNetZip and MemoryStream - c#

I'm trying to zip and unzip data in memory (so, I cannot use FileSystem), and in my sample below when the data is unzipped it has a kind of padding ('\0' chars) at the end of my original data.
What am I doing wrong ?
[Test]
public void Zip_and_Unzip_from_memory_buffer() {
byte[] originalData = Encoding.UTF8.GetBytes("My string");
byte[] zipped;
using (MemoryStream stream = new MemoryStream()) {
using (ZipFile zip = new ZipFile()) {
//zip.CompressionMethod = CompressionMethod.BZip2;
//zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestSpeed;
zip.AddEntry("data", originalData);
zip.Save(stream);
zipped = stream.GetBuffer();
}
}
Assert.AreEqual(256, zipped.Length); // Just to show that the zip has 256 bytes which match with the length unzipped below
byte[] unzippedData;
using (MemoryStream mem = new MemoryStream(zipped)) {
using (ZipFile unzip = ZipFile.Read(mem)) {
//ZipEntry zipEntry = unzip.Entries.FirstOrDefault();
ZipEntry zipEntry = unzip["data"];
using (MemoryStream readStream = new MemoryStream()) {
zipEntry.Extract(readStream);
unzippedData = readStream.GetBuffer();
}
}
}
Assert.AreEqual(256, unzippedData.Length); // WHY my data has trailing '\0' chars like a padding to 256 module ?
Assert.AreEqual(originalData.Length, unzippedData.Length); // FAIL ! The unzipped data has 256 bytes
//Assert.AreEqual(originalData, unzippedData); // FAIL at index 9
}

From MSDN
"Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method;
So you actually want to change the line:
zipped = stream.GetBuffer();
To the line: zipped = stream.ToArray();

I suspect it is from 'MemoryStream.GetBuffer()'
http://msdn.microsoft.com/en-us/library/system.io.memorystream.getbuffer.aspx
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

Related

How to write a sequence of bytes from a file to a byte array without padding the array with null bytes?

I have
[13,132,32,75,22,61,50,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
I want
[13,132,32,75,22,61,50]
I have an array of bytes size 1048576 that I have written to using a file stream. Starting at a particular index in this array until the end of the array are all null bytes. There might be 100000 bytes with values and 948576 null bytes at the end of the array. When I don't know the size of a file how do I efficiently create a new array of size 100000 (i.e. same as total bytes in unknown file) and write all bytes from that file to the byte array?
byte[] buffer = new byte[0x100000];
int numRead = await fileStream.ReadAsync(buffer, 0, buffer.length); // byte array is padded with null bytes at the end
You're stating in the comments that you're just decoding the byte array into a string, so why not read the file contents as a string, such as:
var contents = File.ReadAllText(filePath, Encoding.UTF8);
// contents holds all the text in the file at filePath and no more
or if you want to use a stream:
using (var sr = new StreamReader(path))
{
// Read one character at a time:
var c = sr.Read();
// Read one line at a time:
var line = sr.ReadLine();
// Read the whole file
var contents = sr.ReadToEnd();
}
If you, however, insist on going through a buffer you cannot avoid part of the buffer being empty (having null-bytes) when you reach the end of the file but that's where the return value of ReadAsync saves the day:
byte[] buffer = new byte[0x100000];
int numRead = await fileStream.ReadAsync(buffer, 0, buffer.length);
var sectionToDecode = new byte[numRead];
Array.Copy(buffer, 0, sectionToDecode, 0, numRead);
// Now sectionToDecode has all the bytes that were actually read from the file

How to compress data in C# to be decompressed in zlib python

I have a python zlib decompressor that takes default parameters as follows, where data is string:
import zlib
data_decompressed = zlib.decompress(data)
But, I don't know how I can compress a string in c# to be decompressed in python. I've tray the next piece of code but when I trie to decompresse 'incorrect header check' exception is trown.
static byte[] ZipContent(string entryName)
{
// remove whitespace from xml and convert to byte array
byte[] normalBytes;
using (StringWriter writer = new StringWriter())
{
//xml.Save(writer, SaveOptions.DisableFormatting);
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
normalBytes = encoding.GetBytes(writer.ToString());
}
// zip into new, zipped, byte array
using (Stream memOutput = new MemoryStream())
using (ZipOutputStream zipOutput = new ZipOutputStream(memOutput))
{
zipOutput.SetLevel(6);
ZipEntry entry = new ZipEntry(entryName);
entry.CompressionMethod = CompressionMethod.Deflated;
entry.DateTime = DateTime.Now;
zipOutput.PutNextEntry(entry);
zipOutput.Write(normalBytes, 0, normalBytes.Length);
zipOutput.Finish();
byte[] newBytes = new byte[memOutput.Length];
memOutput.Seek(0, SeekOrigin.Begin);
memOutput.Read(newBytes, 0, newBytes.Length);
zipOutput.Close();
return newBytes;
}
}
Anyone could help me please?
Thank you.
UPDATE 1:
I've tried with defalte function as Shiraz Bhaiji has posted:
public static byte[] Deflate(byte[] data)
{
if (null == data || data.Length < 1) return null;
byte[] compressedBytes;
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
}
}
return compressedBytes;
}
The problem is that to work properly in python code I've to add the "-zlib.MAX_WBITS" argument to decompress as follows:
data_decompressed = zlib.decompress(data, -zlib.MAX_WBITS)
So, my new question is: is it possible to code a deflate method in C# which compression result could be decompressed with zlib.decompress(data) as defaults?
In C# the DeflateStream class supports zlib. See:
https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream?view=netframework-4.8
As you described with your edit, zlib.decompress(data, -zlib.MAX_WBITS) is the correct way to decompress data from C#'s DeflateStream. There are two formats at play here:
deflate - as in specification RFC 1951 - this is what's C# is producing
zlib - as in specification RFC 1950 - this is what's Python is expecting by default
What is the difference between the two? It's small, really:
zlib = [compression flag byte] + [flags byte] + deflate + [adler checksum]
(there are also optional dictionary bytes but we don't have to worry about them)
Therefore, to get zlib format from deflate, we need to prepend two bytes of flags, and append Adler-32 checksum. Luckily we have an answer on stackoverflow for the flags, see What does a zlib header look like? and implementing Adler-32 is not that hard. So suppose you have your MemoryStream ms, we would first write the two flag bytes
ms.Write(new byte[] {0x78,0x9c});
...then we would do exactly what's in your answer
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
}
and, at last, compute the checksum and append it to the end of the stream:
uint a = 0;
uint b = 0;
for(int i = 0; i < data.Length; ++i)
{
a = (a + data[i]) % 65521;
b = (b + a) % 65521;
}
Sadly, I don't know a pretty way of writing uints into the stream. This is an ugly way:
ms.Write(new byte[] { (byte)(b>>8),
(byte)b,
(byte)(a>>8),
(byte)a
});

Append byte[] to MemoryStream

I am trying to read the byte[] for each file and adding it to MemoryStream. Below is the code which throws error. What I am missing in appending?
byte[] ba = null;
List<string> fileNames = new List<string>();
int startPosition = 0;
using (MemoryStream allFrameStream = new MemoryStream())
{
foreach (string jpegFileName in fileNames)
{
ba = GetFileAsPDF(jpegFileName);
allFrameStream.Write(ba, startPosition, ba.Length); //Error here
startPosition = ba.Length - 1;
}
allFrameStream.Position = 0;
ba = allFrameStream.GetBuffer();
Response.ClearContent();
Response.AppendHeader("content-length", ba.Length.ToString());
Response.ContentType = "application/pdf";
Response.BinaryWrite(ba);
Response.End();
Response.Close();
}
Error:
Offset and length were out of bounds for the array or count is greater
than the number of elements from index to the end of the source
collection
startPosition is not offset to MemoryStream, instead to ba. Change it as
allFrameStream.Write(ba, 0, ba.Length);
All byte arrays will be appended to allFrameStream
BTW: Don't use ba = allFrameStream.GetBuffer(); instead use ba = allFrameStream.ToArray(); (You actually don't want internal buffer of MemoryStream).
The MSDN documentation on Stream.Write might help clarify the problem.
Streams are modelled as a continuous sequence of bytes. Reading or writing to a stream moves your position in the stream by the number of bytes read or written.
The second argument to Write is the index in the source array at which to start copying bytes from. In your case this is 0, since you want to read from the start of the array.
Maybe this is a simple solution, not the best but is easy
List<byte> list = new List<byte>();
list.AddRange(Encoding.UTF8.GetBytes("aaaaaaaaaaaaa"));
list.AddRange(Encoding.UTF8.GetBytes("bbbbbbbbbbbbbbbbbb"));
list.AddRange(Encoding.UTF8.GetBytes("cccccccc"));
byte[] c = list.ToArray();

How to determine size of string, and compress it

I'm currently developing an application in C# that uses Amazon SQS
The size limit for a message is 8kb.
I have a method that is something like:
public void QueueMessage(string message)
Within this method, I'd like to first of all, compress the message (most messages are passed in as json, so are already fairly small)
If the compressed string is still larger than 8kb, I'll store it in S3.
My question is:
How can I easily test the size of a string, and what's the best way to compress it?
I'm not looking for massive reductions in size, just something nice and easy - and easy to decompress the other end.
To know the "size" (in kb) of a string we need to know the encoding. If we assume UTF8, then it is (not including BOM etc) like below (but swap the encoding if it isn't UTF8):
int len = Encoding.UTF8.GetByteCount(longString);
Re packing it; I would suggest GZIP via UTF8, optionally followed by base-64 if it has to be a string:
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
{
byte[] raw = Encoding.UTF8.GetBytes(longString);
gzip.Write(raw, 0, raw.Length);
gzip.Close();
}
byte[] zipped = ms.ToArray(); // as a BLOB
string base64 = Convert.ToBase64String(zipped); // as a string
// store zipped or base64
}
Give unzip bytes to this function.The best I could come up with was
public static byte[] ZipToUnzipBytes(byte[] bytesContext)
{
byte[] arrUnZipFile = null;
if (bytesContext.Length > 100)
{
using (var inFile = new MemoryStream(bytesContext))
{
using (var decompress = new GZipStream(inFile, CompressionMode.Decompress, false))
{
byte[] bufferWrite = new byte[4];
inFile.Position = (int)inFile.Length - 4;
inFile.Read(bufferWrite, 0, 4);
inFile.Position = 0;
arrUnZipFile = new byte[BitConverter.ToInt32(bufferWrite, 0) + 100];
decompress.Read(arrUnZipFile, 0, arrUnZipFile.Length);
}
}
}
return arrUnZipFile;
}

Create a Stream without having a physical file to create from

I'm needing to create a zip file containing documents that exist on the server. I am using the .Net Package class to do so, and to create a new Package (which is the zip file) I have to have either a path to a physical file or a stream. I am trying to not create an actual file that would be the zip file, instead just create a stream that would exist in memory or something.
My question is how do you instantiate a new Stream (i.e. FileStream, MemoryStream, etc) without having a physical file to instantiate from.
MemoryStream has several constructor overloads, none of which require a file.
There is an example of how to do this on the MSDN page for MemoryStream:
using System;
using System.IO;
using System.Text;
class MemStream
{
static void Main()
{
int count;
byte[] byteArray;
char[] charArray;
UnicodeEncoding uniEncoding = new UnicodeEncoding();
// Create the data to write to the stream.
byte[] firstString = uniEncoding.GetBytes(
"Invalid file path characters are: ");
byte[] secondString = uniEncoding.GetBytes(
Path.GetInvalidPathChars());
using(MemoryStream memStream = new MemoryStream(100))
{
// Write the first string to the stream.
memStream.Write(firstString, 0 , firstString.Length);
// Write the second string to the stream, byte by byte.
count = 0;
while(count < secondString.Length)
{
memStream.WriteByte(secondString[count++]);
}
// Write the stream properties to the console.
Console.WriteLine(
"Capacity = {0}, Length = {1}, Position = {2}\n",
memStream.Capacity.ToString(),
memStream.Length.ToString(),
memStream.Position.ToString());
// Set the position to the beginning of the stream.
memStream.Seek(0, SeekOrigin.Begin);
// Read the first 20 bytes from the stream.
byteArray = new byte[memStream.Length];
count = memStream.Read(byteArray, 0, 20);
// Read the remaining bytes, byte by byte.
while(count < memStream.Length)
{
byteArray[count++] =
Convert.ToByte(memStream.ReadByte());
}
// Decode the byte array into a char array
// and write it to the console.
charArray = new char[uniEncoding.GetCharCount(
byteArray, 0, count)];
uniEncoding.GetDecoder().GetChars(
byteArray, 0, count, charArray, 0);
Console.WriteLine(charArray);
}
}
}
Is this what you are looking for?
You can create a new stream and write to it. You don't need a file to construct the object.
http://msdn.microsoft.com/en-us/library/system.io.memorystream.aspx
Write Method:
http://msdn.microsoft.com/en-us/library/system.io.memorystream.write.aspx
Constructors for Memory Stream:
http://msdn.microsoft.com/en-us/library/system.io.memorystream.memorystream.aspx

Categories