Encode and Decode Byte Array using Lz4net - c#

Is this the proper way to Encode and Decode a byte array using Lz4net?
byte[] filedata = File.ReadAllBytes(#"C:\Test.txt");
byte[] encodedfileData = LZ4.LZ4Codec.Encode(filedata, 0, filedata.Length);
byte[] decodedfileData = LZ4.LZ4Codec.Decode(encodedfileData, 0, encodedfileData.Length, 0);
decodedfileData returns 0 bytes
I have gone through LZ4 github, but I am not getting any idea what's wrong. So what is the proper way to Encode and Decode a byte array using LZ4?

You can try this:
byte[] filedata = File.ReadAllBytes(#"C:\Test.txt");
byte[] compressed = LZ4.LZ4Codec.Wrap(in);
byte[] uncompressed = LZ4.LZ4Codec.UnWrap(compressed);

You need to put the unpacked size (filedata.Length) as the last parameter:
byte[] decodedfileData = LZ4.LZ4Codec.Decode(
encodedfileData,
0,
encodedfileData.Length,
filedata.Length);

Related

How to write a sequence of bytes from a file to a byte array without padding the array with null bytes?

I have
[13,132,32,75,22,61,50,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
I want
[13,132,32,75,22,61,50]
I have an array of bytes size 1048576 that I have written to using a file stream. Starting at a particular index in this array until the end of the array are all null bytes. There might be 100000 bytes with values and 948576 null bytes at the end of the array. When I don't know the size of a file how do I efficiently create a new array of size 100000 (i.e. same as total bytes in unknown file) and write all bytes from that file to the byte array?
byte[] buffer = new byte[0x100000];
int numRead = await fileStream.ReadAsync(buffer, 0, buffer.length); // byte array is padded with null bytes at the end
You're stating in the comments that you're just decoding the byte array into a string, so why not read the file contents as a string, such as:
var contents = File.ReadAllText(filePath, Encoding.UTF8);
// contents holds all the text in the file at filePath and no more
or if you want to use a stream:
using (var sr = new StreamReader(path))
{
// Read one character at a time:
var c = sr.Read();
// Read one line at a time:
var line = sr.ReadLine();
// Read the whole file
var contents = sr.ReadToEnd();
}
If you, however, insist on going through a buffer you cannot avoid part of the buffer being empty (having null-bytes) when you reach the end of the file but that's where the return value of ReadAsync saves the day:
byte[] buffer = new byte[0x100000];
int numRead = await fileStream.ReadAsync(buffer, 0, buffer.length);
var sectionToDecode = new byte[numRead];
Array.Copy(buffer, 0, sectionToDecode, 0, numRead);
// Now sectionToDecode has all the bytes that were actually read from the file

Gzipstream header and suffix

How do I know the size of my compressed file used GzipStream? I know that it has a header and suffix. First 10 bytes - it's header, second 8 bytes - suffix. How do I know the size file in the suffix?
Something a bit better written:
public int GetUncompressedSize(string FileName)
{
using(BinaryReader br = new BinaryReader(File.OpenRead(pathToFile))
{
br.BaseStream.Seek(SeekOrigin.End, -4);
return br.ReadInt32();
}
}
I see that you down voted my previous answer most likely because it was an example using Java. The principle is still the same, so the answer to your question would be that the last 4 bytes contain the information you require. Hopefully this answer is more what you are after.
Here is a C# Decompress function example of decompressing the GZip inclusive of getting the size of the compressed file used by GZipStream:
static public byte[] Decompress(byte[] b)
{
MemoryStream ms = new MemoryStream(b.length);
ms.Write(b, 0, b.Length);
//last 4 bytes of GZipStream = length of decompressed data
ms.Seek(-4, SeekOrigin.Current);
byte[] lb = new byte[4];
ms.Read(lb, 0, 4);
int len = BitConverter.ToInt32(lb, 0);
ms.Seek(0, SeekOrigin.Begin);
byte[] ob = new byte[len];
GZipStream zs = new GZipStream(ms, CompressionMode.Decompress);
zs.Read(ob, 0, len);
returen ob;
}
I see that you down voted my previous answer most likely because it was an example using Java. The principle is still the same, so the answer to your question would be that the last 4 bytes contain the information you require. Hopefully this answer is more what you are after.
Here is a C# Decompress function example of decompressing the GZip inclusive of getting the size of the compressed file used by GZipStream:
static public byte[] Decompress(byte[] b)
{
MemoryStream ms = new MemoryStream(b.length);
ms.Write(b, 0, b.Length);
//last 4 bytes of GZipStream = length of decompressed data
ms.Seek(-4, SeekOrigin.Current);
byte[] lb = new byte[4];
ms.Read(lb, 0, 4);
int len = BitConverter.ToInt32(lb, 0);
ms.Seek(0, SeekOrigin.Begin);
byte[] ob = new byte[len];
GZipStream zs = new GZipStream(ms, CompressionMode.Decompress);
zs.Read(ob, 0, len);
returen ob;
}

Append byte[] to MemoryStream

I am trying to read the byte[] for each file and adding it to MemoryStream. Below is the code which throws error. What I am missing in appending?
byte[] ba = null;
List<string> fileNames = new List<string>();
int startPosition = 0;
using (MemoryStream allFrameStream = new MemoryStream())
{
foreach (string jpegFileName in fileNames)
{
ba = GetFileAsPDF(jpegFileName);
allFrameStream.Write(ba, startPosition, ba.Length); //Error here
startPosition = ba.Length - 1;
}
allFrameStream.Position = 0;
ba = allFrameStream.GetBuffer();
Response.ClearContent();
Response.AppendHeader("content-length", ba.Length.ToString());
Response.ContentType = "application/pdf";
Response.BinaryWrite(ba);
Response.End();
Response.Close();
}
Error:
Offset and length were out of bounds for the array or count is greater
than the number of elements from index to the end of the source
collection
startPosition is not offset to MemoryStream, instead to ba. Change it as
allFrameStream.Write(ba, 0, ba.Length);
All byte arrays will be appended to allFrameStream
BTW: Don't use ba = allFrameStream.GetBuffer(); instead use ba = allFrameStream.ToArray(); (You actually don't want internal buffer of MemoryStream).
The MSDN documentation on Stream.Write might help clarify the problem.
Streams are modelled as a continuous sequence of bytes. Reading or writing to a stream moves your position in the stream by the number of bytes read or written.
The second argument to Write is the index in the source array at which to start copying bytes from. In your case this is 0, since you want to read from the start of the array.
Maybe this is a simple solution, not the best but is easy
List<byte> list = new List<byte>();
list.AddRange(Encoding.UTF8.GetBytes("aaaaaaaaaaaaa"));
list.AddRange(Encoding.UTF8.GetBytes("bbbbbbbbbbbbbbbbbb"));
list.AddRange(Encoding.UTF8.GetBytes("cccccccc"));
byte[] c = list.ToArray();

Unzipped data being padded with '\0' when using DotNetZip and MemoryStream

I'm trying to zip and unzip data in memory (so, I cannot use FileSystem), and in my sample below when the data is unzipped it has a kind of padding ('\0' chars) at the end of my original data.
What am I doing wrong ?
[Test]
public void Zip_and_Unzip_from_memory_buffer() {
byte[] originalData = Encoding.UTF8.GetBytes("My string");
byte[] zipped;
using (MemoryStream stream = new MemoryStream()) {
using (ZipFile zip = new ZipFile()) {
//zip.CompressionMethod = CompressionMethod.BZip2;
//zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestSpeed;
zip.AddEntry("data", originalData);
zip.Save(stream);
zipped = stream.GetBuffer();
}
}
Assert.AreEqual(256, zipped.Length); // Just to show that the zip has 256 bytes which match with the length unzipped below
byte[] unzippedData;
using (MemoryStream mem = new MemoryStream(zipped)) {
using (ZipFile unzip = ZipFile.Read(mem)) {
//ZipEntry zipEntry = unzip.Entries.FirstOrDefault();
ZipEntry zipEntry = unzip["data"];
using (MemoryStream readStream = new MemoryStream()) {
zipEntry.Extract(readStream);
unzippedData = readStream.GetBuffer();
}
}
}
Assert.AreEqual(256, unzippedData.Length); // WHY my data has trailing '\0' chars like a padding to 256 module ?
Assert.AreEqual(originalData.Length, unzippedData.Length); // FAIL ! The unzipped data has 256 bytes
//Assert.AreEqual(originalData, unzippedData); // FAIL at index 9
}
From MSDN
"Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method;
So you actually want to change the line:
zipped = stream.GetBuffer();
To the line: zipped = stream.ToArray();
I suspect it is from 'MemoryStream.GetBuffer()'
http://msdn.microsoft.com/en-us/library/system.io.memorystream.getbuffer.aspx
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

Convert a string's character encoding from windows-1252 to utf-8

I had converted a Word Document(docx) to html, the converted html has windows-1252 as its character encoding. In .Net for this 1252 character encoding all the special characters are being displayed as '�'. This html is being displayed in a Rad Editor which displays correctly if the html is in Utf-8 format.
I had tried the following code but no vein
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
char[] utf8Chars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0);
string utf8String = new string(utf8Chars);
Any suggestions on how to convert the html into UTF-8?
This should do it:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
Actually the problem lies here
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
We should not get the bytes from the html String. I tried the below code and it worked.
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
How you are planning to use resulting html? The most appropriate way in my opinion to solve your problem would be add meta with encoding specification. Something like:
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
Use Encoding.Convert method. Details are in the Encoding.Convert method MSDN article.

Categories