ushort array compression in C# - c#

I've got a ushort array (actually an image coming from a camera) that I'd like to lossless compress before persistent storage. I'm using the GZipStream function provided in System.IO.Compression.GZipStream. This approach, to my knowledge, requires that I convert the ushort array to a byte array. My solution appears to function properly, but just isn't as quick as I'd like. The images are about 2 Mbytes in raw size, and the compress time ranges (on my slow machine) 200 - 400 msecs, and decompress time ranges 100 - 200 msecs. Looking for suggestions for improving my performance.
public static class Zip
{
public static ushort[] Decompress_ByteToShort(byte[] zippedData)
{
byte[] decompressedData = null;
using (MemoryStream outputStream = new MemoryStream())
{
using (MemoryStream inputStream = new MemoryStream(zippedData))
{
using (GZipStream zip = new GZipStream(inputStream, CompressionMode.Decompress))
{
zip.CopyTo(outputStream);
}
}
decompressedData = outputStream.ToArray();
}
ushort[] decompressShort = new ushort[decompressedData.Length / sizeof(ushort)];
Buffer.BlockCopy(decompressedData, 0, decompressShort, 0, decompressedData.Length);
return decompressShort;
}
public static byte[] Compress_ShortToByte(ushort[] plainData)
{
byte[] compressesData = null;
byte[] uncompressedData = new byte[plainData.Length * sizeof(ushort)];
Buffer.BlockCopy(plainData, 0, uncompressedData, 0, plainData.Length * sizeof(ushort));
using (MemoryStream outputStream = new MemoryStream())
{
using (GZipStream zip = new GZipStream(outputStream, CompressionMode.Compress))
{
zip.Write(uncompressedData, 0, uncompressedData.Length);
}
//Dont get the MemoryStream data before the GZipStream is closed
//since it doesn’t yet contain complete compressed data.
//GZipStream writes additional data including footer information when its been disposed
compressesData = outputStream.ToArray();
}
return compressesData;
}
}

The first problem in your approach I see is that you are using byte arrays instead of direcly loading and writing to files.
Using a smaller temporary buffer and reading\writing to streams and files directly in chunks should be much faster.
Here I propose some functions and overloads you can use to decompress from byte arrays, to byte arrays, from stream, to stream, from file and to file.
The performance improvement should be from 10% to 20%.
Try to adjust the constants as needed.
I used DeflateStream instead of GZipStream, this increases the performance a bit.
You can go back to a GZipStream if you prefer.
I tried just the byte to ushort and ushort to byte[] version of the code and it is about 10% faster.
Accessing directly to files instead of loading it to a big buffer should increase the performance even more.
WARNING: This approach of reading and writing images in this way is not little-endian/big-endian agnostic - it means that a file saved from a Intel/AMD machine is not compatible with an ARM machine, for example in some tablets! Just as a side note :)
/// <summary>The average file size, used to preallocate the right amount of memory for compression.</summary>
private const int AverageFileSize = 100000;
/// <summary>The default size of the buffer used to convert data. WARNING: Must be a multiple of 2!</summary>
private const int BufferSize = 32768;
/// <summary>Decompresses a byte array to unsigned shorts.</summary>
public static ushort[] Decompress_ByteToShort(byte[] zippedData)
{
using (var inputStream = new MemoryStream(zippedData))
return Decompress_File(inputStream);
}
/// <summary>Decompresses a file to unsigned shorts.</summary>
public static ushort[] Decompress_File(string inputFilePath)
{
using (var stream = new FileStream(inputFilePath, FileMode.Open, FileAccess.Read))
return Decompress_File(stream);
}
/// <summary>Decompresses a file stream to unsigned shorts.</summary>
public static ushort[] Decompress_File(Stream zippedData)
{
using (var zip = new DeflateStream(zippedData, CompressionMode.Decompress, true))
{
// Our temporary buffer.
var buffer = new byte[BufferSize];
// Read the number of bytes, written initially as header in the file.
zip.Read(buffer, 0, sizeof(int));
var resultLength = BitConverter.ToInt32(buffer, 0);
// Creates the result array
var result = new ushort[resultLength];
// Decompress the file chunk by chunk
var resultOffset = 0;
for (; ; )
{
// Read a chunk of data
var count = zip.Read(buffer, 0, BufferSize);
if (count <= 0)
break;
// Copy a piece of the decompressed buffer
Buffer.BlockCopy(buffer, 0, result, resultOffset, count);
// Advance counter
resultOffset += count;
}
return result;
}
}
/// <summary>Compresses an ushort array to a file array.</summary>
public static byte[] Compress_ShortToByte(ushort[] plainData)
{
using (var outputStream = new MemoryStream(AverageFileSize))
{
Compress_File(plainData, outputStream);
return outputStream.ToArray();
}
}
/// <summary>Compresses an ushort array directly to a file.</summary>
public static void Compress_File(ushort[] plainData, string outputFilePath)
{
using (var stream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
Compress_File(plainData, stream);
}
/// <summary>Compresses an ushort array directly to a file stream.</summary>
public static void Compress_File(ushort[] plainData, Stream outputStream)
{
using (var zip = new DeflateStream(outputStream, CompressionMode.Compress, true))
{
// Our temporary buffer.
var buffer = new byte[BufferSize];
// Writes the length of the plain data
zip.Write(BitConverter.GetBytes(plainData.Length), 0, sizeof(int));
var inputOffset = 0;
var availableBytes = plainData.Length * sizeof(ushort);
while (availableBytes > 0)
{
// Compute the amount of bytes to copy.
var bytesCount = Math.Min(BufferSize, availableBytes);
// Copy a chunk of plain data into the temporary buffer
Buffer.BlockCopy(plainData, inputOffset, buffer, 0, bytesCount);
// Write the buffer
zip.Write(buffer, 0, bytesCount);
// Advance counters
inputOffset += bytesCount;
availableBytes -= bytesCount;
}
}
}

There is GZipStream constructor (Stream, CompressionLevel) you can change CompressionLevel to speed up compression there is level which say Fastest in this enumeration.
Links to relevant documentation:
http://msdn.microsoft.com/pl-pl/library/hh137341(v=vs.110).aspx
http://msdn.microsoft.com/pl-pl/library/system.io.compression.compressionlevel(v=vs.110).aspx

Related

Convert Stream to byte[] c# for large files of 2GB

Trying to convert Stream object to byte[] and using the below method for the same:
public static byte[] ReadFully(System.IO.Stream input)
{
byte[] buffer = new byte[16*1024];
using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
However the input parameter "input" is for large file that is of 2 GB and hence the code does not get enter into while loop and hence does not convert it to byte array.
For smaller files it is working fine
That's what a Stream is for.
You don't load the whole content into a byte[], you read a small buffer from the Stream into memory and handle it, then dispose and read the next buffer.
If you still need to use a byte[]:
It seems like your app can't handle more than 2^32 Bytes Memory, meaning it's 32bit.
Try changing it to 64bit (in Project Properties go to Build and disable Prefer 32 bit)
Object limited below 32bit. (that is why all index using int)
how about use list contains byte array to deal entire data?
public List<byte[]> ReadBytesList(string fileName)
{
List<byte[]> rawDataBytes= new List<byte[]>();
byte[] buff;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
int arrayCount= (int)(numBytes / 2100000000); //2147483648 is max
int arrayRest = (int)(numBytes % 2100000000);
if(arrayCount>0)
{
for (int i = 0; i < arrayCount; i++)
{
buff = br.ReadBytes(2100000000);
rawDataBytes.Add(buff);
}
buff = br.ReadBytes(arrayRest);
rawDataBytes.Add(buff);
}
else
{
buff = br.ReadBytes(arrayRest);
rawDataBytes.Add(buff);
}
return rawDataBytes;
}

uncompressed file is bigger than original file in GZIP

i'm using the following function to compress(thanks to http://www.dotnetperls.com/):
public static void CompressStringToFile(string fileName, string value)
{
// A.
// Write string to temporary file.
string temp = Path.GetTempFileName();
File.WriteAllText(temp, value);
// B.
// Read file into byte array buffer.
byte[] b;
using (FileStream f = new FileStream(temp, FileMode.Open))
{
b = new byte[f.Length];
f.Read(b, 0, (int)f.Length);
}
// C.
// Use GZipStream to write compressed bytes to target file.
using (FileStream f2 = new FileStream(fileName, FileMode.Create))
using (GZipStream gz = new GZipStream(f2, CompressionMode.Compress, false))
{
gz.Write(b, 0, b.Length);
}
}
and for decompress:
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
so my goal is actually compress log files and than to decompress them in memory and compare the uncompressed file to the original file in order to check that the compression succeeded and i'm able to open the compressed file successfuly.
the problem is that the uncompressed file is most of the time bigger than the original file and my compare check is failing altough the compression probably succeeded.
any idea why ?
btw here how i compare the uncompressed file to the original file:
static bool FileEquals(byte[] file1, byte[] file2)
{
if (file1.Length == file2.Length)
{
for (int i = 0; i < file1.Length; i++)
{
if (file1[i] != file2[i])
{
return false;
}
}
return true;
}
return false;
}
Try this method to compress a file:
public static byte[] Compress(byte[] raw)
{
using (MemoryStream memory = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(memory,
CompressionMode.Compress, true))
{
gzip.Write(raw, 0, raw.Length);
}
return memory.ToArray();
}
}
}
And this to decompress :
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
}
Tell me if it worked.
Goodluck.
Think you'd be better off with the simplest API call, try Stream.CopyTo(). I can't find the error in your code. If I was working on it, I'd probably make sure everything is getting flushed properly.. can't recall if GZipStream is going to flush its output to FileStream when the using block closes.. but then you are also saying that the final file is larger, not smaller.
Anyhow, best policy in my experience.. don't rewrite gotcha prone code when you don't need to. At least you tested it ;)

System.OutOfMemory exception when trying to read large files

public static byte[] ReadMemoryMappedFile(string fileName)
{
long length = new FileInfo(fileName).Length;
using (var stream = File.Open(fileName, FileMode.OpenOrCreate, FileAccess.Read, FileShare.ReadWrite))
{
using (var mmf = MemoryMappedFile.CreateFromFile(stream, null, length, MemoryMappedFileAccess.Read, null, HandleInheritability.Inheritable, false))
{
using (var viewStream = mmf.CreateViewStream(0, length, MemoryMappedFileAccess.Read))
{
using (BinaryReader binReader = new BinaryReader(viewStream))
{
var result = binReader.ReadBytes((int)length);
return result;
}
}
}
}
}
OpenFileDialog openfile = new OpenFileDialog();
openfile.Filter = "All Files (*.*)|*.*";
openfile.ShowDialog();
byte[] buff = ReadMemoryMappedFile(openfile.FileName);
texteditor.Text = BitConverter.ToString(buff).Replace("-"," "); <----A first chance exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
I get a System.OutOfMemory exception when trying to read large files.
I've read a lot for 4 weeks in all the web... and tried a lot!!! But still, I can't seem to find a good solution to my problem.
Please help me..
Update
public byte[] FileToByteArray(string fileName)
{
byte[] buff = null;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
buff = br.ReadBytes((int)numBytes);
//return buff;
return File.ReadAllBytes(fileName);
}
OR
public static byte[] FileToByteArray(FileStream stream, int initialLength)
{
// If we've been passed an unhelpful initial length, just
// use 32K.
if (initialLength < 1)
{
initialLength = 32768;
}
BinaryReader br = new BinaryReader(stream);
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
while ((chunk = br.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
// If we've reached the end of our buffer, check to see if there's
// any more information
if (read == buffer.Length)
{
int nextByte = br.ReadByte();
// End of stream? If so, we're done
if (nextByte == -1)
{
return buffer;
}
// Nope. Resize the buffer, put in the byte we've just
// read, and continue
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
// Buffer is now too big. Shrink it.
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
I still get a System.OutOfMemory exception when trying to read large files.
If your file is 4GB, then BitConverter will turn each byte into XX- string, each char in string is 2 bytes * 3 chars per byte * 4 294 967 295 bytes = 25 769 803 770. You need +25Gb of free memory to fit entire string, plus you already have your file in memory as byte array.
Besides, no single object in a .Net program may be over 2GB. Theoretical limit for a string length would be 1,073,741,823 chars, but you also need to have a 64-bit process.
So solution in your case - open FileStream. Read first 16384 bytes (or how much can fit on your screen), convert to hex and display, and remember file offset. When user wants to navigate to next or previous page - seek to that position in file on disk, read and display again, etc.
You need to read the file in chunks, keep track of where you are in the file, page the contents on screen and use seek and position to move up and down in the file stream.
You will not be able to display 4Gb file reading all of it in memory first by any approach.
The approach is to virtualize the data, reading only the visible lines when user scrolls. If you need to do a read-only text viewer then you can use WPF ItemsControl with virtulizing stack panel and bind to custom IList collection which will lazily fetch lines from the file calculating file offset by for the line index.

compressing and decompressing source data gives result different than source data

In my app I need to Decompress data written by DataContractSerializer to compression Deflate Stream in another app, edit the decompressed data and Compress it again.
Decompression works fine, but not for data compressed by me.
The problem is that when I do this:
byte[] result = Compressor.Compress(Compressor.Decompress(sourceData));
the length of the result byte array is different than sourceData array.
For example:
string source = "test value";
byte[] oryg = Encoding.Default.GetBytes(source);
byte[] comp = Compressor.Compress(oryg);
byte[] result1 = Compressor.Decompress(comp);
string result2 = Encoding.Default.GetString(res);
and here result1.Length is 0 and result2 is "" of course
Here is the code of my Compressor class.
public static class Compressor
{
public static byte[] Decompress(byte[] data)
{
byte[] result;
using (MemoryStream baseStream = new MemoryStream(data))
{
using (DeflateStream stream = new DeflateStream(baseStream, CompressionMode.Decompress))
{
result = ReadFully(stream, -1);
}
}
return result;
}
public static byte[] Compress(byte[] data)
{
byte[] result;
using (MemoryStream baseStream = new MemoryStream())
{
using (DeflateStream stream = new DeflateStream(baseStream, CompressionMode.Compress, true))
{
stream.Write(data, 0, data.Length);
result = baseStream.ToArray();
}
}
return result;
}
/// <summary>
/// Reads data from a stream until the end is reached. The
/// data is returned as a byte array. An IOException is
/// thrown if any of the underlying IO calls fail.
/// </summary>
/// <param name="stream">The stream to read data from</param>
/// <param name="initialLength">The initial buffer length</param>
private static byte[] ReadFully(Stream stream, int initialLength)
{
// If we've been passed an unhelpful initial length, just
// use 32K.
if (initialLength < 1)
{
initialLength = 65768 / 2;
}
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
while ((chunk = stream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
// If we've reached the end of our buffer, check to see if there's
// any more information
if (read == buffer.Length)
{
int nextByte = stream.ReadByte();
// End of stream? If so, we're done
if (nextByte == -1)
{
return buffer;
}
// Nope. Resize the buffer, put in the byte we've just
// read, and continue
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
// Buffer is now too big. Shrink it.
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
}
Please help me with this case if You can.
Best regards,
Adam
(edited: switched from using flush, which still might not flush out all bytes, to now ensuring deflate is disposed first, as per Phil's answer here: zip and unzip string with Deflate)
Before attempting to read from backing store, you have to ensure the deflate stream has fully flushed itself when compressing, allowing deflate to finish compressing and write final bytes. Closing the deflate steam, or disposing of it, will achieve this.
public static byte[] Compress(byte[] data)
{
byte[] result;
using (MemoryStream baseStream = new MemoryStream())
{
using (DeflateStream stream = new DeflateStream(baseStream, CompressionMode.Compress, true))
{
stream.Write(data, 0, data.Length);
}
result = baseStream.ToArray(); // only safe to read after deflate closed
}
return result;
}
Also your ReadFully routine looks incredibly complicated and likely to have bugs.
One being:
while ((chunk = stream.Read(buffer, read, buffer.Length - read)) > 0)
When reading the 2nd chunk, read will be greater than the length of the buffer, meaning it'll always pass a negative value to stream.Read for the number of bytes to read. My guess is that it'll never read the 2nd chunk, returning zero, and fall out of the while loop.
I recommend Jon's version of ReadFully for this purpose: Creating a byte array from a stream

Problem with C# Decompression

Have some data in a sybase image type column that I want to use in a C# app. The data has been compressed by Java using the java.util.zip package. I wanted to test that I could decompress the data in C#. So I wrote a test app that pulls it out of the database:
byte[] bytes = (byte[])reader.GetValue(0);
This gives me a compressed byte[] of 2479 length.
Then I pass this to a seemingly standard C# decompression method:
public static byte[] Decompress(byte[] gzBuffer)
{
MemoryStream ms = new MemoryStream();
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
GZipStream zip = new GZipStream(ms, CompressionMode.Decompress);
zip.Read(buffer, 0, buffer.Length);
return buffer;
}
The value for msgLength is 1503501432 which seems way out of range. The original document should be in the range of 5K -50k. Anyway when I use that value to create "buffer" not surprisingly I get an OutOfMemoryException.
What is happening?
Jim
The Java compress method is as follows:
public byte[] compress(byte[] bytes) throws Exception {
byte[] results = new byte[bytes.length];
Deflater deflator = new Deflater();
deflater.setInput(bytes);
deflater.finish();
int len = deflater.deflate(results);
byte[] out = new byte[len];
for(int i=0; i<len; i++) {
out[i] = results[i];
}
return(out);
}
As I cant see your java code, I can only guess you are compressing your data to a zip file stream. Therefore it will obviously fail if you are trying to decompress that stream with a gzip decompression in c#. Either you change your java code to a gzip compression (Example here at the bottom of the page), or you decompress the zip file stream in c# with an appropriate library (e.g. SharpZipLib).
Update
Ok now, I see you are using deflate for the compression in java. So, obviously you have to use the same algorithm in c#: System.IO.Compression.DeflateStream
public static byte[] Decompress(byte[] buffer)
{
using (MemoryStream ms = new MemoryStream(buffer))
using (Stream zipStream = new DeflateStream(ms,
CompressionMode.Decompress, true))
{
int initialBufferLength = buffer.Length * 2;
byte[] buffer = new byte[initialBufferLength];
bool finishedExactly = false;
int read = 0;
int chunk;
while (!finishedExactly &&
(chunk = zipStream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
if (read == buffer.Length)
{
int nextByte = zipStream.ReadByte();
// End of Stream?
if (nextByte == -1)
{
finishedExactly = true;
}
else
{
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
}
if (!finishedExactly)
{
byte[] final = new byte[read];
Array.Copy(buffer, final, read);
buffer = final;
}
}
return buffer;
}

Categories