Decompress file using zlib - stuck at 256kb limit

Decompress file using zlib - stuck at 256kb limit - c#

I hope someone here will be able to help me out with this.
What I'm trying to do is decompress a zlib compressed file in C# using ZlibNet. (I've also tried DotNetZip and SharpZipLib)
The problem that I'm having is that it'll decompress only the first 256kb, or rather the first 262144 bytes.
Here's my Decompress method, taken from here:
public static byte[] Decompress(byte[] gzip)
{
using (var stream = new Ionic.Zlib.ZlibStream(new MemoryStream(gzip), Ionic.Zlib.CompressionMode.Decompress))
{
var outStream = new MemoryStream();
const int size = 999999; //Playing around with various sizes didn't help
byte[] buffer = new byte[size];
int read;
while ((read = stream.Read(buffer, 0, size)) > 0)
{
outStream.Write(buffer, 0, read);
read = 0;
}
return outStream.ToArray();
}
}
Basically, the int (read) gets set to 262144 on the first time the while loop executes, it writes, and then the next pass of the while loop, read gets said to 0, thus making the loop exit and the function return the outStream as an array. (Even though there are still bytes left to be read!)
Thanks in advance to anyone who could help with this!

Upon further inspection of the originally packed data, it turns out that the script responsible for (de)compressing the data in the original application would split the zlib stream of a file into chunks of 262144 bytes each.
This is why the various libraries I tested always stopped at 262144 bytes-- it was the end of the zlib stream, but not the end of the file it was supposed to extract. (Each zlib stream was also seperated by a 32-bit unsigned int that indicated the amount of bytes the next zlib stream would contain)
My only guess is that they did this so that if they had a very large file, they wouldn't need to load all of it into memory for decompression. (But that's just a guess.)

Related

Handling big file stream (read+write bytes)

The following code do :
Read all bytes from an input file
Keep only part of the file in outbytes
Write the extracted bytes in outputfile
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);
But there is a limitation of ~2GB data for each step.
Edit: The extracted bytes size can also be greater than 2GB.
How could I handle big file ? What is the best way to proceed with good performances, regardless of size ?
Thx !

Example to FileStream to take the middle 3 Gb out of a 5 Gb file:
byte[] buffer = new byte{1024*1024];
using(var readFS = File.Open(pathToBigFile))
using(var writeFS = File.OpenWrite(pathToNewFile))
{
readFS.Seek(1024*1024*1024); //seek to 1gb in
for(int i=0; i < 3000; i++){ //3000 times of one megabyte = 3gb
int bytesRead = readFS.Read(buffer, 0, buffer.Length);
writeFS.Write(buffer, 0, bytesRead);
}
}
It's not a production grade code; Read might not read a full megabyte so you'd end up with less than 3Gb - it's more to demonstrate the concept of using two filestreams and reading repeatedly from one and writing repeatedly to the other. I'm sure you can modify it so that it copies an exact number of bytes by keeping track of the total of all the bytesRead in the loop and stopping reading when you have read enough

It is better to stream the data from one file to the other, only loading small parts of it into memory:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
Usage:
CopyFileSection(sourcefile, outfile, offset, size);
This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.
Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.

Stream only partially read when downloading object from Amazon AWS S3 [duplicate]

This question already has answers here:
How to get all data from NetworkStream
(8 answers)
Closed 4 years ago.
I am trying to simply download an object from my bucket using C# just like we can find in S3 examples, and I can't figure out why the stream won't be entirely copied to my byte array. Only the first 8192 bytes are copied instead of the whole stream.
I have tried with with an Amazon.S3.AmazonS3Client and with an Amazon.S3.Transfer.TransferUtility, but in both cases only the first bytes are actually copied into the buffer.
var stream = await _transferUtility.OpenStreamAsync(BucketName, key);
using (stream)
{
byte[] content = new byte[stream.Length];
stream.Read(content, 0, content.Length);
// Here content should contain all the data from the stream, but only the first 8192 bytes are actually populated.
}
When debugging, I see the stream type is Amazon.Runtime.Internal.Util.Md5Stream, and inside the stream, before calling Read() the property CurrentPosition = 0. After the call, CurrentPosition becomes 8192, which seems to indeed indicate only the first 8K of data was read. The total Length of the stream is 104042.
If I make more calls to stream.Read(), I see more data gets read and CurrentPosition increases in value. But CurrentPosition is not a public property, and I cannot access it in my code to make a while() loop (and having to code such loops to read all the data seems a bit clunky).
Why are only the first 8K read in my code? How should I proceed to read the entire stream?
I tried calling stream.Flush(), but it did not fix the problem.
EDIT 1
I have modified my code so it does the following:
var stream = await _transferUtility.OpenStreamAsync(BucketName, key);
using (stream)
{
byte[] content = new byte[stream.Length];
var bytesRead = 0;
while (bytesRead < stream.Length)
bytesRead += stream.Read(content, bytesRead, content.Length - bytesRead);
}
And it works. But still looks clunky. Is it normal I have to do this?
EDIT 2
Final solution is to create a MemoryStream of the correct size and then call CopyTo(). So no clunky loop anymore and no risk of infinite loop if Read() starts returning 0 before the whole stream has been read:
var stream = await _transferUtility.OpenStreamAsync(BucketName, key);
using (stream)
{
using (var memoryStream = new MemoryStream((int)stream.Length))
{
stream.CopyTo(memoryStream);
var myBuffer = memoryStream.GetBuffer();
}
}

stream.Read() returns the number of bytes read. You can then keep track of the total number of bytes read until you have reached the end of the file (content.Length).
You could also just loop until the returned value is 0 meaning error / no more bytes left.
You will need to keep track of the current offset for your content buffer so that you are not overwriting data for each call.

Image from Stream - only works on first page load

We have a code snippet that is converting Stream to byte[] and later displaying that as an image in aspx page.
Problem is when first time page loads image is being displayed, but not displaying for later requests (reload etc).
Only difference I observed is Stream position in 'input' (ConvertStreamtoByteArray) is 0 for the first time and subsequent calls is > 0. How do I fix this?
context.Response.Clear();
context.Response.ContentType = "image/pjpeg";
context.Response.BinaryWrite(ConvertStreamtoByteArray(imgStream));
context.Response.End();
private static byte[] ConvertStreamtoByteArray(Stream input)
{
var buffer = new byte[16 * 1024];
using (var ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
I think the source is : Creating a byte array from a stream
I think the code snippet is from above link, I see everything matches except for method name.

You're (most likely) holding a reference to imgStream, so the same stream is being used every time ConvertStreamtoByteArray is called.
The problem is that streams track their Position. This starts at 0 when the stream is new, and ends up at the end when you read the entire stream.
Usually the solution in this case is to set the Position back to 0 prior to copying the content of the stream.
In your case, you should probably 1) convert imgStream to a byte array the first time it's needed 2) cache this byte array and not the stream 3) dispose and throw away imgStream and 4) pass the byte array to the Response from this point onwards.
See, this is what happens when you copypasta code from the internets. Weird stuff like this, repeatedly converting the same stream to a byte array (waste of time!), and you end up not using the framework to do your work for you. Manually copying streams is so 2000s.

Writing bytes to a file in C#

I have a BLOB from an Oracle database. In .NET it is of type OracleLob and has among them a Read and ReadByte methods.
int OracleLob.Read(byte[] buffer, int offset, int count)
int OracleLob.ReadByte()
So the Read method reads a sequence of bytes and ReadByte reads a single byte at a time. Here is my code:
OracleLob ol = (OracleLob) cmd.Parameters[1].Value; //her er filen!!
BinaryWriter binWriter = new BinaryWriter(File.Open(#"D:\wordfile.DOCX", FileMode.Create));
int currentByte = ol.ReadByte();
while (currentByte != -1)
{
binWriter.Write(currentByte);
currentByte = ol.ReadByte();
}
binWriter.Close();
But when I open wordfile.DOCX in Word, it says that the file is corrupt and cannot be opened. What am I doing wrong?

BinaryWriter will output the int in some form of serialized manner. It won't write just a single byte. Use FileStream for this purpose, and use the byte[] versions of the read/write methods, since byte-at-a-time streaming is very slow.

A docx file is a OpenXml format file that is basically a series of xml files that is zipped down and renamed to docx. You can't just take a output from a database and write to a file and magically make it into a docx file.
Are you sure it's a docx file you are trying to make here? The only way I can see this working is if you serialized a docx file into the database, but then you have to make sure that it's de-serialized the exactly same way on "the way out", else the underlaying zip file will be corrupt, and the file cannot be opened.

What's wrong with the code is that it's using an int value when writing the byte data to the BinaryWriter. It's using the overload that is writing an int instead of the one writing a byte, so each byte from the source will be written as four bytes. If you check the file size you see that it's four times as large as it should be.
Cast the value to byte so that the correct overload of the Write method is used:
binWriter.Write((byte)currentByte);
To do this more efficiently, you can use a buffer to read blocks of bytes instead of a single byte at a time:
using (FileStream stream = File.Open(#"D:\wordfile.DOCX", FileMode.Create)) {
byte[] buffer = new byte[4096];
int len = ol.Read(buffer, 0, buffer.Length);
while (len > 0) {
stream.Write(buffer, 0, len);
len = ol.Read(buffer, 0, buffer.Length);
}
}

currentByte is declared as an int, so the binary writer is writing 4 bytes for each write.
You need to cast currentByte as an actual byte:
binWriter.Write((byte) currentByte);

How to store an IStream to a file via C#?

I'm working with a 3rd party component that returns an IStream object (System.Runtime.InteropServices.ComTypes.IStream). I need to take the data in that IStream and write it to a file. I've managed to get that done, but I'm not really happy with the code.
With "strm" being my IStream, here's my test code...
// access the structure containing statistical info about the stream
System.Runtime.InteropServices.ComTypes.STATSTG stat;
strm.Stat(out stat, 0);
System.IntPtr myPtr = (IntPtr)0;
// get the "cbSize" member from the stat structure
// this is the size (in bytes) of our stream.
int strmSize = (int)stat.cbSize; // *** DANGEROUS *** (long to int cast)
byte[] strmInfo = new byte[strmSize];
strm.Read(strmInfo, strmSize, myPtr);
string outFile = #"c:\test.db3";
File.WriteAllBytes(outFile, strmInfo);
At the very least, I don't like the long to int cast as commented above, but I wonder if there's not a better way to get the original stream length than the above? I'm somewhat new to C#, so thanks for any pointers.

You don't need to do that cast, as you can read data from IStream source in chunks.
// ...
System.IntPtr myPtr = (IntPtr)-1;
using (FileStream fs = new FileStream(#"c:\test.db3", FileMode.OpenOrCreate))
{
byte[] buffer = new byte[8192];
while (myPtr.ToInt32() > 0)
{
strm.Read(buffer, buffer.Length, myPtr);
fs.Write(buffer, 0, myPtr.ToInt32());
}
}
This way (if works) is more memory efficient, as it just uses a small memory block to transfer data between that streams.

System.Runtime.InteropServices.ComTypes.IStream is a wrapper for ISequentialStream.
From MSDN: http://msdn.microsoft.com/en-us/library/aa380011(VS.85).aspx
The actual number of bytes read can be
less than the number of bytes
requested if an error occurs or if the
end of the stream is reached during
the read operation. The number of
bytes returned should always be
compared to the number of bytes
requested. If the number of bytes
returned is less than the number of
bytes requested, it usually means the
Read method attempted to read past the
end of the stream.
This documentation says, that you can loop and read as long as pcbRead is less then cb.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Decompress file using zlib - stuck at 256kb limit - c#

Related

Handling big file stream (read+write bytes)

Stream only partially read when downloading object from Amazon AWS S3 [duplicate]

Image from Stream - only works on first page load

Writing bytes to a file in C#

How to store an IStream to a file via C#?

Categories

Resources