Stream only partially read when downloading object from Amazon AWS S3 [duplicate]

Stream only partially read when downloading object from Amazon AWS S3 [duplicate] - c#

This question already has answers here:
How to get all data from NetworkStream
(8 answers)
Closed 4 years ago.
I am trying to simply download an object from my bucket using C# just like we can find in S3 examples, and I can't figure out why the stream won't be entirely copied to my byte array. Only the first 8192 bytes are copied instead of the whole stream.
I have tried with with an Amazon.S3.AmazonS3Client and with an Amazon.S3.Transfer.TransferUtility, but in both cases only the first bytes are actually copied into the buffer.
var stream = await _transferUtility.OpenStreamAsync(BucketName, key);
using (stream)
{
byte[] content = new byte[stream.Length];
stream.Read(content, 0, content.Length);
// Here content should contain all the data from the stream, but only the first 8192 bytes are actually populated.
}
When debugging, I see the stream type is Amazon.Runtime.Internal.Util.Md5Stream, and inside the stream, before calling Read() the property CurrentPosition = 0. After the call, CurrentPosition becomes 8192, which seems to indeed indicate only the first 8K of data was read. The total Length of the stream is 104042.
If I make more calls to stream.Read(), I see more data gets read and CurrentPosition increases in value. But CurrentPosition is not a public property, and I cannot access it in my code to make a while() loop (and having to code such loops to read all the data seems a bit clunky).
Why are only the first 8K read in my code? How should I proceed to read the entire stream?
I tried calling stream.Flush(), but it did not fix the problem.
EDIT 1
I have modified my code so it does the following:
var stream = await _transferUtility.OpenStreamAsync(BucketName, key);
using (stream)
{
byte[] content = new byte[stream.Length];
var bytesRead = 0;
while (bytesRead < stream.Length)
bytesRead += stream.Read(content, bytesRead, content.Length - bytesRead);
}
And it works. But still looks clunky. Is it normal I have to do this?
EDIT 2
Final solution is to create a MemoryStream of the correct size and then call CopyTo(). So no clunky loop anymore and no risk of infinite loop if Read() starts returning 0 before the whole stream has been read:
var stream = await _transferUtility.OpenStreamAsync(BucketName, key);
using (stream)
{
using (var memoryStream = new MemoryStream((int)stream.Length))
{
stream.CopyTo(memoryStream);
var myBuffer = memoryStream.GetBuffer();
}
}

stream.Read() returns the number of bytes read. You can then keep track of the total number of bytes read until you have reached the end of the file (content.Length).
You could also just loop until the returned value is 0 meaning error / no more bytes left.
You will need to keep track of the current offset for your content buffer so that you are not overwriting data for each call.

Related

Why does StreamReader.ReadToEnd work but not Stream.Read?

I am trying to get the body of a request in an ASP.NET Core controller as a byte[] array. Here is what I initially wrote:
var declaredLength = (int)request.ContentLength;
byte[] fileBuffer = new byte[declaredLength];
request.Body.Read(fileBuffer, 0, declaredLength);
This code works, but only for small requests (around ~20KB). For larger requests it fills up the first 20,000 or so bytes in the array, then the rest of the array is empty.
I used some code in the top answer here, and was able to read the entire request body successfully after rewriting my code:
var declaredLength = (int)request.ContentLength;
byte[] fileBuffer = new byte[declaredLength];
// need to enable, otherwise Seek() fails
request.EnableRewind();
// using StreamReader apparently resolves the issue
using (var reader = new StreamReader(request.Body, Encoding.UTF8, true, 1024, true))
{
reader.ReadToEnd();
}
request.Body.Seek(0, SeekOrigin.Begin);
request.Body.Read(fileBuffer, 0, declaredLength);
Why is StreamReader.ReadToEnd() able to read the entire request body successfully, while Stream.Read() can't? Reading the request stream twice feels like a hack. Is there a better way to go about this? (I only need to read the stream into a byte array once)

Remember that you're trying to read request.Body before all of the request has been received yet.
Stream.Read behaves like this:
If the end of the stream has been reached, return 0
If there are no bytes available which haven't already been read, block until at least 1 byte is available
If 1 or more new bytes are available, return them straight away. Don't block.
As you can see, if the whole body hasn't been received yet, request.Body.Read(...) will just return the part of the body that has been received.
StreamReader.ReadToEnd() calls Stream.Read in a loop, until it finds the end of the stream.
You should probably call Stream.Read in a loop as well, until you've read all of the bytes:
byte[] fileBuffer = new byte[declaredLength];
int numBytesRead = 0;
while (numBytesRead < declaredLength)
{
int readBytes = request.Body.Read(fileBuffer, numBytesRead, declaredLength - numBytesRead);
if (readBytes == 0)
{
// We reached the end of the stream before we were expecting it
// Might want to throw an exception here?
}
numBytesRead += readBytes;
}

Decompress file using zlib - stuck at 256kb limit

I hope someone here will be able to help me out with this.
What I'm trying to do is decompress a zlib compressed file in C# using ZlibNet. (I've also tried DotNetZip and SharpZipLib)
The problem that I'm having is that it'll decompress only the first 256kb, or rather the first 262144 bytes.
Here's my Decompress method, taken from here:
public static byte[] Decompress(byte[] gzip)
{
using (var stream = new Ionic.Zlib.ZlibStream(new MemoryStream(gzip), Ionic.Zlib.CompressionMode.Decompress))
{
var outStream = new MemoryStream();
const int size = 999999; //Playing around with various sizes didn't help
byte[] buffer = new byte[size];
int read;
while ((read = stream.Read(buffer, 0, size)) > 0)
{
outStream.Write(buffer, 0, read);
read = 0;
}
return outStream.ToArray();
}
}
Basically, the int (read) gets set to 262144 on the first time the while loop executes, it writes, and then the next pass of the while loop, read gets said to 0, thus making the loop exit and the function return the outStream as an array. (Even though there are still bytes left to be read!)
Thanks in advance to anyone who could help with this!

Upon further inspection of the originally packed data, it turns out that the script responsible for (de)compressing the data in the original application would split the zlib stream of a file into chunks of 262144 bytes each.
This is why the various libraries I tested always stopped at 262144 bytes-- it was the end of the zlib stream, but not the end of the file it was supposed to extract. (Each zlib stream was also seperated by a 32-bit unsigned int that indicated the amount of bytes the next zlib stream would contain)
My only guess is that they did this so that if they had a very large file, they wouldn't need to load all of it into memory for decompression. (But that's just a guess.)

Image from Stream - only works on first page load

We have a code snippet that is converting Stream to byte[] and later displaying that as an image in aspx page.
Problem is when first time page loads image is being displayed, but not displaying for later requests (reload etc).
Only difference I observed is Stream position in 'input' (ConvertStreamtoByteArray) is 0 for the first time and subsequent calls is > 0. How do I fix this?
context.Response.Clear();
context.Response.ContentType = "image/pjpeg";
context.Response.BinaryWrite(ConvertStreamtoByteArray(imgStream));
context.Response.End();
private static byte[] ConvertStreamtoByteArray(Stream input)
{
var buffer = new byte[16 * 1024];
using (var ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
I think the source is : Creating a byte array from a stream
I think the code snippet is from above link, I see everything matches except for method name.

You're (most likely) holding a reference to imgStream, so the same stream is being used every time ConvertStreamtoByteArray is called.
The problem is that streams track their Position. This starts at 0 when the stream is new, and ends up at the end when you read the entire stream.
Usually the solution in this case is to set the Position back to 0 prior to copying the content of the stream.
In your case, you should probably 1) convert imgStream to a byte array the first time it's needed 2) cache this byte array and not the stream 3) dispose and throw away imgStream and 4) pass the byte array to the Response from this point onwards.
See, this is what happens when you copypasta code from the internets. Weird stuff like this, repeatedly converting the same stream to a byte array (waste of time!), and you end up not using the framework to do your work for you. Manually copying streams is so 2000s.

How to copy a Stream from the begining irrespective its current position

I got a file stream which has content read from a disk.
Stream input = new FileStream("filename");
This stream is to be passed to a third party library which after reading the stream, keeps the Stream's position pointer at the end of the file (as ususal).
My requirement is not to load the file from the desk everytime, instead I want to maintain MemoryStream, which will be used everytime.
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, read);
}
}
I have tried the above code. It works for the first very time to copy the input stream to output stream, but subsequent calls to CopyStream will not work as the source's Position will be at the end of the stream after the first call.
Are there other alternatives which copy the content of the source stream to another stream irrespective of the source stream's current Position.
And this code needs to run in thread safe manner in a multi threaded environment.

You can use .NET 4.0 Stream.CopyTo to copy your steam to a MemoryStream. The MemoryStream has a Position property you can use to move its postition to the beginning.
var ms = new MemoryStream();
using (Stream file = File.OpenRead(#"filename"))
{
file.CopyTo(ms);
}
ms.Position = 0;
To make a thread safe solution, you can copy the content to a byte array, and make a new MemoryStream wrapping the byte array for each thread that need access:
byte[] fileBytes = ms.ToArray();
var ms2 = new MemoryStream(fileBytes);

You should check the input stream's CanSeek property. If that returns false, you can only read it once anyway. If CanSeek returns true, you can set the position to zero and copy away.
if (input.CanSeek)
{
input.Position = 0;
}
You may also want to store the old position and restore it after copying.
ETA: Passing the same instance of a Stream around is not the safest thing to do. E.g. you can't be sure the Stream wasn't disposed when you get it back. I'd suggest to copy the FileStream to a MemoryStream in the beginning, but only store the byte content of the latter by calling ToArray(). When you need to pass a Stream somewhere, just create a new one each time with new MemoryStream(byte[]).

How to find the no. of bytes of a text file without reading it?

I have c# code reading a text file and printing it out which looks like this:
StreamReader sr = new StreamReader(File.OpenRead(ofd.FileName));
byte[] buffer = new byte[100]; //is there a way to simply specify the length of this to be the number of bytes in the file?
sr.BaseStream.Read(buffer, 0, buffer.Length);
foreach (byte b in buffer)
{
label1.Text += b.ToString("x") + " ";
}
Is there anyway I can know how many bytes my file has?
I want to know the length of the byte[] buffer in advance so that in the Read function, I can simply pass in buffer.length as the third argument.

System.IO.FileInfo fi = new System.IO.FileInfo("myfile.exe");
long size = fi.Length;
In order to find the file size, the system has to read from the disk. So, the above example performs data read from disk but does not read file content.

It's not clear why you're using StreamReader at all if you're going to read binary data. Just use FileStream instead. You can use the Length property to find the length of the file.
Note, however, that that still doesn't mean you should just call Read and *assume` that a single call will read all the data. You should loop until you've read everything:
byte[] data;
using (var stream = File.OpenRead(...))
{
data = new byte[(int) stream.Length];
int offset = 0;
while (offset < data.Length)
{
int chunk = stream.Read(data, offset, data.Length - offset);
if (chunk == 0)
{
// Or handle this some other way
throw new IOException("File has shrunk while reading");
}
offset += chunk;
}
}
Note that this is assuming you do want to read the data. If you don't want to even open the stream, use FileInfo.Length as other answers have shown. Note that both FileStream.Length and FileInfo.Length have a type of long, whereas arrays are limited to 32-bit lengths. What do you want to happen with a file which is bigger than 2 gigs?

You can use the FileInfo.Length method.
Take a look at the example given in the link.

I would imagine something in here should help.
I doubt you can preemptively guess the size of a file without reading it...
How do I use File.ReadAllBytes In chunks
If it is a large file; then reading in chunks should might help

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Stream only partially read when downloading object from Amazon AWS S3 [duplicate] - c#

Related

Why does StreamReader.ReadToEnd work but not Stream.Read?

Decompress file using zlib - stuck at 256kb limit

Image from Stream - only works on first page load

How to copy a Stream from the begining irrespective its current position

How to find the no. of bytes of a text file without reading it?

Categories

Resources