I am attempting to download a large file from a public URL. It seemed to work fine at first but 1 / 10 computers seem to timeout. My initial attempt was to use WebClient.DownloadFileAsync but because it would never complete I fell back to using WebRequest.Create and read the response streams directly.
My first version of using WebRequest.Create found the same problem as WebClient.DownloadFileAsync. The operation times out and the file does not complete.
My next version added retries if the download times out. Here is were it gets weird. The download does eventually finish with 1 retry to finish up the last 7092 bytes. So the file is downloaded with exactly the same size BUT the file is corrupt and differs from the source file. Now I would expect the corruption to be in the last 7092 bytes but this is not the case.
Using BeyondCompare I have found that there are 2 chunks of bytes missing from the corrupt file totalling up to the missing 7092 bytes! This missing bytes are at 1CA49FF0 and 1E31F380, way way before the download times out and is restarted.
What could possibly be going on here? Any hints on how to track down this problem further?
Here is the code in question.
public void DownloadFile(string sourceUri, string destinationPath)
{
//roughly based on: http://stackoverflow.com/questions/2269607/how-to-programmatically-download-a-large-file-in-c-sharp
//not using WebClient.DownloadFileAsync as it seems to stall out on large files rarely for unknown reasons.
using (var fileStream = File.Open(destinationPath, FileMode.Create, FileAccess.Write, FileShare.Read))
{
long totalBytesToReceive = 0;
long totalBytesReceived = 0;
int attemptCount = 0;
bool isFinished = false;
while (!isFinished)
{
attemptCount += 1;
if (attemptCount > 10)
{
throw new InvalidOperationException("Too many attempts to download. Aborting.");
}
try
{
var request = (HttpWebRequest)WebRequest.Create(sourceUri);
request.Proxy = null;//http://stackoverflow.com/questions/754333/why-is-this-webrequest-code-slow/935728#935728
_log.AddInformation("Request #{0}.", attemptCount);
//continue downloading from last attempt.
if (totalBytesReceived != 0)
{
_log.AddInformation("Request resuming with range: {0} , {1}", totalBytesReceived, totalBytesToReceive);
request.AddRange(totalBytesReceived, totalBytesToReceive);
}
using (var response = request.GetResponse())
{
_log.AddInformation("Received response. ContentLength={0} , ContentType={1}", response.ContentLength, response.ContentType);
if (totalBytesToReceive == 0)
{
totalBytesToReceive = response.ContentLength;
}
using (var responseStream = response.GetResponseStream())
{
_log.AddInformation("Beginning read of response stream.");
var buffer = new byte[4096];
int bytesRead = responseStream.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
fileStream.Write(buffer, 0, bytesRead);
totalBytesReceived += bytesRead;
bytesRead = responseStream.Read(buffer, 0, buffer.Length);
}
_log.AddInformation("Finished read of response stream.");
}
}
_log.AddInformation("Finished downloading file.");
isFinished = true;
}
catch (Exception ex)
{
_log.AddInformation("Response raised exception ({0}). {1}", ex.GetType(), ex.Message);
}
}
}
}
Here is the log output from the corrupt download:
Request #1.
Received response. ContentLength=939302925 , ContentType=application/zip
Beginning read of response stream.
Response raised exception (System.Net.WebException). The operation has timed out.
Request #2.
Request resuming with range: 939295833 , 939302925
Received response. ContentLength=7092 , ContentType=application/zip
Beginning read of response stream.
Finished read of response stream.
Finished downloading file.
this is the method I usually use, it hasn't failed me so far for the same kind of loading you need. Try using my code to change yours up a bit and see if that helps.
if (!Directory.Exists(localFolder))
{
Directory.CreateDirectory(localFolder);
}
try
{
HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(Path.Combine(uri, filename));
httpRequest.Method = "GET";
// if the URI doesn't exist, exception gets thrown here...
using (HttpWebResponse httpResponse = (HttpWebResponse)httpRequest.GetResponse())
{
using (Stream responseStream = httpResponse.GetResponseStream())
{
using (FileStream localFileStream =
new FileStream(Path.Combine(localFolder, filename), FileMode.Create))
{
var buffer = new byte[4096];
long totalBytesRead = 0;
int bytesRead;
while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
totalBytesRead += bytesRead;
localFileStream.Write(buffer, 0, bytesRead);
}
}
}
}
}
catch (Exception ex)
{
throw;
}
You should change the timeout settings. There seem to be two possible timeout issues:
Client-side timeout - try changing the timeouts in WebClient. I find for large file downloads sometimes I need to do that.
Server-side timeout - try changing the timeout on the server. You can validate this is the problem using another client, e.g. PostMan
For me your method on how to read the file by buffering looks very weird.
Maybe the problem is, that you do
while(bytesRead > 0)
What if, for some reason, the stream doesnt return any bytes at some point but it is still not yet finished downloading, then it would exit the loop and never come back. You should get the Content-Length, and increment a variable totalBytesReceived by bytesRead. Finally you change the loop to
while(totalBytesReceived < ContentLength)
Allocate buffer size bigger than expected file size .
byte[] byteBuffer = new byte[65536];
so that , if the file is 1GiB in size, you allocate a 1 GiB buffer, and then you try to fill the whole buffer in one call. This filling may return fewer bytes but you've still allocated the whole buffer. Note that the maximum length of a single array in .NET is a 32-bit number which means that even if you recompile your program for 64bit and actually have enough memory available.
Related
I am trying to get the body of a request in an ASP.NET Core controller as a byte[] array. Here is what I initially wrote:
var declaredLength = (int)request.ContentLength;
byte[] fileBuffer = new byte[declaredLength];
request.Body.Read(fileBuffer, 0, declaredLength);
This code works, but only for small requests (around ~20KB). For larger requests it fills up the first 20,000 or so bytes in the array, then the rest of the array is empty.
I used some code in the top answer here, and was able to read the entire request body successfully after rewriting my code:
var declaredLength = (int)request.ContentLength;
byte[] fileBuffer = new byte[declaredLength];
// need to enable, otherwise Seek() fails
request.EnableRewind();
// using StreamReader apparently resolves the issue
using (var reader = new StreamReader(request.Body, Encoding.UTF8, true, 1024, true))
{
reader.ReadToEnd();
}
request.Body.Seek(0, SeekOrigin.Begin);
request.Body.Read(fileBuffer, 0, declaredLength);
Why is StreamReader.ReadToEnd() able to read the entire request body successfully, while Stream.Read() can't? Reading the request stream twice feels like a hack. Is there a better way to go about this? (I only need to read the stream into a byte array once)
Remember that you're trying to read request.Body before all of the request has been received yet.
Stream.Read behaves like this:
If the end of the stream has been reached, return 0
If there are no bytes available which haven't already been read, block until at least 1 byte is available
If 1 or more new bytes are available, return them straight away. Don't block.
As you can see, if the whole body hasn't been received yet, request.Body.Read(...) will just return the part of the body that has been received.
StreamReader.ReadToEnd() calls Stream.Read in a loop, until it finds the end of the stream.
You should probably call Stream.Read in a loop as well, until you've read all of the bytes:
byte[] fileBuffer = new byte[declaredLength];
int numBytesRead = 0;
while (numBytesRead < declaredLength)
{
int readBytes = request.Body.Read(fileBuffer, numBytesRead, declaredLength - numBytesRead);
if (readBytes == 0)
{
// We reached the end of the stream before we were expecting it
// Might want to throw an exception here?
}
numBytesRead += readBytes;
}
I am experiencing some strange behaviour from my code which i am using to stream files to my clients.
I have a mssql server which acts as a filestore, with files that is accessed via an UNC path.
On my webserver i have some .net code running that handles streaming the files (in this case pictures and thumbnails) to my clients.
My code works, but i am experiencing a constant delay of ~12 sec on the initial file request. When i have made the initial request it is as the server wakes up and suddenly becomes responsive only to fall back to the same behaviour some time after.
At first i thought it was my code, but from what i can see on the server activity log there is no ressource intensive code going on. My theory is that at each call to the server the path must first be mounted and that is what causes the delay. It will then unmount some time after and will have to remount.
For reference i am posting my code (maybe i just cannot see the problem):
public async static Task StreamFileAsync(HttpContext context, FileInfo fileInfo)
{
//This controls how many bytes to read at a time and send to the client
int bytesToRead = 512 * 1024; // 512KB
// Buffer to read bytes in chunk size specified above
byte[] buffer = new Byte[bytesToRead];
// Clear the current response content/headers
context.Response.Clear();
context.Response.ClearHeaders();
//Indicate the type of data being sent
context.Response.ContentType = FileTools.GetMimeType(fileInfo.Extension);
//Name the file
context.Response.AddHeader("Content-Disposition", "filename=\"" + fileInfo.Name + "\"");
context.Response.AddHeader("Content-Length", fileInfo.Length.ToString());
// Open the file
using (var stream = fileInfo.OpenRead())
{
// The number of bytes read
int length;
do
{
// Verify that the client is connected
if (context.Response.IsClientConnected)
{
// Read data into the buffer
length = await stream.ReadAsync(buffer, 0, bytesToRead);
// and write it out to the response's output stream
await context.Response.OutputStream.WriteAsync(buffer, 0, length);
try
{
// Flush the data
context.Response.Flush();
}
catch (HttpException)
{
// Cancel the download if a HttpException happens
// (ie. the client has disconnected by we tried to send some data)
length = -1;
}
//Clear the buffer
buffer = new Byte[bytesToRead];
}
else
{
// Cancel the download if client has disconnected
length = -1;
}
} while (length > 0); //Repeat until no data is read
}
// Tell the response not to send any more content to the client
context.Response.SuppressContent = true;
// Tell the application to skip to the EndRequest event in the HTTP pipeline
context.ApplicationInstance.CompleteRequest();
}
If anyone could shed some light over this problem i would be very grateful!
My problem is the following:
I am currently extensively testing my application and now I got to know that it isn't able to handle uploads of large files. Of course I informed myself of this problem, and the AllowWriteStreamBuffering-Property is already set to false, but when I try to upload a file with a size of ~ 700 Mb, my PC freezes (I have 4 Gb RAM and I don't get an MemoryOutOfRangeException). Neither I am able to use the HttpClient-Class because I have to provide support for .NET Framework 4, nor I can chunk the upload because the target-servers do not support that kind of upload.
I think the memory problem is caused because the data I already sent (while uploading) is still allocated in the RAM.
These lines of code are responsible for that:
while ((bytesRead = fileStream.Read(fileBuffer, 0, fileBuffer.Length)) != 0)
{
requestStream.Write(fileBuffer, 0, (int)bytesRead);
}
How can I delete the data which is already sent but still using my memory?
If this isn't the cause of the problem, how can I solve it then?
I tried several Methods, and a kind of internal chunked Upload (but the Chunked-Property of the HttpWebRequest is false) works:
long bytesRead = 0;
long splitBytes = 1000000; // ≈ 1 Mb
int dataPacks = (int)Math.Ceiling((double)file.FileSize/splitBytes);
FileStream fileStream = new FileStream(file.Path, FileMode.Open, FileAccess.Read);
byte[] fileBuffer = new byte[splitBytes];
Stream requestStream = request.GetRequestStream();
requestStream.Write(postBuffer, 0, postBuffer.Length);
while (fileStream.Read(fileBuffer, 0, fileBuffer.Length) != 0)
{
if (bytesRead + splitBytes <= file.FileSize)
{
requestStream.Write(fileBuffer, 0, fileBuffer.Length);
fileStream.Flush();
bytesRead += splitBytes;
}
else
{
requestStream.Write(fileBuffer, 0, (int)(file.FileSize - bytesRead));
fileStream.Flush();
bytesRead += (file.FileSize - bytesRead);
}
}
It works even on servers which don't accept a real-chunked upload.
Looking at the different ways to upload a file in .NET, e.g. HttpPostedFile, and using a HttpHandler, I'm trying to understand how the process works in a bit more details.
Specifically how it writes the information to a file.
Say I have the following:
HttpPostedFile file = context.Request.Files[0];
file.SaveAs("c:\temp\file.zip");
The actual file does not get created until the full stream seems to be processed.
Similarly:
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
I would have thought that this would "progressively" write the file as it reads the stream. Looking at the filesystem, it does not seems to do this at all. If I breakpoint inside the while, it does though.
What I'm trying to do, is have it so you upload a file (using a javascript uploader), and poll alongside, whereby the polling ajax request tries to get the fileinfo(file size) of the uploaded file every second. However, it always returns 0 until the upload is complete.
Vimeo seems to be able to do this type of functionality (for IE)?? Is this a .NET limitation, or is there a way to progressively write the file from the stream?
Two points:
First, in Windows, the displayed size of a file is not updated constantly. The file might indeed be growing continually, but the size only increases once.
Second (more likely in this case), the stream might not be flushing to the disk. You could force it to by adding output.Flush() after the call to output.Write(). You might not want to do that, though, since it will probably have a negative impact on performance.
Perhaps you could poll the Length property of the output stream directly, instead of going through the file system.
EDIT:
To make the Length property of the stream accessible to other threads, you could have a field in your class and update it with each read/write:
private long _uploadedByteCount;
void SomeMethod()
{
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
Interlocked.Add(ref _uploadedByteCount, bytesRead);
}
}
}
public long GetUploadedByteCount()
{
return _uploadedByteCount;
}
I download a file's parts (a picture), and then I want to save these parts into one file.
The problem is, that the first part is being downloaded and saved properly (I can see the part of that pricture). But, when the second part is saved (FileMode.Append) the picture seems to be broken.
Here's the code:
HttpWebRequest webRequest;
HttpWebResponse webResponse;
Stream responseStream;
long StartPosition, EndPosition;
if (File.Exists(LocalPath))
fileStream = new FileStream(LocalPath, FileMode.Append);
else fileStream = new FileStream(LocalPath, FileMode.Create);
webRequest = (HttpWebRequest)WebRequest.Create(FileURL);
webResponse = (HttpWebResponse)webRequest.GetResponse();
responseStream = webResponse.GetResponseStream();
StartPosition = 0; //download first 52062 bytes of the file
EndPosition = 52061;
webRequest.AddRange(StartPosition, EndPosition);
int SeekPosition = (int)StartPosition;
while ((bytesSize = responseStream.Read(Buffer, 0, Buffer.Length)) > 0)
{
lock (fileStream)
{
fileStream.Seek(SeekPosition, SeekOrigin.Begin);
fileStream.Write(Buffer,0, bytesSize);
}
//the Buffer.Length is 2048.
//When the bytes count to download is < 2048 then I decrease the Buffer.Length
//to prevent downloading more that 52062 bytes.
DownloadedBytesCount += bytesSize;
SeekPosition += bytesSize;
long TotalToDownload = EndPosition - StartPosition;
long bytesLeft = TotalToDownload - DownloadedBytesCount;
if (bytesLeft < Buffer.Length)
Buffer = new byte[bytesLeft];
}
WHen I want to download the second part of the file I set
StartPosition = 52062;
EndPosition = 104122;
and then there is a problem that I described above. Why the file is not appened properly ?
You don't need StartPosition, fileStream.Seek() and Buffer = new byte[bytesLeft];
Also the lock() shouldn't be necessary (if it is you've got a lot more troubles).
So remove all that because the chances are you got some of it wrong.
And if it then still doesn't work, edit the question and provide more information. There is quite a lot missing right now:
could you verify with the debugger if the download loop is executed at all.
how is the changeover to the 2nd range 52k - 104k performed
how long is the resulting file in the end?
does the file contain the first 52k bytes or the 2nd download?
etc
All of that matters and we shouldn't have to guess.
What i would try is to download the image some way that you know that it works and compare the byte result to check where the file gets broken and what is breaking it...
This code is wicked... sorry but you must start by deleting all the code and looking at your problem from the beginning. There are many better ways to accomplish what you want. Just take a look at some good solutions:
http://www.codeproject.com/KB/IP/MyDownloader.aspx