Uploading large Files with HttpWebRequest (AllowWriteSteamBuffering doesnt solve this) - c#

My problem is the following:
I am currently extensively testing my application and now I got to know that it isn't able to handle uploads of large files. Of course I informed myself of this problem, and the AllowWriteStreamBuffering-Property is already set to false, but when I try to upload a file with a size of ~ 700 Mb, my PC freezes (I have 4 Gb RAM and I don't get an MemoryOutOfRangeException). Neither I am able to use the HttpClient-Class because I have to provide support for .NET Framework 4, nor I can chunk the upload because the target-servers do not support that kind of upload.
I think the memory problem is caused because the data I already sent (while uploading) is still allocated in the RAM.
These lines of code are responsible for that:
while ((bytesRead = fileStream.Read(fileBuffer, 0, fileBuffer.Length)) != 0)
{
requestStream.Write(fileBuffer, 0, (int)bytesRead);
}
How can I delete the data which is already sent but still using my memory?
If this isn't the cause of the problem, how can I solve it then?

I tried several Methods, and a kind of internal chunked Upload (but the Chunked-Property of the HttpWebRequest is false) works:
long bytesRead = 0;
long splitBytes = 1000000; // ≈ 1 Mb
int dataPacks = (int)Math.Ceiling((double)file.FileSize/splitBytes);
FileStream fileStream = new FileStream(file.Path, FileMode.Open, FileAccess.Read);
byte[] fileBuffer = new byte[splitBytes];
Stream requestStream = request.GetRequestStream();
requestStream.Write(postBuffer, 0, postBuffer.Length);
while (fileStream.Read(fileBuffer, 0, fileBuffer.Length) != 0)
{
if (bytesRead + splitBytes <= file.FileSize)
{
requestStream.Write(fileBuffer, 0, fileBuffer.Length);
fileStream.Flush();
bytesRead += splitBytes;
}
else
{
requestStream.Write(fileBuffer, 0, (int)(file.FileSize - bytesRead));
fileStream.Flush();
bytesRead += (file.FileSize - bytesRead);
}
}
It works even on servers which don't accept a real-chunked upload.

Related

Handling big file stream (read+write bytes)

The following code do :
Read all bytes from an input file
Keep only part of the file in outbytes
Write the extracted bytes in outputfile
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);
But there is a limitation of ~2GB data for each step.
Edit: The extracted bytes size can also be greater than 2GB.
How could I handle big file ? What is the best way to proceed with good performances, regardless of size ?
Thx !
Example to FileStream to take the middle 3 Gb out of a 5 Gb file:
byte[] buffer = new byte{1024*1024];
using(var readFS = File.Open(pathToBigFile))
using(var writeFS = File.OpenWrite(pathToNewFile))
{
readFS.Seek(1024*1024*1024); //seek to 1gb in
for(int i=0; i < 3000; i++){ //3000 times of one megabyte = 3gb
int bytesRead = readFS.Read(buffer, 0, buffer.Length);
writeFS.Write(buffer, 0, bytesRead);
}
}
It's not a production grade code; Read might not read a full megabyte so you'd end up with less than 3Gb - it's more to demonstrate the concept of using two filestreams and reading repeatedly from one and writing repeatedly to the other. I'm sure you can modify it so that it copies an exact number of bytes by keeping track of the total of all the bytesRead in the loop and stopping reading when you have read enough
It is better to stream the data from one file to the other, only loading small parts of it into memory:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
Usage:
CopyFileSection(sourcefile, outfile, offset, size);
This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.
Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.

WebRequest fails to download large files (~ 1 GB) properly

I am attempting to download a large file from a public URL. It seemed to work fine at first but 1 / 10 computers seem to timeout. My initial attempt was to use WebClient.DownloadFileAsync but because it would never complete I fell back to using WebRequest.Create and read the response streams directly.
My first version of using WebRequest.Create found the same problem as WebClient.DownloadFileAsync. The operation times out and the file does not complete.
My next version added retries if the download times out. Here is were it gets weird. The download does eventually finish with 1 retry to finish up the last 7092 bytes. So the file is downloaded with exactly the same size BUT the file is corrupt and differs from the source file. Now I would expect the corruption to be in the last 7092 bytes but this is not the case.
Using BeyondCompare I have found that there are 2 chunks of bytes missing from the corrupt file totalling up to the missing 7092 bytes! This missing bytes are at 1CA49FF0 and 1E31F380, way way before the download times out and is restarted.
What could possibly be going on here? Any hints on how to track down this problem further?
Here is the code in question.
public void DownloadFile(string sourceUri, string destinationPath)
{
//roughly based on: http://stackoverflow.com/questions/2269607/how-to-programmatically-download-a-large-file-in-c-sharp
//not using WebClient.DownloadFileAsync as it seems to stall out on large files rarely for unknown reasons.
using (var fileStream = File.Open(destinationPath, FileMode.Create, FileAccess.Write, FileShare.Read))
{
long totalBytesToReceive = 0;
long totalBytesReceived = 0;
int attemptCount = 0;
bool isFinished = false;
while (!isFinished)
{
attemptCount += 1;
if (attemptCount > 10)
{
throw new InvalidOperationException("Too many attempts to download. Aborting.");
}
try
{
var request = (HttpWebRequest)WebRequest.Create(sourceUri);
request.Proxy = null;//http://stackoverflow.com/questions/754333/why-is-this-webrequest-code-slow/935728#935728
_log.AddInformation("Request #{0}.", attemptCount);
//continue downloading from last attempt.
if (totalBytesReceived != 0)
{
_log.AddInformation("Request resuming with range: {0} , {1}", totalBytesReceived, totalBytesToReceive);
request.AddRange(totalBytesReceived, totalBytesToReceive);
}
using (var response = request.GetResponse())
{
_log.AddInformation("Received response. ContentLength={0} , ContentType={1}", response.ContentLength, response.ContentType);
if (totalBytesToReceive == 0)
{
totalBytesToReceive = response.ContentLength;
}
using (var responseStream = response.GetResponseStream())
{
_log.AddInformation("Beginning read of response stream.");
var buffer = new byte[4096];
int bytesRead = responseStream.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
fileStream.Write(buffer, 0, bytesRead);
totalBytesReceived += bytesRead;
bytesRead = responseStream.Read(buffer, 0, buffer.Length);
}
_log.AddInformation("Finished read of response stream.");
}
}
_log.AddInformation("Finished downloading file.");
isFinished = true;
}
catch (Exception ex)
{
_log.AddInformation("Response raised exception ({0}). {1}", ex.GetType(), ex.Message);
}
}
}
}
Here is the log output from the corrupt download:
Request #1.
Received response. ContentLength=939302925 , ContentType=application/zip
Beginning read of response stream.
Response raised exception (System.Net.WebException). The operation has timed out.
Request #2.
Request resuming with range: 939295833 , 939302925
Received response. ContentLength=7092 , ContentType=application/zip
Beginning read of response stream.
Finished read of response stream.
Finished downloading file.
this is the method I usually use, it hasn't failed me so far for the same kind of loading you need. Try using my code to change yours up a bit and see if that helps.
if (!Directory.Exists(localFolder))
{
Directory.CreateDirectory(localFolder);
}
try
{
HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(Path.Combine(uri, filename));
httpRequest.Method = "GET";
// if the URI doesn't exist, exception gets thrown here...
using (HttpWebResponse httpResponse = (HttpWebResponse)httpRequest.GetResponse())
{
using (Stream responseStream = httpResponse.GetResponseStream())
{
using (FileStream localFileStream =
new FileStream(Path.Combine(localFolder, filename), FileMode.Create))
{
var buffer = new byte[4096];
long totalBytesRead = 0;
int bytesRead;
while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
totalBytesRead += bytesRead;
localFileStream.Write(buffer, 0, bytesRead);
}
}
}
}
}
catch (Exception ex)
{
throw;
}
You should change the timeout settings. There seem to be two possible timeout issues:
Client-side timeout - try changing the timeouts in WebClient. I find for large file downloads sometimes I need to do that.
Server-side timeout - try changing the timeout on the server. You can validate this is the problem using another client, e.g. PostMan
For me your method on how to read the file by buffering looks very weird.
Maybe the problem is, that you do
while(bytesRead > 0)
What if, for some reason, the stream doesnt return any bytes at some point but it is still not yet finished downloading, then it would exit the loop and never come back. You should get the Content-Length, and increment a variable totalBytesReceived by bytesRead. Finally you change the loop to
while(totalBytesReceived < ContentLength)
Allocate buffer size bigger than expected file size .
byte[] byteBuffer = new byte[65536];
so that , if the file is 1GiB in size, you allocate a 1 GiB buffer, and then you try to fill the whole buffer in one call. This filling may return fewer bytes but you've still allocated the whole buffer. Note that the maximum length of a single array in .NET is a 32-bit number which means that even if you recompile your program for 64bit and actually have enough memory available.

File upload with c# and streaming

Looking at the different ways to upload a file in .NET, e.g. HttpPostedFile, and using a HttpHandler, I'm trying to understand how the process works in a bit more details.
Specifically how it writes the information to a file.
Say I have the following:
HttpPostedFile file = context.Request.Files[0];
file.SaveAs("c:\temp\file.zip");
The actual file does not get created until the full stream seems to be processed.
Similarly:
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
I would have thought that this would "progressively" write the file as it reads the stream. Looking at the filesystem, it does not seems to do this at all. If I breakpoint inside the while, it does though.
What I'm trying to do, is have it so you upload a file (using a javascript uploader), and poll alongside, whereby the polling ajax request tries to get the fileinfo(file size) of the uploaded file every second. However, it always returns 0 until the upload is complete.
Vimeo seems to be able to do this type of functionality (for IE)?? Is this a .NET limitation, or is there a way to progressively write the file from the stream?
Two points:
First, in Windows, the displayed size of a file is not updated constantly. The file might indeed be growing continually, but the size only increases once.
Second (more likely in this case), the stream might not be flushing to the disk. You could force it to by adding output.Flush() after the call to output.Write(). You might not want to do that, though, since it will probably have a negative impact on performance.
Perhaps you could poll the Length property of the output stream directly, instead of going through the file system.
EDIT:
To make the Length property of the stream accessible to other threads, you could have a field in your class and update it with each read/write:
private long _uploadedByteCount;
void SomeMethod()
{
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
Interlocked.Add(ref _uploadedByteCount, bytesRead);
}
}
}
public long GetUploadedByteCount()
{
return _uploadedByteCount;
}

CPU usage goes upto 75% while stream a 300 MB file using WCF service

I have a wcf service that is used to download files. Its working alright (finally), but i can see that when it downloads the CPU usage goes around 75%.
Please advise
Client Code
FileTransferServiceClient obj = new FileTransferServiceClient();
Byte[] buffer = new Byte[16 * 1024];
CoverScanZipRequest req = new CoverScanZipRequest(
new string[] { "1", "2" });
CoverScanZipResponse res = new CoverScanZipResponse();
res = obj.CoverScanZip(req);
int byteRead = res.CoverScanZipResult.Read(buffer, 0, buffer.Length);
Response.Buffer = false;
Response.ContentType = "application/zip";
Response.AddHeader("Content-disposition",
"attachment; filename=CoverScans.zip");
Stream outStream = Response.OutputStream;
while (byteRead > 0)
{
outStream.Write(buffer, 0, byteRead);
byteRead = res.CoverScanZipResult.Read(buffer, 0, buffer.Length);
}
res.CoverScanZipResult.Close();
outStream.Close();
In this line:
byteRead = res.CoverScanZipResult.Read(buffer, 0, buffer.Length);
Are you taking uncomressed data, zipping it on the fly. If so that is likely your problem. Compressing data can be quite CPU intensive. As a disagnostic test, try simply sending the raw data to the bowser and see if the CPU useage goes down. If you are zipping on the fly and sending the data uncompressed reduces the CPU load you have 2 realistic options.
Make sure you have enough server infrastructure to do this.
Zip your files off line so they can be queued that way multiple people accessing the service at the same time will not kill the server. You can then save the zip file in a temp folder and email the user a link or similar when it has been processed.

How do I use C# and ASP.net to proxy a WebRequest?

pretty much...i want to do something like this:
Stream Answer = WebResp.GetResponseStream();
Response.OutputStream = Answer;
Is this possible?
No, but you can of course copy the data, either synchronously or asynchronously.
Allocate a buffer (like 4kb in size or so)
Do a read, which will either return the number of bytes read or 0 if the end of the stream has been reached
If data was received, write the amount read and loop to the read
Like so:
using (Stream answer = webResp.GetResponseStream()) {
byte[] buffer = new byte[4096];
for (int read = answer.Read(buffer, 0, buffer.Length); read > 0; read = answer.Read(buffer, 0, buffer.Length)) {
Response.OutputStream.Write(buffer, 0, read);
}
}
This answer has a method CopyStream to copy data between streams (and also indicates the built-in way to do it in .NET 4).
You could do something like:
using (stream answer = WebResp.GetResponseStream())
{
CopyStream(answer, Response.OutputStream);
Response.Flush();
}

Categories