I'm currently building an application that is, among other things, going to download large files from a FTP server. Everything works fine for small files (< 50 MB) but the files I'm downloading are way bigger, mainly over 2 GB.
I've been trying with a Webclient using DownloadfileAsync() and a list system as I'm downloading these files one after the other due to their sizes.
DownloadClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(DownloadProgress);
DownloadClient.DownloadFileCompleted += new AsyncCompletedEventHandler(DownloadCompleted);
private void FileDownload()
{
DownloadClient.DownloadFileAsync(new Uri(#"ftp://" + RemoteAddress + FilesToDownload[0]), LocalDirectory + FilesToDownload[0]));
}
private void DownloadProgress(object sender, DownloadProgressChangedEventArgs e)
{
// Handle progress
}
private void DownloadCompleted(object sender, AsyncCompletedEventArgs e)
{
FilesToDownload.RemoveAt(0);
FileDownload();
}
It works absolutely fine this way on small files, they are all downloaded one by one, the progress is reported and DownloadCompleted fires after each file. This issue I'm facing with big files is that it launches the first download without any issue but doesn't do anything after that. The DownloadCompleted event never fires for some reasons. It looks like the WebClient doesn't know that the file has finished to download, which is an issue as I'm using this event to launch the next download in the FilesToDownload list.
I've also tried to do that synchronously using WebClient.DownloadFile and a for loop to cycle through my FilesToDownload list. It downloads the first file correctly and I get an exception when the second download should start: "The underlying connection was closed: An unexpected error occurred on a receive".
Finally, I've tried to go through this via FTP using edtFTPnet but I'm facing download speed issues (i.e. My download goes full speed with the WebClient and I just get 1/3 of the full speed with edtFTPnet library).
Any thoughts? I have to admit that I'm running out of ideas here.
public string GetRequest(Uri uri, int timeoutMilliseconds)
{
var request = System.Net.WebRequest.Create(uri);
request.Timeout = timeoutMilliseconds;
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
using (var reader = new System.IO.StreamReader(stream))
{
return reader.ReadToEnd();
}
}
Forgot to update this thread but I figured how to sort this out a while ago.
The issue was that the Data connection that is opened for a file transfer randomly times out for some reason or is closed by the server before the transfer ends. I haven't been able to figure out why however as there is a load of local and external network interfaces between my computer and the remote server. As it's totally random (i.e the transfer works fine for five files in a row, times out for one file, works fine for the following files etc), the issue may be server or network related.
I'm now catching any FTP exception raised by the FTP client object during the download and issue a REST command with an offset equals to the position in the data stream where the transfer stopped (i.e total bytes amount of the remote file - currently downloaded bytes amount). Doing so allows to get the remaining bytes that are missing in the local file.
Related
I have logic that downloads a group of files as a zip. The issue is there is no progress so the user does not know how far along the download is.
This Zip file doesn't exist before hand, the user selects the files they want to download and then I use the SharpZipLib nuget package to create a zip
and stream it to the response.
It seems I need to set the Content-Length header for the browser to show a total size progress indicator. The issue I'm having is it seems
this value has to be exact, if its too low or too high by 1 byte the file does not get downloaded properly. I can get an approximate
end value size by adding all the files size together and setting there to be no compressions level but I don't see a way I can calculate the final zip size exactly.
I hoped I could of just overesitmated the final size a bit and the browser would allow that but that doesn't work, the file isn't downloaded properly so you cant access it.
Here are some possible solution I've come up with but they have there own issues.
1 - I can create the zip on the server first and then stream it, therefore knowing the exact size I can set the Content-length. Issue with this
is the user will have to wait for all the files to be streamed to the web server, the zip to be created and then I can start streaming it to the user. While this is going on the user wont even see the file download as being started. This also results in more memory usage of the web server as it has to persist the entire zip file in memory.
2 - I can come up with my own progress UI, I will use the combined file sizes to get a rough final size estimation and then as the files are streamed I push updates to the user via signalR indicating the progress.
3- I show the user the total file size before download begins, this way they will at least have a way to assess themselves how far along it is. But the browser has no indication of how far along it is so if they may forget and when they look at the browser download progress there will be no indication how far along it is
These all have their own drawbacks. Is there a better way do this, ideally so its all handled by the browser?
Below is my ZipFilesToRepsonse method. It uses some objects that aren't shown here for simplicity sake. It also streams the files from azure blob storage
public void ZipFilesToResponse(HttpResponseBase response, IEnumerable<Tuple<string,string>> filePathNames, string zipFileName)
{
using (var zipOutputStream = new ZipOutputStream(response.OutputStream))
{
zipOutputStream.SetLevel(0); // 0 - store only to 9 - means best compression
response.BufferOutput = false;
response.AddHeader("Content-Disposition", "attachment; filename=" + zipFileName);
response.ContentType = "application/octet-stream";
Dictionary<string,long> sizeDictionary = new Dictionary<string, long>();
long totalSize = 0;
foreach (var file in filePathNames)
{
long size = GetBlobProperties(file.Item1).Length;
totalSize += size;
sizeDictionary.Add(file.Item1,size);
}
//Zip files breaks if we dont have exact content length
//and it isn't nesccarily the total lengths of the contents
//dont see a simple way to get it set correctly without downloading entire file to server first
//so for now we wont include a content length
//response.AddHeader("Content-Length",totalSize.ToString());
foreach (var file in filePathNames)
{
long size = sizeDictionary[file.Item1];
var entry = new ZipEntry(file.Item2)
{
DateTime = DateTime.Now,
Size = size
};
zipOutputStream.PutNextEntry(entry);
Container.GetBlockBlobReference(file.Item1).DownloadToStream(zipOutputStream);
response.Flush();
if (!response.IsClientConnected)
{
break;
}
}
zipOutputStream.Finish();
zipOutputStream.Close();
}
response.End();
}
This question already has answers here:
Download file from FTP with Progress - TotalBytesToReceive is always -1?
(3 answers)
Closed 4 years ago.
I have a ListBox that contains a list of DirectAdmin user backups. List is populated using WebRequestMethods.Ftp.ListDirectory and it looks like this:
I can download an archive using the button at the bottom right. When I click on the button, another form appears and downloads the archive.
My download code is this:
public static void DownloadFile(string server, string username, ...)
{
Uri URI = new Uri($"ftp://{server}/{targetFilePath}");
using (WebClient client = new WebClient())
{
client.Credentials = new NetworkCredential(username, password);
if (progress != null)
{
client.DownloadProgressChanged += new DownloadProgressChangedEventHandler(progress);
}
if (complete != null)
{
client.DownloadFileCompleted += new AsyncCompletedEventHandler(complete);
}
before?.Invoke();
client.DownloadFileAsync(URI, localFilePath);
}
}
and this is what I pass to the DownloadFile() method for the DownloadProgressChanged event:
delegate (object s2, DownloadProgressChangedEventArgs e2)
{
TransferLabel.Invoke((MethodInvoker)delegate
{
TransferLabel.Text = $"{(e2.BytesReceived / 1024).ToString()} KB / {(e2.TotalBytesToReceive / 1024).ToString()} KB";
});
TransferProgressBar.Invoke((MethodInvoker)delegate
{
TransferProgressBar.Value = (int)(e2.BytesReceived / (float)e2.TotalBytesToReceive * 100);
});
}
I'm using this same approach to upload a file and it works fine, but with download e2.TotalBytesToReceive returns -1 throughout the process:
and only when it's done, I get the correct value:
Why is that?
I've found a workaround to solve the problem. I'll change the ListBox to ListView and also store the filesize of the archives using ListDirectoryDetails. This way I can compare the e.BytesReceived to stored total bytes instead of e.TotalBytesToReceive. This would solve my problem, but I'm still curious about the problem. Why do I get -1? Am I doing something wrong, or is this a server related problem? Also is there anything I can do to fix it (get the correct value)?
With FTP protocol, WebClient in general does not know total download size. So you commonly get -1 with FTP.
See also Download file from FTP with Progress - TotalBytesToReceive is always -1?
Note that the behavior actually contradicts the .NET documentation, which says for FtpWebResponse.ContentLength (where the value of TotalBytesToReceive comes from):
For requests that use the DownloadFile method, the property is greater than zero if the downloaded file contained data and is zero if it was empty.
But you will easily find out many of questions about this (like the one I've linked above), effectively showing that the behavior is not always as documented. The FtpWebResponse.ContentLength has a meaningful value for GetFileSize method only.
The FtpWebRequest/WebClient makes no explicit attempt to find out a size of the file that it is downloading. All it does is that it tries to look for (xxx bytes). string in 125/150 responses to RETR command. No FTP RFC mandates that the server should include such information. ProFTPD (see data_pasv_open in src/data.c) and vsftpd (see handle_retr in postlogin.c) seem to include this information. Other common FTP servers (IIS, FileZilla) do not do this.
Certainly for HTTP downloads it's possible for the server not to supply size information when performing a file download and you're left with no sensible information until the server signals that it's done.
Not sure for FTP (I'd note that there's a separate SIZE command defined in the FTP command set and so including such information during a Retrieve may be considered redundant).
I'm slightly surprised that the documentation for TotalBytesToRetrieve isn't more explicit on the possibility that the information will not be available and what will be returned in such circumstances.
I created a service that moves certain file types in a directory to another one, this works fine locally and pretty fast over my network. On a different network though it works incredibly slow (a 500mb file takes 6 1/2 minutes) but the same file copied and pasted into the folder via explorer is complete in about 30/40 seconds.
Snippet where file move happens.
currentlyProcessing.Add(currentFile.FullName);
try
{
eventMsg("Attempting to move file","DEBUG");
File.Move(oldFilePath, newFilePath);
eventMsg("File Moved successfully","DEBUG");
}
catch (Exception ex)
{
eventMsg("Cannot Move File another resource is using it", "DEBUG");
eventMsg("Move File Exception : " + ex, "DEBUG");
}
finally
{
if(File.Exists(currentFile.FullName + ".RLK"))
{
try
{
File.Delete(currentFile.FullName + ".RLK");
}
catch (IOException e)
{
eventMsg("File Exception : " + e, "DEBUG");
}
}
currentlyProcessing.Remove(oldFilePath);
}
I fear the code is fine (as it works as expected on other network) so the problem is probably the network, in some shape or form. Has anyone got any common tips to check, the service runs as local system(or Network service) and there doesn't seem to be access problem. What other factors would affect this (other than network/hardware) .
What I would like is it to have a similar transfer speed to that I've witnessed in Explorer. Any pointers greatly appreciated .
First of all
Ping each other to check latency so you can see if it's the problem is globally in the network or is it something with your software.
cmd ping 192.168.1.12 -n 10
If it's a problem on the network follow this:
Restart your HUB / Router
Is one of the PCs use WiFi with low signal?
Is there any Anti Virus that is turned on and monitoring Network Activity ?
If none of the above solves your problem then try using WireShark to be able to further investigate the issue.
For files with such huge size, I would suggest zipping them up if possible, especially since network latency is always a big factor involved in upload/download of files over the internet.
You can zip your files with System.IO.Packaging. Have a look here Using System.IO.Packaging to generate a ZIP file , specifically:
using (Package zip = System.IO.Packaging.Package.Open(zipFilename, FileMode.OpenOrCreate))
{
string destFilename = ".\\" + Path.GetFileName(fileToAdd);
Uri uri = PackUriHelper.CreatePartUri(new Uri(destFilename, UriKind.Relative));
if (zip.PartExists(uri))
{
zip.DeletePart(uri);
}
PackagePart part = zip.CreatePart(uri, "",CompressionOption.Normal);
using (FileStream fileStream = new FileStream(fileToAdd, FileMode.Open, FileAccess.Read))
{
using (Stream dest = part.GetStream())
{
CopyStream(fileStream, dest);
}
}
}
Also, you can should FTP as some other user mentioned if possible. Look at the Renci SSHNet library on CodePlex.
I have implemented something similar to this
only real difference is
string filename = context.Request.RawUrl.Replace("/", "\\").Remove(0,1);
string path = Uri.UnescapeDataString(Path.Combine(_baseFolder, filename));
so that I can traverse to subdirectories. This works great for webpages and other text file types but when trying to serve up media content I get the exception
HttpListenerException: The I/O
operation has been aborted because of
either a thread exit or an application
request
Followed by
InvalidOperationException: Cannot close stream until all bytes are written.
In the using statement.
Any suggestions on how to handle this or stop these exceptions?
Thanks
I should mention that I am using Google Chrome for my browser (Google Chrome doesn't seem to care about the MIME types, when it sees audio it will try to use it like it's in a HTML5 player), but this is also applicable if you are trying to host media content in a page.
Anyways, I was inspecting my headers with fiddler and noticed that Chrome passes 3 requests to the server. I started playing with other browsers and noticed they did not do this, but depending on the browser and what I had hard coded as the MIME type I would either get a page of crazy text, or a download of the file.
On further inspection I noticed that chrome would first request the file. Then request the file again with a few different headers most notably the range header. The first one with byte=0- then the next with a different size depending on how large the file was (more than 3 requests can be made depending how large the file is).
So there was the problem. Chrome will first ask for the file. Once seeing the type it would send another request which seems to me looking for how large the file is (byte=0-) then another one asking for the second half of the file or something similar to allow for a sort of streaming experienced when using HTML5. I coded something quickly up to handle MIME types and threw a HTML5 page together with the audio component and found that other browsers also do this (except IE)
So here is a quick solution and I no longer get these errors
string range = context.Request.Headers["Range"];
int rangeBegin = 0;
int rangeEnd = msg.Length;
if (range != null)
{
string[] byteRange = range.Replace("bytes=", "").Split('-');
Int32.TryParse(byteRange[0], out rangeBegin);
if (byteRange.Length > 1 && !string.IsNullOrEmpty(byteRange[1]))
{
Int32.TryParse(byteRange[1], out rangeEnd);
}
}
context.Response.ContentLength64 = rangeEnd - rangeBegin;
using (Stream s = context.Response.OutputStream)
{
s.Write(msg, rangeBegin, rangeEnd - rangeBegin);
}
Try:
using (Stream s = context.Response.OutputStream)
{
s.Write(msg, 0, msg.Length);
s.Flush()
}
I am writing a simple web service using .NET, one method is used to send a chunk of a file from the client to the server, the server opens a temp file and appends this chunk. The files are quite large 80Mb, the net work IO seems fine, but the append write to the local file is slowing down progressively as the file gets larger.
The follow is the code that slows down, running on the server, where aFile is a string, and aData is a byte[]
using (StreamWriter lStream = new StreamWriter(aFile, true))
{
BinaryWriter lWriter = new BinaryWriter(lStream.BaseStream);
lWriter.Write(aData);
}
Debugging this process I can see that exiting the using statement is slower and slower.
If I run this code in a simple standalone test application the writes are the same speed every time about 3 ms, note the buffer (aData) is always the same side, about 0.5 Mb.
I have tried all sorts of experiments with different writers, system copies to append scratch files, all slow down when running under the web service.
Why is this happening? I suspect the web service is trying to cache access to local file system objects, how can I turn this off for specific files?
More information -
If I hard code the path the speed is fine, like so
using (StreamWriter lStream = new StreamWriter("c:\\test.dat", true))
{
BinaryWriter lWriter = new BinaryWriter(lStream.BaseStream);
lWriter.Write(aData);
}
But then it slow copying this scratch file to the final file destination later on -
File.Copy("c:\\test.dat", aFile);
If I use any varibale in the path it gets slow agin so for example -
using (StreamWriter lStream = new StreamWriter("c:\\test" + someVariable, true))
{
BinaryWriter lWriter = new BinaryWriter(lStream.BaseStream);
lWriter.Write(aData);
}
It has been commented that I should not use StreamWriter, note I tried many ways to open the file using FileStream, none of which made any change when the code is running under the web service, I tried WriteThrough etc.
Its the strangest thing I even tried this -
Write the data to file a.dat
Spawn system "cmd" "copy /b b.dat + a.dat b.dat"
Delete a.dat
This slows down the same way????
Makes me think the web server is running in some protected file IO environment catching all file operations in this process and child process, I can understand this if I was generating a file that might be later served to a client, but I am not, what I am doing is storing large binary blobs on disk, with a index/pointer to them stored in a database, if I comment out the write to the file the whole process fly's no performance issues at all.
I started reading about web server caching strategies, makes me think is there a web.config setting to mark a folder as uncached? Or am I completely barking up the wrong tree.
A long shot: is it possible that you need close some resources when you have finished?
If the file is binary, then why are you using a StreamWriter, which is derived from TextWriter? Just use a FileStream.
Also, BinaryWriter implements IDisposable, You need to put it into a using block.
Update....I replicated the basic code, no database, simple and it seems to work fine, so I suspect there is another reason, I will rest on it over the weekend....
Here is the replicated server code -
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Services;
using System.IO;
namespace TestWS
{
/// <summary>
/// Summary description for Service1
/// </summary>
[WebService(Namespace = "http://tempuri.org/")]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
[System.ComponentModel.ToolboxItem(false)]
// To allow this Web Service to be called from script, using ASP.NET AJAX, uncomment the following line.
// [System.Web.Script.Services.ScriptService]
public class Service1 : System.Web.Services.WebService
{
private string GetFileName ()
{
if (File.Exists("index.dat"))
{
using (StreamReader lReader = new StreamReader("index.dat"))
{
return lReader.ReadLine();
}
}
else
{
using (StreamWriter lWriter = new StreamWriter("index.dat"))
{
string lFileName = Path.GetRandomFileName();
lWriter.Write(lFileName);
return lFileName;
}
}
}
[WebMethod]
public string WriteChunk(byte[] aData)
{
Directory.SetCurrentDirectory(Server.MapPath("Data"));
DateTime lStart = DateTime.Now;
using (FileStream lStream = new FileStream(GetFileName(), FileMode.Append))
{
BinaryWriter lWriter = new BinaryWriter(lStream);
lWriter.Write(aData);
}
DateTime lEnd = DateTime.Now;
return lEnd.Subtract(lStart).TotalMilliseconds.ToString();
}
}
}
And the replicated client code -
static void Main(string[] args)
{
Service1 s = new Service1();
byte[] b = new byte[1024 * 512];
for ( int i = 0 ; i < 160 ; i ++ )
{
Console.WriteLine(s.WriteChunk(b));
}
}
Based on your code, it appears you're using the default handling inside of StreamWriter for files, which means synchronous and exclusive locks on the file.
Based on your comments, it seems the issue you really want to solve is the return time from the web service -- not necessarily the write time for the file. While the write time is the current gating factor as you've discovered, you might be able to get around your issue by going to an asynchronous-write mode.
Alternatively, I prefer completely de-coupled asynchronous operations. In that scenario, the inbound byte[] of data would be saved to its own file (or some other structure), then appended to the master file by a secondary process. More complex for operation, but also less prone to failure.
I don't have enough points to vote up an answer, but jro has the right idea. We do something similar in our service; each chunk is saved to a single temp file, then as soon as all chunks are received they're reassembled into a single file.
I'm not certain on the underlying processes for appending data to a file using StreamWriter, but I would assume it would have to at least read to the end of the current file before attempting to write whatever is in the buffer to it. So as the file gets larger it would have to read more and more of the existing file before writing the next chunk.
Well I found the root cause, "Microsoft Forefront Security", group policy has this running real time scanning, I could see the process goto 30% CPU usage when I close the file, killing this process and everything works the same speed, outside and inside the web service!
Next task find a way to add an exclusion to MFS!