FTP download speed issue: .NET socket programming vs using FtpWebRequest/Response objects - c#

I'm trying to write a simple c# application which downloads a large number of small files from an FTP server.
I've tried two approaches:
1 - generic socket programming
2 - using FtpWebRequest and FtpWebResponse objects
The download speed (for the same file) when using the first approach varies from 1.5s to 7s, the 2nd gives more less the same results - about 2.5s each time.
Considering that about 1.4s out of those 2.5s takes the process of initiating the FtpWebRequest object (only 1.1s for receiving data) the difference is quite significant.
The question is how to achieve for the 1st approach the same good stable download speed as for the 2nd one?
For the 1st approach the problem seems to lay in the loop below (as it takes about 90% of the download time):
Int32 intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);
while (intResponseLength != 0)
{
localFile.Write(buffer, 0, intResponseLength);
intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);
}
Equivalent part of code for the 2nd approach (always takes about 1.1s for particular file):
Int32 intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);
while (intResponseLength != 0)
{
localFile.Write (buffer, 0, intResponseLength);
intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);
}
I've tried buffers from 56b to 32kB - no significant difference.
Also creating a stream on the open data socket:
Stream str = new NetworkStream(dataSocket);
and reading it (instead of using dataSocket.Receive)
str.Read(buffer, 0, intBufferSize);
doesn't help... in fact it's even slower.
Thanks in advance for any suggestion!

You need to use Socket.Poll or Socket.Select methods to check availability of data. What you do not only slows down operation, but also causes extensive CPU load. Poll or Select will yield processor time until data is available or timeout elapses. You can keep the same loop but include a call to one of the above methods, and play with timeouts (try values from 10 ms to 500 ms to find timeout, optimal for your task).

Related

Bandwidth throttling while copying files between computers

I've been trying to make a program to transfer a file with bandwidth throttling (after zipping it) to another computer on the same network.
I need to get its bandwidth throttled in order to avoid saturation (Kind of the way Robocopy does).
Recently, I found the ThrottledStream class, but It doesn't seem to be working, since I can send a 9MB with a limitation of 1 byte throttling and it still arrives almost instantly, so I need to know if there's some misapplication of the class.
Here's the code:
using (FileStream originStream = inFile.OpenRead())
using (MemoryStream compressedFile = new MemoryStream())
using (GZipStream zippingStream = new GZipStream(compressedFile, CompressionMode.Compress))
{
originStream.CopyTo(zippingStream);
using (FileStream finalDestination = File.Create(destination.FullName + "\\" + inFile.Name + ".gz"))
{
ThrottledStream destinationStream = new ThrottledStream(finalDestination, bpsLimit);
byte[] buffer = new byte[bufferSize];
int readCount = compressedFile.Read(buffer,0,bufferSize);
while(readCount > 0)
{
destinationStream.Write(buffer, 0, bufferSize);
readCount = compressedFile.Read(buffer, 0, bufferSize);
}
}
}
Any help would be appreciated.
The ThrottledStream class you linked to uses a delay calculation to determine how long to wait before perform the current write. This delay is based on the amount of data sent before the current write, and how much time has elapsed. Once the delay period has passed it writes the entire buffer in a single chunk.
The problem with this is that it doesn't do any checks on the size of the buffer being written in a particular write operation. If you ask it to limit throughput to 1 byte per second, then call the Write method with a 20MB buffer, it will write the entire 20MB immediately. If you then try to write another block of data that is 2 bytes long, it will wait for a very long time (20*2^20 seconds) before writing those two bytes.
In order to get the ThrottledStream class to work more smoothly, you have to call Write with very small blocks of data. Each block will still be written immediately, but the delays between the write operations will be smaller and the throughput will be much more even.
In your code you use a variable named bufferSize to determine the number of bytes to process per read/write in the internal loop. Try setting bufferSize to 256, which will result in many more reads and writes, but will give the ThrottledStream a chance to actually introduce some delays.
If you set bufferSize to be the same as bpsLimit you should see a single write operation complete every second. The smaller you set bufferSize the more write operations you'll get per second, the smoother the bandwidth throttling will work.
Normally we like to process as much of a buffer as possible in each operation to decrease the overheads, but in this case you're explicitly trying to add overheads to slow things down :)

C# huge memory usage when sending file, System.OutOfMemoryException

I am working at a program that sends/receives files over the network using the TCP.
The program sends multiple files, so the stream is not close until the user quits the program.
The problem that i am facing is that, when i am sending a 700mb file, my server program private memory grows to 700,000 K and cripples my computer performance badly. And when trying to send another 700mb file the server throws an System.OutOfMemoryException.
Can someone tell me what i am doing wrong, or not doing ?
Server-side code:
using ( FileStream fs = new FileStream("dracula.avi", FileMode.Open, FileAccess.Read))
{
byte[] data = new byte[fs.Length];
int remaining = data.Length;
int offset = 0;
strWriter.WriteLine("Content-Length: " + data.Length);
strWriter.Flush();
Thread.Sleep(1000);
while (remaining > 0)
{
Thread.Sleep(10);
int read = fs.Read(data, offset, remaining);
remaining -= read;
offset += read;
}
fs.Flush();
fs.Close();
}
strm.Write(data, 0, data.Length);
strm.Flush();
GC.Collect();
You're currently reading the whole file into memory, even though you only want to copy it to another stream. Don't do that. Just iterate a chunk at a time: read a chunk, write a chunk, read a chunk, write a chunk, etc. If you're using .NET 4, you can use Stream.CopyTo for that purpose.
You're buffering your reads, but not your writes. The program is doing exactly what you're telling it to -- allocating a gigantic chunk or memory and filling it all before ever sending a single byte.
A much better approach is to read a small chunk from the file (for the sake of argument, 4096 bytes) and then write the chunk to the output stream. By doing this, you'll only use 4096 bytes per connection which is much more scalable.
An OOM condition generally occurs when you are either running out of system memory, or in a 32 bit process you are out of address space(2000MB).
You say it can successfully copy one but not two? Is that two concurrently or consecutively? What is your threading model? Also, the example is a snippet, you seem to have a StreamWriter and a Stream for writing, are these objects going away?
Be careful with GC.Collect. Microsoft doesn't recommend explicit calls because if you don't use it correctly it can cause objects to stay alive longer than needed. This is because when you do a GC.Collect, you are promoting objects to a higher generation. In my experience it is best to make sure you are releasing objects and let the framework decide what/when to GC.Collect.
I would get familiar with WinDBG+SOS, this allows you to look at the objects on the heap.
Try this:
Startup WinDBG and attach to your process
Type ".loadby sos clr" if using 4.0, otherwise type ".loadby sos mscorwks"
Press F5 to continue
Copy one file, wait for it to complete
Press CTRL+BREAK
Type "!dumpheap -stat", look at the results, look for objects that should be gone
For-Each object that should be gone, grab the MT value
Type "!dumpheap -mt {0}" replacing {0} with the value from step above
This is a list of instances, grab one of the objects addresses
Type "!gcroot {0}" replacing {0} with the objects address
This should tell you what is rooting the objects, you then need to find out how to unroot, e.g. null objects that aren't needed.
Better send the data chunks as soon as you read them. I didn't test the code, but it should be similar to something like;
var bufferLenght = 1024;
byte[] buffer;
while (remaining > 0)
{
buffer = new byte[1024];
int len = fs.Read(buffer, offset, bufferLenght);
remaining -= len;
offset += len;
strm.Write(buffer, 0, len);
}

HttpWebRequest Grinding to a halt, possibly just due to page size

I have WPF app that processes a lot of urls (thousands), each it sends off to it's own thread, does some processing and stores a result in the database.
The urls can be anything, but some seem to be massively big pages, this seems to shoot the memory usage up a lot and make performance really bad. I set a timeout on the web request, so if it took longer than say 20 seconds it doesn't bother with that url, but it seems to not make much difference.
Here's the code section:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(urlAddress.Address);
req.Timeout = 20000;
req.ReadWriteTimeout = 20000;
req.Method = "GET";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
pageSource = reader.ReadToEnd();
req = null;
}
It also seems to stall/ramp up memory on reader.ReadToEnd();
I would have thought having a cut off of 20 seconds would help, is there a better method? I assume there's not much advantage to using asynch web method as each url download is on its own thread anyway..
Thanks
In general, it's recommended that you use asynchronous HttpWebRequests instead of creating your own threads. The article I've linked above also includes some benchmarking results.
I don't know what you're doing with the page source after you read the stream to end, but using string can be an issue:
System.String type is used in any .NET application. We have strings
as: names, addresses, descriptions, error messages, warnings or even
application settings. Each application has to create, compare or
format string data. Considering the immutability and the fact that any
object can be converted to a string, all the available memory can be
swallowed by a huge amount of unwanted string duplicates or unclaimed
string objects.
Some other suggestions:
Do you have any firewall restrictions? I've seen a lot of issues at work where the firewall enables rate limiting and fetching pages grinds down to a halt (happens to me all the time)!
I presume that you're going to use the string to parse HTML, so I would recommend that you initialize your parser with the Stream instead of passing in a string containing the page source (if that's an option).
If you're storing the page source in the database, then there isn't much you can do.
Try to eliminate the reading of the page source as a potential contributor to the memory/performance problem by commenting it out.
Use a streaming HTML parser such as Majestic 12- avoids the need to load the entire page source into memory (again, if you need to parse)!
Limit the size of the pages you're going to download, say, only download 150KB. The average page size is about 100KB-130KB
Additionally, can you tell us what's your initial rate of fetching pages and what does it go down to? Are you seeing any errors/exceptions from the web request as you're fetching pages?
Update
In the comment section I noticed that you're creating thousands of threads and I would say that you don't need to do that. Start with a small number of threads and keep increasing them until you peek the performance on your system. Once you start adding threads and the performance looks like it's tapered off, then sop adding threads. I can't imagine that you will need more than 128 threads (even that seems high). Create a fixed number of threads, e.g. 64, let each thread take a URL from your queue, fetch the page, process it and then go back to getting pages from the queue again.
You could enumerate with a buffer instead of calling ReadToEnd, and if it is taking too long, then you could log and abandon - something like:
static void Main(string[] args)
{
Uri largeUri = new Uri("http://www.rfkbau.de/index.php?option=com_easybook&Itemid=22&startpage=7096");
DateTime start = DateTime.Now;
int timeoutSeconds = 10;
foreach (var s in ReadLargePage(largeUri))
{
if ((DateTime.Now - start).TotalSeconds > timeoutSeconds)
{
Console.WriteLine("Stopping - this is taking too long.");
break;
}
}
}
static IEnumerable<string> ReadLargePage(Uri uri)
{
int bufferSize = 8192;
int readCount;
Char[] readBuffer = new Char[bufferSize];
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
readCount = stream.Read(readBuffer, 0, bufferSize);
while (readCount > 0)
{
yield return new string(readBuffer, 0, bufferSize);
readCount = stream.Read(readBuffer, 0, bufferSize);
}
}
}
Lirik has really good summary.
I would add that if I were implementing this, I would make a separate process that reads the pages. So, it would be a pipeline. First stage would download the URL and write it to a disk location. And then queue that file to the next stage. Next stage reads from the disk and does the parsing & DB updates. That way you will get max throughput on the download and parsing as well. You can also tune your threadpools so that you have more workers parsing, etc. This architecture also lends very well to distributed processing where you can have one machine downloading, and another host parsing/etc.
Another thing to note is that if you are hitting the same server from multiple threads (even if you are using Async) then you will hit yourself against the max outgoing connection limit. You can throttle yourself to stay below that, or increase the connection limit on the ServicePointManager class.

How to make my application copy file faster

I have create windows application that routine download file from load balance server, currently the speed is about 30MB/second. However I try to use FastCopy or TeraCopy it can copy at about 100MB/second. I want to know how to improve my copy speed to make it can copy file faster than currently.
One common mistake when using streams is to copy a byte at a time, or to use a small buffer. Most of the time it takes to write data to disk is spent seeking, so using a larger buffer will reduce your average seek time per byte.
Operating systems write files to disk in clusters. This means that when you write a single byte to disk Windows will actually write a block between 512 bytes and 64 kb in size. You can get much better disk performance by using a buffer that is an integer multiple of 64kb.
Additionally, you can get a boost from using a buffer that is a multiple of your CPUs underlying memory page size. For x86/x64 machines this can be set to either 4kb or 4mb.
So you want to use an integer multiple of 4mb.
Additionally if you use asynchronous IO you can fully take advantage of the large buffer size.
class Downloader
{
const int size = 4096 * 1024;
ManualResetEvent done = new ManualResetEvent(false);
Socket socket;
Stream stream;
void InternalWrite(IAsyncResult ar)
{
var read = socket.EndReceive(ar);
if (read == size)
InternalRead();
stream.Write((byte[])ar.AsyncState, 0, read);
if (read != size)
done.Set();
}
void InternalRead()
{
var buffer = new byte[size];
socket.BeginReceive(buffer, 0, size, System.Net.Sockets.SocketFlags.None, InternalWrite, buffer);
}
public bool Save(Socket socket, Stream stream)
{
this.socket = socket;
this.stream = stream;
InternalRead();
return done.WaitOne();
}
}
bool Save(System.Net.Sockets.Socket socket, string filename)
{
using (var stream = File.OpenWrite(filename))
{
var downloader = new Downloader();
return downloader.Save(socket, stream);
}
}
Possibly your application can do multi-threading to get the file using multiple threads, however the bandwidth is limited to the speed of the devices that transfer the content
Simplest way is to open the file in raw/binary mode (thats C speak not sure waht the C# equivalent is) and read and write very large blocks (several MB) at a time.
The trick TeraCopy uses is to make the reading and writing asynchronous. This means that a block of data can be written while another one is being read.
You have to fiddle around with the number of blocks and the size of those blocks to get the optimum for your situation. I used this method using C++ and for us the optimum was using four blocks of 256KB when copying from a network share to a local disk.
Regards,
Sebastiaan
If you run Process Monitor you can see the block sizes that Windows Explorer or TeraCopy are using.
In Vista the default block size for the local network is afair 2 MB, which makes copying files over a huge pipe a lot faster.
Why reinvent the wheel?
If your situation permits, you are probably better off shelling out to one of the existing "fast" copy utilities than trying to write one yourself. There are numerous non-obvious edge-cases which need to be handled, and getting consistently good perf requires lots of trial-end-error experimentation.

How to send files over tcp with TcpListener/Client? SocketException problem

I'm developing a simple application to send files over TCP using the TCPListener and TCPClient classes. Here's the code that sends the file.
Stop is a volatile boolean which helps stopping the process at any time and WRITE_BUFFER_SIZE might be changed in runtime (another volatile)
while (remaining > 0 && !stop)
{
DateTime current = DateTime.Now;
int bufferSize = WRITTE_BUFFER_SIZE;
buffer = new byte[bufferSize];
int readed = fileStream.Read(buffer, 0, bufferSize);
stream.Write(buffer, 0, readed);
stream.Flush();
remaining -= readed;
// Wait in order to guarantee send speed
TimeSpan difference = DateTime.Now.Subtract(current);
double seconds = (bufferSize / Speed);
int wait = (int)Math.Floor(seconds * 1000);
wait -= difference.Milliseconds;
if (wait > 10)
Thread.Sleep(wait);
}
stream.Close();
and this is the code that handles the receiver side:
do
{
readed = stream.Read(buffer, 0, READ_BUFFER_SIZE);
// write to .part file and flush to disk
outputStream.Write(buffer, 0, readed);
outputStream.Flush();
offset += readed;
} while (!stop && readed > 0);
Now, when the speed is low (about 5KBps) everything works ok but, as I increase the speed the receiver size becomes more prone to raise a SocketException when reading from the stream. I'm guessing it has to do with the remote socket being closed before all data can be read, but What's the correct way to do this? When should I close the sending client?
I haven't found any good examples of file transmission on google, and the ones that I've found have a similar implementation of what I'm doing so I guess I'm missing something.
Edit: I get this error "Unable to read data from the transport connection". This is an IOException whose inner exception is a SocketException.
I've added this in the sender function, still I get the same error, the code never reaches the stream.close() and of course the tcpclient never really get closed... so I'm completely lost now.
buffer = new byte[1];
client.Client.Receive(buffer);
stream.Close();
Typically you want to set the LINGER option on the socket. Under C++ this would be SO_LINGER, but under windows this doesn't actually work as expected. You really want to do this:
Finish sending data.
Call shutdown() with the how parameter set to 1.
Loop on recv() until it returns 0.
Call closesocket().
Taken from: http://tangentsoft.net/wskfaq/newbie.html#howclose
C# sharp may have corrected this in its libraries, but I doubt it since they are built on top of the winsock API.
Edit:
Looking at your code in more detail. I see that you are sending no header across at all, so on the receiving side you have no idea of how many bytes you are actually supposed to read. Knowing the number of bytes to read of the socket makes this a much easier problem to debug. Keep in mind that shutting down the socket can still snip of the last bit of data if you don't close it properly.
Additionally having your buffer size be volatile is not thread safe and really doesn't buy you anything. Using stop as a volatile is safe, but don't expect it to be instant. In other words the loop could run several more times before it gets the updated value of stop. This is especially true on multiprocessor machines.
Edit_02:
For the TCPClientClass you want to do the following (as far as I can tell without having access to a C# at the moment).
// write all the bytes
// Then do the following
client.client.Shutdown(Shutdown.Send) // This assumes you have access to this protected member
while (stream.read(buffer, 0, READ_BUFFER_SIZE) != 0);
client.close()

Categories