Bandwidth throttling while copying files between computers - c#

I've been trying to make a program to transfer a file with bandwidth throttling (after zipping it) to another computer on the same network.
I need to get its bandwidth throttled in order to avoid saturation (Kind of the way Robocopy does).
Recently, I found the ThrottledStream class, but It doesn't seem to be working, since I can send a 9MB with a limitation of 1 byte throttling and it still arrives almost instantly, so I need to know if there's some misapplication of the class.
Here's the code:
using (FileStream originStream = inFile.OpenRead())
using (MemoryStream compressedFile = new MemoryStream())
using (GZipStream zippingStream = new GZipStream(compressedFile, CompressionMode.Compress))
{
originStream.CopyTo(zippingStream);
using (FileStream finalDestination = File.Create(destination.FullName + "\\" + inFile.Name + ".gz"))
{
ThrottledStream destinationStream = new ThrottledStream(finalDestination, bpsLimit);
byte[] buffer = new byte[bufferSize];
int readCount = compressedFile.Read(buffer,0,bufferSize);
while(readCount > 0)
{
destinationStream.Write(buffer, 0, bufferSize);
readCount = compressedFile.Read(buffer, 0, bufferSize);
}
}
}
Any help would be appreciated.

The ThrottledStream class you linked to uses a delay calculation to determine how long to wait before perform the current write. This delay is based on the amount of data sent before the current write, and how much time has elapsed. Once the delay period has passed it writes the entire buffer in a single chunk.
The problem with this is that it doesn't do any checks on the size of the buffer being written in a particular write operation. If you ask it to limit throughput to 1 byte per second, then call the Write method with a 20MB buffer, it will write the entire 20MB immediately. If you then try to write another block of data that is 2 bytes long, it will wait for a very long time (20*2^20 seconds) before writing those two bytes.
In order to get the ThrottledStream class to work more smoothly, you have to call Write with very small blocks of data. Each block will still be written immediately, but the delays between the write operations will be smaller and the throughput will be much more even.
In your code you use a variable named bufferSize to determine the number of bytes to process per read/write in the internal loop. Try setting bufferSize to 256, which will result in many more reads and writes, but will give the ThrottledStream a chance to actually introduce some delays.
If you set bufferSize to be the same as bpsLimit you should see a single write operation complete every second. The smaller you set bufferSize the more write operations you'll get per second, the smoother the bandwidth throttling will work.
Normally we like to process as much of a buffer as possible in each operation to decrease the overheads, but in this case you're explicitly trying to add overheads to slow things down :)

Related

CopyToAsync weird behaviour when used from multiple threads

I have the following function to write to a file asynchronously from multiple threads in parallel->
static startOffset = 0; // This variable will store the offset at which the thread begins to write
static int blockSize = 10; // size of block written by each thread
static Task<long> WriteToFile(Stream dataToWrite)
{
var startOffset= getStartfOffset(); // Definition of this function is given later
using(var fs = new FileStream(fileName,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.ReadWrite))
{
fs.Seek(offset,SeekOrigin.Begin);
await dataToWrite.CopyToAsync(fs);
}
return startOffset;
}
/**
*I use reader writer lock here so that only one thread can access the value of the startOffset at
*a time
*/
static int getStartOffset()
{
int result = 0;
try
{
rwl.AcquireWriterLock();
result = startOffset;
startOffset+=blockSize; // increment the startOffset for the next thread
}
finally
{
rwl.ReleaseWriterLock();
}
return result;
}
I then access the above function using to write some strings from multiple threads.
var tasks = List<Task>();
for(int i=1;i<=4;i++)
{
tasks.Add(Task.Run( async() => {
String s = "aaaaaaaaaa"
byte[] buffer = new byte [10];
buffer = Encoding.Default.GetBytes(s);
Stream data = new MemoryStream(buffer);
long offset = await WriteToFile(data);
Console.WriteLine($"Data written at offset - {offset}");
});
}
Task.WaitAll(tasks.ToArray());
Now , this code executes well most of the times. But sometimes randomly, it write some Japanese characters or some other symbols in the file. Is there something that I am doing wrong in the multithreading?
Your calculation of startOffset assumes that each thread is writing exactly 10 bytes. There are several issues with this.
One, the data has unknown length:
byte[] buffer = new byte [10];
buffer = Encoding.Default.GetBytes(s);
The assignment doesn't put data into the newly allocated 10 byte array, it leaks the new byte[10] array (which will be garbage collected) and stores a reference to the return of GetBytes(s), which could have any length at all. It could overflow into the next Task's area. Or it could leave some content that existed in the file beforehand (you use OpenOrCreate) which lies in the area for the current Task, but past the end of the actual dataToWrite.
Two, you try to seek past the areas that other threads are expected to write to, but if those writes haven't completed, they haven't increased the file length. So you attempt to seek past the end of the file, which is allowed for the Windows API but might cause problems with the .NET wrappers. However, FileStream.Seek does indicate you are ok
When you seek beyond the length of the file, the file size grows
although this might not be precisely correct, since the Windows API says
It is not an error to set a file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, which results in the intervening bytes uninitialized.
I think that asynchronous file I/O is not usually meant to be utilized with multithreading. Just because something is asynchronous does not mean that an operation should have multiple threads assigned to it.
To quote the documentation for async file I/O: Asynchronous operations enable you to perform resource-intensive I/O operations without blocking the main thread. Basically, instead of using a bunch of threads on one operation, it dispatches a new thread to accomplish a less meaningful task. Eventually with a big enough application, nearly everything can be abstracted to be a not-so-meaningful task and computers can run massive apps pretty quickly utilizing multithreading.
What you are likely experiencing is undefined behavior due to multiple threads overwriting the same location in memory. These Japanese characters you are referring to are likely malformed ascii/unicode that your text editor is attempting to interpret.
If you would like to remedy the undefined behavior and remain using asynchronous operations, you should be able to await each individual task before the next one can start. This will prevent the offset variable from being in the incorrect position for the newest task. Although, logically it will run the same as a synchronous version.

How to use ReadAsync() on a network stream in combination with processing?

I am trying to download from a server a large file, about 500 Mb, but instead of saving that file to the filesystem, I am trying to process it "on the fly", retrieving some chunks of data, analysing them and, when there is enough information, saving them to the database. Here is what I am trying to do:
byte[] buffer = new byte[64 * 1024];
using (HttpResponseMessage response = await httpClient.GetAsync(Server + file, HttpCompletionOption.ResponseHeadersRead))
using (Stream streamToReadFrom = await response.Content.ReadAsStreamAsync())
{
int wereRead;
do
{
wereRead = await streamToReadFrom.ReadAsync(buffer, 0, buffer.Length);
// Do the processing and saving
} while (wereRead == buffer.Length);
I tried to use the buffer of 64k as the chunks of data I need to process are about that size. My reasoning was that since I am 'awaiting' on ReadAsync, the method call will not return until the buffer is full but that is not the case. The method was returning with only 7k to 14k bytes read. I tried to use a much smaller buffer, but anyway the speed of my processing is much higher than the speed of the download, so with a 4k buffer I might have a full buffer on the first iteration but only, say, 3k on the second iteration.
Is there an approach that would be recommended in my situation? Basically, I want ReadAsync to only return once the buffer is full, or once the end of the stream is reached.

C# huge memory usage when sending file, System.OutOfMemoryException

I am working at a program that sends/receives files over the network using the TCP.
The program sends multiple files, so the stream is not close until the user quits the program.
The problem that i am facing is that, when i am sending a 700mb file, my server program private memory grows to 700,000 K and cripples my computer performance badly. And when trying to send another 700mb file the server throws an System.OutOfMemoryException.
Can someone tell me what i am doing wrong, or not doing ?
Server-side code:
using ( FileStream fs = new FileStream("dracula.avi", FileMode.Open, FileAccess.Read))
{
byte[] data = new byte[fs.Length];
int remaining = data.Length;
int offset = 0;
strWriter.WriteLine("Content-Length: " + data.Length);
strWriter.Flush();
Thread.Sleep(1000);
while (remaining > 0)
{
Thread.Sleep(10);
int read = fs.Read(data, offset, remaining);
remaining -= read;
offset += read;
}
fs.Flush();
fs.Close();
}
strm.Write(data, 0, data.Length);
strm.Flush();
GC.Collect();
You're currently reading the whole file into memory, even though you only want to copy it to another stream. Don't do that. Just iterate a chunk at a time: read a chunk, write a chunk, read a chunk, write a chunk, etc. If you're using .NET 4, you can use Stream.CopyTo for that purpose.
You're buffering your reads, but not your writes. The program is doing exactly what you're telling it to -- allocating a gigantic chunk or memory and filling it all before ever sending a single byte.
A much better approach is to read a small chunk from the file (for the sake of argument, 4096 bytes) and then write the chunk to the output stream. By doing this, you'll only use 4096 bytes per connection which is much more scalable.
An OOM condition generally occurs when you are either running out of system memory, or in a 32 bit process you are out of address space(2000MB).
You say it can successfully copy one but not two? Is that two concurrently or consecutively? What is your threading model? Also, the example is a snippet, you seem to have a StreamWriter and a Stream for writing, are these objects going away?
Be careful with GC.Collect. Microsoft doesn't recommend explicit calls because if you don't use it correctly it can cause objects to stay alive longer than needed. This is because when you do a GC.Collect, you are promoting objects to a higher generation. In my experience it is best to make sure you are releasing objects and let the framework decide what/when to GC.Collect.
I would get familiar with WinDBG+SOS, this allows you to look at the objects on the heap.
Try this:
Startup WinDBG and attach to your process
Type ".loadby sos clr" if using 4.0, otherwise type ".loadby sos mscorwks"
Press F5 to continue
Copy one file, wait for it to complete
Press CTRL+BREAK
Type "!dumpheap -stat", look at the results, look for objects that should be gone
For-Each object that should be gone, grab the MT value
Type "!dumpheap -mt {0}" replacing {0} with the value from step above
This is a list of instances, grab one of the objects addresses
Type "!gcroot {0}" replacing {0} with the objects address
This should tell you what is rooting the objects, you then need to find out how to unroot, e.g. null objects that aren't needed.
Better send the data chunks as soon as you read them. I didn't test the code, but it should be similar to something like;
var bufferLenght = 1024;
byte[] buffer;
while (remaining > 0)
{
buffer = new byte[1024];
int len = fs.Read(buffer, offset, bufferLenght);
remaining -= len;
offset += len;
strm.Write(buffer, 0, len);
}

FTP download speed issue: .NET socket programming vs using FtpWebRequest/Response objects

I'm trying to write a simple c# application which downloads a large number of small files from an FTP server.
I've tried two approaches:
1 - generic socket programming
2 - using FtpWebRequest and FtpWebResponse objects
The download speed (for the same file) when using the first approach varies from 1.5s to 7s, the 2nd gives more less the same results - about 2.5s each time.
Considering that about 1.4s out of those 2.5s takes the process of initiating the FtpWebRequest object (only 1.1s for receiving data) the difference is quite significant.
The question is how to achieve for the 1st approach the same good stable download speed as for the 2nd one?
For the 1st approach the problem seems to lay in the loop below (as it takes about 90% of the download time):
Int32 intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);
while (intResponseLength != 0)
{
localFile.Write(buffer, 0, intResponseLength);
intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);
}
Equivalent part of code for the 2nd approach (always takes about 1.1s for particular file):
Int32 intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);
while (intResponseLength != 0)
{
localFile.Write (buffer, 0, intResponseLength);
intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);
}
I've tried buffers from 56b to 32kB - no significant difference.
Also creating a stream on the open data socket:
Stream str = new NetworkStream(dataSocket);
and reading it (instead of using dataSocket.Receive)
str.Read(buffer, 0, intBufferSize);
doesn't help... in fact it's even slower.
Thanks in advance for any suggestion!
You need to use Socket.Poll or Socket.Select methods to check availability of data. What you do not only slows down operation, but also causes extensive CPU load. Poll or Select will yield processor time until data is available or timeout elapses. You can keep the same loop but include a call to one of the above methods, and play with timeouts (try values from 10 ms to 500 ms to find timeout, optimal for your task).

How to make my application copy file faster

I have create windows application that routine download file from load balance server, currently the speed is about 30MB/second. However I try to use FastCopy or TeraCopy it can copy at about 100MB/second. I want to know how to improve my copy speed to make it can copy file faster than currently.
One common mistake when using streams is to copy a byte at a time, or to use a small buffer. Most of the time it takes to write data to disk is spent seeking, so using a larger buffer will reduce your average seek time per byte.
Operating systems write files to disk in clusters. This means that when you write a single byte to disk Windows will actually write a block between 512 bytes and 64 kb in size. You can get much better disk performance by using a buffer that is an integer multiple of 64kb.
Additionally, you can get a boost from using a buffer that is a multiple of your CPUs underlying memory page size. For x86/x64 machines this can be set to either 4kb or 4mb.
So you want to use an integer multiple of 4mb.
Additionally if you use asynchronous IO you can fully take advantage of the large buffer size.
class Downloader
{
const int size = 4096 * 1024;
ManualResetEvent done = new ManualResetEvent(false);
Socket socket;
Stream stream;
void InternalWrite(IAsyncResult ar)
{
var read = socket.EndReceive(ar);
if (read == size)
InternalRead();
stream.Write((byte[])ar.AsyncState, 0, read);
if (read != size)
done.Set();
}
void InternalRead()
{
var buffer = new byte[size];
socket.BeginReceive(buffer, 0, size, System.Net.Sockets.SocketFlags.None, InternalWrite, buffer);
}
public bool Save(Socket socket, Stream stream)
{
this.socket = socket;
this.stream = stream;
InternalRead();
return done.WaitOne();
}
}
bool Save(System.Net.Sockets.Socket socket, string filename)
{
using (var stream = File.OpenWrite(filename))
{
var downloader = new Downloader();
return downloader.Save(socket, stream);
}
}
Possibly your application can do multi-threading to get the file using multiple threads, however the bandwidth is limited to the speed of the devices that transfer the content
Simplest way is to open the file in raw/binary mode (thats C speak not sure waht the C# equivalent is) and read and write very large blocks (several MB) at a time.
The trick TeraCopy uses is to make the reading and writing asynchronous. This means that a block of data can be written while another one is being read.
You have to fiddle around with the number of blocks and the size of those blocks to get the optimum for your situation. I used this method using C++ and for us the optimum was using four blocks of 256KB when copying from a network share to a local disk.
Regards,
Sebastiaan
If you run Process Monitor you can see the block sizes that Windows Explorer or TeraCopy are using.
In Vista the default block size for the local network is afair 2 MB, which makes copying files over a huge pipe a lot faster.
Why reinvent the wheel?
If your situation permits, you are probably better off shelling out to one of the existing "fast" copy utilities than trying to write one yourself. There are numerous non-obvious edge-cases which need to be handled, and getting consistently good perf requires lots of trial-end-error experimentation.

Categories