FileStream.ReadAsync very slow compared to Read() - c#

I have the following code to loop thru a file and read 1024 bytes at a time. The first iteration uses FileStream.Read() and the second iteration uses FileStream.ReadAsync().
private async void Button_Click(object sender, RoutedEventArgs e)
{
await Task.Run(() => Test()).ConfigureAwait(false);
}
private async Task Test()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int readSize;
int blockSize = 1024;
byte[] data = new byte[blockSize];
string theFile = #"C:\test.mp4";
long totalRead = 0;
using (FileStream fs = new FileStream(theFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
readSize = fs.Read(data, 0, blockSize);
while (readSize > 0)
{
totalRead += readSize;
readSize = fs.Read(data, 0, blockSize);
}
}
sw.Stop();
Console.WriteLine($"Read() Took {sw.ElapsedMilliseconds}ms and totalRead: {totalRead}");
sw.Reset();
sw.Start();
totalRead = 0;
using (FileStream fs = new FileStream(theFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, (blockSize*2), FileOptions.Asynchronous | FileOptions.SequentialScan))
{
readSize = await fs.ReadAsync(data, 0, blockSize).ConfigureAwait(false);
while (readSize > 0)
{
totalRead += readSize;
readSize = await fs.ReadAsync(data, 0, blockSize).ConfigureAwait(false);
}
}
sw.Stop();
Console.WriteLine($"ReadAsync() Took {sw.ElapsedMilliseconds}ms and totalRead: {totalRead}");
}
And the result:
Read() Took 162ms and totalRead: 162835040
ReadAsync() Took 15597ms and totalRead: 162835040
The ReadAsync() is about 100 times slower. Am I missing anything? The only thing I can think of is the overhead to create and destroy task using ReadAsync(), but is the overhead that much?
UPDATE:
I've changed the above code to reflect the suggestion by #Cory. There is a slight improvement:
Read() Took 142ms and totalRead: 162835040
ReadAsync() Took 12288ms and totalRead: 162835040
When I increase the read block size to 1MB as suggested by #Alexandru, the results are much more acceptable:
Read() Took 32ms and totalRead: 162835040
ReadAsync() Took 76ms and totalRead: 162835040
So, it hinted to me that it is indeed the overhead of the number of tasks which causes the slowness. But, if the creation and destroy of task only takes merely 100µs, things still don't really adds up for the slowness with a small block size.

Stick with big buffers if you're doing async and make sure to turn on async mode in the FileStream constructor, and you should be okay. Async methods that you await like this will trap in and out of the current thread (mind you the current thread is the UI thread in your case, which can be lagged by any other async method facilitating the same in and out thread trapping) and so there will be some overhead involved in this process if you have a large number of calls (imagine calling a new thread constructor and awaiting for it to finish about 100K times, and especially if you're dealing with a UI app where you need to wait for the UI thread to be free in order to trap back into it once the async function completes). So, to reduce these calls, we simply read in larger increments of data and focus the application on reading more data at a time by increasing the buffer size. Make sure to test this code in Release mode so all of the compiler optimizations are available to us and also such that the debugger does not slow us down:
class Program
{
static void Main(string[] args)
{
DoStuff();
Console.ReadLine();
}
public static async void DoStuff()
{
var filename = #"C:\Example.txt";
var sw = new Stopwatch();
sw.Start();
ReadAllFile(filename);
sw.Stop();
Console.WriteLine("Sync: " + sw.Elapsed);
sw.Restart();
await ReadAllFileAsync(filename);
sw.Stop();
Console.WriteLine("Async: " + sw.Elapsed);
}
static void ReadAllFile(string filename)
{
byte[] buffer = new byte[131072];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, buffer.Length, false))
while (true)
if (file.Read(buffer, 0, buffer.Length) <= 0)
break;
}
static async Task ReadAllFileAsync(string filename)
{
byte[] buffer = new byte[131072];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, buffer.Length, true))
while (true)
if ((await file.ReadAsync(buffer, 0, buffer.Length)) <= 0)
break;
}
}
Results:
Sync: 00:00:00.3092809
Async: 00:00:00.5541262
Pretty negligible...the file is about 1 GB.
Let's say I go even bigger, a 1 MB buffer, AKA new byte[1048576] (come on man, everyone has 1 MB of RAM these days):
Sync: 00:00:00.2925763
Async: 00:00:00.3402034
Then its just a few hundredths of a second difference. If you blink, you'll miss it.

Your method signature suggests you're doing this from an WPF app. While the blocking code will take up the UI thread during this time, the async code will be forced to go through the UI message queue every time an asynchronous operation completes, slowing it down and competing with any UI messages. You should try removing it from the UI thread like so:
void Button_Click(object sender, RoutedEventArgs e)
{
Task.Run(() => Button_Click_Impl());
}
async Task Button_Click_Impl()
{
// put code here.
}
Next, open the file in async mode. If you don't do this, async is emulated and will go much slower:
new FileStream(theFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 4096,
FileOptions.Asynchronous | FileOptions.SequentialScan)
Finally, you may also be able to extract some small performance using ConfigureAwait(false) to avoid moving between threads:
readSize = await fs.ReadAsync(data, 0, 1024).ConfigureAwait(false);

The overhead of a single ReadAsync operation is much higher than of a single Read operation (especially if you do not use the right mode upon opening the file, see other answers). If you eventually end up with the whole file in memory anyway, just query the file's size, allocate a large enough buffer and read all at once. Otherwise, you can still increase the buffer size to e.g. 32 MiB, or even larger if you expect larger file sizes. That should considerably speed up everything.
Only bother with launching a new task if there is considerable CPU-bound work for each block. Otherwise the UI should be kept responsive by the ReadAsync operation (with a sufficiently large buffer) taking their time (if it completes immediately, you may still be blocking the UI, see remarks at Task.Yield()).

Related

Upload blocks in parallel in blob storage

I am trying to convert this to parallel to improve the upload times of a file but with what I have tried it has not had great changes in time.
I want to upload the blocks side-by-side and then confirm them. How could I manage to do it in parallel?
public static async Task UploadInBlocks
(BlobContainerClient blobContainerClient, string localFilePath, int blockSize)
{
string fileName = Path.GetFileName(localFilePath);
BlockBlobClient blobClient = blobContainerClient.GetBlockBlobClient(fileName);
FileStream fileStream = File.OpenRead(localFilePath);
ArrayList blockIDArrayList = new ArrayList();
byte[] buffer;
var bytesLeft = (fileStream.Length - fileStream.Position);
while (bytesLeft > 0)
{
if (bytesLeft >= blockSize)
{
buffer = new byte[blockSize];
await fileStream.ReadAsync(buffer, 0, blockSize);
}
else
{
buffer = new byte[bytesLeft];
await fileStream.ReadAsync(buffer, 0, Convert.ToInt32(bytesLeft));
bytesLeft = (fileStream.Length - fileStream.Position);
}
using (var stream = new MemoryStream(buffer))
{
string blockID = Convert.ToBase64String
(Encoding.UTF8.GetBytes(Guid.NewGuid().ToString()));
blockIDArrayList.Add(blockID);
await blobClient.StageBlockAsync(blockID, stream);
}
bytesLeft = (fileStream.Length - fileStream.Position);
}
string[] blockIDArray = (string[])blockIDArrayList.ToArray(typeof(string));
await blobClient.CommitBlockListAsync(blockIDArray);
}
Of course. You shouldn't expect any improvements - quite the opposite. Blob storage doesn't have any simplistic throughput throttling that would benefit from uploading in multiple streams, and you're already doing extremely light-weight I/O which is going to be entirely I/O bound.
Good I/O code has absolutely no benefits from parallelization. No matter how many workers you put on the job, the pipe is only this thick and will not allow you to pass more data through.
All your code just reimplements the already very efficient mechanisms that the blob storage library has... and you do it considerably worse, with pointless allocation, wrong arguments and new opportunities for bugs. Don't do that. The library can deal with streams just fine.

The C# app hangs up when reading from and writing to a NamedPipeServerStream *from the same thread*

For the past month or so I've been debugging an apparent deadlock caused by a thread reading from and writing to a NamedPipeServerStream, like this:
NamedPipeServerStream pipestream = new NamedPipeServerStream(#"\\.\foo.bar", PipeDirection.InOut, 1, PipeTransmissionMode.Message, PipeOptions.Asynchronous);
//Wait for client connection here
byte[] buffer = new byte[4096];
while (true)
{
int n = await pipestream.ReadAsync(buffer, 0, n);
if (n > 0)
{
await pipestream.WriteAsync(buffer, 0, n);
await pipestream.FlushAsync();
}
else
{
break;
}
}
This code is part of a WPF app, and when the "deadlock" happens, the entire app including the GUI freezes as if the message loop itself is somehow stopped.
Before the deadlock happens, the app works for some arbitrary amount of time (including the echoback loop) so the implementation seems to be correct except for the deadlock.
(The app runs on x86-64 Windows)
I tried many workarounds including making the calls to Read() and Write() synchronous, calling WaitForPipeDrain(), changing the PipeTransmissionMode to PipeTransmissionMode.Byte, removing the Asynchronous flag, changing the buffer size, not flushing after writing, checking for CanWrite and CanRead results, waiting until IsMessageComplete etc. but they didn't work at all.
What did work in the end was running two threads, one for reading and the other for writing, and echoing the data through a System.IO.Pipelines.Pipe instance, like this (I don't believe the fact that I used System.IO.Pipelines specifically is relevant for the fix though):
Main thread:
System.IO.Pipelines.Pipe pipe = new Pipe();
NamedPipeServerStream pipestream = new NamedPipeServerStream(#"\\.\foo.bar", PipeDirection.InOut, 1, PipeTransmissionMode.Message, PipeOptions.Asynchronous);
//Wait for client connection here
//Start thread 1 and 2
Thread 1:
byte[] buffer = new byte[4096];
while (true)
{
int n = await pipestream.ReadAsync(buffer, 0, 4096);
if (n > 0)
{
await pipe.Writer.AsStream().WriteAsync(buffer);
await pipe.Writer.AsStream().FlushAsync();
}
else
{
break;
}
}
Thread 2:
byte[] buffer = new byte[4096];
while (true)
{
int n = await pipe.Reader.AsStream().ReadAsync(buffer, 0, 4096);
if (n > 0)
{
await pipestream.WriteAsync(buffer, 0, n);
await pipestream.FlushAsync();
}
else
{
break;
}
}
I did some stress tests on this implementation and the deadlock seems to be gone forever which is bizarre to me. What I just did was basically writing the exact same thing but separated into two threads. The only possible explanation that I could come up with is that the Named Pipe implementation on Windows uses some thread-local data structures, but I can't think of any reasonable reason why.
Does anyone have an idea how this happens? Is it simply that I'm using Named Pipes wrong?
I would be grateful if anyone could point me in the right direction. Thank you in advance.

How can I pause multi-file copy when one is working?

I started a small project for fun and I liked it so much I started to expand it. I needed a simple file explorer (like windows one) to just see and open files but now expanding it I have a problem with multiple files, I want to copy some from directory A and paste them in B then while doing it I want to copy some files from directory C and paste them in D and if the copy A -> B is in progress the copy C -> D is paused and when the copy A -> B finishes the second copy can start. For now, I can copy files from A to B. Is there anything not too complex I can try?
I'm using a new form to display the progress bar, file name, file count, and size when starting a copy and I'm using a BackgroundWorker
I am assuming you are just calling File.Move or File.Copy those don't give the ability to pause the actual operation, you will have to write your own Move/Copy operations
eg. to copy the file you could do the following
public void CopyFile(string sourceFileName, string destFileName, bool overwrite)
{
var outputFileMode = overwrite ? FileMode.Create : FileMode.CreateNew;
using (var inputStream = new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var outputStream = new FileStream(destFileName, outputFileMode, FileAccess.Write, FileShare.None))
{
const int bufferSize = 16384; // 16 Kb
var buffer = new byte[bufferSize];
int bytesRead;
do
{
bytesRead = inputStream.Read(buffer, 0, bufferSize);
outputStream.Write(buffer, 0, bytesRead);
} while (bytesRead == bufferSize);
}
}
Now that you have this code you can simply add a while loop with a condition to "pause" the code eg.
while (_pause)
{
Thread.Sleep(100);
}
This while would go into the do loop from the above code.
Here the complete idea
public void CopyFile(string sourceFileName, string destFileName, bool overwrite)
{
var outputFileMode = overwrite ? FileMode.Create : FileMode.CreateNew;
using (var inputStream = new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var outputStream = new FileStream(destFileName, outputFileMode, FileAccess.Write, FileShare.None))
{
const int bufferSize = 16384; // 16 Kb
var buffer = new byte[bufferSize];
int bytesRead;
do
{
//run this loop until _pause = false
while (_pause)
{
Thread.Sleep(100);
}
bytesRead = inputStream.Read(buffer, 0, bufferSize);
outputStream.Write(buffer, 0, bytesRead);
} while (bytesRead == bufferSize);
}
}
You could use objects to represent different operations, something like
public interface IFileOperation {
void Execute(Func<double> ReportProgress, CancellationToken cancel);
}
You could then create a queue of multiple operations, and create a task on another thread to process each item
private CancellationTokenSource cts = new CancellationTokenSource();
public double CurrentProgress {get; private set;}
public void CancelCurrentOperation() => cts.Cancel();
public BlockingCollection<IFileOperation> Queue = new BlockingCollection<IFileOperation>(new ConcurrentQueue<IFileOperation>());
public void RunOnWorkerThread(){
foreach(var op in Queue .GetConsumingEnumerable()){
cts = new CancellationTokenSource();
CurrentProgress = 0;
op.Execute(p => Progress = p, cts.Token );
}
}
This will run file operations, one at a time, on a background thread, while allowing new operations to be added from the main thread. To report progress you would need a non-modal progress bar. I.e. instead of showing a dialog you should add a progress bar control somewhere in your UI. Otherwise you would not be able to add new operations without cancelling the current operation. You will also need some way to connect the progress bar to the currently running operation, For example by running a timer on the main thread that updates the property that the progress bar is bound to. You can either run the method as a long Running task, or as a dedicated thread.
You could, if you wish, add a pause/resume method to the FileOperation. The answer by Rand Random shows how to copy files manually, so I will skip this here. You could also create a UI that will show a list of all queued file operations, and allow removing queued tasks. You could even, with some more work, run multiple operations in parallel, and show separate progress for each one.

C# async/await progress reporting is not in expected order

I am experimenting with async/await and progress reporting and therefore have written an async file copy method that reports progress after every copied MB:
public async Task CopyFileAsync(string sourceFile, string destFile, CancellationToken ct, IProgress<int> progress) {
var bufferSize = 1024*1024 ;
byte[] bytes = new byte[bufferSize];
using(var source = new FileStream(sourceFile, FileMode.Open, FileAccess.Read)){
using(var dest = new FileStream(destFile, FileMode.Create, FileAccess.Write)){
var totalBytes = source.Length;
var copiedBytes = 0;
var bytesRead = -1;
while ((bytesRead = await source.ReadAsync(bytes, 0, bufferSize, ct)) > 0)
{
await dest.WriteAsync(bytes, 0, bytesRead, ct);
copiedBytes += bytesRead;
progress?.Report((int)(copiedBytes * 100 / totalBytes));
}
}
}
}
In a console application a create I file with random content of 10MB and then copy it using the method above:
private void MainProgram(string[] args)
{
Console.WriteLine("Create File...");
var dir = Path.GetDirectoryName(typeof(MainClass).Assembly.Location);
var file = Path.Combine(dir, "file.txt");
var dest = Path.Combine(dir, "fileCopy.txt");
var rnd = new Random();
const string chars = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
var str = new string(Enumerable
.Range(0, 1024*1024*10)
.Select(i => letters[rnd.Next(chars.Length -1)])
.ToArray());
File.WriteAllText(file, str);
var source = new CancellationTokenSource();
var token = source.Token;
var progress = new Progress<int>();
progress.ProgressChanged += (sender, percent) => Console.WriteLine($"Progress: {percent}%");
var task = CopyFileAsync(file, dest, token, progress);
Console.WriteLine("Start Copy...");
Console.ReadLine();
}
After the application has executed, both files are identical, so the copy process is carried out in the correct order. However, the Console output is something like:
Create File...
Start Copy...
Progress: 10%
Progress: 30%
Progress: 20%
Progress: 60%
Progress: 50%
Progress: 70%
Progress: 80%
Progress: 40%
Progress: 90%
Progress: 100%
The order differs every time I call the application. I don't understand this behaviour. If I put a Breakpoint to the event handler and check each value, they are in the correct order. Can anyone explain this to me?
I want to use this later in a GUI application with a progress bar and don't want to have it jumping back and forward all the time.
Progress<T> captures current SynchronizationContext when created. If there is no SynchronizationContext (like in console app) - progress callbacks will be scheduled to thread pool threads. That means multiple callbacks can even run in parallel, and of course order is not guaranteed.
In UI applications, posting to synchronization context is roughly equivalent to:
In WPF: Dispatcher.BeginInvoke()
In WinForms: Control.BeginInvoke
I'm not working with WinForms, but in WPF, multiple BeginInvoke with the same priority (and in this case they are with the same priority) are guaranteed to execute in order they were invoked:
multiple BeginInvoke calls are made at the same DispatcherPriority,
they will be executed in the order the calls were made.
I don't see why in WinForms Control.BeginInvoke might execute our of order, but I'm not aware of a proof like I provided above for WPF. So I think in both WPF and WinForms you can safely rely on your progress callbacks to be executed in order (provided that you created Progress<T> instance itself on UI thread so that context could be captured).
Site note: don't forget to add ConfigureAwait(false) to your ReadAsync and WriteAsync calls to prevent returning to UI thread in UI applications every time after those awaits.

async / await vs BeginRead, EndRead

I don't quite 'get' async and await yet, and I'm looking for some clarification around a particular problem I'm about to solve. Basically, I need to write some code that'll handle a TCP connection. It'll essentially just receive data and process it until the connection is closed.
I'd normally write this code using the NetworkStream BeginRead and EndRead pattern, but since the async / await pattern is much cleaner, I'm tempted to use that instead. However, since I admittedly don't fully understand exactly what is involved in these, I'm a little wary of the consequences. Will one use more resources than the other; will one use a thread where another would use IOCP, etc.
Convoluted example time. These two do the same thing - count the bytes in a stream:
class StreamCount
{
private Stream str;
private int total = 0;
private byte[] buffer = new byte[1000];
public Task<int> CountBytes(Stream str)
{
this.str = str;
var tcs = new TaskCompletionSource<int>();
Action onComplete = () => tcs.SetResult(total);
str.BeginRead(this.buffer, 0, 1000, this.BeginReadCallback, onComplete);
return tcs.Task;
}
private void BeginReadCallback(IAsyncResult ar)
{
var bytesRead = str.EndRead(ar);
if (bytesRead == 0)
{
((Action)ar.AsyncState)();
}
else
{
total += bytesRead;
str.BeginRead(this.buffer, 0, 1000, this.BeginReadCallback, ar.AsyncState);
}
}
}
... And...
public static async Task<int> CountBytes(Stream str)
{
var buffer = new byte[1000];
var total = 0;
while (true)
{
int bytesRead = await str.ReadAsync(buffer, 0, 1000);
if (bytesRead == 0)
{
break;
}
total += bytesRead;
}
return total;
}
To my eyes, the async way looks cleaner, but there is that 'while (true)' loop that my uneducated brain tells me is going to use an extra thread, more resources, and therefore won't scale as well as the other one. But I'm fairly sure that is wrong. Are these doing the same thing in the same way?
To my eyes, the async way looks cleaner, but there is that 'while (true)' loop that my uneducated brain tells me is going to use an extra thread, more resources, and therefore won't scale as well as the other one.
Nope, it won't. The loop will only use a thread when it's actually running code... just as it would in your BeginRead callback. The await expression will return control to whatever the calling code is, having registered a continuation which jumps back to the right place in the method (in an appropriate thread, based on the synchronization context) and then continues running until it either gets to the end of the method or hits another await expression. It's exactly what you want :)
It's worth learning more about how async/await works behind the scenes - you might want to start with the MSDN page on it, as a jumping off point.

Categories