C# async/await progress reporting is not in expected order - c#

I am experimenting with async/await and progress reporting and therefore have written an async file copy method that reports progress after every copied MB:
public async Task CopyFileAsync(string sourceFile, string destFile, CancellationToken ct, IProgress<int> progress) {
var bufferSize = 1024*1024 ;
byte[] bytes = new byte[bufferSize];
using(var source = new FileStream(sourceFile, FileMode.Open, FileAccess.Read)){
using(var dest = new FileStream(destFile, FileMode.Create, FileAccess.Write)){
var totalBytes = source.Length;
var copiedBytes = 0;
var bytesRead = -1;
while ((bytesRead = await source.ReadAsync(bytes, 0, bufferSize, ct)) > 0)
{
await dest.WriteAsync(bytes, 0, bytesRead, ct);
copiedBytes += bytesRead;
progress?.Report((int)(copiedBytes * 100 / totalBytes));
}
}
}
}
In a console application a create I file with random content of 10MB and then copy it using the method above:
private void MainProgram(string[] args)
{
Console.WriteLine("Create File...");
var dir = Path.GetDirectoryName(typeof(MainClass).Assembly.Location);
var file = Path.Combine(dir, "file.txt");
var dest = Path.Combine(dir, "fileCopy.txt");
var rnd = new Random();
const string chars = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
var str = new string(Enumerable
.Range(0, 1024*1024*10)
.Select(i => letters[rnd.Next(chars.Length -1)])
.ToArray());
File.WriteAllText(file, str);
var source = new CancellationTokenSource();
var token = source.Token;
var progress = new Progress<int>();
progress.ProgressChanged += (sender, percent) => Console.WriteLine($"Progress: {percent}%");
var task = CopyFileAsync(file, dest, token, progress);
Console.WriteLine("Start Copy...");
Console.ReadLine();
}
After the application has executed, both files are identical, so the copy process is carried out in the correct order. However, the Console output is something like:
Create File...
Start Copy...
Progress: 10%
Progress: 30%
Progress: 20%
Progress: 60%
Progress: 50%
Progress: 70%
Progress: 80%
Progress: 40%
Progress: 90%
Progress: 100%
The order differs every time I call the application. I don't understand this behaviour. If I put a Breakpoint to the event handler and check each value, they are in the correct order. Can anyone explain this to me?
I want to use this later in a GUI application with a progress bar and don't want to have it jumping back and forward all the time.

Progress<T> captures current SynchronizationContext when created. If there is no SynchronizationContext (like in console app) - progress callbacks will be scheduled to thread pool threads. That means multiple callbacks can even run in parallel, and of course order is not guaranteed.
In UI applications, posting to synchronization context is roughly equivalent to:
In WPF: Dispatcher.BeginInvoke()
In WinForms: Control.BeginInvoke
I'm not working with WinForms, but in WPF, multiple BeginInvoke with the same priority (and in this case they are with the same priority) are guaranteed to execute in order they were invoked:
multiple BeginInvoke calls are made at the same DispatcherPriority,
they will be executed in the order the calls were made.
I don't see why in WinForms Control.BeginInvoke might execute our of order, but I'm not aware of a proof like I provided above for WPF. So I think in both WPF and WinForms you can safely rely on your progress callbacks to be executed in order (provided that you created Progress<T> instance itself on UI thread so that context could be captured).
Site note: don't forget to add ConfigureAwait(false) to your ReadAsync and WriteAsync calls to prevent returning to UI thread in UI applications every time after those awaits.

Related

Why File.ReadAllLinesAsync() blocks the UI thread?

Here is my code. An event handler for WPF button that reads lines of a file:
private async void Button_OnClick(object sender, RoutedEventArgs e)
{
Button.Content = "Loading...";
var lines = await File.ReadAllLinesAsync(#"D:\temp.txt"); //Why blocking UI Thread???
Button.Content = "Show"; //Reset Button text
}
I used asynchronous version of File.ReadAllLines() method in .NET Core 3.1 WPF App.
But it is blocking the UI Thread! Why?
Update: Same as #Theodor Zoulias, I do a test :
private async void Button_OnClick(object sender, RoutedEventArgs e)
{
Button.Content = "Loading...";
TextBox.Text = "";
var stopwatch = Stopwatch.StartNew();
var task = File.ReadAllLinesAsync(#"D:\temp.txt"); //Problem
var duration1 = stopwatch.ElapsedMilliseconds;
var isCompleted = task.IsCompleted;
stopwatch.Restart();
var lines = await task;
var duration2 = stopwatch.ElapsedMilliseconds;
Debug.WriteLine($"Create: {duration1:#,0} msec, Task.IsCompleted: {isCompleted}");
Debug.WriteLine($"Await: {duration2:#,0} msec, Lines: {lines.Length:#,0}");
Button.Content = "Show";
}
result is :
Create: 652 msec msec, Task.IsCompleted: False | Await: 15 msec, Lines: 480,001
.NET Core 3.1, C# 8, WPF, Debug build | 7.32 Mb File(.txt) | HDD 5400 SATA
Sadly currently (.NET 5) the built-in asynchronous APIs for accessing the filesystem are not implemented consistently according to Microsoft's own recommendations about how asynchronous methods are expected to behave.
An asynchronous method that is based on TAP can do a small amount of work synchronously, such as validating arguments and initiating the asynchronous operation, before it returns the resulting task. Synchronous work should be kept to the minimum so the asynchronous method can return quickly.
Methods like StreamReader.ReadToEndAsync do not behave this way, and instead block the current thread for a considerable amount of time before returning an incomplete Task. For example in an older experiment of mine with reading a 6MB file from my SSD, this method blocked the calling thread for 120 msec, returning a Task that was then completed after only 20 msec. My suggestion is to avoid using the asynchronous filesystem APIs from GUI applications, and use instead the synchronous APIs wrapped in Task.Run.
var lines = await Task.Run(() => File.ReadAllLines(#"D:\temp.txt"));
Update: Here are some experimental results with File.ReadAllLinesAsync:
Stopwatch stopwatch = Stopwatch.StartNew();
Task<string[]> task = File.ReadAllLinesAsync(#"C:\6MBfile.txt");
long duration1 = stopwatch.ElapsedMilliseconds;
bool isCompleted = task.IsCompleted;
stopwatch.Restart();
string[] lines = await task;
long duration2 = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Create: {duration1:#,0} msec, Task.IsCompleted: {isCompleted}");
Console.WriteLine($"Await: {duration2:#,0} msec, Lines: {lines.Length:#,0}");
Output:
Create: 450 msec, Task.IsCompleted: False
Await: 5 msec, Lines: 204,000
The method File.ReadAllLinesAsync blocked the current thread for 450 msec, and the returned task completed after 5 msec. These measurements are consistent after multiple runs.
.NET Core 3.1.3, C# 8, Console App, Release build (no debugger attached), Windows 10, SSD Toshiba OCZ Arc 100 240GB
.NET 6 update. The same test on the same hardware using .NET 6:
Create: 19 msec, Task.IsCompleted: False
Await: 366 msec, Lines: 204,000
The implementation of the asynchronous filesystem APIs has been improved on .NET 6, but still they are far behind the synchronous APIs (they are about
2 times slower, and not totally asynchronous). So my suggestion to
use the synchronous APIs wrapped in Task.Run still holds.
Thanks to Theodor Zoulias for the answer, it's correct and working.
When awaiting an async method, the current thread will wait for the result of the async method. The current thread in this case is main thread, so it's wait for the result of the reading process and thus freeze the UI. (UI is handle by the main thread)
To share more information with other users, I created a visual studio solution to give the ideas practically.
Problem: Read a huge file async and process it without freezing the UI.
Case1: If it happens rarely, my recommendation is to create a thread and read the content of file, process the file and then kill the thread. Use the bellow lines of code from the button's on-click event.
OpenFileDialog fileDialog = new OpenFileDialog()
{
Multiselect = false,
Filter = "All files (*.*)|*.*"
};
var b = fileDialog.ShowDialog();
if (string.IsNullOrEmpty(fileDialog.FileName))
return;
Task.Run(async () =>
{
var fileContent = await File.ReadAllLinesAsync(fileDialog.FileName, Encoding.UTF8);
// Process the file content
label1.Invoke((MethodInvoker)delegate
{
label1.Text = fileContent.Length.ToString();
});
});
Case2: If it happens continuously, my recommendation is to create a channel and subscribe to it in a background thread. whenever a new file name published, the consumer will read it asynchronously and process it.
Architecture:
Call below method (InitializeChannelReader) in your constructor to subscribe to channel.
private async Task InitializeChannelReader(CancellationToken cancellationToken)
{
do
{
var newFileName = await _newFilesChannel.Reader.ReadAsync(cancellationToken);
var fileContent = await File.ReadAllLinesAsync(newFileName, Encoding.UTF8);
// Process the file content
label1.Invoke((MethodInvoker)delegate
{
label1.Text = fileContent.Length.ToString();
});
} while (!cancellationToken.IsCancellationRequested);
}
Call method method in order to publish file name to channel which will be consumed by consumer. Use the bellow lines of code from the button's on-click event.
OpenFileDialog fileDialog = new OpenFileDialog()
{
Multiselect = false,
Filter = "All files (*.*)|*.*"
};
var b = fileDialog.ShowDialog();
if (string.IsNullOrEmpty(fileDialog.FileName))
return;
await _newFilesChannel.Writer.WriteAsync(fileDialog.FileName);

Parallel request to scrape multiple pages of a website

I want to scrape a website with plenty of pages with interesting data but as the source is very large I want to multithread and limit the overload.
I use a Parallel.ForEach to start each chunk of 10 tasks and I wait in the main for loop until the numbers of active threads started drop below a threshold. For that I use a counter of active threads I increment when starting a new thread with a WebClient and decrement when the DownloadStringCompleted event of the WebClient is triggered.
Originally the questions was how to use DownloadStringTaskAsync instead of DownloadString and wait that each of the threads started in the Parallel.ForEach has completed. This has been solved with a workaround:
a counter (activeThreads) and a Thread.Sleep in the main foor loop.
Is using await DownloadStringTaskAsync instead of DownloadString supposed to improve at all the speed by freeing a thread while waiting for the DownloadString data to arrive ?
And to get back to the original question, is there a way to do this more elegantly using TPL without the workaround of involving a counter ?
private static volatile int activeThreads = 0;
public static void RecordData()
{
var nbThreads = 10;
var source = db.ListOfUrls; // Thousands urls
var iterations = source.Length / groupSize;
for (int i = 0; i < iterations; i++)
{
var subList = source.Skip(groupSize* i).Take(groupSize);
Parallel.ForEach(subList, (item) => RecordUri(item));
//I want to wait here until process further data to avoid overload
while (activeThreads > 30) Thread.Sleep(100);
}
}
private static async Task RecordUri(Uri uri)
{
using (WebClient wc = new WebClient())
{
Interlocked.Increment(ref activeThreads);
wc.DownloadStringCompleted += (sender, e) => Interlocked.Decrement(ref iterationsCount);
var jsonData = "";
RootObject root;
jsonData = await wc.DownloadStringTaskAsync(uri);
var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
RecordData(root)
}
}
If you want an elegant solution you should use Microsoft's Reactive Framework. It's dead simple:
var source = db.ListOfUrls; // Thousands urls
var query =
from uri in source.ToObservable()
from jsonData in Observable.Using(
() => new WebClient(),
wc => Observable.FromAsync(() => wc.DownloadStringTaskAsync(uri)))
select new { uri, json = JsonConvert.DeserializeObject<RootObject>(jsonData) };
IDisposable subscription =
query.Subscribe(x =>
{
/* Do something with x.uri && x.json */
});
That's the entire code. It's nicely multi-threaded and it's kept under control.
Just NuGet "System.Reactive" to get the bits.
Parallel.ForEach
Will create ProcessorCount tasks to execute the function for each item in the source Enumerable. It will take care that there are not to many tasks and will wait for all items and tasks to be executed.
Task.WhenAll
Only awaits the given tasks it does not execute them. Its on your hand to execute them in a proper way and not to many at once.
But there is some fault in your code. The function RecordUri will return a task that has to be awaited otherwise the ForEach will just create more and more as the function will never know when the current task is completed. Also problematic is that you create a task in a task and the first task does nothing else then wait for the first one.
You might also want to take a look at this overload of Parallel.ForEach
https://msdn.microsoft.com/en-us/library/dd782934(v=vs.110).aspx
Edit
Is using await DownloadStringTaskAsync instead of DownloadString supposed to improve at all the speed by freeing a thread while waiting for the DownloadString data to arrive ?
No. As when a task is awaiting a external resource it enters a Suspended state (Windows api that is not using some old/dirty iteration waiting). So there is no much difference.
What differs is the overhead the compiler will generate when compiling your async code. The DownloadStringTaskAsync will create a task that contains the long operation. If you use await it, you will attach yourself to that task (by ContinueWith). So you just create a Task for awaiting another. This is the overhead i was talking about in the upper text.
My approach would be: Use the synchronous method inside your Parallel.ForEach. The Threadding will be done by PLinq and you are free to go on.
Remember "KISS"

App that uses Task Scheduler quickly runs out of memory

The app parses files within some directory, while new files are being added to the directory. I uses ConcurrentQueue and tried to split work to the number of cores. So if there are files to process - it should process up to 4(cores) files concurrently.
Yet the app runs OOM within seconds, after processing 10-30 files. I see the memory consumption grow to ~1.5GB quickly, than OOM error appears.
I'm to task scheduler, so I'm probably doing something wrong.
File parsing is done by running some .exe on the file, which uses <5mb or ram.
Task scheduler runs every time timer thread elapses. But it runs OOM even before timer elapsed for 2nd time.
private void OnTimedEvent(object source, ElapsedEventArgs e)
{
DirectoryInfo info = new DirectoryInfo(AssemblyDirectory);
FileInfo[] allSrcFiles = info.GetFiles("*.dat").OrderBy(p => p.CreationTime).ToArray();
var validSrcFiles = allSrcFiles.Where(p => (DateTime.Now - p.CreationTime) > TimeSpan.FromSeconds(60));
var newFilesToParse = validSrcFiles.Where(f => !ProcessedFiles.Contains(f.Name));
if (newFilesToParse.Any()) Console.WriteLine("Adding " + newFilesToParse.Count() + " files to the Queue");
foreach (var file in newFilesToParse)
{
FilesToParseQueue.Enqueue(file);
ProcessedFiles.Add(file.Name);
}
if (!busy)
{
if (FilesToParseQueue.Any())
{
busy = true;
Console.WriteLine("");
Console.WriteLine("There are " + FilesToParseQueue.Count + " files in queue. Processing...");
}
var scheduler = new LimitedConcurrencyLevelTaskScheduler(coresCount); //4
TaskFactory factory = new TaskFactory(scheduler);
while (FilesToParseQueue.Any())
{
factory.StartNew(() =>
{
FileInfo file;
if (FilesToParseQueue.TryDequeue(out file))
{
//Dequeue();
ParseFile(file);
}
});
}
if (!FilesToParseQueue.Any())
{
busy = false;
Console.WriteLine("Finished processing Files in the Queue. Waiting for new files...");
}
}
}
Your code keeps on creating new Tasks as long as there are files to process and it does so much faster that the files can be processed. But it has no other limit (like the number of files in the directory), which is why it quickly runs out of memory.
A simple fix would be to move the dequeuing outside the loop:
while (true)
{
FileInfo file;
if (FilesToParseQueue.TryDequeue(out file))
{
factory.StartNew(() => ParseFile(file));
}
else
{
break;
}
}
You would get even better performance if you created just one Task per core and processed the files using a loop inside those Tasks.
This kind of problem (where you queue mutliple units of work, and want them processed in parallel) is a perfect fit for TPL Dataflow:
private async void OnTimedEvent(object source, ElapsedEventArgs e)
{
DirectoryInfo info = new DirectoryInfo(AssemblyDirectory);
FileInfo[] allSrcFiles = info.GetFiles("*.dat").OrderBy(p => p.CreationTime).ToArray();
var validSrcFiles = allSrcFiles.Where(p => (DateTime.Now - p.CreationTime) > TimeSpan.FromSeconds(60));
var newFilesToParse = validSrcFiles.Where(f => !ProcessedFiles.Contains(f.Name));
if (newFilesToParse.Any()) Console.WriteLine("Adding " + newFilesToParse.Count() + " files to the Queue");
var blockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = coresCount,
};
var block = new ActionBlock<FileInfo>(ParseFile, blockOptions);
var filesToParseCount = 0;
foreach (var file in newFilesToParse)
{
block.Post(file);
ProcessedFiles.Add(file.Name);
++filesToParseCount;
}
Console.WriteLine("There are " + filesToParseCount + " files in queue. Processing...");
block.Complete();
await block.Completion;
Console.WriteLine("Finished processing Files in the Queue. Waiting for new files...");
}
Basic solution
You can actually fix your code by stripping it down to the bare essentials like so:
// This is technically a misnomer. It should be
// called "FileNamesQueuedForProcessing" or similar.
// Non-thread-safe. Assuming timer callback access only.
private readonly HashSet<string> ProcessedFiles = new HashSet<string>();
private readonly LimitedConcurrencyLevelTaskScheduler LimitedConcurrencyScheduler = new LimitedConcurrencyLevelTaskScheduler(Environment.ProcessorCount);
private void OnTimedEvent(object source, ElapsedEventArgs e)
{
DirectoryInfo info = new DirectoryInfo(AssemblyDirectory);
// Slightly rewritten to cut down on allocations.
FileInfo[] newFilesToParse = info
.GetFiles("*.dat")
.Where(f =>
(DateTime.Now - f.CreationTime) > TimeSpan.FromSeconds(60) && // I'd consider removing this filter.
!ProcessedFiles.Contains(f.Name))
.OrderBy(p => p.CreationTime)
.ToArray();
if (newFilesToParse.Length != 0) Console.WriteLine("Adding " + newFilesToParse.Count() + " files to the Queue");
foreach (FileInfo file in newFilesToParse)
{
// Fire and forget.
// You can add the resulting task to a shared thread-safe collection
// if you want to observe completion/exceptions/cancellations.
Task.Factory.StartNew(
() => ParseFile(file)
, CancellationToken.None
, TaskCreationOptions.DenyChildAttach
, LimitedConcurrencyScheduler
);
ProcessedFiles.Add(file.Name);
}
}
Note how I am not doing any kind of load balancing on my own, instead relying on LimitedConcurrencyLevelTaskScheduler to perform as advertised - that is, accept all work items immediately on Task.Factory.StartNew, queue them internally and process them at some point in the future on up to [N = max degree of parallelism] thread pool threads.
P.S. I'm assuming that OnTimedEvent will always fire on the same thread. If not, a small change will be necessary to ensure thread safety:
private void OnTimedEvent(object source, ElapsedEventArgs e)
{
lock (ProcessedFiles)
{
// As above.
}
}
Alternative solution
Now, here's a slightly more novel approach: how about we get rid of the timer and LimitedConcurrencyLevelTaskScheduler and encapsulate all of the processing in a single, modular pipeline? There will be a lot of blocking code (unless you break out TPL Dataflow - but I'll stick with Base Class Library types here), but the messaging between stages is so easy it makes for a really appealing design (in my opinion of course).
private async Task PipelineAsync()
{
const int MAX_FILES_TO_BE_QUEUED = 16;
using (BlockingCollection<FileInfo> queue = new BlockingCollection<FileInfo>(boundedCapacity: MAX_FILES_TO_BE_QUEUED))
{
Task producer = Task.Run(async () =>
{
try
{
while (true)
{
DirectoryInfo info = new DirectoryInfo(AssemblyDirectory);
HashSet<string> namesOfFilesQeueuedForProcessing = new HashSet<string>();
FileInfo[] newFilesToParse = info
.GetFiles("*.dat")
.Where(f =>
(DateTime.Now - f.CreationTime) > TimeSpan.FromSeconds(60) &&
!ProcessedFiles.Contains(f.Name))
.OrderBy(p => p.CreationTime) // Processing order is not guaranteed.
.ToArray();
foreach (FileInfo file in newFilesToParse)
{
// This will block if we reach bounded capacity thereby throttling
// the producer (meaning we'll never overflow the handover collection).
queue.Add(file);
namesOfFilesQeueuedForProcessing.Add(file.Name);
}
await Task.Delay(TimeSpan.FromSeconds(60)).ConfigureAwait(false);
}
}
finally
{
// Exception? Cancellation? We'll let the
// consumer know that it can wind down.
queue.CompleteAdding();
}
});
Task consumer = Task.Run(() =>
{
ParallelOptions options = new ParallelOptions {
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(queue.GetConsumingEnumerable(), options, file => ParseFile(file));
});
await Task.WhenAll(producer, consumer).ConfigureAwait(false);
}
}
This pattern in its general form is described in Stephen Toub's "Patterns of Parallel Programming", page 55. I highly recommend having a look.
The trade-off here is the amount of blocking that you'll be doing due to using BlockingCollection<T> and Parallel.ForEach. The benefits of the pipeline as a concept are numerous though: new stages (Task instances) are easy to add, completion and cancellation easy to wire in, both producer and consumer exceptions are observed, and all the mutable state is delightfully local.

Advice on processing giant text file and processing URL's

I'm currently trying to loop through a text file that is about 1.5gb's in size and then use the URL's that are grabbed from it to pull down the html from the site.
For speed I'm trying to process all the HTTP request on a new thread but since C# is not my strongest language but a requirement for what I'm doing I'm a bit confused on good thread practice.
This is how I'm processing the list
private static void Main()
{
const Int32 BufferSize = 128;
using (var fileStream = File.OpenRead("dump.txt"))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
var progress = 0;
while ((line = streamReader.ReadLine()) != null)
{
var stuff = line.Split('|');
getHTML(stuff[3]);
progress += 1;
Console.WriteLine(progress);
}
}
}
And I'm pulling down the HTML as so
private static void getHTML(String url)
{
new Thread(() =>
{
var client = new DecompressGzipResponse();
var html = client.DownloadString(url);
}).Start();
}
Though the speeds are fast doing this initially, after about 20 thousand they slow down and eventually after 32 thousand the application will hang and crash. I was under the impression C# threads terminated when the function completed?
Can anyone give any examples/ suggestions on how to do this better?
One very reliable way to do this is by using the producer-consumer pattern. You create a thread-safe queue of URLs (for example, BlockingCollection<Uri>). Your main thread is the producer, which adds items to the queue. You then have multiple consumer threads, each of which reads Urls from the queue and does the HTTP requests. See BlockingCollection.
Setting it up isn't terribly difficult:
BlockingCollection<Uri> UrlQueue = new BlockingCollection<Uri>();
// Main thread starts the consumer threads
Task t1 = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
Task t2 = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
// create more tasks if you think necessary.
// Now read your file
foreach (var line in File.ReadLines(inputFileName))
{
var theUri = ExtractUriFromLine(line);
UrlQueue.Add(theUri);
}
// when done adding lines to the queue, mark the queue as complete
UrlQueue.CompleteAdding();
// now wait for the tasks to complete.
t1.Wait();
t2.Wait();
// You could also use Task.WaitAll if you have an array of tasks
The individual threads process the urls with this method:
void ProcessUrls()
{
foreach (var uri in UrlQueue.GetConsumingEnumerable())
{
// code here to do a web request on that url
}
}
That's a simple and reliable way to do things, but it's not especially quick. You can do much better by using a second queue of WebCient objects that make asynchronous requests For example, say you want to have 15 asynchronous requests. You start the same way with a BlockingCollection, but you only have one persistent consumer thread.
const int MaxRequests = 15;
BlockingCollection<WebClient> Clients = new BlockingCollection<WebClient>();
// start a single consumer thread
var ProcessingThread = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
// Create the WebClient objects and add them to the queue
for (var i = 0; i < MaxRequests; ++i)
{
var client = new WebClient();
// Add an event handler for the DownloadDataCompleted event
client.DownloadDataCompleted += DownloadDataCompletedHandler;
// And add this client to the queue
Clients.Add(client);
}
// add the code from above that reads the file and populates the queue
Your processing function is somewhat different:
void ProcessUrls()
{
foreach (var uri in UrlQueue.GetConsumingEnumerable())
{
// Wait for an available client
var client = Clients.Take();
// and make an asynchronous request
client.DownloadDataAsync(uri, client);
}
// When the queue is empty, you need to wait for all of the
// clients to complete their requests.
// You know they're all done when you dequeue all of them.
for (int i = 0; i < MaxRequests; ++i)
{
var client = Clients.Take();
client.Dispose();
}
}
Your DownloadDataCompleted event handler does something with the data that was downloaded, and then adds the WebClient instance back to the queue of clients.
void DownloadDataCompleteHandler(Object sender, DownloadDataCompletedEventArgs e)
{
// The data downloaded is in e.Result
// be sure to check the e.Error and e.Cancelled values to determine if an error occurred
// do something with the data
// And then add the client back to the queue
WebClient client = (WebClient)e.UserState;
Clients.Add(client);
}
This should keep you going with 15 concurrent requests, which is about all you can do without getting a bit more complicated. Your system can likely handle many more concurrent requests, but the way that WebClient starts asynchronous requests requires some synchronous work up front, and that overhead makes 15 about the maximum number you can handle.
You might be able to have multiple threads initiating the asynchronous requests. In that case, you could potentially have as many threads as you have processor cores. So on a quad core machine, you could have the main thread and three consumer threads. With three consumer threads this technique could give you 45 concurrent requests. I'm not certain that it scales that well, but it might be worth a try.
There are ways to have hundreds of concurrent requests, but they're quite a bit more complicated to implement.
You need thread management.
My advice is to use Tasks instead of creating your own Threads.
By using the Task Parallel Library, you let the runtime deal with the thread management. By default, it will allocate your tasks on threads from the ThreadPool, and will allow a level of concurrency which is contingent on the number of CPU cores you have. It will also reuse existing Threads when they become available instead of wasting time creating new ones.
If you want to get more advanced, you can create your own task scheduler to manage the scheduling aspect yourself.
See also What is difference between Task and Thread?

WPF async await Task Locks UI Thread Running Tasks in Parallel

I have a WPF app which, upon button click, creates a List<Task<int>> and starts these tasks. My assumption is that the Add() call starts these in parallel, but async.
This is my function that does a bunch of WMI calls in serial on a remote machine:
AgentBootstrapper.cs
public async Task<int> BootstrapAsync(BootstrapContext context, IProgress<BootstrapAsyncProgress> progress)
{
...
do a bunch of stuff in serial *without* await calls
...
if (progress != null)
{
progress.Report(new BootstrapAsyncProgress
{
MachineName = context.MachineName,
ProgressPercentage = 30,
Text = "Copying install agent software to \\\\" + context.MachineName + "\\" + context.ShareName
});
}
...
return pid; // ProcessId of the remote agent that was just started
}
And this is obviously my button handler in the UI:
Shell.xaml.cs
private async void InstallButton_Click(object sender, RoutedEventArgs e)
{
var bootstrapTasks = new List<Task<int>>();
var progress = new Progress<BootstrapAsyncProgress>();
progress.ProgressChanged += (o, asyncProgress) =>
{
Debug.WriteLine("{0}: {1}% {2}", asyncProgress.MachineName, asyncProgress.ProgressPercentage,
asyncProgress.Text);
//TODO Update ViewModel property for ProgressPercentage
};
var vm = DataContext as ShellViewModel;
Debug.Assert(vm != null);
foreach (var targetMachine in vm.TargetMachines)
{
var bootstrapContext = new BootstrapContext(targetMachine.MachineName, true)
{
AdminUser = vm.AdminUser,
AdminPassword = vm.AdminPassword
};
var bootstrapper = new AgentBootstrapper(bootstrapContext);
bootstrapTasks.Add(bootstrapper.BootstrapAsync(bootstrapContext, progress)); // UI thread locks up here
}
}
I know functions marked as async should have function calls within them using await. In my case, these are all calls to some synchronous WMi helper functions which all return void. So, I don't think await is what I want here.
Simply put, I want all the bootstrapTasks items (the calls to bootstrapper.BootstrapAsync() to fire at once, and have the UI thread receive progress events from all of them. When the whole lot are complete, I'll need to handle that too.
Update 1
Attempting to use Task.Run() fixes the UI locking issue, but only the first Task instance is executed. Update foreach loop:
foreach (var targetMachine in vm.TargetMachines)
{
var tm = targetMachine; // copy closure variable
var bootstrapContext = new BootstrapContext(tm.MachineName, true)
{
AdminUser = vm.AdminUser,
AdminPassword = vm.AdminPassword
};
var bootstrapper = new AgentBootstrapper(bootstrapContext);
Debug.WriteLine("Starting Bootstrap task on default thread pool...");
var task = Task.Run(() =>
{
var pid = bootstrapper.Bootstrap(bootstrapContext, progress);
return pid;
});
Debug.WriteLine("Adding Task<int> " + task.Id + " to List<Task<int>>.");
tasks.Add(task);
await Task.WhenAll(tasks); // Don't proceed with the rest of this function untill all tasks are complete
}
Update 2
Moving the await Task.WhenAll(tasks); outside the foreach loop allows all tasks to run in parallel.
Nothing in the code generated for async/await involves the creation of threads. Using the async keyword does not cause another thread to be used. All async does is allow you to use the await keyword. If you want something to happen on another thread, try using Task.Run.
Run the tasks on the thread pool (using the default task scheduler, that is) and await Task.WhenAll(bootstrapTasks) on them in your UI thread?

Categories