Do race conditions exist when using just async/await? - c#

.NET 5.0
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Threading.Tasks;
using System;
using System.Collections.Generic;
namespace AsyncTest
{
[TestClass]
public class AsyncTest
{
public async Task AppendNewIntVal(List<int> intVals)
{
await Task.Delay(new Random().Next(15, 45));
intVals.Add(new Random().Next());
}
public async Task AppendNewIntVal(int count, List<int> intVals)
{
var appendNewIntValTasks = new List<Task>();
for (var a = 0; a < count; a++)
{
appendNewIntValTasks.Add(AppendNewIntVal(intVals));
}
await Task.WhenAll(appendNewIntValTasks);
}
[TestMethod]
public async Task TestAsyncIntList()
{
var appendCount = 30;
var intVals = new List<int>();
await AppendNewIntVal(appendCount, intVals);
Assert.AreEqual(appendCount, intVals.Count);
}
}
}
The above code compiles and runs, but the test fails with output similar to:
Assert.AreEqual failed. Expected:<30>. Actual:<17>.
In the above example the "Actual" value is 17, but it varies between executions.
I know I am missing some understanding around how asynchronous programming works in .NET as I'm not getting the expected output.
From my understanding, the AppendNewIntVal method kicks off N number of tasks, then waits for them all to complete. If they have all completed, I'd expect they would have each appended a single value to the list but that's not the case. It looks like there's a race condition but I didn't think that was possible because the code is not multithreaded. What am I missing?

Yes, if you don't await each awaitable immediately, i.e. here:
appendNewIntValTasks.Add(AppendNewIntVal(intVals));
This line is in async terms equivalent to (in thread-based code) Thread.Start, and we now have no safety around the inner async code:
intVals.Add(new Random().Next());
which can now fail in the same concurrency ways when two flows call Add at the same time. You should also probably avoid new Random(), as that isn't necessarily random (it is time based on many framework versions, and can end up with two flows getting the same seed).
So: the code as shown is indeed dangerous.
The obviously safe version is:
public async Task AppendNewIntVal(int count, List<int> intVals)
{
for (var a = 0; a < count; a++)
{
await AppendNewIntVal(intVals);
}
}
It is possible to defer the await, but you're explicitly opting into concurrency when you do that, and your code needs to handle it suitably defensively.

Yes, race conditions do exist.
Async methods are basically tasks that can potentially run in parallel, depending on a task scheduler they are submitted to. The default one is ThreadPoolTaskScheduler, which is a wrapper around ThreadPool. Thus, if you submit your tasks to a scheduler (thread pool) that can execute multiple tasks in parallel, you are likely going to run into race conditions.
You could make your code a bit safer:
lock (intVals) intVals.Add(new Random().Next());
But then this opens up another can of worms :)
If you are interested in more details about async programming, see this link. Also this article is quite useful and explains best practices in asynchronous programming.
Happy (asynchronous) coding!

Yes, race conditions are indeed possible when using async/await in a way that introduces concurrency. To introduce concurrency you must:
Launch multiple asynchronous operations concurrently, i.e. launch the next operation without awaiting the completion of the previous operation, and,
Have no ambient synchronization¹ mechanism in place, namely a SynchronizationContext, that would synchronize the execution of the continuations of the asynchronous operations.
In your case both conditions are met, so the continuations are running on multiple threads concurrently. And since the List<T> class is not thread-safe, you get undefined behavior.
To see what effect a SynchronizationContext has in a situation like this, you can install the Nito.AsyncEx.Context package and do this:
[TestMethod]
public void TestAsyncIntList()
{
AsyncContext.Run(async () =>
{
var appendCount = 30;
var intVals = new List<int>();
await AppendNewIntVal(appendCount, intVals);
Assert.AreEqual(appendCount, intVals.Count);
});
}
FYI many types of applications install automatically a SynchronizationContext when launched (WPF and Windows Forms to name a few). Console applications do not though, hence the need to be extra cautious when writing an async-enabled console application.
¹ It's worth noting that the similar-looking terms synchronized/unsynchronized and synchronous/asynchronous are mostly unrelated. This can be a source of confusion for someone who is not familiar with these terms. The first term is about preventing multiple threads from accessing a shared resource concurrently. The second term is about doing something without blocking a thread.

Related

Handling asynchronous threads

I (think that I) understand the differences between threads and tasks.
Threads allow us to do multiple things in parallel (they are CPU-bound).
Asynchronous tasks release the processor time while some I/O work is done (they are I/O-bound).
Now, let's say I want to do multiple asynchronous tasks in parallel. For example, I want to download several pages of a paged response at the same time. Or, I want to write new data into two different databases. What is the correct way to handle the threads? Should they be async and awaited? Or can the async operation be just inside the thread? What is the best practice for error handling?
I have tried creating my own utility method to start a new async thread, but I have a feeling that it can go horribly wrong.
public static Task<Thread> RunInThreadAsync<T>(T actionParam, Func<T, Task> asyncAction)
{
var thread = new Thread(async () => await asyncAction(actionParam));
thread.Start();
return thread;
}
Is this ok? Or should the method be public static async Task<Thread>? If yes, what should be awaited? There is no thread.StartAsync(). Or should I use Task.Run instead?
Note: Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
I (think that I) understand the differences between threads and tasks.
There's one important concept missing here: concurrency. Concurrency is doing more than one thing at a time. This is different than "parallel", which is a term most developers use to mean "doing more than one thing at a time using threads". So, parallelism is one form of concurrency, and asynchrony is another form of concurrency.
Now, let's say I want to do multiple asynchronous tasks in parallel.
And here's the problem: mixing two forms of concurrency. What you really want to do is multiple asynchronous tasks concurrently. And the way to do this is via Task.WhenAll.
Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
This argument doesn't make any sense. Asynchronous code won't block the main thread because it's asynchronous. There's no explicit thread necessary.
If, for some unknown reason, you really do need a background thread, then just wrap your code in Task.Run. Thread should only ever be used for COM interop; any other use of Thread is legacy code as soon as it is written.
System.Threading.Thread has been in .NET since version 1.1. It allows you to control multiple worker threads within your application. This only uses 1 core of your CPU.
The Task Parallel Library (TPL) introduced the ability to leverage multiple cores on your machine with async Tasks or System.Threading.Tasks.Task<T>.
My approach for your "multiple downloader" scenario, would be to create a new CancellationTokenSource which allows me to cancel my Tasks. The I would start creating my Task<T> and start them. You can use Task.WaitAll() to sit and wait.
You should be aware that you can chain your tasks together in a sequence by using the ContinueWith<T>() method.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp2
{
async class Program
{
static bool DownloadFile (string path)
{
// Do something here. long running task.
// check for cancellation -> Task.Factory.CancellationToken.IsCancellationRequested
return true;
}
static async void Main(string[] args)
{
var paths = new[] { "Somepaths", "to the files youwant", "to download" };
List<Task<bool>> results = new List<Task<bool>>();
var cts = new System.Threading.CancellationTokenSource();
foreach(var path in paths)
{
var task = new Task<bool>(_path => DownloadFile((string)_path), path, cts.Token);
task.Start();
results.Add(task);
}
// use cts.Cancel(); to cancel all associated tasks.
// Task.WhenAll() to do something when they are all done.
// Task.WaitAll( results.ToArray() ); // to sit and wait.
Console.WriteLine("Press <Enter> to quit.");
var final = Console.ReadLine();
}
}
}

Throttling with SemaphoreSlim -- "Task.Run()" vs "new Func<Task>()"

This might not be specific to SemaphoreSlim exclusively, but basically my question is about whether there is a difference between the below two methods of throttling a collection of long running tasks, and if so, what that difference is (and when if ever to use either).
In the example below, let's say that each tracked task involves loading data from a Url (totally made up example, but is a common one that I've found for SemaphoreSlim examples).
The main difference comes down to how the individual tasks are added to the list of tracked tasks. In the first example, we call Task.Run() with a lambda, whereas in the second, we new up a Func(<Task<Result>>()) with a lambda and then immediately call that func and add the result to the tracked task list.
Examples:
Using Task.Run():
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
await ss.WaitAsync().ConfigureAwait(false);
trackedTasks.Add(Task.Run(async () =>
{
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
}));
}
var results = await Task.WhenAll(trackedTasks);
Using a new Func:
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
trackedTasks.Add(new Func<Task<Result>>(async () =>
{
await ss.WaitAsync().ConfigureAwait(false);
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
})());
}
var results = await Task.WhenAll(trackedTasks);
There are two differences:
Task.Run does error handling
First off all, when you call the lambda, it runs. On the other hand, Task.Run would call it. This is relevant because Task.Run does a bit of work behind the scenes. The main work it does is handling a faulted task...
If you call a lambda, and the lambda throws, it would throw before you add the Task to the list...
However, in your case, because your lambda is async, the compiler would create the Task for it (you are not making it by hand), and it will correctly handle the exception and make it available via the returned Task. Therefore this point is moot.
Task.Run prevents task attachment
Task.Run sets DenyChildAttach. This means that the tasks created inside the Task.Run run independently from (are not synchronized with) the returned Task.
For example, this code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(Task.Run(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
}));
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in unknown order. However the following code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(new Func<Task<int>>(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
})());
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in order, every time. This is odd, right? What happens is that the inner task is attached to outer one, and executed right away in the same thread. But if you use Task.Run, the inner task is not attached and scheduled independently.
This remain true even if you use await, as long as the task you await does not go to an external system...
What happens with external system? Well, for example, if your task is reading from an URL - as in your example - the system would create a TaskCompletionSource, get the Task from it, set a response handler that writes the result to the TaskCompletionSource, make the request, and return the Task. This Task is not scheduled, it running on the same thread as a parent task makes no sense. And thus, it can break the order.
Since, you are using await to wait on an external system, this point is moot too.
Conclusion
I must conclude that these are equivalent.
If you want to be safe, and make sure it works as expected, even if - in a future version - some of the above points stops being moot, then keep Task.Run. On the other hand, if you really want to optimize, use the lambda and avoid the Task.Run (very small) overhead. However, that probably won't be a bottleneck.
Addendum
When I talk about a task that goes to an external system, I refer to something that runs outside of .NET. There a bit of code that will run in .NET to interface with the external system, but the bulk of the code will not run in .NET, and thus will not be in a managed thread at all.
The consumer of the API specify nothing for this to happen. The task would be a promise task, but that is not exposed, for the consumer there is nothing special about it.
In fact, a task that goes to an external system may barely run in the CPU at all. Futhermore, it might just be waiting on something exterior to the computer (it could be the network or user input).
The pattern is as follows:
The library creates a TaskCompletionSource.
The library sets a means to recieve a notification. It can be a callback, event, message loop, hook, listening to a socket, a pipe line, waiting on a global mutex... whatever is necesary.
The library sets code to react to the notification that will call SetResult, or SetException on the TaskCompletionSource as appropiate for the notification recieved.
The library does the actual call to the external system.
The library returns TaskCompletionSource.Task.
Note: with extra care of optimization not reordering things where it should not, and with care of handling errors during the setup phase. Also, if a CancellationToken is involved, it has to be taken into account (and call SetCancelled on the TaskCompletionSource when appropiate). Also, there could be tear down necesary in the reaction to the notification (or on cancellation). Ah, do not forget to validate your parameters.
Then the external system goes and does whatever it does. Then when it finishes, or something goes wrong, gives the library the notification, and your Task is sudendtly completed, faulted... (or if cancellation happened, your Task is now cancelled) and .NET will schedule the continuations of the task as needed.
Note: async/await uses continuations behind the scenes, that is how execution resumes.
Incidentally, if you wanted to implement SempahoreSlim yourself, you would have to do something very similar to what I describe above. You can see it in my backport of SemaphoreSlim.
Let us see a couple of examples of promise tasks...
Task.Delay: when we are waiting with Task.Delay, the CPU is not spinning. This is not running in a thread. In this case the notification mechanism will be an OS timer. When the OS sees that the time of the timer has elapsed, it will call into the CLR, and then the CLR will mark the task as completed. What thread was waiting? none.
FileStream.ReadSync: when we are reading from storage with FileStream.ReadSync the actual work is done by the device. The CRL has to declare a custom event, then pass the event, the file handle and the buffer to the OS... the OS calls the device driver, the device driver interfaces with the device. As the storage device recovers the information, it will write to memory (directly on the specified buffer) via DMA technology. And when it is done, it will set an interruption, that is handled by the driver, that notifies the OS, that calls the custom event, that marks the task as completed. What thread did read the data from storage? none.
A similar pattern will be used to download from a web page, except, this time the device goes to the network. How to make an HTTP request and how the system waits for a response is beyond the scope of this answer.
It is also possible that the external system is another program, in which case it would run on a thread. But it won't be a managed thread on your process.
Your take away is that these task do not run on any of your threads. And their timing might depend on external factors. Thus, it makes no sense to think of them as running in the same thread, or that we can predict their timing (well, except of course, in the case of the timer).
Both are not very good because they create the tasks immediately. The func version is a little less overhead since it saves the Task.Run route over the thread pool just to immediately end the thread pool work and suspend on the semaphore. You don't need an async Func, you could simplify this by using an async method (possibly a local function).
But you should not do this at all. Instead, use a helper method that implements a parallel async foreach.
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Then you just go urls.ForEachAsync(myDop, async input => await ProcessAsync(input));
Here, the tasks are created on demand. You can even make the input stream lazy.

Async/await performance

I'm working on performance optimization of the program which widely uses async/await feature. Generally speaking it downloads thousands of json documents through HTTP in parallel, parses them and builds some response using this data. We experience some issues with performance, when we handle many requests simultaneously (e.g. download 1000 jsons), we can see that a simple HTTP request can take a few minutes.
I wrote a small console app to test it on a simplified example:
class Program
{
static void Main(string[] args)
{
for (int i = 0; i < 100000; i++)
{
Task.Run(IoBoundWork);
}
Console.ReadKey();
}
private static async Task IoBoundWork()
{
var sw = Stopwatch.StartNew();
await Task.Delay(1000);
Console.WriteLine(sw.Elapsed);
}
}
And I can see similar behavior here:
The question is why "await Task.Delay(1000)" eventually takes 23 sec.
Task.Delay isn't broken, but you're performing 100,000 tasks which each take some time. It's the call to Console.WriteLine that is causing the problem in this particular case. Each call is cheap, but they're accessing a shared resource, so they aren't very highly parallelizable.
If you remove the call to Console.WriteLine, all the tasks complete very quickly. I changed your code to return the elapsed time that each task observes, and then print just a single line of output at the end - the maximum observed time. On my computer, without any Console.WriteLine call, I see output of about 1.16 seconds, showing very little inefficiency:
using System;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
class Program
{
static void Main(string[] args)
{
ThreadPool.SetMinThreads(50000, 50000);
var tasks = Enumerable.Repeat(0, 100000)
.Select(_ => Task.Run(IoBoundWork))
.ToArray();
Task.WaitAll(tasks);
var maxTime = tasks.Max(t => t.Result);
Console.WriteLine($"Max: {maxTime}");
}
private static async Task<double> IoBoundWork()
{
var sw = Stopwatch.StartNew();
await Task.Delay(1000);
return sw.Elapsed.TotalSeconds;
}
}
You can then modify IoBoundWork to do different tasks, and see the effect. Examples of work to try:
CPU work (do something actively "hard" for the CPU, but briefly)
Synchronous sleeping (so the thread is blocked, but the CPU isn't)
Synchronous IO which doesn't have any shared bottlenecks (although that's generally hard, given that the disk or network is likely to end up being a shared resource bottleneck even if you're writing to different files etc)
Synchronous IO with a shared bottleneck such as Console.WriteLine
Asynchronous IO (await foo.WriteAsync(...) etc)
You can also try removing the call to Task.Delay(1000) or changing it. I found that by removing it entirely, the result was very small - whereas replacing it with Task.Yield was very similar to Task.Delay. It's worth remembering that as soon as your async method has to actually "pause" you're effectively doubling the task scheduling problem - instead of scheduling 100,000 operations, you're scheduling 200,000.
You'll see a different pattern in each case. Fundamentally, you're starting 100,000 tasks, asking them all to wait for a second, then asking them all to do something. That causes issues in terms of continuation scheduling that's async/await specific, but also plain resource management of "Performing 100,000 tasks each of which needs to write to the console is going to take a while."
If your problem is performance, async-await is the wrong solution.
async-await is all about availability. Availability to handle the screen and user impute, availability to handle HTTP requests, etc.
The synchronization work behind async-await will use more resources and take more time than simply blocking until the operation completes.
Your HTTP server will handle more requests because less threads will be blocked waiting for operations to complete but each request will take slightly longer.

Why an additional async operation is making my code faster than when the operation is not taking place at all?

I'm working on a SMS-based game (Value Added Service), in which a question must be sent to each subscriber on a daily basis. There are over 500,000 subscribers and therefore performance is a key factor. Since each subscriber can be a difference state of the competition with different variables, database must be queried separately for each subscriber before sending a text message. To achieve the best performance I'm using .Net Task Parallel Library (TPL) to spawn parallel threadpool threads and do as much async operations as possible in each thread to finally send texts asap.
Before describing the actual problem there are some more information necessary to give about the code.
At first there was no async operation in the code. I just scheduled some 500,000 tasks with the default task scheduler into the Threadpool and each task would work through the routines, blocking on all EF (Entity Framework) queries and sequentially finishing its job. It was good, but not fast enough. Then I changed all EF queries to Async, the outcome was superb in speed but there has been so many deadlocks and timeouts in SQL server that about a third of the subscribers never received a text! After trying different solutions, I decided not to do too many Async Database operations while I have over 500,000 tasks running on a 24 core server (with at least 24 concurrent threadpool threads)!
I rolled back all the changes (the Asycn ones) expect for one web service call in each task which remained Async.
Now the weird case:
In my code, I have a boolean variable named "isCrossSellActive". When the variable is set some more DB operations take place and an asycn webservice call will happen on which the thread awaits. When this variable is false, none of these operations will happen including the async webservice call. Awkwardly when the variable is set the code runs so much faster than when it's not! It seems like for some reason the awaited async code (the cooperative thread) is making the code faster.
Here is the code:
public async Task AutoSendMessages(...)
{
//Get list of subscriptions plus some initialization
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(numberOfThreads);
TaskFactory taskFactory = new TaskFactory(lcts);
List<Task> tasks = new List<Task>();
//....
foreach (var sub in subscriptions)
{
AutoSendData data = new AutoSendData
{
ServiceId = serviceId,
MSISDN = sub.subscriber,
IsCrossSellActive = bolCrossSellHeader
};
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
}
GC.Collect();
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ae)
{
ae.Handle((ex) =>
{
_logRepo.LogException(1, "", ex);
return true;
});
}
await _autoSendRepo.SetAutoSendingStatusEnd(statusId);
}
public async Task SendQuestion(object data)
{
//extract variables from input parameter
try
{
if (isCrossSellActive)
{
int pieceCount = subscriptionRepo.GetSubscriberCarPieces(curSubscription.service, curSubscription.subscriber).Count(c => c.isConfirmed);
foreach (var rule in csRules)
{
if (rule.Applies)
{
if (await HttpClientHelper.GetJsonAsync<bool>(url, rule.TargetServiceBaseAddress))
{
int noOfAddedPieces = SomeCalculations();
if (noOfAddedPieces > 0)
{
crossSellRepo.SetPromissedPieces(curSubscription.subscriber, curSubscription.service,
rule.TargetShortCode, noOfAddedPieces, 0, rule.ExpirationLimitDays);
}
}
}
}
}
// The rest of the code. (Some db CRUD)
await SmsClient.SendSoapMessage(subscriber, smsBody);
}
catch (Exception ex){//...}
}
Ok, thanks to #usr and the clue he gave me, the problem is finally solved!
His comment drew my attention to the awaited taskFactory.StartNew(...) line which sequentially adds new tasks to the "tasks" list which is then awaited on by Task.WaitAll(tasks);
At first I removed the await keyword before the taskFactory.StartNew() and it led the code towards a horrible state of malfunction! I then returned the await keyword to before taskFactory.StartNew() and debugged the code using breakpoints and amazingly saw that the threads are ran one after another and sequentially before the first thread reaches the first await inside the "SendQuestion" routine. When the "isCrossSellActive" flag was set despite the more jobs a thread should do the first await keyword is reached earlier thus enabling the next scheduled task to run. But when its not set the only await keyword is the last line of the routine so its most likely to run sequentially to the end.
usr's suggestion to remove the await keyword in the for loop seemed to be correct but the problem was the Task.WaitAll() line would wait on the wrong list of Task<Task<void>> instead of Task<void>. I finally used Task.Run instead of TaskFactory.StartNew and everything changed. Now the service is working well. The final code inside the for loop is:
tasks.Add(Task.Run(async () =>
{
await SendQuestion(data);
}));
and the problem was solved.
Thank you all.
P.S. Read this article on Task.Run and why TaskFactory.StartNew is dangerous: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html
It's extremly hard to tell unless you add some profiling that tell you which code is taking longer now.
Without seeing more numbers my best guess would be that the SMS service doesn't like when you send too many requests in a short time and chokes. When you add the extra DB calls the extra delay make the sms service work better.
A few other small details:
await Task.WhenAll is usually a bit better than Task.WaitAll. WaitAll means the thread will sit around waiting. Making a deadlock slightly more likely.
Instead of:
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
You should be able to do
tasks.Add(SendQuestion(data));

Tracking progress of a multi-step Task

I am working on a simple server that exposes webservices to clients. Some of the requests may take a long time to complete, and are logically broken into multiple steps. For such requests, it is required to report progress during execution. In addition, a new request may be initiated before a previous one completes, and it is required that both execute concurrently (barring some system-specific limitations).
I was thinking of having the server return a TaskId to its clients, and having the clients track the progress of the requests using the TaskId. I think this is a good approach, and I am left with the issue of how tasks are managed.
Never having used the TPL, I was thinking it would be a good way to approach this problem. Indeed, it allows me to run multiple tasks concurrently without having to manually manage threads. I can even create multi-step tasks relatively easily using ContinueWith.
I can't come up with a good way of tracking a task's progress, though. I realize that when my requests consist of a single "step", then the step has to cooperatively report its state. This is something I would prefer to avoid at this point. However, when a request consists of multiple steps, I would like to know which step is currently executing and report progress accordingly. The only way I could come up with is extremely tiresome:
Task<int> firstTask = new Task( () => { DoFirstStep(); return 3.14; } );
firstTask.
ContinueWith<int>( task => { UpdateProgress("50%"); return task.Result; } ).
ContinueWith<string>( task => { DoSecondStep(task.Result); return "blah"; }.
ContinueWith<string>( task => { UpdateProgress("100%"); return task.Result; } ).
And even this is not perfect since I would like the Task to store its own progress, instead of having UpdateProgress update some known location. Plus it has the obvious downside of having to change a lot of places when adding a new step (since now the progress is 33%, 66%, 100% instead of 50%, 100%).
Does anyone have a good solution?
Thanks!
This isn't really a scenario that the Task Parallel Library supports that fully.
You might consider an approach where you fed progress updates to a queue and read them on another Task:
static void Main(string[] args)
{
Example();
}
static BlockingCollection<Tuple<int, int, string>> _progressMessages =
new BlockingCollection<Tuple<int, int, string>>();
public static void Example()
{
List<Task<int>> tasks = new List<Task<int>>();
for (int i = 0; i < 10; i++)
tasks.Add(Task.Factory.StartNew((object state) =>
{
int id = (int)state;
DoFirstStep(id);
_progressMessages.Add(new Tuple<int, int, string>(
id, 1, "10.0%"));
DoSecondStep(id);
_progressMessages.Add(new Tuple<int, int, string>(
id, 2, "50.0%"));
// ...
return 1;
},
(object)i
));
Task logger = Task.Factory.StartNew(() =>
{
foreach (var m in _progressMessages.GetConsumingEnumerable())
Console.WriteLine("Task {0}: Step {1}, progress {2}.",
m.Item1, m.Item2, m.Item3);
});
List<Task> waitOn = new List<Task>(tasks.ToArray());
waitOn.Add(logger);
Task.WaitAll(waitOn.ToArray());
Console.ReadLine();
}
private static void DoSecondStep(int id)
{
Console.WriteLine("{0}: First step", id);
}
private static void DoFirstStep(int id)
{
Console.WriteLine("{0}: Second step", id);
}
This sample doesn't show cancellation, error handling or account for your requirement that your task may be long running. Long running tasks place special requirements on the scheduler. More discussion of this can be found at http://parallelpatterns.codeplex.com/, download the book draft and look at Chapter 3.
This is simply an approach for using the Task Parallel Library in a scenario like this. The TPL may well not be the best approach here.
If your web services are running inside ASP.NET (or a similar web application server) then you should also consider the likely impact of using threads from the thread pool to execute tasks, rather than service web requests:
How does Task Parallel Library scale on a terminal server or in a web application?
I don't think the solution you are looking for will involve the Task API. Or at least, not directly. It doesn't support the notion of percentage complete, and the Task/ContinueWith functions need to participate in that logic because it's data that is only available at that level (only the final invocation of ContinueWith is in any position to know the percentage complete, and even then, doing so algorithmically will be a guess at best because it certainly doesn't know if one task is going to take a lot longer than the other. I suggest you create your own API to do this, possibly leveraging the Task API to do the actual work.
This might help: http://blog.stephencleary.com/2010/06/reporting-progress-from-tasks.html. In addition to reporting progress, this solution also enables updating form controls without getting the Cross-thread operation not valid exception.

Categories