Task.WhenAll and task starting behaviour - c#

I've got a fairly simple application using Task.WhenAll. The issue I am facing so far is that I don't know if I should start the subtasks myself or let WhenAll start them as appropriate.
The examples online show using tasks from framework methods, where it's not clear to me if the tasks returned have already started or not. However I've created my own tasks with an Action, so it's a detail that I have to address.
When I'm using Task.WhenAll, should I start the constituent tasks directly, or should I let Task.WhenAll handle it for fun, profit, and improved execution speed?
For further fun, the subtasks contain lots of blocking I/O.

WhenAll won't start tasks for you. You have to start them yourself.
var unstartedTask = new Task(() => {});
await Task.WhenAll(unstartedTask); // this task won't complete until unstartedTask.Start()
However, generally, tasks created (e.g. using Task.Run, async methods, etc.) have already been started. So you generally don't have to take a separate action to start the task.
var task = Task.Run(() => {});
await Task.WhenAll(task); // no need for task.Start()

I've created my own tasks with an Action
When you're working with asynchronous tasks, the convention is to only deal with tasks already in progress. So using the Task constructor and Start is inappropriate; it would be better to use Task.Run.
As others have noted, Task.WhenAll only aggregates the tasks; it does not start them for you.

Task.WhenAll(IEnumerable) will handle the supplied tasks for you, but you can create them using the most common way - by executing Task.Run(Action) or TaskFactory.StartNew(Action) method.
Just for a note: if any of the tasks is completed in Faulted state, resulting task will complete in Faulted state as well, having AggregateException set to its Exception property.

Related

Running async functions in parallel from list of interfaces

I have a similair question to Running async methods in parallel in that I wish to run a number of functions from a list of functions in parallel.
I have noted in a number of comments online it is mentioned that if you have another await in your methods, Task.WhenAll() will not help as Async methods are not parallel.
I then went ahead and created a thread for each using function call with the below (the number of parallel functions will be small typically 1 to 5):
public interface IChannel
{
Task SendAsync(IMessage message);
}
public class SendingChannelCollection
{
protected List<IChannel> _channels = new List<IChannel>();
/* snip methods to add channels to list etc */
public async Task SendAsync(IMessage message)
{
var tasks = SendAll(message);
await Task.WhenAll(tasks.AsParallel().Select(async task => await task));
}
private IEnumerable<Task> SendAll(IMessage message)
{
foreach (var channel in _channels)
yield return channel.SendAsync(message, qos);
}
}
I would like to double check I am not doing anything horrendous with code smells or bugs as i get to grips with what I have patched together from what i have found online. Many thanks in advance.
Let's compare the behaviour of your line:
await Task.WhenAll(tasks.AsParallel().Select(async task => await task));
in contrast with:
await Task.WhenAll(tasks);
What are you delegating to PLINQ in the first case? Only the await operation, which does basically nothing - it invokes the async/await machinery to wait for one task. So you're setting up a PLINQ query that does all the heavy work of partitioning and merging the results of an operation that amounts to "do nothing until this task completes". I doubt that is what you want.
If you have another await in your methods, Task.WhenAll() will not help as Async methods are not parallel.
I couldn't find that in any of the answers to the linked questions, except for one comment under the question itself. I'd say that it's probably a misconception, stemming from the fact that async/await doesn't magically turn your code into concurrent code. But, assuming you're in an environment without a custom SynchronizationContext (so not an ASP or WPF app), continuations to async functions will be scheduled on the thread pool and possibly run in parallel. I'll delegate you to this answer to shed some light on that. That basically means that if your SendAsync looks something like this:
Task SendAsync(IMessage message)
{
// Synchronous initialization code.
await something;
// Continuation code.
}
Then:
The first part before await runs synchronously. If this part is heavyweight, you should introduce parallelism in SendAll so that the initialization code is run in parallel.
await works as usual, waiting for work to complete without using up any threads.
The continuation code will be scheduled on the thread pool, so if a few awaits finish up at the same time their continuations might be run in parallel if there's enough threads in the thread pool.
All of the above is assuming that await something actually awaits asynchronously. If there's a chance that await something completes synchronously, then the continuation code will also run synchronously.
Now there is a catch. In the question you linked one of the answers states:
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
Now I don't know if that's true, since I weren't able to find any other source claiming that. I guess it's possible and in that case it might actually be beneficial to invoke PLINQ to deal with partitioning and throttling for you. However, you said you typically handle 1-5 functions, so you shouldn't worry about this.
So to summarize, parallelism is hard and the correct approach depends on how exactly your SendAsync method looks like. If it has heavyweight initialization code and that's what you want to parallelise, you should run all the calls to SendAsync in parallel. Otherwise, async/await will be implicitly using the thread pool anyway, so your call to PLINQ is redundant.

How is concurrency handled when not using Task.Run() to schedule the work?

If we fill a list of Tasks that need to do both CPU-bound and I/O bound work, by simply passing their method declaration to that list (Not by creating a new task and manually scheduling it by using Task.Start), how exactly are these tasks handled?
I know that they are not done in parallel, but concurrently.
Does that mean that a single thread will move along them, and that single thread might not be the same thread in the thread pool, or the same thread that initially started waiting for them all to complete/added them to the list?
EDIT: My question is about how exactly these items are handled in the list concurrently - is the calling thread moving through them, or something else is going on?
Code for those that need code:
public async Task SomeFancyMethod(int i)
{
doCPUBoundWork(i);
await doIOBoundWork(i);
}
//Main thread
List<Task> someFancyTaskList = new List<Task>();
for (int i = 0; i< 10; i++)
someFancyTaskList.Add(SomeFancyMethod(i));
// Do various other things here --
// how are the items handled in the meantime?
await Task.WhenAll(someFancyTaskList);
Thank you.
Asynchronous methods always start running synchronously. The magic happens at the first await. When the await keyword sees an incomplete Task, it returns its own incomplete Task. If it sees a complete Task, execution continues synchronously.
So at this line:
someFancyTaskList.Add(SomeFancyMethod(i));
You're calling SomeFancyMethod(i), which will:
Run doCPUBoundWork(i) synchronously.
Run doIOBoundWork(i).
If doIOBoundWork(i) returns an incomplete Task, then the await in SomeFancyMethod will return its own incomplete Task.
Only then will the returned Task be added to your list and your loop will continue. So the CPU-bound work is happening sequentially (one after the other).
There is some more reading about this here: Control flow in async programs (C#)
As each I/O operation completes, the continuations of those tasks are scheduled. How those are done depends on the type of application - particularly, if there is a context that it needs to return to (desktop and ASP.NET do unless you specify ConfigureAwait(false), ASP.NET Core doesn't). So they might run sequentially on the same thread, or in parallel on ThreadPool threads.
If you want to immediately move the CPU-bound work to another thread to run that in parallel, you can use Task.Run:
someFancyTaskList.Add(Task.Run(() => SomeFancyMethod(i)));
If this is in a desktop application, then this would be wise, since you want to keep CPU-heavy work off of the UI thread. However, then you've lost your context in SomeFancyMethod, which may or may not matter to you. In a desktop app, you can always marshall calls back to the UI thread fairly easily.
I assume you don't mean passing their method declaration, but just invoking the method, like so:
var tasks = new Task[] { MethodAsync("foo"),
MethodAsync("bar") };
And we'll compare that to using Task.Run:
var tasks = new Task[] { Task.Run(() => MethodAsync("foo")),
Task.Run(() => MethodAsync("bar")) };
First, let's get the quick answer out of the way. The first variant will have lower or equal parallelism to the second variant. Parts of MethodAsync will run the caller thread in the first case, but not in the second case. How much this actually affects the parallelism depends entirely on the implementation of MethodAsync.
To get a bit deeper, we need to understand how async methods work. We have a method like:
async Task MethodAsync(string argument)
{
DoSomePreparationWork();
await WaitForIO();
await DoSomeOtherWork();
}
What happens when you call such a method? There is no magic. The method is a method like any other, just rewritten as a state machine (similar to how yield return works). It will run as any other method until it encounters the first await. At that point, it may or may not return a Task object. You may or may not await that Task object in the caller code. Ideally, your code should not depend on the difference. Just like yield return, await on a (non-completed!) task returns control to the caller of the method. Essentially, the contract is:
If you have CPU work to do, use my thread.
If whatever you do would mean the thread isn't going to use the CPU, return a promise of the result (a Task object) to the caller.
It allows you to maximize the ratio of what CPU work each thread is doing. If the asynchronous operation doesn't need the CPU, it will let the caller do something else. It doesn't inherently allow for parallelism, but it gives you the tools to do any kind of asynchronous operation, including parallel operations. One of the operations you can do is Task.Run, which is just another asynchronous method that returns a task, but which returns to the caller immediately.
So, the difference between:
MethodAsync("foo");
MethodAsync("bar");
and
Task.Run(() => MethodAsync("foo"));
Task.Run(() => MethodAsync("bar"));
is that the former will return (and continue to execute the next MethodAsync) after it reaches the first await on a non-completed task, while the latter will always return immediately.
You should usually decide based on your actual requirements:
Do you need to use the CPU efficiently and minimize context switching etc., or do you expect the async method to have negligible CPU work to do? Invoke the method directly.
Do you want to encourage parallelism or do you expect the async method to do interesting amounts of CPU work? Use Task.Run.
Here is your code rewritten without async/await, with old-school continuations instead. Hopefully it will make it easier to understand what's going on.
public Task CompoundMethodAsync(int i)
{
doCPUBoundWork(i);
return doIOBoundWorkAsync(i).ContinueWith(_ =>
{
doMoreCPUBoundWork(i);
});
}
// Main thread
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
Task task = CompoundMethodAsync(i);
tasks.Add(task);
}
// The doCPUBoundWork has already ran synchronously 10 times at this point
// Do various things while the compound tasks are progressing concurrently
Task.WhenAll(tasks).ContinueWith(_ =>
{
// The doIOBoundWorkAsync/doMoreCPUBoundWork have completed 10 times at this point
// Do various things after all compound tasks have been completed
});
// No code should exist here. Move everything inside the continuation above.

Continue workflow while async operation is running

I've commented on Eric Lippert's answer to What's the difference between Task.Start/Wait and Async/Await?
I am asking this question since I am still unsure if I understand the concept correctly and how I'd achieve my goal. Adding tons of comments is not very helpful for anyone.
What I understand: await tells the compiler that the current thread has the capacity to perform some other computation and get back once the awaited operation is done. That means the workflow is interrupted until the awaited operation is done. This doesn't speed up the computation of the context which contains the await but increases the overall application performance due to better usage of workers.
No my issue: I'd like to continue the workflow and in the end make sure the operation is done. So basically allow the worker to continue the current workflow even if the awaitable operation is not complete and await completion at the end of the workflow. I'd like the worker to spend time on my workflow and not run away and help someone else.
What I think might work: Consider n async Add operations and a Flush operation which processes added items. Flush requires the items to be added. But adding items doesn't require the previous item to be added. So basically I'd like to collect all running Add operations and await all of them once all have been added. And after they have been awaited they should be Flushed.
Can I do this by adding the Add Tasks to a list and in the end await all those tasks?
Or is this pseudo-async and has no benefit in the end?
Is it the same as awaiting all the Add operations directly? (Without collecting them)
What I understand: await tells the compiler that the current thread has the capacity to perform some other computation and get back once the awaited operation is done.
That's pretty close. A better way to characterize it is: await means suspend this workflow until the awaited task is complete. If the workflow is suspended because the task isn't done, that frees up this thread to find more work to do, and the workflow will be scheduled to resume at some point in the future when the task is done. The choice of what to do while waiting is given to the code that most recently called this workflow; that is, an await is actually a fancy return. After all, return means "let my caller decide what to do next".
If the task is done at the point of the await then the workflow simply continues normally.
Await is an asynchronous wait. It waits for a task to be done, but it keeps busy while it is waiting.
I'd like to continue the workflow and in the end make sure the operation is done. So basically allow the worker to continue the current workflow even if the awaitable operation is not complete and await completion at the end of the workflow. I'd like the worker to spend time on my workflow and not run away and help someone else.
Sure, that's fine. Don't await a task until the last possible moment, when you need the task to be complete before the workflow can continue. That's a best practice.
However: if your workflow is doing operations that are taking more than let's say 30 milliseconds without awaiting something, and you're on the UI thread, then you are potentially freezing the UI and irritating the user.
Can I do this by adding the Add Tasks to a list and in the end await all those tasks?
Of course you can; that's a good idea. Use the WhenAll combinator to easily await all of a sequence of tasks.
Is it the same as awaiting all the Add operations directly? (Without collecting them)
No, it's different. As you correctly note, awaiting each Add operation will ensure that no Add starts until the previous one completes. If there's no requirement that they be serialized in this manner, you can make a more efficient workflow by starting the tasks first, and then awaiting them after they're all started.
If I understand your question correctly, what you want to do is parallelize the asynchronous work, which is very common.
Consider the following code:
async Task Add(Item item) { ... }
async Task YourMethod()
{
var tasks = new List<Task>();
foreach (var item in collection)
{
tasks.Add(Add(item));
}
// do any work you need
Console.WriteLine("Working...");
// then ensure the tasks are done
await Task.WhenAll(tasks);
// and flush them out
await Flush();
}

Multiple await operations or just one

I've seen how the await keyword is implemented and resulting structure it creates. I think I have a rudimentary understanding of it. However, is
public async Task DoWork()
{
await this.Operation1Async();
await this.Operation2Async();
await this.Operation3Async();
}
"better" (generally speaking) or
public async Task DoWork()
{
await this.Operation1Async();
this.Operation2();
this.Operation3();
}
The problem with the first approach is that it is creating a new Task for each await call? Which entails a new thread?
Whereas the first creates a new Task on the first await and then everything from there is processed in the new Task?
Edit
Ok maybe I wasn't too clear, but if for example we have
while (await reader.ReadAsync())
{
//...
}
await reader.NextResultAsync();
// ...
Is this not creating two tasks? One in the main thread with the first ReadAsync then another task in this newly created task with the NextResultAsync. My question is there really a need for the second task, isn't
the one task created in the main thread sufficient? So
while (await reader.ReadAsync())
{
//...
}
reader.NextResult();
// ...
it is creating a new Task for each await call? Which entails a new thread?
Yes and no. Yes, it is creating a Task for each asynchronous method; the async state machine will create one. However, these tasks are not threads, nor do they even run on threads. They do not "run" anywhere.
You may find some blog posts of mine useful:
async intro, which explains how async/await work.
There Is No Thread, which explains how tasks can work without threads.
Intro to the Task type, which explains how some tasks (Delegate Tasks) have code and run on threads, but the tasks used by async (Promise Tasks) do not.
Whereas the first creates a new Task on the first await and then everything from there is processed in the new Task?
Not at all. Tasks only complete once, and the method will not continue past the await until that task is complete. So, the task returned by Operation1Async has already completed before Operation2 is even called.
The 2 examples are not functionally equivalent so you would choose the one depending on your specific needs. In the first example the 3 tasks are executed sequentially, whereas in the second example the second and third tasks are running in parallel without waiting for their result to complete. Also in the second example the DoWork method could return before the second and third tasks have completed.
If you want to ensure that the tasks have completed before leaving the DoWork method body you might need to do this:
public async Task DoWork()
{
await this.Operation1Async();
this.Operation2().GetAwaiter().GetResult();
this.Operation3().GetAwaiter().GetResult();
}
which of course is absolutely terrible and you should never be doing it as it is blocking the main thread in which case you go with the first example. If those tasks use I/O completion ports then you should definitely take advantage of them instead of blocking the main thread.
If on the other hand you are asking whether you should make Operation2 and Operation3 asynchronous, then the answer is this: If they are doing I/O bound stuff where you can take advantage of I/O Completion Ports then you should absolutely make them async and go with the first approach. If they are CPU bound operations where you cannot use IOCP then it might be better to leave them synchronous because it wouldn't make any sense to execute this CPU bound operations in a separate task which you would block for anyway.
The problem with the first approach is that it is creating a new Task for each await call? Which entails a new thread?
This is your misunderstanding, which is leading to you to be suspicious of the code in the first example.
A Task does not entail a new thread. A Task certainly can be run on a new thread if you want to do so, but an important use of tasks is when a task directly or indirectly works through asynchronous i/o, which happens when the task, or one that it in turn awaits on, uses async i/o to access files or network streams (e.g. web or database access) allowing a pooled thread to be returned to the pool until that i/o has completed.
As such if the task does not complete immediately (which may happen if e.g. its purpose could be fulfilled entirely from currently-filled buffers) the thread currently running it can be returned to the pool and can be used to do something else in the meantime. When the i/o completes then another thread from the pool can take over and complete that waiting, which can then finish the waiting in a task waiting on that, and so on.
As such the first example in your question allows for fewer threads being used in total, especially when other work will also being using threads from the same pool.
In the second example once the first await has completed the thread that handled its completion will be blocking on the synchronous equivalent methods. If other operations also need to use threads from the pool then that thread not being returned to it, fresh threads will have to be spun up. As such the second example is the example that will need more threads.
One is not better than the other, they do different things.
In the first example, each operation is scheduled and performed on a thread, represented by a Task. Note: It's not guaranteed what thread they happen on.
The await keyword means (loosely) "wait until this asynchronous operation has finished and then continue". The continuation, is not necessarily done on the same thread either.
This means example one, is a synchronous processing of asynchronous operations. Now just because a Task is created, it doesn't infer a Thread is also created, there is a pool of threads the TaskScheduler uses, which have already been created, very minimal overhead is actually introduced.
In your second example, the await will call the first operation using the scheduler, then call the next two as normal. No Task or Thread is created for the second two calls, nor is it calling methods on a Task.
In the first example, you can also look into making your asynchronous calls simultaneous. Which will schedule all three operations to run "at the same time" (not guaranteed), and wait until they have all finished executing.
public async Task DoWork()
{
var o1 = this.Operation1Async();
var o2 = this.Operation2Async();
var o3 = this.Operation3Async();
await Task.WhenAll(o1, o2, o3);
}

The different use of Task Parallel library

I saw few people call function using syntax like:
Parallel.Invoke(() => Method1(yourString1),() => Method2(youString2));
And few people write code like:
Task myFirstTask = Task.Factory.StartNew(() => Method1(5));
Task mySecondTask = Task.Factory.StartNew(() => Method2("Hello"));
So my question is when one should use Parallel.Invoke() and when one should create instance of Task class and call StartNew() method.
Parallel.Invoke() looks very handy.so what is the significance of using Task class & StartNew() method.........put some light and tell me the importance of different approach for same kind of job means call two function parallel with two different syntax.
i never use before the Task Parallel library. so there could be some hidden reason for using two approach for calling function. please guide me in detail. thanks
Well, Parallel.Invoke will block until both of the new tasks have completed.
The second approach will start two new tasks, but not wait for them to be completed. You could await them manually, or in C# 5 the new async/await feature will help you "wait" asynchronously.
It really depends what you want to do. If you want your thread to block until all the tasks have finished, Parallel.Invoke is handy.

Categories