Is it safe not to wait for all tasks - c#

Say I have the following action method:
[HttpPost]
public async Task<IActionResult> PostCall()
{
var tasks = new List<Task<bool>>();
for (int i = 0; i < 10; i++)
tasks.Add(Manager.SomeMethodAsync(i));
// Is this line necessary to ensure that all tasks will finish successfully?
Task.WaitAll(tasks.ToArray());
if (tasks.Exists(x => x.Result))
return new ObjectResult("At least one task returned true");
else
return new ObjectResult("No tasks returned true");
}
Is Task.WaitAll(tasks.ToArray()) necessary to ensure that all tasks will finish successfully? Will the tasks whose Result happened not to get accessed by the Exists finish their execution in the background successfully? Or is there a chance that some of the tasks (that weren't waited for) get dropped since they would not be attached to the request? Is there a better implementation I'm missing?

Under your provided implementation, the Task.WaitAll call blocks the calling thread until all tasks have completed. It would only proceed to the next line and perform the Exists check after this has happened. If you remove the Task.WaitAll, then the Exists check would cause the calling thread to block on each task in order; i.e. it first blocks on tasks[0]; if this returns false, then it would block on tasks[1], then tasks[2], and so on. This is not desirable since it doesn't allow for your method to finish early if the tasks complete out of order.
If you only need to wait until whichever task returns true first, then you could use Task.WhenAny. This will make your asynchronous method resume as soon as any task completes. You can then check whether it evaluated to true and return success immediately; otherwise, you keep repeating the process for the remaining collection of tasks until there are none left.
If your code was running as an application (WPF, WinForms, Console), then the remaining tasks would continue running on the thread pool until completion, unless the application is shut down. Thread-pool threads are background threads, so they won't keep the process alive if all foreground threads have terminated (e.g. because all windows were closed).
Since you're running a web app, you incur the risk of having your app pool recycled before the tasks have completed. The unawaited tasks are fire-and-forget and therefore untracked by the runtime. To prevent this from happening, you can register them with the runtime through the HostingEnvironment.QueueBackgroundWorkItem method, as suggested in the comments.
[HttpPost]
public async Task<IActionResult> PostCall()
{
var tasks = Enumerable
.Range(0, 10)
.Select(Manager.SomeMethodAsync)
.ToList();
foreach (var task in tasks)
HostingEnvironment.QueueBackgroundWorkItem(_ => task);
while (tasks.Any())
{
var readyTask = await Task.WhenAny(tasks);
tasks.Remove(readyTask);
if (await readyTask)
return new ObjectResult("At least one task returned true");
}
return new ObjectResult("No tasks returned true");
}

Yes, the tasks are not guaranteed to complete unless something waits for them (with something like an await)
In your case, the main change you should make is making the Task.WaitAll
await Task.WhenAll(tasks);
So it is actually asynchronous. If you just want to wait for a task to return, use WhenAny instead.

Related

Execute many long-running tasks with the "TAP" design pattern

I'm currently developing a system where I'll need to connect a couple of clients to a server, which means that I will need to run a background task for each client. The last project I built was with APM, but I am now trying out to build everything around the new and better TAP.
My question is, how do I run many long-running asynchronous functions within a synchronous function? I know that I could use Task.Run(), but it feels like there's a better way. If I just try to run the function as it is, the warning ...
"Because this call is not awaited, execution of the current method continues before the call is completed."
... appears, which means that I'm doing something wrong.. or do I? What is the most efficient and correct way to make all of the clients run at the same time?
class AsyncClient
{
public AsyncClient()
{
...
}
public async Task RunAsync(IPAddress address, int port)
{
... waiting for data
}
}
static void Main(string[] args)
{
List<AsyncClient> clients = new <AsyncClient>();
clients.Add(new AsyncClient());
clients.Add(new AsyncClient());
clients.Add(new AsyncClient());
foreach (var c in clients)
{
// What is the best way to start every async tasks?
c.RunAsync("127.0.0.1", "8080");
// ^ This gives the warning "Because this call is not awaited,
// execution of the current method continues before the call is completed."
}
}
Thanks!
First you should change your Main method to be async:
static async Task Main(string[] args)
Then you can await the asynchronous operations.
To allow them to run in parallel, you can make use of LINQ Select:
IEnumerable<Task> tasks = clients.Select(c => c.RunAsync("127.0.0.1", "8080"));
await Task.WhenAll(tasks);
Task.WhenAll returns a new Task that completes when all the provided Tasks have completed.
Without awaiting the Tasks, there is a good chance that your Main method will complete, and hence the program will exit, before the Tasks have competed,
So you have a non async method, and in this non-async method you want to call async methods.
Usually a method is async, because somewhere deep inside your thread has to wait for another lengthy process to finish. Think of a file to be written, a database query to be executed, or some information to be fetched from the internet. Those are typically functions where you'll find async methods next to the non-async methods.
Instead of waiting idly for the other process to finish its task, the caller of the method receives control to do other things, until it sees an await. Control is given to the caller until it sees an await etc.
So if you want to do other things while the other process is executing its task: simply don't await. The problem is of course: you want to know the result of the other task, before your function exits. If you don't if will be hard to define the post condition of your method.
void MyMethod()
{
Task<int> taskA = MethodAasync(...);
// you didn't await, you are free to do something else, like calling another async method
Task<double> taskB = MethodBasync(...);
DoSomethingUseful();
// if here, you need the result of taskA. Wait until it is ready
// this blocks your thread!
taskA.Wait();
int resultA = taskA.Result();
ProcessResult(resultA);
// if desired, you can wait for a collection of tasks:
Task[] tasksToWaitFor = new Task[] {taskA, taskB};
Task.WaitAll(tasksToWaitFor);
int resultA = taskA.Result();
double resultB = taskB.Result();
ProcessResults(resultA, resultB);
}
Even if you are not interested in the result of the tasks, it is wise to wait for them to finish. This allows you to react on exceptions.
By the way, did you see that I did not call Task.Run! What happens, is that my thread enters MethodAasync until it sees an await. Then the procedure gets back control, so it enters MethodBasync until is sees an await. Your procedure gets back control to DoSomethingUseful.
As soon as the other process (database query, write file, etc) is finished, one of the threads of the thread pool continues processing the statements after the await, until it meets a new await, or until there is nothing more to process.
Task.Wait and Task.WaitAll are the methods that stop this asynchronousness: the thread will really block until all async methods are completely finished.
There is seldom a reason to use Task.Run if you want to call an async method: simply call it, do not wait for it, so you can do other useful stuff. Make sure you Wait for the task to finish as soon as you need the result, or at the latest when you return the method.
Another method would be to return the tasks without waiting for them to finish, to give your caller the opportunity to do something useful as long as the tasks are not completed. Of course this can only be done if your procedure doesn't need the result of the task. It also obliges your caller to wait for completion, or pass the tasks to his caller.
The only reason to Task.Run that I can see, is that you want to start a lengthy procedure within your own process, that you don't want to wait for right now. Think of doing a lengthy calculations. Don't use Task.Run if another process is involved. In that case the other process should have an async function, or you should create an async extension method that does the task.Run.
int DoSomeLengthyCalculations(...) {...};
async Task<MyResult> CalculateIt(...)
{
Task<int> taskLengthyCalculations = Task.Run( () => DoSomeLengthyCalculations(...);
// if desired DoSomethingUsefull; after that wait for the task to end
// and process the result:
Task.Wait(taskLengthyCalculations);
int resultLengthyCalculations = taskLengthyCalucalations.Result();
MyResult result = ProcessResult(resultLengthyCalculations);
return result;
}
The nice thing is that you've hidden whether you are doing the lengthy calculations, or that someone else is doing it. For instance if you are unit testing methods that async access a database, you can mock this while accessing a Dictionary instead.
}

Running multipe Task<> in an enterprise application in a safe way

I'm designing the software architecture for a product who can instantiate a series of "agents" doing some useful things.
Let's say each agent implement an interface having a function:
Task AsyncRun(CancellationToken token)
Because since these agents are doing a lot of I/O it could make some sense having as an async function. More over, the AsyncRun is supposed never complete, if no exception or explict cancellation occour.
Now the question is: main program has to run this on multiple agents, I would like to know the correct way of running that multiple task, signal each single completion ( that are due to cancellation/errors ):
for example I'm thinking on something like having an infinite loop like this
//.... all task cretaed are in the array tasks..
while(true)
{
await Task.WhenAny(tasks)
//.... check each single task for understand which one(s) exited
// re-run the task if requested replacing in the array tasks
}
but not sure if it is the correct ( or even best way )
And moreover I would like to know if this is the correct pattern, especially because the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
// re-run the task if requested replacing in the array tasks
This is the first thing I'd consider changing. It's far better to not let an application handle its own "restarting". If an operation failed, then there's no guarantee that an application can recover. This is true for any kind of operation in any language/runtime.
A better solution is to let another application restart this one. Allow the exception to propagate (logging it if possible), and allow it to terminate the application. Then have your "manager" process (literally a separate executable process) restart as necessary. This is the way all modern high-availability systems work, from the Win32 services manager, to ASP.NET, to the Kubernetes container manager, to the Azure Functions runtime.
Note that if you do want to take this route, it may make sense to split up the tasks to different processes, so they can be restarted independently. That way a restart in one won't cause a restart in others.
However, if you want to keep all your tasks in the same process, then the solution you have is fine. If you have a known number of tasks at the beginning of the process, and that number won't change (unless they fail), then you can simplify the code a bit by factoring out the restarting and using Task.WhenAll instead of Task.WhenAny:
async Task RunAsync(Func<CancellationToken, Task> work, CancellationToken token)
{
while (true)
{
try { await work(token); }
catch
{
// log...
}
if (we-should-not-restart)
break;
}
}
List<Func<CancellationToken, Task>> workToDo = ...;
var tasks = workToDo.Select(work => RunAsync(work, token));
await Task.WhenAll(tasks);
// Only gets here if they all complete/fail and were not restarted.
the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
The best way to prevent this is to wrap the call in Task.Run, so this:
await work(token);
becomes this:
await Task.Run(() => work(token));
In order to know whether the task completes successfully, or is cancelled or faulted, you could use a continuation. The continuation will be invoked as soon as the task finishes, whether that's because of failure, cancellation or completion. :
using (var tokenSource = new CancellationTokenSource())
{
IEnumerable<IAgent> agents; // TODO: initialize
var tasks = new List<Task>();
foreach (var agent in agents)
{
var task = agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
if (t.IsCanceled)
{
// Do something if cancelled.
}
else if (t.IsFaulted)
{
// Do something if faulted (with t.Exception)
}
else
{
// Do something if the task has completed.
}
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
In the end you will wait for the continued tasks. Also see this answer.
If you are afraid that the IAgent implementations will create blocking calls and want to prevent the application from hanging, you can wrap the call to the async method in Task.Run. This way the call to the agent is executed on the threadpool and is therefore non-blocking:
var task = Task.Run(async () =>
await agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
// Same as above
}));
You may want to use Task.Factory.StartNew instead to mark the task as longrunning for example.

How to make sure a task is started and safely start it if not?

I get an IEnumerable<Task> tasks from somewhere that I do not control. I don't know if the tasks are manually created using new Task, Task.Run, or if they are a result of an async method call async Task DoSomethingAsync().
If I do await Task.WhenAll(tasks), I risk hanging indefinitely because maybe one or more of the tasks are not started.
I can't do tasks.ForEach(t => t.Start()), because then I will get an InvalidOperationException "Start may not be called on a promise-style task" if it's from an async method call (already started).
I can't do await Task.WhenAll(tasks.Select(t => Task.Run(async () => await t))) because each t still does not start just by awaiting it.
I assume the solution has something to do with checking each task's Status and Start() based on that, but I also assume that it can be tricky because that status could change at any time, right? If this is still the way to go, which statuses would be correct to check and what threading issues should I worry about?
Non working case example:
//making an IEnumerable as an example, remember I don't control this part
Task t = new Task( () => Console.WriteLine("started"));
IEnumerable<Task> tasks = new[] {t};
//here I receive the tasks
await Task.WhenAll(tasks);//waits forever because t is not started
Working case example:
//calls the async function, starting it.
Task t = DoSomethingAsync();
IEnumerable<Task> tasks = new[] {t};
//here I receive the tasks and it will complete because the task is already started
await Task.WhenAll(tasks);
async Task DoSomethingAsync() => Console.WriteLine("started");
If for whatever reason you cannot change the code to not return unstarted tasks, you can check Status and start task if it has Created status:
if (task.Status == TaskStatus.Created)
task.Start();
All other task statues indicate that task is either completed, running, or being scheduled, so you don't need to start tasks in that statuses.
Of course in theory this introduces race condition, because task can be started right between your check and Start call, but, as correctly pointed by Servy in comments - if there ever is race condition here - that means another party (which created that task) is also trying to start it. Even if you handle exception (InvalidOperationException) - another party is unlikely to do that, and so will get exception while trying to start their own task. So only one side (either you, or code that created that task) should be trying to start it.
That said - much better than doing this is to ensure you might never get unstarted task in the first place, because it's just bad design to return such tasks to external code, at least without explicitly indicating that (while it's for some use cases ok to use unstarted task internally).

How should I be using the TPL if I want to do implement this threading concept?

Let's say I have a method like SaveAsync(Item item) and I need to call it on 10 Items and the calls are independent of one another. I imagine the ideal way in terms of threading is like
Thread A | Run `SaveAsync(item1)` until we hit the `await` | ---- ... ---- | Run `SaveAsync(item10)` until we hit the `await` | ---------------------------------------|
Thread B | --------------------------------------------------- | Run the stuff after the `await` in `SaveAsync(item1)` | ------------------ ... -----------------------|
Thread C | ------------------------------------------------------ | Run the stuff after the `await` in `SaveAsync(item2)` | ------------------ ... --------------------|
.
.
.
(with it being possible that some of the stuff after the await for multiple items is run in the same thread, perhaps even Thread A)
I'm wondering how to write that in C#? Is it a parallel foreach or a loop with with await SaveAsync(item) or what?
Per default async tasks will always return to the thread context they were started on. You can change this by adding
await task.ConfigureAwait(false)
This allows tells the runtime that you do not care on which thread context the task will resume and the runtime can omit the capture of the current thread context (which is quite costly).
However per default you will always be scheduled on the thread context that started the task.
There are a fewer default contexts, such as the ui thread context or the thread pool context. A task started on the ui thread context will be scheduled back to the ui thread context.
A tasks started on the thread pool context will be scheduled to the next free thread from the pool. Not necessarily the same thread the task was started on.
However you can provide your own context if you need more control over the task scheduling.
How to start multiple task in a fashion as you described above. A loop will not help here. Lets take this example.
foreach(var item in items)
{
await SaveAsync(item);
}
The await here will wait until the SaveAsync finishes. So all saves are processed in sequence.
How to save truly asynchronous?
The trick is to start all tasks, but not await them, until all tasks are started. You then wait all tasks with WhenAll(IEnumerable<Task>).
Here an example.
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync(item)); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
Because of the missing await, all "Save-Actions" are placed in the Async/Await state machine. As soon as the first task yields back, the second will be executed. This will result in a behavior somewhat similar to the one described in your question.
The only main difference here, is all tasks are executed in the same thread. This is most of time complete ok, because all Save methods usually need to access the same resources. Parallelizing them gives no real advantage, because the bottleneck is this resource.
How to use mutliple threads
You can execute a task on a new thread by using
Task.Run(SaveAsync(item));
This will execute the thread on a new thread taken from the thread pool, but there is no wait to start a new thread and finish the method on the ui thread.
To execute all items on different thread, you can use nearly the same code as before:
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(Task.Run(SaveAsync(item));); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
The only difference is here, that we take the taks returned form StartNew.
One remark: Using Task.Run does not guarantee you a new thread. It will execute the task on the next free thread from the thread pool. This depends on your local settings as well as the local configuration (e.g. a heavy barebone server will have a lot more threads than any consumer laptop).
Whether you get a new thread or you have to wait for any occupied thread to finish is completely up to the thread pool. (The tread pool usually does a really great job. For more info, here a really great article on the thread pool performance: CLR-Thread-Pool)
This is where people do most of the mistakes with async/await:
1) Either people think, that everything after calling async method, with/without awaiting, does translate to ThreadPool thread.
2) Or people think that async does run synchronously.
The truth is somewhere between and #Iqon's statement about next block of code is actually incorrect: "The only main difference here, is all tasks are executed in the same thread."
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync(item)); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
To make statement like this would suggest that the async method SaveAsync(item) is actually capable to execute fully and completely synchronously.
Here are examples:
async Task SaveAsync1(Item item)
{
//no awaiting at all
}
async Task SaveAsync2(Item item)
{
//awaiting already completed task
int i = await Task.FromResult(0);
}
Methods like these would really run synchronously on thread it the async task was executed on. But these kind async methods are special snowflakes. There is no operation awaited here, everything is commplete even when await is inside the method, because it does not await on first case and does await on completed task in second case, so it synchronously continues after await and these two calls would be same:
var taskA = SaveAsync2(item);//it would return task runned to completion
//same here, await wont happen as returned task was runned to completion
await SaveAsync2(item);
So, making statements, that executing async method here synchronously is correct only in this special case:
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync2(item));
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
And there is no need to store tasks and await Task.WhenAll(tasks), it is all already done and this would be enough:
foreach(var item in items)
{
SaveAsync2(item);
//it will execute synchronously because there is
//nothing to await for in the method
}
Now lets explore real case, an async method that actualy awaits something inside or spark awaitable operation:
async Task SaveAsync3(Item item)
{
//awaiting already completed task
int i = await Task.FromResult(0);
await Task.Delay(1000);
Console.WriteLine(i);
}
Now what would this do?
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync3(item));
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
Would it run synchronously? No!
Would it run concurrently in parallel? NO!
As I said at begining, the truth is somewhere between with async methods unless they are special snow flakes like SaveAsync1 and SaveAsync2.
So what the code did? It executed each SaveAsync3 synchronously up to await Task.Delay where it found the returned task is incomplete and returned back to caller and provided incomplete task which was stored to tasks and next SaveAsync was executed in same way.
Now await Task.WhenAll(tasks); has really meaning, because it is awaiting some incomplete operation which will run outside this thread context and in parallel.
All those parts of SaveAsync3 method after await Task.Delay will be scheduled to ThreadPool and will run in parallel, unless special case like UI thread context and in that case ConfigureAwait(false) after TaskDelay would be needed.
Hope you guys understand, what I want to say. You can not really say how async method will run unless you have more information about it or code.
This exercise also opens question, when to Task.Run on async method.
It is often missused and I think, that there are really just 2 main cases:
1) When you want break from current threads context, like UI, ASP.NET etc
2) When async method has synchronous part(up to first incomplete await) which is computationally intensive and you want to offload it as well, not just the incomplete await part. The case would be if SaveAsync3 would be computing the variable i for long time, let's say Fibonacci :).
For example you do not have to use Task.Run on something like SaveAsync which would open file and save into it something asynchronously, unless the synchronous part before first await inside SaveAsync is an issue, taks time. Then Task.Run is in order as is part of case 2).

Adding items to a list from asynchronous method

I need to return items from partitions in service fabric and add them to a list. The results come from async methods. I try to understand what is happening to get it to run faster. Does the loop wait until await is returned for each GetItems, or does the loop continue and start a new GetItems for the next partition?
List<string> mylist = new List<string>();
foreach(var partition in partitions)
{
var int64RangePartitionInformation = p.PartitionInformation as Int64RangePartitionInformation;
if (int64RangePartitionInformation == null) continue;
var minKey = int64RangePartitionInformation.LowKey;
var assetclient = ServiceProxy.Create<IService>(serviceName, new ServicePartitionKey(minKey));
var assets = await assetclient.GetItems(CancellationToken.None);
mylist.AddRange(items)
}
The await keyword tells the compiler to refactor your method to a state machine. Once your async method is called, everything before the first await will be executed. The rest of the tasks will be registered for a delayed execution, and depending of the current configuration can be executed immediately and synchronously or on another thread.
If the awaited task is executed asynchronously, the actual method call returns, so the thread is free to do anything else, eg. refresh the UI. This refactored state machine-like method is then called again and again, polls whether the awaited task is finished. Once it is finished, the state is switched so the codes after the awaited line will be executed and so on.
So logically yes, the loop waits until the results are there, but in reality the thread is not blocked because of the state machine-like behavior mentioned above.
See a detailed explanation here.
Update:
If the results can be obtained parallelly from all partitions, do not await them one by one. Instead, use Parallel.ForEach or just populate a Task collection in your foreach loop, and finally await them in one step:
await Task.WhenAll(myTasks);
await is an "asynchronous wait". It pauses the current method and (asynchronously) waits for that operation to complete before continuing.
Note that the method is paused, not the thread. That's what makes await an "asynchronous wait" instead of a regular wait ("synchronous wait").
For more info, see my async/await intro.
Does the loop wait until await is returned for each GetItems, or does the loop continue and start a new GetItems for the next partition?
At each await, the loop is (asynchronously) waiting. So with the current code, only one call to the service is being done at a time, and the list does not have to deal with concurrent AddRange calls.

Categories