How to run multiple threads on ASP.NET MVC efficiently? - c#

I've a scheduled task in my ASP.NET MVC site that nightly sends notifications for the users.
I've been sending and awaiting notifications one by one, but it's taking an hour to complete now, so I thought I should send them in batches.
int counter = 0;
List<Task> tasks = new List<Task>();
foreach (var user in users)
{
tasks.Add(Task.Run(async () =>
{
await user.SendNotificationAsync();
counter++;
}));
if (tasks.Count >= 20)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
if(tasks.Any())
{
await Task.WhenAll(tasks);
tasks.Clear();
}
But I've read that creating multiple threads is not efficient on the servers. How should I run multiple instances of the method on the server?

Because you are not following the best practices on TPL, here's a rewrite on how you should do it:
List<Task> tasks = new List<Task>();
int counter = 0; // not sure what this is for
foreach (var user in users)
{
tasks.Add(user.SendNotificationAsync()); // do not create a wrapping task
counter++; // not sure about this either
// it won't ever be greater than 20
if (tasks.Count == 20)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
if (tasks.Any())
{
await Task.WhenAll(tasks);
tasks.Clear();
}
This is perfectly fine, also, because threads will be spawned and destroyed as soon as they are done.

Just to shed light on what Camilo meant by his example was that, in your example, you were creating a new Task to monitor your awaitable task. So, essentially, you were not only creating twice the number of tasks needed, you were also chaining them up - a Task to monitor a Task, where the middle task is just a proxy which will be picked up from the Threadpool just to monitor another task from the Threadpool.
As the user.SendNotificationAsync() is an awaitable task anyway, you can directly add it to the List<Task> - tasks and await directly on it.
Hence his example.

Related

How to synchronize the recurrent execution of three tasks that depend on each other?

I would like to ask expert developers in C#. I have three recurrent tasks that my program needs to do. Task 2 depends on task 1 and task 3 depends on task 2, but task 1 doesn't need to wait for the other two tasks to finish in order to start again (the program is continuously running). Since each task takes some time, I would like to run each task in one thread or a C# Task. Once task 1 finishes task 2 starts and task 1 starts again ... etc.
I'm not sure what is the best way to implement this. I hope someone can guide me on this.
One way to achieve this is using something called the the Task Parallel Library. This provides a set of classes that allow you to arrange your tasks into "blocks". You create a method that does A, B and C sequentially, then TPL will take care of running multiple invocations of that method simultaneously. Here's a small example:
async Task Main()
{
var actionBlock = new ActionBlock<int>(DoTasksAsync, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2 // This is the number of simultaneous executions of DoTasksAsync that will be run
};
await actionBlock.SendAsync(1);
await actionBlock.SendAsync(2);
actionBlock.Complete();
await actionBlock.Completion;
}
async Task DoTasksAsync(int input)
{
await DoTaskAAsync();
await DoTaskBAsync();
await DoTaskCAsync();
}
I would probably use some kind of queue pattern.
I am not sure what the requirements for if task 1 is threadsafe or not, so I will keep it simple:
Task 1 is always executing. As soon as it finished, it posts a message on some queue and starts over.
Task 2 is listening to the queue. Whenever a message is available, it starts working on it.
Whenever task 2 finishes working, it calls task 3, so that it can do it's work.
As one of the comments mentioned, you should probably be able to use async/await successfully in your code. Especially between task 2 and 3. Note that task 1 can be run in parallel to task 2 and 3, since it is not dependent on any of the other task.
You could use the ParallelLoop method below. This method starts an asynchronous workflow, where the three tasks are invoked in parallel to each other, but sequentially to themselves. So you don't need to add synchronization inside each task, unless some task produces global side-effects that are visible from some other task.
The tasks are invoked on the ThreadPool, with the Task.Run method.
/// <summary>
/// Invokes three actions repeatedly in parallel on the ThreadPool, with the
/// action2 depending on the action1, and the action3 depending on the action2.
/// Each action is invoked sequentially to itself.
/// </summary>
public static async Task ParallelLoop<TResult1, TResult2>(
Func<TResult1> action1,
Func<TResult1, TResult2> action2,
Action<TResult2> action3,
CancellationToken cancellationToken = default)
{
// Arguments validation omitted
var task1 = Task.FromResult<TResult1>(default);
var task2 = Task.FromResult<TResult2>(default);
var task3 = Task.CompletedTask;
try
{
int counter = 0;
while (true)
{
counter++;
var result1 = await task1.ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
task1 = Task.Run(action1); // Restart the task1
if (counter <= 1) continue; // In the first loop result1 is undefined
var result2 = await task2.ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
task2 = Task.Run(() => action2(result1)); // Restart the task2
if (counter <= 2) continue; // In the second loop result2 is undefined
await task3.ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
task3 = Task.Run(() => action3(result2)); // Restart the task3
}
}
finally
{
// Prevent fire-and-forget
Task allTasks = Task.WhenAll(task1, task2, task3);
try { await allTasks.ConfigureAwait(false); } catch { allTasks.Wait(); }
// Propagate all errors in an AggregateException
}
}
There is an obvious pattern in the implementation, that makes it trivial to add overloads having more than three actions. Each added action will require its own generic type parameter (TResult3, TResult4 etc).
Usage example:
var cts = new CancellationTokenSource();
Task loopTask = ParallelLoop(() =>
{
// First task
Thread.Sleep(1000); // Simulates synchronous work
return "OK"; // The result that is passed to the second task
}, result =>
{
// Second task
Thread.Sleep(1000); // Simulates synchronous work
return result + "!"; // The result that is passed to the third task
}, result =>
{
// Third task
Thread.Sleep(1000); // Simulates synchronous work
}, cts.Token);
In case any of the tasks fails, the whole loop will stop (with the loopTask.Exception containing the error). Since the tasks depend on each other, recovering from a single failed task is not possible¹. What you could do is to execute the whole loop through a Polly Retry policy, to make sure that the loop will be reincarnated in case of failure. If you are unfamiliar with the Polly library, you could use the simple and featureless RetryUntilCanceled method below:
public static async Task RetryUntilCanceled(Func<Task> action,
CancellationToken cancellationToken)
{
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
try { await action().ConfigureAwait(false); }
catch { if (cancellationToken.IsCancellationRequested) throw; }
}
}
Usage:
Task loopTask = RetryUntilCanceled(() => ParallelLoop(() =>
{
//...
}, cts.Token), cts.Token);
Before exiting the process you are advised to Cancel() the CancellationTokenSource and Wait() (or await) the loopTask, in order for the loop to terminate gracefully. Otherwise some tasks may be aborted in the middle of their work.
¹ It is actually possible, and probably preferable, to execute each individual task through a Polly Retry policy. The parallel loop will be suspended until the failed task is retried successfully.

Task.delay inside Task.Run

Here I have the following piece of code:
var tasks = new List<Task>();
var stopwatch = new Stopwatch();
for (var i = 0; i < 100; i++)
{
var person = new Person { Id = i };
list.Add(person);
}
stopwatch.Start();
foreach (var item in list)
{
var task = Task.Run(async () =>
{
await Task.Delay(1000);
Console.WriteLine("Hi");
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
stopwatch.Stop();
I assume that I will have about 100 seconds in the result of the stopwatch.
But I have 1,1092223.
I think I missing something, can you help me to explain why?
I assume that your confusion might come from the await keyword in await Task.Delay(1000);
But this holds only for the innerworking of the taskmethod. Inside the loop the next iteration will be performed immidiately after Task.Run is executed. So all Tasks will be started in close succession and then run in parallel. (As far as the system has free threads at hand of course) The system takes care how, when and in which order they can be executed.
In the end in this line:
await Task.WhenAll(tasks);
you actually wait for the slowest of them (or the one started as last).
To fullfill your expectation your code should actually look like this:
public async Task RunAsPseudoParallel()
{
List<Person> list = new List<Person>();
var stopwatch = new Stopwatch();
for (var i = 0; i < 100; i++)
{
var person = new Person { Id = i };
list.Add(person);
}
stopwatch.Start();
foreach (var item in list)
{
await Task.Run(async () =>
{
await Task.Delay(1000);
Console.WriteLine("Hi");
});
}
stopwatch.Stop()
}
Disclaimer: But this code is quite nonsensical, because it uses async functionality to implement a synchronous process. In this scenario you can simply leave out the Task.Run call and use a simple Thread.Sleep(1000).
Delays are always approximate.
You are limited by when the task scheduler chooses to run the delegate you pass to Task.Run. It may be executing other tasks and be unwilling to start up more threads. Or, it may launch a new thread -- which while not slow is also not free and costs time too.
You are limited by when the task scheduler chooses to resume your code after the delay completes.
You're also limited by the OS scheduler, which may be allocating CPU time to other processes/threads and end up delaying the thread that would execute your code.
Because you are launching multiple tasks, you are seeing all of these per-task variables compound into an even larger delay.

How to limit number of async IO tasks to database?

I have a list of id's and I want to get data for each of those id in parallel from database. My below ExecuteAsync method is called at very high throughput and for each request we have around 500 ids for which I need to extract data.
So I have got below code where I am looping around list of ids and making async calls for each of those id in parallel and it works fine.
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
{
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
var excludeNull = new List<T>(ids.Count);
for (int i = 0; i < responses.Length; i++)
{
var response = responses[i];
if (response != null)
{
excludeNull.Add(response);
}
}
return excludeNull;
}
private async Task<T> Execute<T>(IPollyPolicy policy,
Func<CancellationToken, Task<T>> requestExecuter) where T : class
{
var response = await policy.Policy.ExecuteAndCaptureAsync(
ct => requestExecuter(ct), CancellationToken.None);
if (response.Outcome == OutcomeType.Failure)
{
if (response.FinalException != null)
{
// log error
throw response.FinalException;
}
}
return response?.Result;
}
Question:
Now as you can see I am looping all ids and making bunch of async calls to database in parallel for each id which can put lot of load on database (depending on how many request is coming). So I want to limit the number of async calls we are making to database. I modified ExecuteAsync to use Semaphore as shown below but it doesn't look like it does what I want it to do:
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var throttler = new SemaphoreSlim(250);
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
finally
{
throttler.Release();
}
}
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
// same excludeNull code check here
return excludeNull;
}
Does Semaphore works on Threads or Tasks? Reading it here looks like Semaphore is for Threads and SemaphoreSlim is for tasks.
Is this correct? If yes then what's the best way to fix this and limit the number of async IO tasks we make to database here.
Task is an abstraction on threads, and doesn’t necessarily create a new thread. Semaphore limits the number of threads that can access that for loop. Execute returns a Task which aren’t threads. If there’s only 1 request, there will be only 1 thread inside that for loop, even if it is asking for 500 ids. The 1 thread sends off all the async IO tasks itself.
Sort of. I would not say that tasks are related to threads at all. There are actually two kinds of tasks: a delegate task (which is kind of an abstraction of a thread), and a promise task (which has nothing to do with threads).
Regarding the SemaphoreSlim, it does limit the concurrency of a block of code (not threads).
I recently started playing with C# so my understanding is not right looks like w.r.t Threads and Tasks.
I recommend reading my async intro and best practices. Follow up with There Is No Thread if you're interested more about how threads aren't really involved.
I modified ExecuteAsync to use Semaphore as shown below but it doesn't look like it does what I want it to do
The current code is only throttling the adding of the tasks to the list, which is only done one at a time anyway. What you want to do is throttle the execution itself:
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy, Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var throttler = new SemaphoreSlim(250);
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
tasks.Add(ThrottledExecute(ids[i]));
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
// same excludeNull code check here
return excludeNull;
async Task<T> ThrottledExecute(int id)
{
await throttler.WaitAsync().ConfigureAwait(false);
try {
return await Execute(policy, ct => mapper(ct, id)).ConfigureAwait(false);
} finally {
throttler.Release();
}
}
}
Your colleague has probably in mind the Semaphore class, which is indeed a thread-centric throttler, with no asynchronous capabilities.
Limits the number of threads that can access a resource or pool of resources concurrently.
The SemaphoreSlim class is a lightweight alternative to Semaphore, which includes the asynchronous method WaitAsync, that makes all the difference in the world. The WaitAsync doesn't block a thread, it blocks an asynchronous workflow. Asynchronous workflows are cheap (usually less than 1000 bytes each). You can have millions of them "running" concurrently at any given moment. This is not the case with threads, because of the 1 MB of memory that each thread reserves for its stack.
As for the ExecuteAsync method, here is how you could refactor it by using the LINQ methods Select, Where, ToArray and ToList:
Update: The Polly library supports capturing and continuing on the current synchronization context, so I added a bool executeOnCurrentContext
argument to the API. I also renamed the asynchronous Execute method to ExecuteAsync, to be in par with the guidelines.
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper,
int concurrencyLevel = 1, bool executeOnCurrentContext = false) where T : class
{
var throttler = new SemaphoreSlim(concurrencyLevel);
Task<T>[] tasks = ids.Select(async id =>
{
await throttler.WaitAsync().ConfigureAwait(executeOnCurrentContext);
try
{
return await ExecuteAsync(policy, ct => mapper(ct, id),
executeOnCurrentContext).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
}).ToArray();
T[] results = await Task.WhenAll(tasks).ConfigureAwait(false);
return results.Where(r => r != null).ToList();
}
private async Task<T> ExecuteAsync<T>(IPollyPolicy policy,
Func<CancellationToken, Task<T>> function,
bool executeOnCurrentContext = false) where T : class
{
var response = await policy.Policy.ExecuteAndCaptureAsync(
ct => executeOnCurrentContext ? function(ct) : Task.Run(() => function(ct)),
CancellationToken.None, continueOnCapturedContext: executeOnCurrentContext)
.ConfigureAwait(executeOnCurrentContext);
if (response.Outcome == OutcomeType.Failure)
{
if (response.FinalException != null)
{
ExceptionDispatchInfo.Throw(response.FinalException);
}
}
return response?.Result;
}
You are throttling the rate at which you add tasks to the list. You are not throttling the rate at which tasks are executed. To do that, you'd probably have to implement your semaphore calls inside the Execute method itself.
If you can't modify Execute, another way to do it is to poll for completed tasks, sort of like this:
for (int i = 0; i < ids.Count; i++)
{
var pendingCount = tasks.Count( t => !t.IsCompleted );
while (pendingCount >= 500) await Task.Yield();
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
await Task.WhenAll( tasks );
Actually the TPL is capable to control the task execution and limit the concurrency. You can test how many parallel tasks is suitable for your use-case. No need to think about threads, TPL will manage everything fine for you.
To use limited concurrency see this answer, credits to #panagiotis-kanavos
.Net TPL: Limited Concurrency Level Task scheduler with task priority?
The example code is (even using different priorities, you can strip that):
QueuedTaskScheduler qts = new QueuedTaskScheduler(TaskScheduler.Default,4);
TaskScheduler pri0 = qts.ActivateNewQueue(priority: 0);
TaskScheduler pri1 = qts.ActivateNewQueue(priority: 1);
Task.Factory.StartNew(()=>{ },
CancellationToken.None,
TaskCreationOptions.None,
pri0);
Just throw all your tasks to the queue and with Task.WhenAll you can wait till everything is done.

Multiple Async Calls with Pause Between Calls

I have an IEnumerable<Task>, where each Task will call the same endpoint. However, the endpoint can only handle so many calls per second. How can I put, say, a half second delay between each call?
I have tried adding Task.Delay(), but of course awaiting them simply means that the app waits a half second before sending all the calls at once.
Here is a code snippet:
var resultTasks = orders
.Select(async task =>
{
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
return result;
} );
var results = Task.WhenAll(resultTasks);
I feel like I should do something like
Task.WhenAll(resultTasks.EmitOverTime(500));
... but how exactly do I do that?
What you describe in your question is in other words rate limiting. You'd like to apply rate limiting policy to your client, because the API you use enforces such a policy on the server to protect itself from abuse.
While you could implement rate limiting yourself, I'd recommend you to go with some well established solution. Rate Limiter from Davis Desmaisons was the one that I picked at random and I instantly liked it. It had solid documentation, superior coverage and was easy to use. It is also available as NuGet package.
Check out the simple snippet below that demonstrates running semi-overlapping tasks in sequence while defering the task start by half a second after the immediately preceding task started. Each task lasts at least 750 ms.
using ComposableAsync;
using RateLimiter;
using System;
using System.Threading.Tasks;
namespace RateLimiterTest
{
class Program
{
static void Main(string[] args)
{
Log("Starting tasks ...");
var constraint = TimeLimiter.GetFromMaxCountByInterval(1, TimeSpan.FromSeconds(0.5));
var tasks = new[]
{
DoWorkAsync("Task1", constraint),
DoWorkAsync("Task2", constraint),
DoWorkAsync("Task3", constraint),
DoWorkAsync("Task4", constraint)
};
Task.WaitAll(tasks);
Log("All tasks finished.");
Console.ReadLine();
}
static void Log(string message)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff ") + message);
}
static async Task DoWorkAsync(string name, IDispatcher constraint)
{
await constraint;
Log(name + " started");
await Task.Delay(750);
Log(name + " finished");
}
}
}
Sample output:
10:03:27.121 Starting tasks ...
10:03:27.154 Task1 started
10:03:27.658 Task2 started
10:03:27.911 Task1 finished
10:03:28.160 Task3 started
10:03:28.410 Task2 finished
10:03:28.680 Task4 started
10:03:28.913 Task3 finished
10:03:29.443 Task4 finished
10:03:29.443 All tasks finished.
If you change the constraint to allow maximum two tasks per second (var constraint = TimeLimiter.GetFromMaxCountByInterval(2, TimeSpan.FromSeconds(1));), which is not the same as one per half a second, then the output could be like:
10:06:03.237 Starting tasks ...
10:06:03.264 Task1 started
10:06:03.268 Task2 started
10:06:04.026 Task2 finished
10:06:04.031 Task1 finished
10:06:04.275 Task3 started
10:06:04.276 Task4 started
10:06:05.032 Task4 finished
10:06:05.032 Task3 finished
10:06:05.033 All tasks finished.
Note that the current version of Rate Limiter targets .NETFramework 4.7.2+ or .NETStandard 2.0+.
This is just a thought, but another approach could be to create a queue and add another thread that runs polling the queue for calls that need to go out to your endpoint.
Have you considered just turning that into a foreach-loop with a Task.Delay call? You seem to want to explicitly call them sequentially, it won't hurt if that is obvious from your code.
var results = new List<YourResultType>;
foreach(var order in orders){
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
results.Add(result.Response);
}
catch(Exception ex)
{
result.Exception = ex;
}
}
Instead of selecting from orders you could loop over them, and inside the loop put the result into a list and then call Task.WhenAll.
Would look something like:
var resultTasks = new List<VendorTaskResult>(orders.Count);
orders.ToList().ForEach( item => {
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
resultTasks.Add(result);
Thread.Sleep(x);
});
var results = Task.WhenAll(resultTasks);
If you want to control the number of requests executed simultaneously, you have to use a semaphore.
I have something very similar, and it works fine with me. Please note that I call ToArray() after the Linq query finishes, that triggers the tasks:
using (HttpClient client = new HttpClient()) {
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
await Task.Delay(300);
return client.GetStringAsync(<url with variable job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
}
Now please note that this will create n nunmber of tasks, all in parallel, and the Task.Delay literally does nothing. If you want to call the pages synchronously (as it sounds by putting a delay between the calls), then this code may be better:
using (HttpClient client = new HttpClient()) {
foreach (string job in _group) {
await Task.Delay(300);
_pages.Add(await client.GetStringAsync(<url with variable job>));
}
}
The download of the pages is still asynchronous (while downloading other tasks are done), but each call to download the page is synchronous, ensuring that you can wait for one to finish in order to call the next one.
The code can be easily changed to call the pages asynchronously in chunks, like every 10 pages, wait 300ms, like in this sample:
IEnumerable<string[]> toParse = myData
.Select((v, i) => new { v.code, group = i / 20 })
.GroupBy(x => x.group)
.Select(g => g.Select(x => x.code).ToArray());
using (HttpClient client = new HttpClient()) {
foreach (string[] _group in toParse) {
string[] _pages = null;
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
return client.GetStringAsync(<url with job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
await Task.Delay(5000);
}
}
All this does is group your pages in chunks of 20, iterate through the chunks, download all pages of the chunk asynchronously, wait 5 seconds, move on to the next chunk.
I hope that is what you were waiting for :)
The proposed method EmitOverTime is doable, but only by blocking the current thread:
public static IEnumerable<Task<TResult>> EmitOverTime<TResult>(
this IEnumerable<Task<TResult>> tasks, int delay)
{
foreach (var item in tasks)
{
Thread.Sleep(delay); // Delay by blocking
yield return item;
}
}
Usage:
var results = await Task.WhenAll(resultTasks.EmitOverTime(500));
Probably better is to create a variant of Task.WhenAll that accepts a delay argument, and delays asyncronously:
public static async Task<TResult[]> WhenAllWithDelay<TResult>(
IEnumerable<Task<TResult>> tasks, int delay)
{
var tasksList = new List<Task<TResult>>();
foreach (var task in tasks)
{
await Task.Delay(delay).ConfigureAwait(false);
tasksList.Add(task);
}
return await Task.WhenAll(tasksList).ConfigureAwait(false);
}
Usage:
var results = await WhenAllWithDelay(resultTasks, 500);
This design implies that the enumerable of tasks should be enumerated only once. It is easy to forget this during development, and start enumerating it again, spawning a new set of tasks. For this reason I propose to make it an OnlyOnce enumerable, as it is shown in this question.
Update: I should mention why the above methods work, and under what premise. The premise is that the supplied IEnumerable<Task<TResult>> is deferred, in other words non-materialized. At the method's start there are no tasks created yet. The tasks are created one after the other during the enumeration of the enumerable, and the trick is that the enumeration is slow and controlled. The delay inside the loop ensures that the tasks are not created all at once. They are created hot (in other words already started), so at the time the last task has been created some of the first tasks may have already been completed. The materialized list of half-running/half-completed tasks is then passed to Task.WhenAll, that waits for all to complete asynchronously.

Does async/await inside a loop create a bottleneck?

Lets say i have the following code for example:
private async Task ManageClients()
{
for (int i =0; i < listClients.Count; i++)
{
if (list[i] == 0)
await DoSomethingWithClientAsync();
else
await DoOtherThingAsync();
}
DoOtherWork();
}
My questions are:
1. Will the for() continue and process other clients on the list?, or it
will await untill it finishes one of the tasks.
2. Is even a good practice to use async/await inside a loop?
3. Can it be done in a better way?
I know it was a really simple example, but I'm trying to imagine what would happen if that code was a server with thousands of clients.
In your code example, the code will "block" when the loop reaches await, meaning the other clients will not be processed until the first one is complete. That is because, while the code uses asynchronous calls, it was written using a synchronous logic mindset.
An asynchronous approach should look more like this:
private async Task ManageClients()
{
var tasks = listClients.Select( client => DoSomethingWithClient() );
await Task.WhenAll(tasks);
DoOtherWork();
}
Notice there is only one await, which simultaneously awaits all of the clients, and allows them to complete in any order. This avoids the situation where the loop is blocked waiting for the first client.
If a thread that is executing an async function is calling another async function, this other function is executed as if it was not async until it sees a call to a third async function. This third async function is executed also as if it was not async.
This goes on, until the thread sees an await.
Instead of really doing nothing, the thread goes up the call stack, to see if the caller was not awaiting for the result of the called function. If not, the thread continues the statements in the caller function until it sees an await. The thread goes up the call stack again to see if it can continue there.
This can be seen in the following code:
var taskDoSomething = DoSomethingAsync(...);
// because we are not awaiting, the following is done as soon as DoSomethingAsync has to await:
DoSomethingElse();
// from here we need the result from DoSomethingAsync. await for it:
var someResult = await taskDoSomething;
You can even call several sub-procedures without awaiting:
var taskDoSomething = DoSomethingAsync(...);
var taskDoSomethingElse = DoSomethingElseAsync(...);
// we are here both tasks are awaiting
DoSomethingElse();
Once you need the results of the tasks, if depends what you want to do with them. Can you continue processing if one task is completed but the other is not?
var someResult = await taskDoSomething;
ProcessResult(someResult);
var someOtherResult = await taskDoSomethingelse;
ProcessBothResults(someResult, someOtherResult);
If you need the result of all tasks before you can continue, use Task.WhenAll:
Task[] allTasks = new Task[] {taskDoSomething, taskDoSomethingElse);
await Task.WhenAll(allTasks);
var someResult = taskDoSomething.Result;
var someOtherResult = taskDoSomethingElse.Result;
ProcessBothResults(someResult, someOtherResult);
Back to your question
If you have a sequence of items where you need to start awaitable tasks, it depends on whether the tasks need the result of other tasks or not. In other words can task[2] start if task[1] has not been completed yet? Do Task[1] and Task[2] interfere with each other if they run both at the same time?
If they are independent, then start all Tasks without awaiting. Then use Task.WhenAll to wait until all are finished. The Task scheduler will take care that not to many tasks will be started at the same time. Be aware though, that starting several tasks could lead to deadlocks. Check carefully if you need critical sections
var clientTasks = new List<Task>();
foreach(var client in clients)
{
if (list[i] == 0)
clientTasks.Add(DoSomethingWithClientAsync());
else
clientTasks.Add(DoOtherThingAsync());
}
// if here: all tasks started. If desired you can do other things:
AndNowForSomethingCompletelyDifferent();
// later we need the other tasks to be finished:
var taskWaitAll = Task.WhenAll(clientTasks);
// did you notice we still did not await yet, we are still in business:
MontyPython();
// okay, done with frolicking, we need the results:
await taskWaitAll;
DoOtherWork();
This was the scenario where all Tasks where independent: no task needed the other to be completed before it could start. However if you need Task[2] to be completed before you can start Task[3] you should await:
foreach(var client in clients)
{
if (list[i] == 0)
await DoSomethingWithClientAsync());
else
await DoOtherThingAsync();
}

Categories