Adding items to a list from asynchronous method - c#

I need to return items from partitions in service fabric and add them to a list. The results come from async methods. I try to understand what is happening to get it to run faster. Does the loop wait until await is returned for each GetItems, or does the loop continue and start a new GetItems for the next partition?
List<string> mylist = new List<string>();
foreach(var partition in partitions)
{
var int64RangePartitionInformation = p.PartitionInformation as Int64RangePartitionInformation;
if (int64RangePartitionInformation == null) continue;
var minKey = int64RangePartitionInformation.LowKey;
var assetclient = ServiceProxy.Create<IService>(serviceName, new ServicePartitionKey(minKey));
var assets = await assetclient.GetItems(CancellationToken.None);
mylist.AddRange(items)
}

The await keyword tells the compiler to refactor your method to a state machine. Once your async method is called, everything before the first await will be executed. The rest of the tasks will be registered for a delayed execution, and depending of the current configuration can be executed immediately and synchronously or on another thread.
If the awaited task is executed asynchronously, the actual method call returns, so the thread is free to do anything else, eg. refresh the UI. This refactored state machine-like method is then called again and again, polls whether the awaited task is finished. Once it is finished, the state is switched so the codes after the awaited line will be executed and so on.
So logically yes, the loop waits until the results are there, but in reality the thread is not blocked because of the state machine-like behavior mentioned above.
See a detailed explanation here.
Update:
If the results can be obtained parallelly from all partitions, do not await them one by one. Instead, use Parallel.ForEach or just populate a Task collection in your foreach loop, and finally await them in one step:
await Task.WhenAll(myTasks);

await is an "asynchronous wait". It pauses the current method and (asynchronously) waits for that operation to complete before continuing.
Note that the method is paused, not the thread. That's what makes await an "asynchronous wait" instead of a regular wait ("synchronous wait").
For more info, see my async/await intro.
Does the loop wait until await is returned for each GetItems, or does the loop continue and start a new GetItems for the next partition?
At each await, the loop is (asynchronously) waiting. So with the current code, only one call to the service is being done at a time, and the list does not have to deal with concurrent AddRange calls.

Related

c#, multiple async task execute [duplicate]

In terms of performance, will these 2 methods run GetAllWidgets() and GetAllFoos() in parallel?
Is there any reason to use one over the other? There seems to be a lot happening behind the scenes with the compiler so I don't find it clear.
============= MethodA: Using multiple awaits ======================
public async Task<IHttpActionResult> MethodA()
{
var customer = new Customer();
customer.Widgets = await _widgetService.GetAllWidgets();
customer.Foos = await _fooService.GetAllFoos();
return Ok(customer);
}
=============== MethodB: Using Task.WaitAll =====================
public async Task<IHttpActionResult> MethodB()
{
var customer = new Customer();
var getAllWidgetsTask = _widgetService.GetAllWidgets();
var getAllFoosTask = _fooService.GetAllFos();
Task.WaitAll(new List[] {getAllWidgetsTask, getAllFoosTask});
customer.Widgets = getAllWidgetsTask.Result;
customer.Foos = getAllFoosTask.Result;
return Ok(customer);
}
=====================================
The first option will not execute the two operations concurrently. It will execute the first and await its completion, and only then the second.
The second option will execute both concurrently but will wait for them synchronously (i.e. while blocking a thread).
You shouldn't use both options since the first completes slower than the second and the second blocks a thread without need.
You should wait for both operations asynchronously with Task.WhenAll:
public async Task<IHttpActionResult> MethodB()
{
var customer = new Customer();
var getAllWidgetsTask = _widgetService.GetAllWidgets();
var getAllFoosTask = _fooService.GetAllFos();
await Task.WhenAll(getAllWidgetsTask, getAllFoosTask);
customer.Widgets = await getAllWidgetsTask;
customer.Foos = await getAllFoosTask;
return Ok(customer);
}
Note that after Task.WhenAll completed both tasks already completed so awaiting them completes immediately.
Short answer: No.
Task.WaitAll is blocking, await returns the task as soon as it is encountered and registers the remaining part of the function and continuation.
The "bulk" waiting method you were looking for is Task.WhenAll that actually creates a new Task that finishes when all tasks that were handed to the function are done.
Like so: await Task.WhenAll({getAllWidgetsTask, getAllFoosTask});
That is for the blocking matter.
Also your first function does not execute both functions parallel. To get this working with await you'd have to write something like this:
var widgetsTask = _widgetService.GetAllWidgets();
var foosTask = _fooService.GetAllWidgets();
customer.Widgets = await widgetsTask;
customer.Foos = await foosTask;
This will make the first example to act very similar to the Task.WhenAll method.
As an addition to what #i3arnon said. You will see that when you use await you are forced to have to declare the enclosing method as async, but with waitAll you don't. That should tell you that there is more to it than what the main answer says. Here it is:
WaitAll will block until the given tasks finish, it does not pass control back to the caller while those tasks are running. Also as mentioned, the tasks are run asynchronous to themselves, not to the caller.
Await will not block the caller thread, it will however suspend the execution of the code below it, but while the task is running, control is returned back to the caller. For the fact that control is returned back to the caller (the called method is running async), you have to mark the method as async.
Hopefully the difference is clear. Cheers
Only your second option will run them in parallel. Your first will wait on each call in sequence.
As soon as you invoke the async method it will start executing. Whether it will execute on the current thread (and thus run synchronously) or it will run async is not possible to determine.
Thus, in your first example the first method will start doing work, but then you artificially stops the flow of the code with the await. And thus the second method will not be invoked before the first is done executing.
The second example invokes both methods without stopping the flow with an await. Thus they will potentially run in parallel if the methods are asynchronous.

How to execute an arbitrary number of async tasks in sequential order?

I have this function:
async Task RefreshProfileInfo(List<string> listOfPlayers)
// For each player in the listOfPlayers, checks an in-memory cache if we have an entry.
// If we have a cached entry, do nothing.
// If we don't have a cached entry, fetch from backend via an API call.
This function is called very frequently, like:
await RefreshProfileInfo(playerA, playerB, playerC)
or
await RefreshProfileInfo(playerB, playerC, playerD)
or
await RefreshProfileInfo(playerE, playerF)
Ideally, if the players do not overlap each other, the calls should not affect each other (requesting PlayerE and PlayerF should not block the request for PlayerA, PlayerB, PlayerC). However, if the players DO overlap each other, the second call should wait for the first (requesting PlayerB, PlayerC, PlayerD, should wait for PlayerA, PlayerB, PlayerC to finish).
However, if that isn't possible, at the very least I'd like all calls to be sequential. (I think they should still be async, so they don't block other unrelated parts of the code).
Currently, what happens is each RefreshProfileInfo runs in parallel, which results in hitting backend every time (9 times in this example).
Instead, I want to execute them sequentially, so that only the first call hits the backend, and subsequent calls just hit cache.
What data structure/approach should I use? I'm having trouble figuring out how to "connect" the separate calls to each other. I've been playing around with Task.WhenAll() as well as SemaphoreSlim, but I can't figure out how to use them properly.
Failed attempt
The idea behind my failed attempt was to have a helper class where I could call a function, SequentialRequest(Task), and it would sequentially run all tasks invoked in this manner.
List<Task> waitingTasks = new List<Task>();
object _lock = new object();
public async Task SequentialRequest(Task func)
{
var waitingTasksCopy = new List<Task>();
lock (_lock)
{
waitingTasksCopy = new List<Task>(waitingTasks);
waitingTasks.Add(func); // Add this task to the waitingTasks (for future SequentialRequests)
}
// Wait for everything before this to finish
if (waitingTasksCopy.Count > 0)
{
await Task.WhenAll(waitingTasksCopy);
}
// Run this task
await func;
}
I thought this would work, but "func" is either run instantly (instead of waiting for earlier tasks to finish), or never run at all, depending on how I call it.
If I call it using this, it runs instantly:
async Task testTask()
{
await Task.Delay(4000);
}
If I call it using this, it never runs:
Task testTask = new Task(async () =>
{
await Task.Delay(4000);
});
Here's why your current attempt doesn't work:
// Run this task
await func;
The comment above is not describing what the code is doing. In the asynchronous world, a Task represents some operation that is already in progress. Tasks are not "run" by using await; await it a way for the current code to "asynchronously wait" for a task to complete. So no function signature taking a Task is going to work; the task is already in progress before it's even passed to that function.
Your question is actually about caching asynchronous operations. One way to do this is to cache the Task<T> itself. Currently, your cache holds the results (T); you can change your cache to hold the asynchronous operations that retrieve those results (Task<T>). For example, if your current cache type is ConcurrentDictionary<PlayerId, Player>, you could change it to ConcurrentDictionary<PlayerId, Task<Player>>.
With a cache of tasks, when your code checks for a cache entry, it will find an existing entry if the player data is loaded or has started loading. Because the Task<T> represents some asynchronous operation that is already in progress (or has already completed).
A couple of notes for this approach:
This only works for in-memory caches.
Think about how you want to handle errors. A naive cache of Task<T> will also cache error results, which is usually not desired.
The second point above is the trickier part. When an error happens, you'd probably want some additional logic to remove the errored task from the cache. Bonus points (and additional complexity) if the error handling code prevents an errored task from getting into the cache in the first place.
at the very least I'd like all calls to be sequential
Well, that's much easier. SemaphoreSlim is the asynchronous replacement for lock, so you can use a shared SemaphoreSlim. Call await mySemaphoreSlim.WaitAsync(); at the beginning of RefreshProfileInfo, put the body in a try, and in the finally block at the end of RefreshProfileInfo, call mySemaphoreSlim.Release();. That will limit all calls to RefreshProfileInfo to running sequentially.
I had the same issue in one of my projects. I had multiple threads call a single method and they all made IO calls when not found in cache. What you want to do is to add the Task to your cache and then await it. Subsequent calls will then just read the result once the task completes.
Example:
private Task RefreshProfile(Player player)
{
// cache is of type IMemoryCache
return _cache.GetOrCreate(player, entry =>
{
// expire in 30 seconds
entry.AbsoluteExpiration = DateTimeOffset.UtcNow.AddSeconds(30);
return ActualRefreshCodeThatReturnsTask(player);
});
}
Then just await in your calling code
await Task.WhenAll(RefreshProfile(Player a), RefreshProfile(Player b), RefreshProfile(Player c));

Is it safe not to wait for all tasks

Say I have the following action method:
[HttpPost]
public async Task<IActionResult> PostCall()
{
var tasks = new List<Task<bool>>();
for (int i = 0; i < 10; i++)
tasks.Add(Manager.SomeMethodAsync(i));
// Is this line necessary to ensure that all tasks will finish successfully?
Task.WaitAll(tasks.ToArray());
if (tasks.Exists(x => x.Result))
return new ObjectResult("At least one task returned true");
else
return new ObjectResult("No tasks returned true");
}
Is Task.WaitAll(tasks.ToArray()) necessary to ensure that all tasks will finish successfully? Will the tasks whose Result happened not to get accessed by the Exists finish their execution in the background successfully? Or is there a chance that some of the tasks (that weren't waited for) get dropped since they would not be attached to the request? Is there a better implementation I'm missing?
Under your provided implementation, the Task.WaitAll call blocks the calling thread until all tasks have completed. It would only proceed to the next line and perform the Exists check after this has happened. If you remove the Task.WaitAll, then the Exists check would cause the calling thread to block on each task in order; i.e. it first blocks on tasks[0]; if this returns false, then it would block on tasks[1], then tasks[2], and so on. This is not desirable since it doesn't allow for your method to finish early if the tasks complete out of order.
If you only need to wait until whichever task returns true first, then you could use Task.WhenAny. This will make your asynchronous method resume as soon as any task completes. You can then check whether it evaluated to true and return success immediately; otherwise, you keep repeating the process for the remaining collection of tasks until there are none left.
If your code was running as an application (WPF, WinForms, Console), then the remaining tasks would continue running on the thread pool until completion, unless the application is shut down. Thread-pool threads are background threads, so they won't keep the process alive if all foreground threads have terminated (e.g. because all windows were closed).
Since you're running a web app, you incur the risk of having your app pool recycled before the tasks have completed. The unawaited tasks are fire-and-forget and therefore untracked by the runtime. To prevent this from happening, you can register them with the runtime through the HostingEnvironment.QueueBackgroundWorkItem method, as suggested in the comments.
[HttpPost]
public async Task<IActionResult> PostCall()
{
var tasks = Enumerable
.Range(0, 10)
.Select(Manager.SomeMethodAsync)
.ToList();
foreach (var task in tasks)
HostingEnvironment.QueueBackgroundWorkItem(_ => task);
while (tasks.Any())
{
var readyTask = await Task.WhenAny(tasks);
tasks.Remove(readyTask);
if (await readyTask)
return new ObjectResult("At least one task returned true");
}
return new ObjectResult("No tasks returned true");
}
Yes, the tasks are not guaranteed to complete unless something waits for them (with something like an await)
In your case, the main change you should make is making the Task.WaitAll
await Task.WhenAll(tasks);
So it is actually asynchronous. If you just want to wait for a task to return, use WhenAny instead.

How should I be using the TPL if I want to do implement this threading concept?

Let's say I have a method like SaveAsync(Item item) and I need to call it on 10 Items and the calls are independent of one another. I imagine the ideal way in terms of threading is like
Thread A | Run `SaveAsync(item1)` until we hit the `await` | ---- ... ---- | Run `SaveAsync(item10)` until we hit the `await` | ---------------------------------------|
Thread B | --------------------------------------------------- | Run the stuff after the `await` in `SaveAsync(item1)` | ------------------ ... -----------------------|
Thread C | ------------------------------------------------------ | Run the stuff after the `await` in `SaveAsync(item2)` | ------------------ ... --------------------|
.
.
.
(with it being possible that some of the stuff after the await for multiple items is run in the same thread, perhaps even Thread A)
I'm wondering how to write that in C#? Is it a parallel foreach or a loop with with await SaveAsync(item) or what?
Per default async tasks will always return to the thread context they were started on. You can change this by adding
await task.ConfigureAwait(false)
This allows tells the runtime that you do not care on which thread context the task will resume and the runtime can omit the capture of the current thread context (which is quite costly).
However per default you will always be scheduled on the thread context that started the task.
There are a fewer default contexts, such as the ui thread context or the thread pool context. A task started on the ui thread context will be scheduled back to the ui thread context.
A tasks started on the thread pool context will be scheduled to the next free thread from the pool. Not necessarily the same thread the task was started on.
However you can provide your own context if you need more control over the task scheduling.
How to start multiple task in a fashion as you described above. A loop will not help here. Lets take this example.
foreach(var item in items)
{
await SaveAsync(item);
}
The await here will wait until the SaveAsync finishes. So all saves are processed in sequence.
How to save truly asynchronous?
The trick is to start all tasks, but not await them, until all tasks are started. You then wait all tasks with WhenAll(IEnumerable<Task>).
Here an example.
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync(item)); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
Because of the missing await, all "Save-Actions" are placed in the Async/Await state machine. As soon as the first task yields back, the second will be executed. This will result in a behavior somewhat similar to the one described in your question.
The only main difference here, is all tasks are executed in the same thread. This is most of time complete ok, because all Save methods usually need to access the same resources. Parallelizing them gives no real advantage, because the bottleneck is this resource.
How to use mutliple threads
You can execute a task on a new thread by using
Task.Run(SaveAsync(item));
This will execute the thread on a new thread taken from the thread pool, but there is no wait to start a new thread and finish the method on the ui thread.
To execute all items on different thread, you can use nearly the same code as before:
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(Task.Run(SaveAsync(item));); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
The only difference is here, that we take the taks returned form StartNew.
One remark: Using Task.Run does not guarantee you a new thread. It will execute the task on the next free thread from the thread pool. This depends on your local settings as well as the local configuration (e.g. a heavy barebone server will have a lot more threads than any consumer laptop).
Whether you get a new thread or you have to wait for any occupied thread to finish is completely up to the thread pool. (The tread pool usually does a really great job. For more info, here a really great article on the thread pool performance: CLR-Thread-Pool)
This is where people do most of the mistakes with async/await:
1) Either people think, that everything after calling async method, with/without awaiting, does translate to ThreadPool thread.
2) Or people think that async does run synchronously.
The truth is somewhere between and #Iqon's statement about next block of code is actually incorrect: "The only main difference here, is all tasks are executed in the same thread."
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync(item)); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
To make statement like this would suggest that the async method SaveAsync(item) is actually capable to execute fully and completely synchronously.
Here are examples:
async Task SaveAsync1(Item item)
{
//no awaiting at all
}
async Task SaveAsync2(Item item)
{
//awaiting already completed task
int i = await Task.FromResult(0);
}
Methods like these would really run synchronously on thread it the async task was executed on. But these kind async methods are special snowflakes. There is no operation awaited here, everything is commplete even when await is inside the method, because it does not await on first case and does await on completed task in second case, so it synchronously continues after await and these two calls would be same:
var taskA = SaveAsync2(item);//it would return task runned to completion
//same here, await wont happen as returned task was runned to completion
await SaveAsync2(item);
So, making statements, that executing async method here synchronously is correct only in this special case:
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync2(item));
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
And there is no need to store tasks and await Task.WhenAll(tasks), it is all already done and this would be enough:
foreach(var item in items)
{
SaveAsync2(item);
//it will execute synchronously because there is
//nothing to await for in the method
}
Now lets explore real case, an async method that actualy awaits something inside or spark awaitable operation:
async Task SaveAsync3(Item item)
{
//awaiting already completed task
int i = await Task.FromResult(0);
await Task.Delay(1000);
Console.WriteLine(i);
}
Now what would this do?
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync3(item));
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
Would it run synchronously? No!
Would it run concurrently in parallel? NO!
As I said at begining, the truth is somewhere between with async methods unless they are special snow flakes like SaveAsync1 and SaveAsync2.
So what the code did? It executed each SaveAsync3 synchronously up to await Task.Delay where it found the returned task is incomplete and returned back to caller and provided incomplete task which was stored to tasks and next SaveAsync was executed in same way.
Now await Task.WhenAll(tasks); has really meaning, because it is awaiting some incomplete operation which will run outside this thread context and in parallel.
All those parts of SaveAsync3 method after await Task.Delay will be scheduled to ThreadPool and will run in parallel, unless special case like UI thread context and in that case ConfigureAwait(false) after TaskDelay would be needed.
Hope you guys understand, what I want to say. You can not really say how async method will run unless you have more information about it or code.
This exercise also opens question, when to Task.Run on async method.
It is often missused and I think, that there are really just 2 main cases:
1) When you want break from current threads context, like UI, ASP.NET etc
2) When async method has synchronous part(up to first incomplete await) which is computationally intensive and you want to offload it as well, not just the incomplete await part. The case would be if SaveAsync3 would be computing the variable i for long time, let's say Fibonacci :).
For example you do not have to use Task.Run on something like SaveAsync which would open file and save into it something asynchronously, unless the synchronous part before first await inside SaveAsync is an issue, taks time. Then Task.Run is in order as is part of case 2).

Understanding acync and await in C#

I'm learning async and await operation in c#. I couldn't understand the flow of execution when it handles multiple async operation. for eg: I have the below code in my c# application.
await repository.GetAsync(values); // execute for 10 sec
var result = repository.setAsync(data); // 20 sec
dataresult = await repository.GetAsync(result); //execute for 10 sec
I have three async calls here.
As per my understanding each call will have a callback and this will not wait for one action to complete.
So how I can ensure the action is complete?
The repository.setAsync will execute before repository.GetAsync(values) complete its execution? or this will execute only after repository.GetAsync(values) execution completed?
So what will be the order of execution?
1)
await repository.GetAsync(values); // started await method execution, since there is no callback it will not set and will start execute the next before complete this.
var result = repository.setAsync(data); // will execute for 20 sec. Once completed will go to previous thread and complete that.
await repository.GetAsync(values); // started await method execution, complete it and move to the next line.
var result = repository.setAsync(data); // will execute for 20 sec.
When you execute something synchronously, you wait for it to finish before moving on to another task. When you execute something asynchronously, you can move on to another task before it finishes. But here, for asynchronous it waiting for the operation to finish. Why this contradiction?
I want to return the dataresult only once the operation has been completed.
I feel this is contrary to fire and forget. Whether these two are same or different concepts?
As per the below link reference
The await keyword does not block the thread until the task is
complete.
But from the answers posted here, I understood this will pause the execution. which is true? Am I missed something?
As per my understanding each call will have a callback and this will not wait for one action to complete.
When you use await, the code will wait for the action to complete before moving on. This is the way you deal with data dependencies -- situations when a task needs results from a previous task to be available before it can start processing. The only action that is not awaited is result, so GetAsync(result) must take Task<T> as its parameter, where T is the type of whatever SetAsync method returns.
Note
If code following the await does not need to be executed on the UI thread you should call ConfigureAwait(false) upon the Task you are awaiting. Why is this best practice? Stephen Cleary provides an excellent blog post on the topic of async/await deadlocks that explains it.
It is also very likely that you are missing await on the second line and an assignment of data on the first line:
var data = await repository.GetAsync(values).ConfigureAwait(false);
var result = await repository.SetAsync(data).ConfigureAwait(false);
dataresult = await repository.GetAsync(result).ConfigureAwait(false);
So what is the concept of callback and fire and forget here?
If callback happens before the call of await, which is possible when you fire up a task, do something else, and then await that task, you get a chance to do more work in between "firing and forgetting" and getting the results back. The key here is that there must be no data dependency in the middle:
var dataTask = repository.GetAsyncOne(values); // Fire and forget
// Do something else in the middle
var result = await repository.SetAsync(data).ConfigureAwait(false);
// If the task has completed, there will be no wait
var data = await dataTask.ConfigureAwait(false);

Categories