Scatter gather with async await - c#

I have a situation where I am interested in the first successful response from an array of services that each support the method
Task<Try<SearchResponse>> PerformSearch(SearchRequest request);
The Try class is a container for a Good/Bad result (like error Monad)
The call to the list of services currently is this
var searchResponses = await Task.WhenAll(
_searchServices.Select(s => s.PerformSearch(request)));
return searchResponses.FirstOrBad(sr=>sr.IsGood);
Where FirstOrBad is an extension method that finds the first good result or returns a composite Bad Try with a concatenation of all the errors.
As far as I understand the problem with this is that due to the WhenAll the time to find the first good result is limited by the slowest response.
I want to continue execution as soon as I receive the first positive result but not the first (2nd ... etc) result if it is not successful, but also continue execution if all results return unsuccessfully, reporting the lack of success.
I would have thought this is a common problem but have found little when searching for examples. It maybe known by some other term than scatter gather.

Something like this should work for you
public static async Task<Try<T>> FirstOrBad<T>(this IEnumerable<Task<Try<T>>> tasks, Func<Try<T>, bool> predicate)
{
var taskList = tasks.ToList();
var completed = new List<Task<Try<T>>>();
Task<Try<T>> completedTask;
do
{
completedTask = await Task.WhenAny(taskList);
completed.Add(completedTask);
taskList.Remove(completedTask);
} while (!predicate(await completedTask) && taskList.Any());
return !predicate(await completedTask) ? new Try<T>(completed.ToString(",")) : await completedTask;
}
Adapter from this answer TPL wait for task to complete with a specific return value

Related

C# Add to a List Asynchronously in API

I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}

How does parallelization work on async/await?

I have the following code, that I intend to run asynchronously. My goal is that GetPictureForEmployeeAsync() is called in parallel as many times as needed. I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
public Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
}
private Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, base64PictureTask, documentTask);
}
private static async Task<Picture> CreatePicture(IDictionary<string, string> tags, Task<string> base64PictureTask, Task<Employee> documentTask)
{
var document = await documentTask;
return new Picture
{
EmployeeID = document.ID,
Data = await base64PictureTask,
ID = document.ID.ToString(),
Tags = tags,
};
}
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this? If not, how should I restructure the code to achieve what I want?
I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
It doesn't.
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this?
Yes and no. The WhenAll isn't limited in any way by the awaited tasks in CreatePicture, but that has nothing to do with whether GetPictureForEmployeeAsync is awaited or not. These two lines of code are equivalent in terms of behavior:
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
return Task.WhenAll(employees.Select(async employee => await GetPictureForEmployeeAsync(employee, tags)));
I recommend reading my async intro to get a good understanding of how async and await work with tasks.
Also, since GetPictures has non-trivial logic (GetRepositoryQuery and evaluating tags["gender"]), I recommend using async and await for GetPictures, as such:
public async Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
var tasks = employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)).ToList();
return await Task.WhenAll(tasks);
}
As a final note, you may find your code cleaner if you don't pass around "tasks meant to be awaited" - instead, await them first and pass their result values:
async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
await Task.WhenAll(base64PictureTask, documentTask);
return CreatePicture(tags, await base64PictureTask, await documentTask);
}
static Picture CreatePicture(IDictionary<string, string> tags, string base64Picture, Employee document)
{
return new Picture
{
EmployeeID = document.ID,
Data = base64Picture,
ID = document.ID.ToString(),
Tags = tags,
};
}
The thing to keep in mind about calling an async method is that, as soon as an await statement is reached inside that method, control immediately goes back to the code that invoked the async method -- no matter where the await statement happens to be in the method. With a 'normal' method, control doesn't go back to the code that invokes that method until the end of that method is reached.
So in your case, you can do the following:
private async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
// As soon as we get here, control immediately goes back to the GetPictures
// method -- no need to store the task in a variable and await it within
// CreatePicture as you were doing
var picture = await blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var document = await documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, picture, document);
}
Because the first line of code in GetPictureForEmployeeAsync has an await, control will immediately go right back to this line...
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
...as soon as it is invoked. This will have the effect of all of the employee items getting processed in parallel (well, sort of -- the number of threads that will be allotted to your application will be limited).
As an additional word of advice, if this application is hitting a database or web service to get the pictures or documents, this code will likely cause you issues with running out of available connections. If this is the case, consider using System.Threading.Tasks.Parallel and setting the maximum degree of parallelism, or use SemaphoreSlim to control the number of connections used simultaneously.

Handle result returned by async call within async method

Imagine this is a method performing a DB query and returning a result, which in case of null is replaced with a default value (Null object pattern).
public ResultObj Get()
{
var result = dbContext.GetSomeResult();
return result ?? ResultObj.NullValue;
}
Imagine this DB query is a long-running process, so I would use async/await to execute this process in a separate thread. Suppose that the dbContext.GetSomeResultAsync() method is available.
How can be this method converted in an asynchronous one so that I can write something like this?
var resultTask = GetAsync();
var otherResultTask = GetSomethingElseAsync();
Task.WaitAll(resultTask, otherResultTask);
var myResult = resultTask.Result;
var myOtherResult = otherResultTask.Result;
I tried this solution.
public async Task<ResultObj> GetAsync()
{
var result = await dbContext.GetSomeResultAsync();
return result ?? ResultObj.NullValue;
}
First, I'm wondering why this code compiles: why can I return ResultObj when Task<ResultObj> is expected?
Second, this code predictably results in a deadlock, as clearly explained by the great number of resources about async deadlocks anti-patterns. The deadlock can be prevented by using .ConfigureAwait(false) method after the async call. Is this the right way to go? Are there any hidden drawbacks in this case? Is it a general rule?
I also tried this.
public async Task<ResultObj> GetAsync()
{
return await Task.Run(() => {
var result = dbContext.GetSomeResult();
return result ?? ResultObj.NullValue;
});
}
This results in a deadlock, too. This time I cannot even figure out why.
Edit: possible solution
Finally, after having read this, I found a solution to my problem.
My generic query wrapper method is like this.
public async Task<ResultObj> GetAsync()
{
var result = await dbContext.GetSomeResultAsync();
return result ?? ResultObj.NullValue;
}
On calling method, I use this pattern.
public async Task<CollectedResults> CollectAsync()
{
var resultTask = GetAsync();
var otherResultTask = GetSomethingElseAsync();
//here both queries are being executed.
//...in the while, optionally, here some other synchronous actions
//then, await results
var result = await resultTask;
var otherResult = await otherResultTask;
//here process collected results and return
return new CollectedResults(...);
}
It is worth mentioning that the above code, wrapped in a domain class, is called by a Controller action. In order for this to work I had to make async the methods all the way up, until Controller action, which now appears as follows.
public async Task<CollectedResults> Get()
{
return await resultsCollector.CollectAsync();
}
This way, deadlock doesn't happen anymore and execution time greatly improves with respect to the synchronous version.
I don't know if this is the canonical way of executing parallel queries. But it works and I don't see particular pitfalls in the code.
First of all, regarding :
so I would use async/await to execute this process in a separate thread.
There is no new thread created when we use async and await
Secondly:
why can I return ResultObj when Task is expected?
the Task<TResult> as return type of method tells that it returns a Task of type TResult but we need to return object of type that TResult back from it so the method can be awaited and when using Task<TResult> as reutrn type we should be using async and await to do the work.
Lastly:
this code predictably results in a deadlock
You are using async keyword with method signatures and also await the next async method call being done from within the method. So apparently it looks like the code in first example you have posted shouldn't be deadlocked, if the method GetSomeResultAsync you are consuming is really a async method and is properly implemented.
I suggest to study more about the async await before getting in to it, following is a good article to start with:
https://blog.stephencleary.com/2012/02/async-and-await.html

Double await in one call

I was watching a video called Becoming a C# Time Lord and at 0:35:36 this code popped up:
async Task<TResult[]> PurelyWhenAll<TResult> (params Task<TResult>[] tasks)
{
var killJoy = new TaskCompletionSource<TResult[]>();
foreach ( var task in tasks )
task.ContinueWith(ant =>
{
if ( ant.IsCanceled )
killJoy.TrySetCanceled();
else if ( ant.IsFaulted )
killJoy.TrySetException(ant.Exception.InnerException);
});
return await await Task.WhenAny(killJoy.Task, Task.WhenAll(tasks));
}
Does this mean that a task returns a task and because of that we have double await? If that is the case what happens concerning performance if we have more than two awaits? Is this good practice, should this be avoided?
Task.WhenAny is going to return a Task<Task<TResult>>:
Awaiting the result of Task.WhenAny() will return the first task that completed
Awaiting that task will return the results of the task, i.e. a TResult[].
You might find it easy to understand with explanatory variables:
var firstCompletedTask = await Task.WhenAny(killJoy.Task, Task.WhenAll(tasks));
var firstResult = await firstCompletedTask;
return firstResult;
It's not clear why you're concerned around the performance of this - it's just two await expressions, not particularly different to any other method with two await expressions.
It's pretty natural to do this when using Task.WhenAny<TResult>(Task<TResult>[]), given that the return type is a Task<Task<TResult>>.
You don't have to worry about the performance of two awaits on the same task. If it has already completed, you just get the value returned without the task running again.

How to use async countdown event instead of collecting tasks and awaiting on them?

I have the following code:
var tasks = await taskSeedSource
.Select(taskSeed => GetPendingOrRunningTask(taskSeed, createTask, onFailed, onSuccess, sem))
.ToList()
.ToTask();
if (tasks.Count == 0)
{
return;
}
if (tasks.Contains(null))
{
tasks = tasks.Where(t => t != null).ToArray();
if (tasks.Count == 0)
{
return;
}
}
await Task.WhenAll(tasks);
Where taskSeedSource is a Reactive Observable. It could be that this code have many problems, but I see at least two:
I am collecting tasks whereas I could do without it.
Somehow, the returned tasks list may contain nulls, even though GetPendingOrRunningTask is an async method and hence never returns null. I failed to understand why it happens, so I had to defend against it without understanding the cause of the problem.
I would like to use the AsyncCountdownEvent from the AsyncEx framework instead of collecting the tasks and then awaiting on them.
So, I can pass the countdown event to GetPendingOrRunningTask which will increment it immediately and signal before returning after awaiting for the completion of its internal logic. However, I do not understand how to integrate the countdown event into the monad (that is the Reactive jargon, isn't it?).
What is the right way to do it?
EDIT
Guys, let us forget about the mysterious nulls in the returned list. Suppose everything is green and the code is
var tasks = await taskSeedSource
.Select(taskSeed => GetPendingOrRunningTask(taskSeed, ...))
.ToList()
.ToTask();
await Task.WhenAll(tasks);
Now the question is how do I do it with the countdown event? So, suppose I have:
var c = new AsyncCountdownEvent(1);
and
async Task GetPendingOrRunningTask<T>(AsyncCountdownEvent c, T taskSeed, ...)
{
c.AddCount();
try
{
await ....
}
catch (Exception exc)
{
// The exception is handled
}
c.Signal();
}
My problem is that I no longer need the returned task. These tasks where collected and awaited to get the moment when all the work items are over, but now the countdown event can be used to indicate when the work is over.
My problem is that I am not sure how to integrate it into the Reactive chain. Essentially, the GetPendingOrRunningTask can be async void. And here I am stuck.
EDIT 2
Strange appearance of a null entry in the list of tasks
#Servy is correct that you need to solve the null Task problem at the source. Nobody wants to answer a question about how to workaround a problem that violates the contracts of a method that you've defined yourself and yet haven't provided the source for examination.
As for the issue about collecting tasks, it's easy to avoid with Merge if your method returns a generic Task<T>:
await taskSeedSource
.Select(taskSeed => GetPendingOrRunningTask(taskSeed, createTask, onFailed, onSuccess, sem))
.Where(task => task != null) // According to you, this shouldn't be necessary.
.Merge();
However, unfortunately there's no official Merge overload for the non-generic Task but that's easy enough to define:
public static IObservable<Unit> Merge(this IObservable<Task> sources)
{
return sources.Select(async source =>
{
await source.ConfigureAwait(false);
return Unit.Default;
})
.Merge();
}

Categories