Better way to collect results from async tasks - c#

My MVC app ocasionally results in a deadlock. I think it is likely due to a faulty way I am collecting data from completed async tasks.
I have two independent async methods.
var task1 = GetNamesFromSource1Async(); // a database call, may throw an exception
var task2 = GetNamesFromSource2Async(); // a database call, may throw an exception
var total = new List<string>();
await Task.WhenAll(task1, taks2).ConfigureAwait(false);
Question part 1: What is the safest recommended way and the best practice to collect results from these tasks:
// here I already know that both tasks are completed and
// I am using (or abusing?) await to get task results
List<string> names1 = await task1.ConfigureAwait(false);
List<string> names2 = await task2.ConfigureAwait(false);
if (names1 != null) total.AddRange(names1);
if (names2 != null) total.AddRange(names2);
or
total.AddRange(task1.IsFaulted ? new List<string> : task1.Result);
total.AddRange(task2.IsFaulted ? new List<string> : task2.Result);
?
Question part 2: in addition if I want to transform data from the first source, is it safe to use ContinueWith (when I say safe I mean from the standpoint of deadlocks)
var task1 = GetNamesFromSource1Async().ContinueWith(t =>
{
if ( !t.IsFaulted && t.Result != null)
{
return t.Result.Take(1).ToList();
}
});
Remark: here I am trying control for exceptions within each of the tasks by checking IsFaulted flag.
A recommendation on the best practice to solve this problem would be highly appreciated. I am using .NET 4.5

.Result is a blocking call and can lead to deadlock when mixed with async/await
var task1 = GetNamesFromSource1Async(); // a database call, may throw an exception
var task2 = GetNamesFromSource2Async(); // a database call, may throw an exception
var total = new List<string>();
var results = await Task.WhenAll(task1, task2);
total.AddRange(results.Where(s => s != null && s.Count > 0).SelectMany(s => s));
Update
The above assumed the return types were all the same.
However from your comment...
How would you modify the last line if I still need to collect results
but task1 and task2 are based on different types?
and referencing this answer
Awaiting multiple Tasks with different results
Then it would be modified as
var task1 = GetNamesFromSource1Async(); // a database call, may throw an exception
var task2 = GetNamesFromSource2Async(); // a database call, may throw an exception
var total = new List<string>();
await Task.WhenAll(task1, task2);
List<String> names1 = await task1;
List<int> names2 = await task2;
//...process results

Related

Task.WhenAll on List<Task> behaving differently than Task.WhenAll on IEnumerable<Task>

I'm seeing some odd behavioral differences when calling Task.WhenAll(IEnumerable<Task<T>>) and calling Task.WhenAll(List<Task<T>>) while trying to catch exceptions
My code is as follows:
public async Task Run()
{
var en = GetResources(new []{"a","b","c","d"});
await foreach (var item in en)
{
var res = item.Select(x => x.Id).ToArray();
System.Console.WriteLine(string.Join("-> ", res));
}
}
private async IAsyncEnumerable<IEnumerable<ResponseObj>> GetResources(
IEnumerable<string> identifiers)
{
IEnumerable<IEnumerable<string>> groupedIds = identifiers.Batch(2);
// MoreLinq extension method -- batches IEnumerable<T>
// into IEnumerable<IEnumerable<T>>
foreach (var batch in groupedIds)
{
//GetHttpResource is simply a wrapper around HttpClient which
//makes an Http request to an API endpoint with the given parameter
var tasks = batch.Select(id => ac.GetHttpResourceAsync(id)).ToList();
// if I remove this ToList(), the behavior changes
var stats = tasks.Select(t => t.Status);
// at this point the status being WaitingForActivation is reasonable
// since I have not awaited yet
IEnumerable<ResponseObj> res = null;
var taskGroup = Task.WhenAll(tasks);
try
{
res = await taskGroup;
var awaitedStats = tasks.Select(t => t.Status);
//this is the part that changes
//if I have .ToList(), the statuses are RanToCompletion or Faulted
//if I don't have .ToList(), the statuses are always WaitingForActivation
}
catch (Exception ex)
{
var exceptions = taskGroup.Exception.InnerException;
DoSomethingWithExceptions(exceptions);
res = tasks.Where(g => !g.IsFaulted).Select(t => t.Result);
//throws an exception because all tasks are WaitingForActivation
}
yield return res;
}
}
Ultimately, I have an IEnumerable of identifiers, I'm batching that into batches of 2 (hard coded in this example), and then running Task.WhenAll to run each batch of 2 at the same time.
What I want is if 1 of the 2 GetResource tasks fails, to still return the successful result of the other, and handle the exception (say, write it to a log).
If I run Task.WhenAll on a list of tasks, this works exactly how I want. However, if I remove the .ToList(), when I attempt to find my faulted tasks in the catch block after the await taskGroup, I run into problems because the statuses of my tasks are still WaitingForActivation although I believe they have been awaited.
When there is no exception thrown, the List and IEnumerable act the same way. This only starts causing issues when I try to catch exceptions.
What is the reasoning behind this behavior? The Task.WhenAll must have completed since I get into the catch block, however why are the statuses still WaitingForActivation? Have I failed to grasp something fundamental here?
Unless you make the list concrete (by using ToList()), each time you enumerate over the list you are calling GetHttpResourceAsync again, and creating a new task. This is due to the deferred execution.
I would definitely keep the ToList() call when working with a list of tasks

Is there a proper pattern for multiple ContinueWith methods

In the docs for TPL I found this line:
Invoke multiple continuations from the same antecedent
But this isn't explained any further. I naively assumed you could chain ContinueWiths in a pattern matching like manner until you hit the right TaskContinuationOptions.
TaskThatReturnsString()
.ContinueWith((s) => Console.Out.WriteLine(s.Result), TaskContinuationOptions.OnlyOnRanToCompletion)
.ContinueWith((f) => Console.Out.WriteLine(f.Exception.Message), TaskContinuationOptions.OnlyOnFaulted)
.ContinueWith((f) => Console.Out.WriteLine("Cancelled"), TaskContinuationOptions.OnlyOnCanceled)
.Wait();
But this doesn't work like I hoped for at least two reasons.
The continuations are properly chained so the 2nd ContinueWith gets the result form the 1st, that is implemented as new Task, basically the ContinueWith task itself. I realize that the String could be returned onwards, but won't that be a new task with other info lost?
Since the first option is not met, the Task is just cancelled. Meaning that the second set will never be met and the exceptions are lost.
So what do they mean in the docs when they say multiple continuations from the same antecedent?
Is there a proper patter for this or do we just have to wrap the calls in try catch blocks?
EDIT
So I guess this was what I was hoping I could do, note this is a simplified example.
public void ProccessAllTheThings()
{
var theThings = util.GetAllTheThings();
var tasks = new List<Task>();
foreach (var thing in theThings)
{
var task = util.Process(thing)
.ContinueWith((t) => Console.Out.WriteLine($"Finished processing {thing.ThingId} with result {t.Result}"), TaskContinuationOptions.OnlyOnRanToCompletion)
.ContinueWith((t) => Console.Out.WriteLine($"Error on processing {thing.ThingId} with error {t.Exception.Message}"), TaskContinuationOptions.OnlyOnFaulted);
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
Since this wasn't possible I was thinking I would have to wrap each task call in a try catch inside the loop so I wouldn't stop the process but not wait on it there. I wasn't sure what the correct way.
Sometimes a solution is just staring you in the face, this would work wouldn't it?
public void ProccessAllTheThings()
{
var theThings = util.GetAllTheThings();
var tasks = new List<Task>();
foreach (var thing in theThings)
{
var task = util.Process(thing)
.ContinueWith((t) =>
{
if (t.Status == TaskStatus.RanToCompletion)
{
Console.Out.WriteLine($"Finished processing {thing.ThingId} with result {t.Result}");
}
else
{
Console.Out.WriteLine($"Error on processing {thing.ThingId} - {t.Exception.Message}");
}
});
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
What you did is to create a sequential chain of multiple tasks.
What you need to do is attach all your continuation tasks to the first one:
var firstTask = TaskThatReturnsString();
var t1 = firstTask.ContinueWith (…);
var t2 = firstTask.ContinueWith (…);
var t3 = firstTask.ContinueWith (…);
Then you need to wait for all the continuation tasks:
Task.WaitAll (t1, t2, t3);

Task null using Task.Run and Parallel.For

I have two services that ultimately both update the same object, so we have a test to ensure that the writes to that object complete (Under the hood we have retry policies on each).
9 times out of 10, one or more of the theories will fail, with the task.ShouldNotBeNull(); always being the assertion to fail. What am i getting wrong with the async code in this sample? Why would the task be null?
[Theory]
[InlineData(1)]
[InlineData(5)]
[InlineData(10)]
[InlineData(20)]
public async Task ConcurrencyIssueTest(int iterations)
{
var orderResult = await _driver.PlaceOrder();
var tasksA = new List<Task<ApiResponse<string>>>();
var tasksB = new List<Task<ApiResponse<string>>>();
await Task.Run(() => Parallel.For(1, iterations,
x =>
{
tasksA.Add(_Api.TaskA(orderResult.OrderId));
tasksB.Add(_Api.TaskB(orderResult.OrderId));
}));
//Check all tasks return successful
foreach (var task in tasksA)
{
task.ShouldNotBeNull();
var result = task.GetAwaiter().GetResult();
result.ShouldNotBeNull();
result.StatusCode.ShouldBe(HttpStatusCode.OK);
}
foreach (var task in tasksB)
{
task.ShouldNotBeNull();
var result = task.GetAwaiter().GetResult();
result.ShouldNotBeNull();
result.StatusCode.ShouldBe(HttpStatusCode.OK);
}
}
}
There's no need for Tasks and Parrallel looping here. I'm presuming that your _api calls are IO bound? You want something more like this:
var tasksA = new List<Task<ApiResponse<string>>>();
var tasksB = new List<Task<ApiResponse<string>>>();
//fire off all the async tasks
foreach(var it in iterations){
tasksA.Add(_Api.TaskA(orderResult.OrderId));
tasksB.Add(_Api.TaskB(orderResult.OrderId));
}
//await the results
await Task.WhenAll(tasksA).ConfigureAwait(false);
foreach (var task in tasksA)
{
//no need to get GetAwaiter(), you've awaited above.
task.Result;
}
//to get the most out of the async only await them just before you need them
await Task.WhenAll(tasksB).ConfigureAwait(false);
foreach (var task2 in tasksB)
{
task2.Result;
}
this will fire all your api calls async then block while the results return. You Parallel for and tasks are just using additional thread pool threads to zero benefit.
If _api is CPU bound you could get benefit from Task.Run but I'm guessing these are web api or something. So the Task.Run is doing nothing but using an additional thread.
As others have suggested, remove the Parallel, and await on all tasks to finish before asserting them.
I would also recommend to remove .Result from each task, and await them instead.
public async Task ConcurrencyIssueTest(int iterations)
{
var orderResult = await _driver.PlaceOrder();
var taskA = _Api.TaskA(orderResult.OrderId);
var taskB = _Api.TaskB(orderResult.OrderId);
await Task.WhenAll(taskA, taskB);
var taskAResult = await taskA;
taskAResult.ShouldNotBeNull();
taskAResult.StatusCode.ShouldBe(HttpStatusCode.OK);
var taskBResult = await taskB;
taskBResult.ShouldNotBeNull();
taskBResult.StatusCode.ShouldBe(HttpStatusCode.OK);
}

How to do parallel.for async methods

Whats the best way to a parallel processing in c# with some async methods.
Let me explain with some simple code
Example Scenario: We have a person and 1000 text files from them. we want to check that his text files does not contain sensitive keywords, and if one of his text files contains sensitive keywords, we mark him with the untrusted. The method which check this is an async method, and as fast as we found one of the sensitive keywords further processing is not required and checking loop must be broke for that person.
For the best performance and making it so fast, we must use Parallel processing
simple psudocode:
boolean sesitivedetected=false;
Parallel.ForEach(textfilecollection,async (textfile,parallelloopstate)=>
{
if (await hassensitiveasync(textfile))
{
sensitivedetected=true;
parallelloopstate.break()
}
}
‌if (sensitivedetected)
markuntrusted(person)
Problem is that Parallel.ForEach don't wait until completion of async tasks so statement ‌if (sensitivedetected) is runned as soon as creating task are finished.
I read other Questions like write parallel.for with async and async/await and Parallel.For and lots of other pages.
This topics are usefull when you need the results of async methods to be collected and used later, but in my scenario execution of loop should be ended as soon as possible.
Update: Sample code:
Boolean detected=false;
Parallel.ForEach(UrlList, async (url, pls) =>
{
using (HttpClient hc = new HttpClient())
{
var result = await hc.GetAsync(url);
if ((await result.Content.ReadAsStringAsync()).Contains("sensitive"))
{
detected = true;
pls.Break();
}
}
});
if (detected)
Console.WriteLine("WARNING");
The simplest way to achieve what you need (and not what you want, because Threading is evil). Is to use ReactiveExtensions.
var firstSensitive = await UrlList
.Select(async url => {
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");
To limit the number of concurrent HTTP queries :
int const concurrentRequestLimit = 4;
var semaphore = new SemaphoreSlim(concurrentRequestLimit);
var firstSensitive = await UrlList
.Select(async url => {
await semaphore.WaitAsync()
try
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
finally
semaphore.Release();
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");

How to check that all tasks have been properly completed?

I have the following lines in my code:
var taskA = Task.Factory.StartNew(WorkA);
var taskB = Task.Factory.StartNew(WorkB);
var allTasks = new[] { taskA, taskB };
Task.Factory.ContinueWhenAll(allTasks, tasks => FinalWork(), TaskContinuationOptions.OnlyOnRanToCompletion);
But when I run this, I get the following error:
It is invalid to exclude specific continuation kinds for continuations off of multiple tasks.
Which is caused by the option TaskContinuationOptions.OnlyOnRanToCompletion.
My question is how to check that all tasks have done their work properly (all tasks statuses are RanToCompletion) and then do FinalWork()?
In the meantime, the application performs other tasks.
Based on #Peter Ritchie and #Ben McDougall answers I found a solution. I modified my code by removing redundant variable tasks and TaskContinuationOptions.OnlyOnRanToCompletion
var taskA = Task.Factory.StartNew(WorkA);
var taskB = Task.Factory.StartNew(WorkB);
var allTasks = new[] { taskA, taskB };
Task.Factory.ContinueWhenAll(allTasks, FinalWork);
Where FinalWork is:
private static void FinalWork(Task[] tasks)
{
if (tasks.All(t => t.Status == TaskStatus.RanToCompletion))
{
// do "some work"
}
}
If all tasks have status RanToCompletion, "some work" will be done. It will be performed immediately after all tasks have completed and will not block the main task.
If I cancel at least one of the tasks, nothing will be done.
Alternatively you can do this,
var taskA = Task.Factory.StartNew(WorkA);
var taskB = Task.Factory.StartNew(WorkB);
var allTasks = new[] { taskA, taskB };
var continuedTask = Task.WhenAll(allTasks).ContinueWith((antecedent) => { /*Do Work*/ }, TaskContinuationOptions.OnlyOnRanToCompletion));
You haven't provided any code that does anything with any of the tasks that ran to completion (your tasks variable is ignored). You'd get the same result if you simply removed TaskContinuationOptions.OnlyOnRanToCompletion. i.e. If you could use ContinueWhenAll with TaskContinuationOptions.OnlyOnRanToCompletion your continuation isn't going to be called until all tasks have either completed or failed. If you don't do anything with just the completed tasks, that's the same as Task.Factory.ContinueWhenAll(allTasks, tasks => FinalWork());
If there's something more specific that you want to do, please provide the details so that someone might be able to help you.
To answer the actual question you posed:
My question is how to check that all tasks have done their work properly (all tasks statuses are RanToCompletion) and then do FinalWork()? In the meantime, the application performs other tasks.
at least that is what I read as the question check the following code:
var taskA = Task.Factory.StartNew(WorkA);
var taskB = Task.Factory.StartNew(WorkB);
var allTasks = new[] { taskA, taskB };
taskA.Wait();
taskB.Wait();
if (taskA.Status == TaskStatus.RanToCompletion && taskB.Status == TaskStatus.RanToCompletion)
Task.Factory.ContinueWhenAll(allTasks, tasks => FinalWork());
else
//do something
You actually answered the question yourself with your question if you did mean that.

Categories