I was watching a video called Becoming a C# Time Lord and at 0:35:36 this code popped up:
async Task<TResult[]> PurelyWhenAll<TResult> (params Task<TResult>[] tasks)
{
var killJoy = new TaskCompletionSource<TResult[]>();
foreach ( var task in tasks )
task.ContinueWith(ant =>
{
if ( ant.IsCanceled )
killJoy.TrySetCanceled();
else if ( ant.IsFaulted )
killJoy.TrySetException(ant.Exception.InnerException);
});
return await await Task.WhenAny(killJoy.Task, Task.WhenAll(tasks));
}
Does this mean that a task returns a task and because of that we have double await? If that is the case what happens concerning performance if we have more than two awaits? Is this good practice, should this be avoided?
Task.WhenAny is going to return a Task<Task<TResult>>:
Awaiting the result of Task.WhenAny() will return the first task that completed
Awaiting that task will return the results of the task, i.e. a TResult[].
You might find it easy to understand with explanatory variables:
var firstCompletedTask = await Task.WhenAny(killJoy.Task, Task.WhenAll(tasks));
var firstResult = await firstCompletedTask;
return firstResult;
It's not clear why you're concerned around the performance of this - it's just two await expressions, not particularly different to any other method with two await expressions.
It's pretty natural to do this when using Task.WhenAny<TResult>(Task<TResult>[]), given that the return type is a Task<Task<TResult>>.
You don't have to worry about the performance of two awaits on the same task. If it has already completed, you just get the value returned without the task running again.
Related
After browsing different await vs Task.Run questions on SO my takeaway is that await is better for I/O operations and Task.Run for CPU-intensive operations. However the code with await seems to always be longer than Task.Run. Below is an example method of how I am creating a context object with await in my app:
public async AppContext CreateAppContext()
{
var context = new AppContext();
var customersTask = _dataAccess.GetCustomers();
var usersTask = _dataAccess.GetUsers();
var reportsTask = _dataAccess.GetReports();
context.Customers = await customersTask;
context.Users = await usersTask;
context.Reports = await reportsTask;
return context;
}
If I was to rewrite this with Task.Run I could do
public async AppContext CreateAppContext()
{
var context = new AppContext();
await Task.WhenAll(new[]
{
Task.Run(async () => { context.Customers = await _dataAccess.GetCustomers(); }),
Task.Run(async () => { context.Users = await _dataAccess.GetUsers(); }),
Task.Run(async () => { context.Reports = await _dataAccess.GetReports(); });
})
return context;
}
The difference is not major when I create an object with 3 properties but I have objects where I need to initialize 20+ properties in this manner which makes the await code a lot longer (nearly double) than Task.Run. Is there a way for me to initialize the object using await with code that is not a lot longer than what I can do with Task.Run?
Personally, I prefer an asynchronous factory pattern over that kind of code; but if you need to do concurrent asynchronous work like that multiple times, then you'll probably want to write your own helper method.
The BCL-provided WhenAll works best when either all tasks have no results, or when all tasks have the same type of result. One fairly common approach to help WhenAll work with different types of tasks is to return a tuple of results, which can then be deconstructed into different variables if desired. E.g.,
public static class TaskEx
{
public static async Task<(T1, T2, T3)> WhenAll<T1, T2, T3>(Task<T1> task1, Task<T2> task2, Task<T3> task3)
{
await Task.WhenAll(task1, task2, task3);
return (await task1, await task2, await task3);
}
}
// Usage
public async AppContext CreateAppContext()
{
var context = new AppContext();
(context.Customers, context.Users, conext.Reports) =
await TaskEx.WhenAll(
_dataAccess.GetCustomers(),
_dataAccess.GetUsers(),
_dataAccess.GetReports());
return context;
}
You can even define a tuple GetAwaiter extension method if you want, to make it more implicit:
// Usage
public async AppContext CreateAppContext()
{
var context = new AppContext();
(context.Customers, context.Users, conext.Reports) =
await (
_dataAccess.GetCustomers(),
_dataAccess.GetUsers(),
_dataAccess.GetReports());
return context;
}
There are a couple of disadvantages to these approaches, though. First, you have to define as many overloads as you need. Second, the multiple-assignment code is not very nice; it's fine for 2 or 3 properties, but would get ugly (IMO) if done for much more than that.
So I think what you really want is a custom delegate form of WhenAll. Something like this should work:
public static class TaskEx
{
public static async Task WhenAll(params Func<Task>[] tasks)
{
return Task.WhenAll(tasks.Select(action => action()));
}
}
// Usage
public async AppContext CreateAppContext()
{
var context = new AppContext();
await TaskEx.WhenAll(
async () => context.Customers = await _dataAccess.GetCustomers(),
async () => context.Users = await _dataAccess.GetUsers(),
async () => conext.Reports = await _dataAccess.GetReports());
return context;
}
Since this solution avoids dealing with the different result types entirely, multiple overloads aren't needed.
If you really want to keep your general pattern (which I'd avoid - it would be much better to do all the work and then assign all the results at the same time; look into the return value of Task.WhenAll), all you need is a simple helper method:
static async Task Assign<T>(Action<T> assignment, Func<Task<T>> getValue)
=> assignment(await getValue());
Then you can use it like this:
await Task.WhenAll
(
Assign(i => context.Customers = i, _dataAccess.GetCustomers),
Assign(i => context.Users = i, _dataAccess.GetUsers),
Assign(i => context.Reports = i, _dataAccess.GetReports)
);
There's many other ways to make this even simpler, but this is the most equivalent to your Task.Run code without having to involve another thread indirection just to do an assignment. It also avoids the very common mistake when you happen to use the wrong Task.Run overload and get a race condition (as Task.Run returns immediately instead of waiting for the result).
Also, you misunderstood the "await vs. Task.Run" thing. There's actually not that much difference between await and Task.Run in your code - mainly, it forces a thread switch (and a few other subtle things). The argument is against using Task.Run to run synchronous code; that wastes a thread waiting for a thing to complete, your code doesn't.
Do keep in mind that WhenAll comes with its own complications, though. While it does mean you don't have to worry about some of the tasks ending up unobserved (and not waited on!), it also means you have to completely rethink your exception handling, since you're going to get an AggregateException rather than anything more specific. If you're relying on error handling based on identifying exceptions, you need to be very careful. Usually, you don't want AggregateException to leak out of methods - it's very difficult to handle in a global manner; the only method that knows the possibilities of what can happen is the method that calls the WhenAll. Hopefully.
It's definitely a good idea to run parallel operations like this in a way that cannot produce dangerous and confusing side-effects. In your code, you either get a consistent object returned, or you get nothing - that's exactly the right approach. Be wary of this approach leaking into other contexts - it can get really hard to debug issues where randomly half of the operations succeed and other half fails :)
Is there a way for me to initialize the object using await with code that is not a lot longer than what I can do with Task.Run?
If you want to run all tasks in parallel - in short no, you cant shorten number of lines. Also note that those two snippets are not fully functionally equivalent - see Why should I prefer single await Task.WhenAll over multiple awaits.
You can simplify (and maybe even improve performance a bit) your Task.WhenAll approach by introducing a method which will await and assign. Something along these lines:
public async AppContext CreateAppContext()
{
var context = new AppContext();
await Task.WhenAll(
AwaitAndAssign(val => context.Customers = val, _dataAccess.GetCustomers()),
AwaitAndAssign(val => context.Users = val, _dataAccess.Users()),
AwaitAndAssign(val => context.Reports = val, _dataAccess.GetReports())
);
return context;
async Task AwaitAndAssign<T>(Action<T> assign, Task<T> valueTask) =>
assign(await valueTask);
}
I have a situation where I am interested in the first successful response from an array of services that each support the method
Task<Try<SearchResponse>> PerformSearch(SearchRequest request);
The Try class is a container for a Good/Bad result (like error Monad)
The call to the list of services currently is this
var searchResponses = await Task.WhenAll(
_searchServices.Select(s => s.PerformSearch(request)));
return searchResponses.FirstOrBad(sr=>sr.IsGood);
Where FirstOrBad is an extension method that finds the first good result or returns a composite Bad Try with a concatenation of all the errors.
As far as I understand the problem with this is that due to the WhenAll the time to find the first good result is limited by the slowest response.
I want to continue execution as soon as I receive the first positive result but not the first (2nd ... etc) result if it is not successful, but also continue execution if all results return unsuccessfully, reporting the lack of success.
I would have thought this is a common problem but have found little when searching for examples. It maybe known by some other term than scatter gather.
Something like this should work for you
public static async Task<Try<T>> FirstOrBad<T>(this IEnumerable<Task<Try<T>>> tasks, Func<Try<T>, bool> predicate)
{
var taskList = tasks.ToList();
var completed = new List<Task<Try<T>>>();
Task<Try<T>> completedTask;
do
{
completedTask = await Task.WhenAny(taskList);
completed.Add(completedTask);
taskList.Remove(completedTask);
} while (!predicate(await completedTask) && taskList.Any());
return !predicate(await completedTask) ? new Try<T>(completed.ToString(",")) : await completedTask;
}
Adapter from this answer TPL wait for task to complete with a specific return value
I understand that calling task.Result in an async method can lead to deadlocks. I have a different twist on the question, though...
I find myself doing this pattern a lot. I have several tasks that return the same types of results, so I can await on them all at once. I want to process the results separately, though:
Task<int> t1 = m1Async();
Task<int> t2 = m2Async();
await Task.WhenAll(t1, t2);
Is it ok to call Result here, since I know the tasks are now completed?
int result1 = t1.Result;
int result2 = t2.Result;
Or, should I use await still...it just seems redundant and can be a bit uglier depending on how I need to process the results:
int result1 = await t1;
int result2 = await t2;
Update: Someone marked my question as a duplicate of this one: Awaiting multiple Tasks with different results.
The question is different, which is why I didn't find it in my searches, though one of the detailed answers there does answer may question, also.
There's nothing inherently wrong or bad about using t1.Result after you've already done an await, but you may be opening yourself up to future issues. What if someone changes the code at the beginning of your method so you can no longer be positive the Tasks have completed successfully? And what if they don't see your code further down that makes this assumption?
Seems to me that it might be better to use the returned value from your first await.
Task<int> t1 = m1Async();
Task<int> t2 = m2Async();
var results = await Task.WhenAll(t1, t2);
int result1 = results[0];
int result2 = results[1];
That way, if someone messes with the first await, there's a natural connection for them to follow to know that your code later is dependent on its result.
You may also want to consider whether Task.WhenAll() is really giving you any value here. Unless you're hoping to tell the difference between one task failing and both failing, it might just be simple to await the tasks individually.
Task<int> t1 = m1Async();
Task<int> t2 = m2Async();
int result1 = await t1;
int result2 = await t2;
The doc says that Task.Result it is equivalent to calling the Wait method.
And when Wait is called
“What does Task.Wait do?”
... If the Task ran to completion, Wait will return successfully.
(from https://blogs.msdn.microsoft.com/pfxteam/2009/10/15/task-wait-and-inlining/)
So you can assume that when you call Task.Result, it will return successfully not leading to the deadlocks you mentioned.
Just want to know what is the order of the task result when using WhenAll and ContinueWith.
Are these results gurenteed to be in the same order with task Id?
I wrote below code
public static async Task<string> PrintNumber(int number)
{
return await Task.Delay(number*1000).ContinueWith(_=>
{
Console.WriteLine(number);return "TaskId:"+Task.CurrentId+" Result:"+number;
});
}
public static void Main()
{
Task.WhenAll(new[]
{
PrintNumber(3),
PrintNumber(2),
PrintNumber(1),
}).ContinueWith((antecedent) =>
{
foreach(var a in antecedent.Result)
{
Console.WriteLine(a);
}
});
}
and run that several times in linqpad getting the same result
1
2
3
TaskId:15 Result:3
TaskId:14 Result:2
TaskId:13 Result:1
or
1
2
3
TaskId:18 Result:3
TaskId:17 Result:2
TaskId:16 Result:1
With that specific invocation, the argument of a Task[] -- the order is not guaranteed.
In fact, according to the Task.WhenAll(Task[]) documentation there is no mention of order whatsoever. But if you use the Task.WhenAll(IEnumerable<Task<TResult>>) overload it reads as follows:
If none of the tasks faulted and none of the tasks were canceled, the resulting task will end in the RanToCompletion state. The Result of the returned task will be set to an array containing all of the results of the supplied tasks in the same order as they were provided (e.g. if the input tasks array contained t1, t2, t3, the output task's Result will return an TResult[] where arr[0] == t1.Result, arr1 == t2.Result, and arr[2] == t3.Result).
When you call Task.WhenAll (with either an enumerable or a params array), the order of the results match the order of the tasks passed to that method.
That is to say, this is true:
var task1 = PrintNumber(3);
var task2 = PrintNumber(2);
var task3 = PrintNumber(1);
var taskResults = await Task.WhenAll(task1, task2, task3);
// taskResults[0] is the same as task1.Result
// taskResults[1] is the same as task2.Result
// taskResults[2] is the same as task3.Result
However, ContinueWith is an entirely different story. ContinueWith attaches a continuation, and this continuation will run sometime after the task completes.
In your particular code, you're not attaching a continuation to the task passed to Task.WhenAll. But if you were, then that continuation could run anytime after that task completed.
On a side note, don't use ContinueWith (as I explain on my blog). Just use await instead; the resulting code is more correct, cleaner, and easier to maintain.
I asked a question yesterday and, unfortunately, even with the answers provided, I'm still hitting up on stumbling blocks about how to do things correctly... My issue is that my code actually works, but I'm a complete novice at concurrency programming and it feels like I'm not programming the correct way and, most importantly, I'm afraid of developing bad habits.
To make up a simplistic example to elaborate on yesterday's question, suppose I had the following methods:
static Task<IEnumerable<MyClass>> Task1(CancellationToken ct)
static Task<IEnumerable<int>> Task2(CancellationToken ct, List<string> StringList)
static Task<IEnumerable<String>> Task3(CancellationToken ct)
static Task<IEnumerable<Double>> Task4(CancellationToken ct)
static Task Task5(CancellationToken ct, IEnumerable<int> Task2Info, IEnumerable<string> Task3Info, IEnumerable<double> Task4Info)
static Task Task6(CancellationToken ct, IEnumerable<int> Task2Info, IEnumerable<MyClass> Task1Info)
And the code I've written that utilizes them looks as follows:
static Task execute(CancellationToken ct)
{
IEnumerable<MyClass> Task1Info = null;
List<string> StringList = null;
IEnumerable<int> Task2Info = null;
IEnumerable<string> Task3Info = null;
IEnumerable<double> Task4Info = null;
var TaskN = Task.Run(() =>
{
Task1Info = Task1(ct).Result;
}
, ct)
.ContinueWith(res =>
{
StringList = Task1Info.Select(k=> k.StringVal).ToList();
Task2Info = Task2(ct, StringList).Result;
}
, ct);
return Task.Run(() =>
{
return Task.WhenAll
(
TaskN,
Task.Run(() => { Task3Info = Task3(ct).Result; }, ct),
Task.Run(() => { Task4Info = Task4(ct).Result; }, ct)
)
.ContinueWith(res =>
{
Task5(ct, Task2Info, Task3Info, Task4Info).Wait();
}
, ct)
.ContinueWith(res =>
{
Task6(ct, Task2Info, Task1Info).Wait();
}
, ct);
});
}
In other words:
I need the results of Task1 to calculate StringList and to run Task2
Task2, Task3 and Task4 can all run concurrently
I need the return values from all of the above for later method calls
Once these are run, I use their results to run Task5
Once Task5 is run, I use all the results in running Task6
As a simple explanation, imagine the first portion is data gathering, the second is data cleansing and the third data reporting
Like I said, my challenge is that this actually runs, but I simply feel that it's more of a "hack" that the right way to program - Concurrency programming is very new to me and I definitely want to learn the best ways this should be done...
I would feel better about my answer if I could see how your TaskN methods were implemented. Specifically, I would want to validate the need for your TaskN method calls to be wrapped inside calls to Task.Run() if they are already returning a Task return value.
But personally, I would just use the async-await style of programming. I find it fairly easy to read/write. Documentation can be found here.
I would then envision your code looking something like this:
static async Task execute(CancellationToken ct)
{
// execute and wait for task1 to complete.
IEnumerable<MyClass> Task1Info = await Task1(ct);
List<string> StringList = Task1Info.Select(k=> k.StringVal).ToList();
// start tasks 2 through 4
Task<IEnumerable<int>> t2 = Task2(ct, StringList);
Task<IEnumerable<string>> t3 = Task3(ct);
Task<IEnmerable<Double>> t4 = Task4(ct);
// now that tasks 2 to 4 have been started,
// wait for all 3 of them to complete before continuing.
IEnumerable<int> Task2Info = await t2;
IEnumerable<string> Task3Info = await t3;
IEnumerable<Double> Task4Info = await t4;
// execute and wait for task 5 to complete
await Task5(ct, Task2Info, Task3Info, Task4Info);
// finally, execute and wait for task 6 to complete
await Task6(ct, Task2Info, Task1Info);
}
I don't fully understand the code in question but it is very easy to link tasks by dependency. Example:
var t1 = ...;
var t2 = ...;
var t3 = Task.WhenAll(t1, t2).ContinueWith(_ => RunT3(t1.Result, t2.Result));
You can express an arbitrary dependency DAG like that. This also makes it unnecessary to store into local variables, although you can do that if you need.
That way you also can get rid of the awkward Task.Run(() => { Task3Info = Task3(ct).Result; }, ct) pattern. This is equivalent to Task3(ct) except for the store to the local variable which probably should not exist in the first place.
.ContinueWith(res =>
{
Task5(ct, Task2Info, Task3Info, Task4Info).Wait();
}
This probably should be
.ContinueWith(_ => Task5(ct, Task2Info, Task3Info, Task4Info)).Unwrap()
Does that help? Leave a comment.
it feels like I'm not programming the correct way and, most importantly, I'm afraid of developing bad habits.
I definitely want to learn the best ways this should be done...
First, distinguish between asynchronous and parallel, which are two different forms of concurrency. An operation is a good match for "asynchronous" if it doesn't need a thread to do its work - e.g., I/O and other logical operations where you wait for something like timers. "Parallelism" is about splitting up CPU-bound work across multiple cores - you need parallelism if your code is maxing out one CPU at 100% and you want it to run faster by using other CPUs.
Next, follow a few guidelines for which APIs are used in which situations. Task.Run is for pushing CPU-bound work to other threads. If your work isn't CPU-bound, you shouldn't be using Task.Run. Task<T>.Result and Task.Wait are even more on the parallel side; with few exceptions, they should really only be used for dynamic task parallelism. Dynamic task parallelism is extremely powerful (as #usr pointed out, you can represent any DAG), but it's also low-level and awkward to work with. There are almost always better approaches. ContinueWith is another example of an API that is for dynamic task parallelism.
Since your method signatures return Task, I'm going to assume that they're naturally asynchronous (meaning: they don't use Task.Run, and are probably implemented with I/O). In that case, Task.Run is the incorrect tool to use. The proper tools for asynchronous work are the async and await keywords.
So, taking your desired behavior one at a time:
// I need the results of Task1 to calculate StringList and to run Task2
var task1Result = await Task1(ct);
var stringList = CalculateStringList(task1Result);
// Task2, Task3 and Task4 can all run concurrently
var task2 = Task2(ct, stringList);
var task3 = Task3(ct);
var task4 = Task4(ct);
await Task.WhenAll(task2, task3, task4);
// I need the return values from all of the above for later method calls
var task2Result = await task2;
var task3Result = await task3;
var task4Result = await task4;
// Once these are run, I use their results to run Task5
await Task5(ct, task2Result, task3Result, task4Result);
// Once Task5 is run, I use all the results in running Task6
await Task6(ct, task2Result, task1Result);
For more about what APIs are appropriate in which situations, I have a Tour of Task on my blog, and I cover a lot of concurrency best practices in my book.