One of the differences between Task.Run and TaskFactory.StartNew is the addition of the new DenyChildAttach option by default. But AttachedToParent was created for a reason. In what situation would you want to use attached child tasks?
The use case for AttachedToParent is when you have a nested dynamic task parallelism scenario. That is:
You have an algorithm to execute in parallel (that is, CPU code, not I/O operations), and
The tasks your algorithm must perform vary while the algorithm is running (that is, at the time your algorithm starts, it doesn't know how many tasks it needs), and
There is a hierarchical or parent/child relationship in the completion of these tasks (that is, the "parent" should not be considered complete until all its children are complete, and if any children are failed/canceled, then the parent should be failed/canceled as well, even if its code did not error).
Since the vast majority of concurrency problems are I/O-based (not CPU-based), and since the vast majority of parallelism scenarios are data-based parallelism (not dynamic task parallelism), and since dynamic task parallel problems may or may not have a hierarchical nature, this scenario almost never comes up.
Unfortunately, there is a logical parent/child relationship between tasks (including asynchronous tasks), which have caused a lot of developers to incorrectly attempt to use the AttachedToParent flag with async tasks. Thus, the introduction of the DenyChildAttach flag (which prevents AttachedToParent from taking effect).
The only situation I can think of is made obsolete by async-await. E.g. if you want a task to wait on its child tasks you will await them.
var parentTask = Task.Run(async () =>
{
await Task.Run(() => Thread.Sleep(1000));
Console.WriteLine("parent task completed");
});
But in .Net 4.0 you would have to Wait() on them. E.g.
var parentTask = Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => Thread.Sleep(1000)).Wait();
Console.WriteLine("parent task completed");
});
Unlike the first example this will block the thread until the child task is complete. With attached tasks we can get the same behaviour like this.
var parentTask = Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => Thread.Sleep(1000), TaskCreationOptions.AttachedToParent)
.ContinueWith(antecedent => Console.WriteLine("parent task completed", TaskContinuationOptions.AttachedToParent);
});
Related
I love PLINQ. In all of my test cases on various parallelization patterns, it performs well and consistently. But I recently ran into a question that has me a little bothered: what is the functional difference between these two examples? What, hopefully if anything, qualifies the PLINQ example to not be similar or equivalent to the following anti-pattern?
PLINQ:
public int PLINQSum()
{
return Enumerable.Range(0, N)
.AsParallel()
.Select((x) => x + 1)
.Sum();
}
Sync over async:
public int AsyncSum()
{
var tasks = Enumerable.Range(0, N)
.Select((x) => Task.Run(() => x + 1));
return Task.WhenAll(tasks).Result.Sum();
}
The AsyncSum method is not an example of Sync over Async. It is an example of using the Task.Run method with the intention of parallelizing a calculation. You might think that Task = async, but it's not. The Task class was introduced with the .NET Framework 4.0 in 2010, as part of the Task Parallel Library, two years before the advent of the async/await technology with the .NET Framework 4.5 in 2012.
What is Sync over Async: We use this term to describe a situation where an asynchronous API is invoked and then waited synchronously, causing a thread to be blocked until the completion of the asynchronous operation. It is implied that the asynchronous API has a truly asynchronous implementation, meaning that it uses no thread while the operation is in-flight. Most, but not all, of the asynchronous APIs that are built-in the .NET platform have truly asynchronous implementations.
The two examples in your question are technically different, but not because one of them is Sync over Async. None of them is. Both are parallelizing a synchronous operation (the mathematical addition x + 1), that cannot be performed without utilizing the CPU. And when we use the CPU, we use a thread.
Characterizing the AsyncSum method as anti-pattern might be fair, but not because it is Sync over Async. You might want to call it anti-pattern because:
It allocates and schedules a Task for each number in the sequence, incurring a gigantic overhead compared to the tiny computational work that has to be performed.
It saturates the ThreadPool for the whole duration of the parallel operation.
It forces the ThreadPool to create additional threads, resulting in oversubscription (more threads than CPUs). This results in the operating system having more work to do (switching between threads).
It has bad behavior in case of exceptions. Instead of stopping the operation as soon as possible after an error has occurred, it will invoke the lambda invariably for all elements in the sequence. As a result you'll have to wait for longer until you observe the error, and finally you might observe a huge number of errors.
It doesn't utilize the current thread. The current thread is blocked doing nothing, while all the work is done by ThreadPool threads. In comparison the PLINQ utilizes the current thread as one of its worker threads. This is something that you could also do manually, by creating some of the tasks with the Task constructor (instead of Task.Run), and then use the RunSynchronously method in order to run them on the current thread, while the rest of the tasks are scheduled on the ThreadPool.
var task1 = new Task<int>(() => 1 + 1); // Cold task
var task2 = Task.Run(() => 2 + 2); // Hot task scheduled on the ThreadPool
task1.RunSynchronously(); // Run the cold task on the current thread
int sum = Task.WhenAll(task1, task2).Result.Sum(); // Wait both tasks
The name AsyncSum itself is inappropriate, since there is nothing asynchronous happening inside this method. A better name could be WhenAll_TaskRun_Sum.
If we fill a list of Tasks that need to do both CPU-bound and I/O bound work, by simply passing their method declaration to that list (Not by creating a new task and manually scheduling it by using Task.Start), how exactly are these tasks handled?
I know that they are not done in parallel, but concurrently.
Does that mean that a single thread will move along them, and that single thread might not be the same thread in the thread pool, or the same thread that initially started waiting for them all to complete/added them to the list?
EDIT: My question is about how exactly these items are handled in the list concurrently - is the calling thread moving through them, or something else is going on?
Code for those that need code:
public async Task SomeFancyMethod(int i)
{
doCPUBoundWork(i);
await doIOBoundWork(i);
}
//Main thread
List<Task> someFancyTaskList = new List<Task>();
for (int i = 0; i< 10; i++)
someFancyTaskList.Add(SomeFancyMethod(i));
// Do various other things here --
// how are the items handled in the meantime?
await Task.WhenAll(someFancyTaskList);
Thank you.
Asynchronous methods always start running synchronously. The magic happens at the first await. When the await keyword sees an incomplete Task, it returns its own incomplete Task. If it sees a complete Task, execution continues synchronously.
So at this line:
someFancyTaskList.Add(SomeFancyMethod(i));
You're calling SomeFancyMethod(i), which will:
Run doCPUBoundWork(i) synchronously.
Run doIOBoundWork(i).
If doIOBoundWork(i) returns an incomplete Task, then the await in SomeFancyMethod will return its own incomplete Task.
Only then will the returned Task be added to your list and your loop will continue. So the CPU-bound work is happening sequentially (one after the other).
There is some more reading about this here: Control flow in async programs (C#)
As each I/O operation completes, the continuations of those tasks are scheduled. How those are done depends on the type of application - particularly, if there is a context that it needs to return to (desktop and ASP.NET do unless you specify ConfigureAwait(false), ASP.NET Core doesn't). So they might run sequentially on the same thread, or in parallel on ThreadPool threads.
If you want to immediately move the CPU-bound work to another thread to run that in parallel, you can use Task.Run:
someFancyTaskList.Add(Task.Run(() => SomeFancyMethod(i)));
If this is in a desktop application, then this would be wise, since you want to keep CPU-heavy work off of the UI thread. However, then you've lost your context in SomeFancyMethod, which may or may not matter to you. In a desktop app, you can always marshall calls back to the UI thread fairly easily.
I assume you don't mean passing their method declaration, but just invoking the method, like so:
var tasks = new Task[] { MethodAsync("foo"),
MethodAsync("bar") };
And we'll compare that to using Task.Run:
var tasks = new Task[] { Task.Run(() => MethodAsync("foo")),
Task.Run(() => MethodAsync("bar")) };
First, let's get the quick answer out of the way. The first variant will have lower or equal parallelism to the second variant. Parts of MethodAsync will run the caller thread in the first case, but not in the second case. How much this actually affects the parallelism depends entirely on the implementation of MethodAsync.
To get a bit deeper, we need to understand how async methods work. We have a method like:
async Task MethodAsync(string argument)
{
DoSomePreparationWork();
await WaitForIO();
await DoSomeOtherWork();
}
What happens when you call such a method? There is no magic. The method is a method like any other, just rewritten as a state machine (similar to how yield return works). It will run as any other method until it encounters the first await. At that point, it may or may not return a Task object. You may or may not await that Task object in the caller code. Ideally, your code should not depend on the difference. Just like yield return, await on a (non-completed!) task returns control to the caller of the method. Essentially, the contract is:
If you have CPU work to do, use my thread.
If whatever you do would mean the thread isn't going to use the CPU, return a promise of the result (a Task object) to the caller.
It allows you to maximize the ratio of what CPU work each thread is doing. If the asynchronous operation doesn't need the CPU, it will let the caller do something else. It doesn't inherently allow for parallelism, but it gives you the tools to do any kind of asynchronous operation, including parallel operations. One of the operations you can do is Task.Run, which is just another asynchronous method that returns a task, but which returns to the caller immediately.
So, the difference between:
MethodAsync("foo");
MethodAsync("bar");
and
Task.Run(() => MethodAsync("foo"));
Task.Run(() => MethodAsync("bar"));
is that the former will return (and continue to execute the next MethodAsync) after it reaches the first await on a non-completed task, while the latter will always return immediately.
You should usually decide based on your actual requirements:
Do you need to use the CPU efficiently and minimize context switching etc., or do you expect the async method to have negligible CPU work to do? Invoke the method directly.
Do you want to encourage parallelism or do you expect the async method to do interesting amounts of CPU work? Use Task.Run.
Here is your code rewritten without async/await, with old-school continuations instead. Hopefully it will make it easier to understand what's going on.
public Task CompoundMethodAsync(int i)
{
doCPUBoundWork(i);
return doIOBoundWorkAsync(i).ContinueWith(_ =>
{
doMoreCPUBoundWork(i);
});
}
// Main thread
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
Task task = CompoundMethodAsync(i);
tasks.Add(task);
}
// The doCPUBoundWork has already ran synchronously 10 times at this point
// Do various things while the compound tasks are progressing concurrently
Task.WhenAll(tasks).ContinueWith(_ =>
{
// The doIOBoundWorkAsync/doMoreCPUBoundWork have completed 10 times at this point
// Do various things after all compound tasks have been completed
});
// No code should exist here. Move everything inside the continuation above.
I've got a fairly simple application using Task.WhenAll. The issue I am facing so far is that I don't know if I should start the subtasks myself or let WhenAll start them as appropriate.
The examples online show using tasks from framework methods, where it's not clear to me if the tasks returned have already started or not. However I've created my own tasks with an Action, so it's a detail that I have to address.
When I'm using Task.WhenAll, should I start the constituent tasks directly, or should I let Task.WhenAll handle it for fun, profit, and improved execution speed?
For further fun, the subtasks contain lots of blocking I/O.
WhenAll won't start tasks for you. You have to start them yourself.
var unstartedTask = new Task(() => {});
await Task.WhenAll(unstartedTask); // this task won't complete until unstartedTask.Start()
However, generally, tasks created (e.g. using Task.Run, async methods, etc.) have already been started. So you generally don't have to take a separate action to start the task.
var task = Task.Run(() => {});
await Task.WhenAll(task); // no need for task.Start()
I've created my own tasks with an Action
When you're working with asynchronous tasks, the convention is to only deal with tasks already in progress. So using the Task constructor and Start is inappropriate; it would be better to use Task.Run.
As others have noted, Task.WhenAll only aggregates the tasks; it does not start them for you.
Task.WhenAll(IEnumerable) will handle the supplied tasks for you, but you can create them using the most common way - by executing Task.Run(Action) or TaskFactory.StartNew(Action) method.
Just for a note: if any of the tasks is completed in Faulted state, resulting task will complete in Faulted state as well, having AggregateException set to its Exception property.
This might be the worst StackOverflow title I've ever written. What I'm actually trying to do is execute an asynchronous method that uses the async/await convention (and itself contains additional await calls) from within a synchronous method multiple times in parallel while maintaining the same thread throughout the execution of each branch of the parallel execution, including for all await continuations. To put it another way, I want to execute some async code synchronously, but I want to do it multiple times in parallel. Now you can see why the title was so bad. Perhaps this is best illustrated with some code...
Assume I have the following:
public class MyAsyncCode
{
async Task MethodA()
{
// Do some stuff...
await MethodB();
// Some other stuff
}
async Task MethodB()
{
// Do some stuff...
await MethodC();
// Some other stuff
}
async Task MethodC()
{
// Do some stuff...
}
}
The caller is synchronous (from a console application). Let me try illustrating what I'm trying to do with an attempt to use Task.WaitAll(...) and wrapper tasks:
public void MyCallingMethod()
{
List<Task> tasks = new List<Task>();
for(int c = 0 ; c < 4 ; c++)
{
MyAsyncCode asyncCode = new MyAsyncCode();
tasks.Add(Task.Run(() => asyncCode.MethodA()));
}
Task.WaitAll(tasks.ToArray());
}
The desired behavior is for MethodA, MethodB, and MethodC to all be run on the same thread, both before and after the continuation, and for this to happen 4 times in parallel on 4 different threads. To put it yet another way, I want to remove the asynchronous behavior of my await calls since I'm making the calls parallel from the caller.
Now, before I go any further, I do understand that there's a difference between asynchronous code and parallel/multi-threaded code and that the former doesn't imply or suggest the latter. I'm also aware the easiest way to achieve this behavior is to remove the async/await declarations. Unfortunately, I don't have the option to do this (it's in a library) and there are reasons why I need the continuations to all be on the same thread (having to do with poor design of said library). But even more than that, this has piqued my interest and now I want to know from an academic perspective.
I've attempted to run this using PLINQ and immediate task execution with .AsParallel().Select(x => x.MethodA().Result). I've also attempted to use the AsyncHelper class found here and there, which really just uses .Unwrap().GetAwaiter().GetResult(). I've also tried some other stuff and I can't seem to get the desired behavior. I either end up with all the calls on the same thread (which obviously isn't parallel) or end up with the continuations executing on different threads.
Is what I'm trying to do even possible, or are async/await and the TPL just too different (despite both being based on Tasks)?
The methods that you are calling do not use ConfigureAwait(false). This means that we can force the continuations to resume in a context we like. Options:
Install a single-threaded synchronization context. I believe Nito.Async has that.
Use a custom TaskScheduler. await looks at TaskScheduler.Current and resumes at that scheduler if it is non-default.
I'm not sure if there are any pros and cons for either option. Option 2 has easier scoping I think. Option 2 would look like:
Task.Factory.StartNew(
() => MethodA()
, new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler).Unwrap();
Call this once for each parallel invocation and use Task.WaitAll to join all those tasks. Probably you should dispose of that scheduler as well.
I'm (ab)using ConcurrentExclusiveSchedulerPair here to get a single-threaded scheduler.
If those methods are not particularly CPU-intensive you can just use the same scheduler/thread for all of them.
You can create 4 independent threads, each one executes MethodA with a limited-concurrency (actually, no concurrency at all) TaskScheduler. That will ensure that every Task, and continuation Tasks, that the thread creates, will be executed by that thread.
public void MyCallingMethod()
{
CancellationToken csl = new CancellationToken();
var threads = Enumerable.Range(0, 4).Select(p =>
{
var t = new Thread(_ =>
{
Task.Factory.StartNew(() => MethodA(), csl, TaskCreationOptions.None,
new LimitedConcurrencyLevelTaskScheduler(1)).Wait();
});
t.Start();
return t;
}).ToArray();
//You can block the main thread and wait for the other threads here...
}
That won't ensure you a 4th degree parallelism, of course.
You can see an implementation of such TaskScheduler in MSDN - https://msdn.microsoft.com/en-us/library/ee789351(v=vs.110).aspx
I am currently trying to do some performance optimization by using Tasks to take advantage of parallel threading in .Net 4.0.
I have made three methods that returns collections of some objects or just an object.
Lets call them MethodA, MethodB and MethodC.
Inside of MethodB I have a long-running delay - approximately 5-7 sec.
var person = new Person();
person.A = Task.Factory.StartNew(() => Mystatic.MethodA()).Result;
person.B = Task.Factory.StartNew(() => Mystatic.MethodB()).Result;
person.C = Task.Factory.StartNew(() => Mystatic.MethodC()).Result;
Now I am expecting person.A and person.C properties to be set / populated before person.B, but I have difficulties testing / debugging it to verify my assumptions.
I have added a parallel watch on all three properties, but debugging through is not clarifying things for me.
Also is this the proper way for me to optimize multiple calls to methods, if I am populating a main object?
In my case I have 5-7 different methods to gather data from, and I would like to make them parallel as some of them are relatively time-consuming.
Relying on timing is buggy by principle because timings are never guaranteed, especially not under real word conditions. You should apply proper waiting on tasks if you need to ensure that they have completed. Or use C# 5.0 async-await, or continuations.
In short, don't program by coincidence. Make your programs correct by construction.
Your assumption is wrong. The Result property of Task will effectively wait for the task to complete - so, A, B , C will be assigned in that sequence. Also, this will defeat the purpose of creating async tasks.
One way is you can use Task.WaitAll on all three tasks and then assign the result from each task to A,B,C
you can also use async/await if you have VS2012 - you can still target .NET 4.0 by using http://nuget.org/packages/Microsoft.Bcl.Async/
However understand that async/await will not start 3 tasks in parallel if you do this:
person.A = await Task.Run(() => Mystatic.MethodA());
person.B = await Task.Run(() => Mystatic.MethodB());
person.C = await Task.Run(() => Mystatic.MethodC());
it will still be sequential - if you want parallel execution to some degree you can do this:
Task tA = Task.Run(() => Mystatic.MethodA());
Task tB = Task.Run(() => Mystatic.MethodB());
Task tC = Task.Run(() => Mystatic.MethodC());
person.A = await tA;
person.B = await tB;
person.C = await tC;