I have created a list of Task, like this:
public void A()
{
}
public void B()
{
}
public void C()
{
}
public void Ex()
{
Task.WaitAll(Task.Factory.StartNew(A), Task.Factory.StartNew(B), Task.Factory.StartNew(C));
var p=true;
}
Now my question is that. Will all the Tasks inside the list execute one by one or will they execute in parallel.
p=true
"p" is set when all tasks are done or before they are done?
For the first question:
Will those tasks execute one by one or asynchronously.
(here, I imagine you meant concurrently, which is not exactly the same)
Using StartNew will run your task in the current TaskScheduler. by default that means it will use the ThreadPool and, if there are any available slots in the thread pool, it will be run in parallel. If all slots are taken in the task pool, it may throttle the execution of the tasks in order to avoid the CPU to be overwhelmed, and the tasks may not be executed at the same concurrently: there are no guarantees.
This is a simplified explanation and a more complete and detailed explanation on the scheduling strategy is explained on the TaskScheduler documentation.
As a side note. The documentation for StartTask mentions a subtle difference between StartNew(Action) and Run(Action). They are not exactly equivalent, unlike stated in other answers.
Starting with the .NET Framework 4.5, you can use the Task.Run(Action) method as a quick way to call StartNew(Action) with default parameters. Note, however, that there is a difference in behavior between the two methods regarding : Task.Run(Action) by default does not allow child tasks started with the TaskCreationOptions.AttachedToParent option to attach to the current Task instance, whereas StartNew(Action) does.
For the second question
"p" is set when all tasks are done or before they are done?
The short answer is yes.
However, you should consider using another approach as this one will block your thread and waiting idly. The alternative is to give the control back to the caller if you can and so the thread is freed and can be used by the CPU. this is especially true if the thread on which this code is running is part of a ThreadPool.
therefore, you should prefer using WhenAll(). It returns a Task, which can be awaited or on which ContinueWith can be called
example:
var tasks = new Task[] {Task.Factory.StartNew(A), Task.Factory.StartNew(B), Task.Factory.StartNew(C)};
await Task.WhenAll(tasks);
first:
you are creating the tasks in wrong way. when you instantiate a task you need to call Start method on it otherwise it dose nothing.
new Task(() => /* Something * /).Start();
if you create Tasks the way you just did (by calling constructor and hitting start or using the TaskFacotry or even Task.Run) by default a ThreadPool thread will be dedicated to the task and thus the task is executed in parallel.
the Task.WhenAll Method will block the execution of calling method until all tasks which are passed to it are done executing.
so the boolean variable is set after all tasks are done.
Related
If we fill a list of Tasks that need to do both CPU-bound and I/O bound work, by simply passing their method declaration to that list (Not by creating a new task and manually scheduling it by using Task.Start), how exactly are these tasks handled?
I know that they are not done in parallel, but concurrently.
Does that mean that a single thread will move along them, and that single thread might not be the same thread in the thread pool, or the same thread that initially started waiting for them all to complete/added them to the list?
EDIT: My question is about how exactly these items are handled in the list concurrently - is the calling thread moving through them, or something else is going on?
Code for those that need code:
public async Task SomeFancyMethod(int i)
{
doCPUBoundWork(i);
await doIOBoundWork(i);
}
//Main thread
List<Task> someFancyTaskList = new List<Task>();
for (int i = 0; i< 10; i++)
someFancyTaskList.Add(SomeFancyMethod(i));
// Do various other things here --
// how are the items handled in the meantime?
await Task.WhenAll(someFancyTaskList);
Thank you.
Asynchronous methods always start running synchronously. The magic happens at the first await. When the await keyword sees an incomplete Task, it returns its own incomplete Task. If it sees a complete Task, execution continues synchronously.
So at this line:
someFancyTaskList.Add(SomeFancyMethod(i));
You're calling SomeFancyMethod(i), which will:
Run doCPUBoundWork(i) synchronously.
Run doIOBoundWork(i).
If doIOBoundWork(i) returns an incomplete Task, then the await in SomeFancyMethod will return its own incomplete Task.
Only then will the returned Task be added to your list and your loop will continue. So the CPU-bound work is happening sequentially (one after the other).
There is some more reading about this here: Control flow in async programs (C#)
As each I/O operation completes, the continuations of those tasks are scheduled. How those are done depends on the type of application - particularly, if there is a context that it needs to return to (desktop and ASP.NET do unless you specify ConfigureAwait(false), ASP.NET Core doesn't). So they might run sequentially on the same thread, or in parallel on ThreadPool threads.
If you want to immediately move the CPU-bound work to another thread to run that in parallel, you can use Task.Run:
someFancyTaskList.Add(Task.Run(() => SomeFancyMethod(i)));
If this is in a desktop application, then this would be wise, since you want to keep CPU-heavy work off of the UI thread. However, then you've lost your context in SomeFancyMethod, which may or may not matter to you. In a desktop app, you can always marshall calls back to the UI thread fairly easily.
I assume you don't mean passing their method declaration, but just invoking the method, like so:
var tasks = new Task[] { MethodAsync("foo"),
MethodAsync("bar") };
And we'll compare that to using Task.Run:
var tasks = new Task[] { Task.Run(() => MethodAsync("foo")),
Task.Run(() => MethodAsync("bar")) };
First, let's get the quick answer out of the way. The first variant will have lower or equal parallelism to the second variant. Parts of MethodAsync will run the caller thread in the first case, but not in the second case. How much this actually affects the parallelism depends entirely on the implementation of MethodAsync.
To get a bit deeper, we need to understand how async methods work. We have a method like:
async Task MethodAsync(string argument)
{
DoSomePreparationWork();
await WaitForIO();
await DoSomeOtherWork();
}
What happens when you call such a method? There is no magic. The method is a method like any other, just rewritten as a state machine (similar to how yield return works). It will run as any other method until it encounters the first await. At that point, it may or may not return a Task object. You may or may not await that Task object in the caller code. Ideally, your code should not depend on the difference. Just like yield return, await on a (non-completed!) task returns control to the caller of the method. Essentially, the contract is:
If you have CPU work to do, use my thread.
If whatever you do would mean the thread isn't going to use the CPU, return a promise of the result (a Task object) to the caller.
It allows you to maximize the ratio of what CPU work each thread is doing. If the asynchronous operation doesn't need the CPU, it will let the caller do something else. It doesn't inherently allow for parallelism, but it gives you the tools to do any kind of asynchronous operation, including parallel operations. One of the operations you can do is Task.Run, which is just another asynchronous method that returns a task, but which returns to the caller immediately.
So, the difference between:
MethodAsync("foo");
MethodAsync("bar");
and
Task.Run(() => MethodAsync("foo"));
Task.Run(() => MethodAsync("bar"));
is that the former will return (and continue to execute the next MethodAsync) after it reaches the first await on a non-completed task, while the latter will always return immediately.
You should usually decide based on your actual requirements:
Do you need to use the CPU efficiently and minimize context switching etc., or do you expect the async method to have negligible CPU work to do? Invoke the method directly.
Do you want to encourage parallelism or do you expect the async method to do interesting amounts of CPU work? Use Task.Run.
Here is your code rewritten without async/await, with old-school continuations instead. Hopefully it will make it easier to understand what's going on.
public Task CompoundMethodAsync(int i)
{
doCPUBoundWork(i);
return doIOBoundWorkAsync(i).ContinueWith(_ =>
{
doMoreCPUBoundWork(i);
});
}
// Main thread
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
Task task = CompoundMethodAsync(i);
tasks.Add(task);
}
// The doCPUBoundWork has already ran synchronously 10 times at this point
// Do various things while the compound tasks are progressing concurrently
Task.WhenAll(tasks).ContinueWith(_ =>
{
// The doIOBoundWorkAsync/doMoreCPUBoundWork have completed 10 times at this point
// Do various things after all compound tasks have been completed
});
// No code should exist here. Move everything inside the continuation above.
When I need some parallel processing I usually do it like this:
static void Main(string[] args)
{
var tasks = new List<Task>();
var toProcess = new List<string>{"dog", "cat", "whale", "etc"};
toProcess.ForEach(s => tasks.Add(CanRunAsync(s)));
Task.WaitAll(tasks.ToArray());
}
private static async Task CanRunAsync(string item)
{
// simulate some work
await Task.Delay(10000);
}
I had cases when this did not process the items in parallel and had to use Task.Run to force it to run on different threads.
What am I missing?
Task means "a thing that needs doing, which may have already completed, may be executing on a parallel thread, or may be depending on out-of-process data (sockets, etc), or might just be ... connected to a switch somewhere that says 'done'" - it has very little to do with threading, other than: if you schedule a continuation (aka await), then somehow that will need to get back onto a thread to fire, but how that happens and what that means is up to whatever code created and owns the task.
Note: parallelism can be expressed in terms of multiple tasks (if you so choose), but multiple tasks doesn't imply parallelism.
In your case: it all depends on what CanRun does or is - and we don't know that. It should also probably be called CanRunAsync.
I had cases when this did not process the items in parallel and had to use Task.Run to force it to run on different threads.
Most likely these cases were associated with methods that have an asynchronous contract, but their implementation is synchronous. Like this method for example:
static async Task NotAsync(string item)
{
Thread.Sleep(10000); // Simulate a CPU-bound calculation, or a blocking I/O operation
await Task.CompletedTask;
}
Any thread that invokes this method will be blocked for 10 seconds, and then it will be handed back an already completed task. Although the contract of the NotAsync method is asynchronous (it has an awaitable return type), its actual implementation is synchronous because it does all the work during the invocation. So when you try to create multiple tasks by invoking this method:
toProcess.ForEach(s => tasks.Add(NotAsync(s)));
...the current thread will be blocked for 10 seconds * number of tasks. When these tasks are created they are all completed, so waiting for their completion will cause zero waiting:
Task.WaitAll(tasks.ToArray()); // Waits for 0 seconds
By wrapping the NotAsync in a Task.Run you ensure that the current thread will not be blocked, because the NotAsync will be invoked on the ThreadPool.
toProcess.ForEach(s => tasks.Add(Task.Run(() => NotAsync(s))));
The Task.Run returns immediately a Task, with guaranteed zero blocking.
It should be noted that writing asynchronous methods with synchronous implementations violates Microsoft's guidelines:
An asynchronous method that is based on TAP can do a small amount of work synchronously, such as validating arguments and initiating the asynchronous operation, before it returns the resulting task. Synchronous work should be kept to the minimum so the asynchronous method can return quickly.
But sometimes even Microsoft violates this guideline. That's because violating this one is better than violating the guideline about not exposing asynchronous wrappers for synchronous methods. In order words exposing APIs that call Task.Run internally in order to give the impression of being asynchronous, is an even greater sin than blocking the current thread.
I've seen how the await keyword is implemented and resulting structure it creates. I think I have a rudimentary understanding of it. However, is
public async Task DoWork()
{
await this.Operation1Async();
await this.Operation2Async();
await this.Operation3Async();
}
"better" (generally speaking) or
public async Task DoWork()
{
await this.Operation1Async();
this.Operation2();
this.Operation3();
}
The problem with the first approach is that it is creating a new Task for each await call? Which entails a new thread?
Whereas the first creates a new Task on the first await and then everything from there is processed in the new Task?
Edit
Ok maybe I wasn't too clear, but if for example we have
while (await reader.ReadAsync())
{
//...
}
await reader.NextResultAsync();
// ...
Is this not creating two tasks? One in the main thread with the first ReadAsync then another task in this newly created task with the NextResultAsync. My question is there really a need for the second task, isn't
the one task created in the main thread sufficient? So
while (await reader.ReadAsync())
{
//...
}
reader.NextResult();
// ...
it is creating a new Task for each await call? Which entails a new thread?
Yes and no. Yes, it is creating a Task for each asynchronous method; the async state machine will create one. However, these tasks are not threads, nor do they even run on threads. They do not "run" anywhere.
You may find some blog posts of mine useful:
async intro, which explains how async/await work.
There Is No Thread, which explains how tasks can work without threads.
Intro to the Task type, which explains how some tasks (Delegate Tasks) have code and run on threads, but the tasks used by async (Promise Tasks) do not.
Whereas the first creates a new Task on the first await and then everything from there is processed in the new Task?
Not at all. Tasks only complete once, and the method will not continue past the await until that task is complete. So, the task returned by Operation1Async has already completed before Operation2 is even called.
The 2 examples are not functionally equivalent so you would choose the one depending on your specific needs. In the first example the 3 tasks are executed sequentially, whereas in the second example the second and third tasks are running in parallel without waiting for their result to complete. Also in the second example the DoWork method could return before the second and third tasks have completed.
If you want to ensure that the tasks have completed before leaving the DoWork method body you might need to do this:
public async Task DoWork()
{
await this.Operation1Async();
this.Operation2().GetAwaiter().GetResult();
this.Operation3().GetAwaiter().GetResult();
}
which of course is absolutely terrible and you should never be doing it as it is blocking the main thread in which case you go with the first example. If those tasks use I/O completion ports then you should definitely take advantage of them instead of blocking the main thread.
If on the other hand you are asking whether you should make Operation2 and Operation3 asynchronous, then the answer is this: If they are doing I/O bound stuff where you can take advantage of I/O Completion Ports then you should absolutely make them async and go with the first approach. If they are CPU bound operations where you cannot use IOCP then it might be better to leave them synchronous because it wouldn't make any sense to execute this CPU bound operations in a separate task which you would block for anyway.
The problem with the first approach is that it is creating a new Task for each await call? Which entails a new thread?
This is your misunderstanding, which is leading to you to be suspicious of the code in the first example.
A Task does not entail a new thread. A Task certainly can be run on a new thread if you want to do so, but an important use of tasks is when a task directly or indirectly works through asynchronous i/o, which happens when the task, or one that it in turn awaits on, uses async i/o to access files or network streams (e.g. web or database access) allowing a pooled thread to be returned to the pool until that i/o has completed.
As such if the task does not complete immediately (which may happen if e.g. its purpose could be fulfilled entirely from currently-filled buffers) the thread currently running it can be returned to the pool and can be used to do something else in the meantime. When the i/o completes then another thread from the pool can take over and complete that waiting, which can then finish the waiting in a task waiting on that, and so on.
As such the first example in your question allows for fewer threads being used in total, especially when other work will also being using threads from the same pool.
In the second example once the first await has completed the thread that handled its completion will be blocking on the synchronous equivalent methods. If other operations also need to use threads from the pool then that thread not being returned to it, fresh threads will have to be spun up. As such the second example is the example that will need more threads.
One is not better than the other, they do different things.
In the first example, each operation is scheduled and performed on a thread, represented by a Task. Note: It's not guaranteed what thread they happen on.
The await keyword means (loosely) "wait until this asynchronous operation has finished and then continue". The continuation, is not necessarily done on the same thread either.
This means example one, is a synchronous processing of asynchronous operations. Now just because a Task is created, it doesn't infer a Thread is also created, there is a pool of threads the TaskScheduler uses, which have already been created, very minimal overhead is actually introduced.
In your second example, the await will call the first operation using the scheduler, then call the next two as normal. No Task or Thread is created for the second two calls, nor is it calling methods on a Task.
In the first example, you can also look into making your asynchronous calls simultaneous. Which will schedule all three operations to run "at the same time" (not guaranteed), and wait until they have all finished executing.
public async Task DoWork()
{
var o1 = this.Operation1Async();
var o2 = this.Operation2Async();
var o3 = this.Operation3Async();
await Task.WhenAll(o1, o2, o3);
}
I am trying to understand the code I wrote,
for (int i = 0; i< 5; i++)
{
ExecuteCMD("/c robocopy C:\\Source D:\\Source /MIR", true);
}
public async void ExecuteCMD(string cmdLine, bool waitForExit)
{
await Task.Factory.StartNew(() =>
{
ExecuteProcess proc = new ExecuteProcess(cmdLine, waitForExit);
proc.Execute();
} );
}
The async method ExecuteCMD will run in a loop for 5 times. I know async doesn't create new threads. So, are there 5 objects created with the same name ('proc') in the same thread? Please explain
Many thanks in advance!
do you mean your ExecuteProcess proc Object? This is a local variable of your lambda function. So there is no conflict in your code.
The Lambda
() =>
{
ExecuteProcess proc = new ExecuteProcess(cmdLine, waitForExit);
proc.Execute();
}
is called 5 Times but every call creates only one instance of ExecuteProcess for the variable proc.
You are using Task.Factory.StartNew, so you will most likely (see Stephen Cleary's comment) end up on the default TaskScheduler, which happens to execute work on thread pool threads. Your ExecuteProcess allocation and Execute call will therefore occur 5 times, as expected, on thread pool threads (see above point re default scheduler) - and most likely in parallel to each other, and in parallel to your for loop (this last part might be difficult to wrap your head around, but that's the whole problem with async void - the execution order is non-deterministic; more on that later).
You are sort of right in that async/await does not necessarily create new threads. async/await is all about chaining tasks and their continuations so that they execute in correct order with respect to each other. Where the actual Task runs is determined by how that Task is created. Here you are explicitly requesting your work to be pushed out to the thread pool, because that is where Tasks created by Task.Factory.StartNew execute.
Others have pointed out that you might be using async void erroneously - possibly due to lack of understanding. async void is only good for scheduling work in a fire-and-forget manner, and this work needs to be self-contained, complete with its own exception handling and concurrency controls. Because your async void will run unobserved, in parallel to the rest of your code. It's like saying: "I want this piece of code to run at some point in the future. I don't need to know when it completes or whether it raised any exceptions - I'll just wait until it hits the first await, and then carry on executing my own work - and the rest of the async void will proceed on its own, in parallel, without supervision". Because of this, if you put some code after your for loop, it will most likely execute before your ExecuteProcess work, which may or may not be what you want.
Here's a step-by-step view of what actually happens in your application.
You hit the first iteration of the for loop on the main thread. The runtime calls ExecuteCMD, as if it were any other, synchronous method call. It enters the method and executes the code preceding the await, still as part of the first for loop iteration, on the main thread. Then it schedules some work on the thread pool via Task.Factory.StartNew. This work will execute at some point in the future on, let's say, thread pool thread #1. The Task returned by Task.Factory.StartNew is then awaited. This task cannot possibly complete synchronously, so the await schedules the rest of your async void to run in the future (on the main thread, after the task created by Task.Factory.StartNew has completed) and yields. At this point your ExecuteProcess work probably hasn't even started yet, but the main thread is already free to jump to the second iteration of the for loop. It does exactly that, which results in another task being scheduled to run on, say, thread pool thread #2, at some point in the future - followed by yet another continuation scheduled to run on the main thread. The for loop then jumps to the next item.
By the time your for loop ends, you will most likely have 5 Tasks waiting to execute on thread pool threads, followed by 5 continuation Tasks waiting to execute on the main thread. They will all complete at some point in the future. Your code won't know when because you told it that you don't care (by using async void).
I'm trying to implement coroutines using async/await, and for that I want to ensure my coroutines are only executing on one thread (the thread that resumes them).
I am currently using a custom awaiter which simply queues the continuation on the coroutine object. When a coroutine wants to yield, it awaits this custom awaiter. When a coroutine is resumed, it simply calls the continuation.
I can guarantee that only one continatuion is queued per resume, ie. that we don't create multiple tasks without awaiting them. I can also guarantee that we will only be awaiting tasks that ultimately await the custom awaiter or other tasks that await the custom awaiter. That is, we won't be awating any "external" tasks.
An example would be something like this:
private static async Task Sleep(int ms)
{
Stopwatch timer = Stopwatch.StartNew();
do
{
await Coroutine.Yield();
}
while (timer.ElapsedMilliseconds < ms);
}
private static async Task Test()
{
// Second resume
await Sleep(1000);
// Unknown how many resumes
}
private static async Task Main()
{
// First resume
await Coroutine.Yield();
// Second resume
await Test();
}
It all seems to work and it seems like the continuations to the tasks are indeed executed inline on the same thread. I just want to be sure that this behavior is consistent and that I can rely on it.
I was able to check the reference source, and have found what I think is the place continuations are executed. The path to this function seems pretty complex though, and I cannot determine just what call exactly leads to this function being called (but I assume it is some compiler generated call).
Now, from this function, it seems like the continuation is not inlined if:
The current thread is aborting
This should not be a problem, as the current thread is willingly executing the coroutine, and we shouldn't be executing a coroutine if we're aborting.
IsValidLocationForInlining is false
This property is false if the current synchronization context is non default, or the current task scheduler is non default. As a precaution I am doing SynchronizationContext.SetSynchronizationContext(null) for the duration of the continuation, when resuming a coroutine. I will also be ensuring that the task scheduler is the default.
Now, my actual question is if I can rely on this behavior. Is this something that is likely to change in .NET versions? Would it be better to implement a custom synchronization context which ensured that all continuations were run by the coroutine?
Furthermore, I know the task libraries have changed a lot from .NET 4 to .NET 4.5. The reference source is for .NET 4.5, as far as I know, so I want to know if someone knows if this behavior has changed. I will be using the coroutines library on .NET 4.0 with Microsoft.Bcl.Async mainly, and it also seems to work fine here.
I'm trying to implement coroutines using async/await, and for that I
want to ensure my coroutines are only executing on one thread (the
thread that resumes them).
I think you can safely rely upon this behavior. This should be true as long as you do not use any of the following features:
Custom TPL task schedulers;
Custom synchronization contexts;
ConfiguredTaskAwaitable or Task.ConfigureAwait();
YieldAwaitable or Task.Yield();
Task.ContinueWith();
Anything which may lead to a thread switch, like an async I/O API, Task.Delay(), Task.Run(), Task.Factory.StartNew(), ThreadPool.QueueUserWorkItem() etc.
The only thing that you use here is TaskAwaiter, more about it below.
First of all, you should not be worried about a thread switch inside the task, where you do await Coroutine.Yield(). The code will be resumed exactly on the same thread where you explicitly call the continuation callback, you have complete control over this.
Secondly, the only Task object you have here is that generated by the state machine logic (specifically, by AsyncTaskMethodBuilder). This is the task returned by Test(). As mentioned above, a thread switch inside this task may not take place, unless you do it explicitly before calling the continuation callback via your custom awaiter.
So, your only remaining concerned is about a thread switch which may happen at the point where you're awaiting the result of the task returned by Test(). That's where TaskAwaiter comes into play.
The behavior of TaskAwaiter is undocumented. As far as I can tell from the Reference Sources, IsValidLocationForInlining is not observed for the TaskContinuation kind of continuation (created by TaskAwaiter). The present behavior is the following: the continuation will be not inlined if the current thread was aborted or if the current thread's synchronization context is different from that captured by TaskAwaiter.
If you don't want to rely upon this, you can create another custom awaiter to replace TaskAwaiter for your coroutine tasks. You could implement it using Task.ContinueWith( TaskContinuationOptions.ExecuteSynchronously), which behavior is unofficially documented by Stephen Toub in his "When "“ExecuteSynchronously” doesn't execute synchronously" blog post. To sum up, the ExecuteSynchronously continuation won't be inlined under the following severe conditions:
the current thread was aborted;
the current thread has stack overflows;
the target task scheduler rejects inlining (but you're not using custom task schedulers; the default one always promotes inlining where possible).