Run I\O bunch threads in asynchronous manner [duplicate] - c#

Ok, so basically I have a bunch of tasks (10) and I want to start them all at the same time and wait for them to complete. When completed I want to execute other tasks. I read a bunch of resources about this but I can't get it right for my particular case...
Here is what I currently have (code has been simplified):
public async Task RunTasks()
{
var tasks = new List<Task>
{
new Task(async () => await DoWork()),
//and so on with the other 9 similar tasks
}
Parallel.ForEach(tasks, task =>
{
task.Start();
});
Task.WhenAll(tasks).ContinueWith(done =>
{
//Run the other tasks
});
}
//This function perform some I/O operations
public async Task DoWork()
{
var results = await GetDataFromDatabaseAsync();
foreach (var result in results)
{
await ReadFromNetwork(result.Url);
}
}
So my problem is that when I'm waiting for tasks to complete with the WhenAll call, it tells me that all tasks are over even though none of them are completed. I tried adding Console.WriteLine in my foreach and when I have entered the continuation task, data keeps coming in from my previous Tasks that aren't really finished.
What am I doing wrong here?

You should almost never use the Task constructor directly. In your case that task only fires the actual task that you can't wait for.
You can simply call DoWork and get back a task, store it in a list and wait for all the tasks to complete. Meaning:
tasks.Add(DoWork());
// ...
await Task.WhenAll(tasks);
However, async methods run synchronously until the first await on an uncompleted task is reached. If you worry about that part taking too long then use Task.Run to offload it to another ThreadPool thread and then store that task in the list:
tasks.Add(Task.Run(() => DoWork()));
// ...
await Task.WhenAll(tasks);

If you want to run those task's parallel in different threads using TPL you may need something like this:
public async Task RunTasks()
{
var tasks = new List<Func<Task>>
{
DoWork,
//...
};
await Task.WhenAll(tasks.AsParallel().Select(async task => await task()));
//Run the other tasks
}
These approach parallelizing only small amount of code: the queueing of the method to the thread pool and the return of an uncompleted Task. Also for such small amount of task parallelizing can take more time than just running asynchronously. This could make sense only if your tasks do some longer (synchronous) work before their first await.
For most cases better way will be:
public async Task RunTasks()
{
await Task.WhenAll(new []
{
DoWork(),
//...
});
//Run the other tasks
}
To my opinion in your code:
You should not wrap your code in Task before passing to Parallel.ForEach.
You can just await Task.WhenAll instead of using ContinueWith.

Essentially you're mixing two incompatible async paradigms; i.e. Parallel.ForEach() and async-await.
For what you want, do one or the other. E.g. you can just use Parallel.For[Each]() and drop the async-await altogether. Parallel.For[Each]() will only return when all the parallel tasks are complete, and you can then move onto the other tasks.
The code has some other issues too:
you mark the method async but don't await in it (the await you do have is in the delegate, not the method);
you almost certainly want .ConfigureAwait(false) on your awaits, especially if you aren't trying to use the results immediately in a UI thread.

The DoWork method is an asynchronous I/O method. It means that you don't need multiple threads to execute several of them, as most of the time the method will asynchronously wait for the I/O to complete. One thread is enough to do that.
public async Task RunTasks()
{
var tasks = new List<Task>
{
DoWork(),
//and so on with the other 9 similar tasks
};
await Task.WhenAll(tasks);
//Run the other tasks
}
You should almost never use the Task constructor to create a new task. To create an asynchronous I/O task, simply call the async method. To create a task that will be executed on a thread pool thread, use Task.Run. You can read this article for a detailed explanation of Task.Run and other options of creating tasks.

Just also add a try-catch block around the Task.WhenAll
NB: An instance of System.AggregateException is thrown that acts as a wrapper around one or more exceptions that have occurred. This is important for methods that coordinate multiple tasks like Task.WaitAll() and Task.WaitAny() so the AggregateException is able to wrap all the exceptions within the running tasks that have occurred.
try
{
Task.WaitAll(tasks.ToArray());
}
catch(AggregateException ex)
{
foreach (Exception inner in ex.InnerExceptions)
{
Console.WriteLine(String.Format("Exception type {0} from {1}", inner.GetType(), inner.Source));
}
}

Related

WhenAll Task.Run Tasks Hang when Exceptions Occur

I have a minimal example of some async code that is exhibiting strange behavior. This is sandbox code, more geared at trying to understand async better --
private async Task ExhibitStrangeBehaviorAsync()
{
async Task TaskA()
{
await Task.Run(async () =>
{
throw new Exception(nameof(TaskA));
await Task.Yield();
});
}
async Task TaskB()
{
await Task.Run(() =>
{
throw new Exception(nameof(TaskB));
});
}
var tasks = new List<Task>
{
TaskA(),
TaskB(),
};
var tasksTask = Task.WhenAll(tasks);
try
{
await tasksTask;
}
catch
{
Debug.WriteLine(tasksTask.Exception.Message);
}
}
Intermittently, this code will hang. I would like to better understand why. My guess currently is the intermittent nature is due to the out-of-order execution of the aggregated tasks, and/or this line from Asynchronous Programming:
Lambda expressions in LINQ use deferred execution, meaning code could end up executing at a time when you're not expecting it to.
TaskA would fall in this category.
The code does not seem to hang if TaskB also Task.Runs an async lambda, or if neither local Task function contains Task.Run, e.g.
async Task TaskA()
{
//await Task.Run(async () =>
//{
throw new Exception(nameof(TaskA));
await Task.Yield();
//});
}
async Task TaskB()
{
//await Task.Run(() =>
//{
throw new Exception(nameof(TaskB));
//});
}
Can anybody shed some light on what's going on here?
EDIT
This is executing in the context of a UI thread, specifically that of a Xamarin.Forms application.
EDIT 2
Here is another variant on it that runs straight out of the Xamarin.Forms OnAppearing lifecycle method. I had to modify TaskA/B slightly, though they break this way with the original setup above as well.
protected async override void OnAppearing()
{
base.OnAppearing();
async Task TaskA()
{
await Task.Run(async () =>
{
throw new InvalidOperationException();
await Task.Delay(1).ConfigureAwait(false);
}).ConfigureAwait(false);
}
async Task TaskB()
{
await Task.Run(() => throw new ArgumentException()).ConfigureAwait(false);
}
var tasks = new List<Task>
{
TaskA(),
TaskB(),
};
var tasksTask = Task.WhenAll(tasks);
try
{
await tasksTask;
}
catch
{
Debug.WriteLine(tasksTask.Exception.Message);
}
}
There is some chance this may be related to another issue I have had - I am using an older version of Xamarin.Forms, one which has problems with its OnAppearing handling async correctly. I am going to try with a newer version to see if it resolves the issue.
I'm going to guess you're calling ExhibitStrangeBehaviorAsync().Wait() in a GUI app (or similar), as that's the only situation which I think of which can cause a deadlock here. This answer is written on that assumption.
The deadlock is this one, cause by the fact that you're running an await on a thread which has a SynchronizationContext installed on it, and then blocking that same thread higher up the callstack with a call to .Wait().
When you run TaskA() and TaskB(), both of those methods post some work to the ThreadPool, which takes a variable amount of time. When the ThreadPool gets around to actually executing the throw statements, this causes the Task returned from TaskA / TaskB to complete with an exception.
tasksTask will complete when the two tasks returned from TaskA and TaskB complete.
The race comes from the fact that, at the point that this line is executed:
await tasksTask;
the task tasksTask may or may not have completed. tasksTask will have completed if the Tasks returned from TaskA and TaskB have completed, so it's a race against the main thread's progression towards the await tasksTask line, against how fast the ThreadPool can run both of those throw statements.
If tasksTask is completed, the await happens synchronously (it's smart enough to check whether the Task being awaited has already completed), and there's no chance of a deadlock. If tasksTask hasn't completed, then my guess is that you're hitting the deadlock described here.
This is also consistent with your observation that removing the calls to Task.Run removes the deadlock. In this case, the tasks returned from TaskA and TaskB complete synchronously as well, so there's no race.
The morale of the story is, as is oft-repeated, do not mix async and sync code. Do not call .Wait() or .Result on a Task which is influenced in any way by await. Also consider tactical use of .ConfigureAwait(false) to guard against other people calling .Wait() on tasks which you produce.

How to make two Tasks run with an even distribution of cpu time

They start out even, but eventually the processTasks never gets hits.
Originally I had this as two threads when the tasks were simple. Someone suggested async/await tasks and being new to c# I had no reason to doubt them.
Task monitorTasks= new Task (monitor.start );
Task processTasks= new Task( () => processor.process(ref param, param2) );
monitorTasks.Start();
processTasks.Start();
await processTasks;
Have I executed this wrong? Is my problem inevitable while running two tasks? Should they be threads? How to avoid.
edit
To clarify. The tasks are never intended to end. They will always be processing and monitoring while triggering events that notify watchers of monitor outputs or processor outputs.
If you await on a Task.WhenAll then it will wait until all tasks have been processed
await Task.WhenAll(monitorTasks, processTasks)
https://msdn.microsoft.com/en-us/library/system.threading.tasks.task.whenall(v=vs.110).aspx
Task.WaitAll blocks the current thread until everything has completed.
Task.WhenAll returns a task which represents the action of waiting until everything has completed.
Task.WhenAll Method
Creates a task that will complete when all of the Task objects in an
enumerable collection have completed.
Task.WaitAll Method
Waits for all of the provided Task objects to complete execution.
If you want to block wait on started tasks (which is seemingly what you want)
Task monitorTasks= new Task (monitor.start );
Task processTasks= new Task( () => processor.process(ref param, param2) );
monitorTasks.Start();
processTasks.Start();
Task.WaitAll(new Task[]{monitorTasks,processTasks})
If you are using async await, see Asynchronous programming with async and await
Then you could do something like this
var task1 = DoWorkAsync();
var task2 = DoMoreWorkAsync();
await Task.WhenAll(task1, task2);
I couldn't get tasks to run evenly.
The monitor task was getting constantly flooded whereas the processor task was getting tasks less frequently, which is when I suspect the monitor task took over.
Since no one could help me,
My solution was to turn them back into threads and set the priority of the threads.Lower than normal for the monitor task, and higher than normal for the processor task.
This seems to have solved my problem.

Wrapping an synchronus method to run asynchronously

I may get voted as a duplicate of
wrapping-synchronous-code-into-asynchronous-call
However, My question is what if I don't care about the results of the task?
More detail: I want the task to run and whether or not it finishes, I want the task to run again in another thread.
To explain specifically, I have a service that reads a table and gets all the records that have not been processed. The task does it's business, but I don't know if there is one record or 1000 records to be processed in that instance. If it is 1000, it will take several minutes, one will take less than a second, usually the task finds nothing to do. So in that case I don't need to know the results of the previous task nor wait for it to finish.
I have read on async and if I do not consume the await result, I read that it will not continue. Is that true?
The example is:
private async Task MakeRequest()
{
mainTimer.Stop();
var task = Task.Run(() => Exec());
await task;
mainTimer.Start();
}
does this mean that the task is not ready to run again unless I have the "await task" command?
await task;
States that you want to wait for task completion and continue execute next lines.
When you run this:
Task.Run(() => Exec());
It start to execute new task immediately, doesn't matter did you use await next or not. You can run the same function in a task as many times as you want without using keyword await. But you need make sure that your function can handle concurrent execution.
If you need to get result of your task without using await, you can continue execution on the same thread on which task was started:
task.ContinueWith(task=> {});
if you don't care about the result, then use a void function:
private async void MakeRequest()
{
mainTimer.Stop();
var task = Task.Run(() => Exec());
await task;
mainTimer.Start();
}
Also, not consuming a task does not block anything, it will finish even if you don't await it.

How should I be using the TPL if I want to do implement this threading concept?

Let's say I have a method like SaveAsync(Item item) and I need to call it on 10 Items and the calls are independent of one another. I imagine the ideal way in terms of threading is like
Thread A | Run `SaveAsync(item1)` until we hit the `await` | ---- ... ---- | Run `SaveAsync(item10)` until we hit the `await` | ---------------------------------------|
Thread B | --------------------------------------------------- | Run the stuff after the `await` in `SaveAsync(item1)` | ------------------ ... -----------------------|
Thread C | ------------------------------------------------------ | Run the stuff after the `await` in `SaveAsync(item2)` | ------------------ ... --------------------|
.
.
.
(with it being possible that some of the stuff after the await for multiple items is run in the same thread, perhaps even Thread A)
I'm wondering how to write that in C#? Is it a parallel foreach or a loop with with await SaveAsync(item) or what?
Per default async tasks will always return to the thread context they were started on. You can change this by adding
await task.ConfigureAwait(false)
This allows tells the runtime that you do not care on which thread context the task will resume and the runtime can omit the capture of the current thread context (which is quite costly).
However per default you will always be scheduled on the thread context that started the task.
There are a fewer default contexts, such as the ui thread context or the thread pool context. A task started on the ui thread context will be scheduled back to the ui thread context.
A tasks started on the thread pool context will be scheduled to the next free thread from the pool. Not necessarily the same thread the task was started on.
However you can provide your own context if you need more control over the task scheduling.
How to start multiple task in a fashion as you described above. A loop will not help here. Lets take this example.
foreach(var item in items)
{
await SaveAsync(item);
}
The await here will wait until the SaveAsync finishes. So all saves are processed in sequence.
How to save truly asynchronous?
The trick is to start all tasks, but not await them, until all tasks are started. You then wait all tasks with WhenAll(IEnumerable<Task>).
Here an example.
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync(item)); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
Because of the missing await, all "Save-Actions" are placed in the Async/Await state machine. As soon as the first task yields back, the second will be executed. This will result in a behavior somewhat similar to the one described in your question.
The only main difference here, is all tasks are executed in the same thread. This is most of time complete ok, because all Save methods usually need to access the same resources. Parallelizing them gives no real advantage, because the bottleneck is this resource.
How to use mutliple threads
You can execute a task on a new thread by using
Task.Run(SaveAsync(item));
This will execute the thread on a new thread taken from the thread pool, but there is no wait to start a new thread and finish the method on the ui thread.
To execute all items on different thread, you can use nearly the same code as before:
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(Task.Run(SaveAsync(item));); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
The only difference is here, that we take the taks returned form StartNew.
One remark: Using Task.Run does not guarantee you a new thread. It will execute the task on the next free thread from the thread pool. This depends on your local settings as well as the local configuration (e.g. a heavy barebone server will have a lot more threads than any consumer laptop).
Whether you get a new thread or you have to wait for any occupied thread to finish is completely up to the thread pool. (The tread pool usually does a really great job. For more info, here a really great article on the thread pool performance: CLR-Thread-Pool)
This is where people do most of the mistakes with async/await:
1) Either people think, that everything after calling async method, with/without awaiting, does translate to ThreadPool thread.
2) Or people think that async does run synchronously.
The truth is somewhere between and #Iqon's statement about next block of code is actually incorrect: "The only main difference here, is all tasks are executed in the same thread."
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync(item)); // No await here
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
To make statement like this would suggest that the async method SaveAsync(item) is actually capable to execute fully and completely synchronously.
Here are examples:
async Task SaveAsync1(Item item)
{
//no awaiting at all
}
async Task SaveAsync2(Item item)
{
//awaiting already completed task
int i = await Task.FromResult(0);
}
Methods like these would really run synchronously on thread it the async task was executed on. But these kind async methods are special snowflakes. There is no operation awaited here, everything is commplete even when await is inside the method, because it does not await on first case and does await on completed task in second case, so it synchronously continues after await and these two calls would be same:
var taskA = SaveAsync2(item);//it would return task runned to completion
//same here, await wont happen as returned task was runned to completion
await SaveAsync2(item);
So, making statements, that executing async method here synchronously is correct only in this special case:
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync2(item));
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
And there is no need to store tasks and await Task.WhenAll(tasks), it is all already done and this would be enough:
foreach(var item in items)
{
SaveAsync2(item);
//it will execute synchronously because there is
//nothing to await for in the method
}
Now lets explore real case, an async method that actualy awaits something inside or spark awaitable operation:
async Task SaveAsync3(Item item)
{
//awaiting already completed task
int i = await Task.FromResult(0);
await Task.Delay(1000);
Console.WriteLine(i);
}
Now what would this do?
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(SaveAsync3(item));
}
await Task.WhenAll(tasks); // will only continue when all tasks are finished (or cancelled or failed)
Would it run synchronously? No!
Would it run concurrently in parallel? NO!
As I said at begining, the truth is somewhere between with async methods unless they are special snow flakes like SaveAsync1 and SaveAsync2.
So what the code did? It executed each SaveAsync3 synchronously up to await Task.Delay where it found the returned task is incomplete and returned back to caller and provided incomplete task which was stored to tasks and next SaveAsync was executed in same way.
Now await Task.WhenAll(tasks); has really meaning, because it is awaiting some incomplete operation which will run outside this thread context and in parallel.
All those parts of SaveAsync3 method after await Task.Delay will be scheduled to ThreadPool and will run in parallel, unless special case like UI thread context and in that case ConfigureAwait(false) after TaskDelay would be needed.
Hope you guys understand, what I want to say. You can not really say how async method will run unless you have more information about it or code.
This exercise also opens question, when to Task.Run on async method.
It is often missused and I think, that there are really just 2 main cases:
1) When you want break from current threads context, like UI, ASP.NET etc
2) When async method has synchronous part(up to first incomplete await) which is computationally intensive and you want to offload it as well, not just the incomplete await part. The case would be if SaveAsync3 would be computing the variable i for long time, let's say Fibonacci :).
For example you do not have to use Task.Run on something like SaveAsync which would open file and save into it something asynchronously, unless the synchronous part before first await inside SaveAsync is an issue, taks time. Then Task.Run is in order as is part of case 2).

How to await Parallel Linq actions to complete

I'm not sure how I'm supposed to mix plinq and async-await. Suppose that I have the following interface
public interface IDoSomething (
Task Do();
}
I have a list of these which I would like to execute in parallel and be able to await the completion of all.
public async Task DoAll(IDoSomething[] doers) {
//Execute all doers in parallel ideally using plinq and
//continue when all are complete
}
How to implement this? I'm not sure how to go from parallel linq to Tasks and vice versa.
I'm not terribly worried about exception handling. Ideally the first one would fire and break the whole process as I plan to discard the entire thing on error.
Edit: A lot of people are saying Task.WaitAll. I'm aware of this but my understanding (unless someone can demonstrate otherwise) is that it won't actively parallelize things for you to multiple available processor cores. What I'm specifically asking is twofold -
if I await a Task within a Plinq Action does that get rid of a lot of the advantage since it schedules a new thread?
If I doers.AsParallel().ForAll(async d => await d.Do()) which takes about 5 second on average, how do I not spin the invoking thread in the meantime?
What you're looking for is this:
public Task DoAllAsync(IEnumerable<IDoSomething> doers)
{
return Task.WhenAll(doers.Select(doer => Task.Run(() => doer.Do())));
}
Using Task.Run will use a ThreadPool thread to execute each synchronous part of the async method Do in parallel while Task.WhenAll asynchronously waits for the asynchronous parts together that are executing concurrently.
This is a good idea only if you have substantial synchronous parts in these async methods (i.e. the parts before an await) for example:
async Task Do()
{
for (int i = 0; i < 10000; i++)
{
Math.Pow(i,i);
}
await Task.Delay(10000);
}
Otherwise, there's no need for parallelism and you can just fire the asynchronous operations concurrently and wait for all the returned tasks using Task.WhenAll:
public Task DoAllAsync(IEnumerable<IDoSomething> doers)
{
return Task.WhenAll(doers.Select(doer => doer.Do()));
}
public async Task DoAll(IDoSomething[] doers) {
//using ToArray to materialize the query right here
//so we don't accidentally run it twice later.
var tasks = doers.Select(d => Task.Run(()=>d.Do())).ToArray();
await Task.WhenAll(tasks);
}

Categories