I am playing around the parallel execution of tasks in .Net. I have implemented function below which executes list of tasks in parallel by using Task.WhenAll. I also have found that there are two options I can use to add tasks in the list. The option 1 is to use Task.Run and pass Func delegate. The option 2 is to add the result of the invoked Func delegate.
So my questions are:
Task.Run (Option 1) takes additional threads from thread pool and execute tasks in them by passing them to Task.WhenAll. So the question is does Task.WhenAll run each task in the list asynchronously so the used threads are taken from and passed back to thread pool or all taken threads are blocked until execution is completed (or an exception raised)?
Does it make any difference if I call Task.Run passing synchronous (non-awaitable) or asynchronous (awaitable) delegates?
In the option 2 - theoretically no additional threads taken from thread pool to execute Tasks in the list. However, the tasks are executed concurrently. Does Task.WhenAll creates threads internally or all the tasks are executed in a single thread created by Task.WhenAll? And how SemaphoreSlim affects concurrent tasks?
What do you think is the best approach to deal with asynchronous parallel tasks?
public static async Task<IEnumerable<TResult>> ExecTasksInParallelAsync<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, Task<TResult>> task, int minDegreeOfParallelism = 1, int maxDegreeOfParallelism = 1)
{
var allTasks = new List<Task<TResult>>();
using (var throttler = new SemaphoreSlim(minDegreeOfParallelism, maxDegreeOfParallelism))
{
foreach (var element in source)
{
// do an async wait until we can schedule again
await throttler.WaitAsync();
Func<Task<TResult>> func = async () =>
{
try
{
return await task(element);
}
finally
{
throttler.Release();
}
};
//Option 1
allTasks.Add(Task.Run(func));
//Option 2
allTasks.Add(func.Invoke());
}
return await Task.WhenAll(allTasks);
}
}
The function above is executed as
[HttpGet()]
public async Task<IEnumerable<string>> Get()
{using (var client = new HttpClient())
{
var source = Enumerable.Range(1, 1000).Select(x => "https://dog.ceo/api/breeds/list/all");
var result = await Class1.ExecTasksInParallelAsync(
source, async (x) =>
{
var responseMessage = await client.GetAsync(x);
return await responseMessage.Content.ReadAsStringAsync();
}, 100, 200);
return result;
}
}
Option 2 tested better
I ran a few tests using your code and determined that option 2 is roughly 50 times faster than option 1, on my machine at least. However, using PLINQ was even 10 times faster than option 2.
Option 3, PLINQ, is even faster
You could replace that whole mess with a single line of PLINQ:
return source.AsParallel().WithDegreeOfParallelism(maxDegreeOfParallelism)
.Select( s => task(s).GetAwaiter().GetResult() );
Oops... option 4
Turns out my prior solution would reduce parallelism if the task was actually async (I had been testing with a dummy synchronous function). This solution fixes the problem:
var tasks = source.AsParallel()
.WithDegreeOfParallelism(maxDegreeOfParallelism)
.Select( s => task(s) );
await Task.WhenAll(tasks);
return tasks.Select( t => t.Result );
I ran this on my laptop with 10,000 iterations. I did three runs to ensure that there wasn't a priming effect. Results:
Run 1
Option 1: Duration: 13727ms
Option 2: Duration: 303ms
Option 3 :Duration: 39ms
Run 2
Option 1: Duration: 13586ms
Option 2: Duration: 287ms
Option 3 :Duration: 28ms
Run 3
Option 1: Duration: 13580ms
Option 2: Duration: 316ms
Option 3 :Duration: 32ms
You can try it on DotNetFiddle but you'll have to use much shorter runs to stay within quota.
In addition to allowing very short and powerful code, PLINQ totally kills it for parallel processing, as LINQ uses a functional programming approach, and the functional approach is way better for parallel tasks.
Related
I have created a class that allows me to run multiple operations concurrently with an option to set a max concurrency limit. I.e., if I have 100 operations to do, and I set maxCurrency to 10, at any given time, maximum 10 operations should be running concurrently. Eventually, all of the operations should be executed.
Here's the code:
public async Task<IReadOnlyCollection<T>> Run<T>(IEnumerable<Func<CancellationToken, Task<T>>> operations, int maxConcurrency, CancellationToken ct)
{
using var semaphore = new SemaphoreSlim(maxConcurrency, maxConcurrency);
var results = new ConcurrentBag<T>();
var tasks = new List<Task>();
foreach (var operation in operations)
{
await semaphore.WaitAsync(ct).ConfigureAwait(false);
var task = Task.Factory.StartNew(async () =>
{
try
{
Debug.WriteLine($"Adding new result");
var singleResult = await operation(ct).ConfigureAwait(false);
results.Add(singleResult);
Debug.WriteLine($"Added {singleResult}");
}
finally
{
semaphore.Release();
}
}, ct);
tasks.Add(task);
}
await Task.WhenAll(tasks).ConfigureAwait(false);
Debug.WriteLine($"Completed tasks: {tasks.Count(t => t.IsCompleted)}");
Debug.WriteLine($"Calculated results: {results.Count}");
return results.ToList().AsReadOnly();
}
Here's an example of how I use it:
var operations = Enumerable.Range(1, 10)
.Select<int, Func<CancellationToken, Task<int>>>(n => async ct =>
{
await Task.Delay(100, ct);
return n;
});
var data = await _sut.Run(operations, 2, CancellationToken.None);
Every time I execute this, the data collection has just 8 results. I'd expect to have 10 results.
Here's the Debug log:
Adding new
Adding new
Added 1
Added 2
Adding new
Adding new
Added 3
Added 4
Adding new
Adding new
Added 5
Adding new
Added 6
Adding new
Added 7
Adding new
Added 8
Adding new
Completed tasks: 10
Calculated results: 8
As you can see:
10 tasks are Completed
"Adding new" is logged 10 times
"Added x" is logged 8 times
I do not understand why the 2 last operations are not finished. All tasks have IsComplete set as true, which, as I understand, should mean that all of them got executed to an end.
The issue here is that Task.Factory.StartNew returns a task that when awaited returns the inner task.
It does not give you a task that will wait for this inner task, hence your problem.
The easiest way to fix this is to call Unwrap on the tasks you create, which will unwrap the inner task and allow you to wait for that.
This should work:
var task = ....
....
}, ct).Unwrap();
with this small change you get this output:
...
Added 9
Added 10
Completed tasks: 10
Calculated results: 10
Note that my comments on your question still stands:
You're still working with the illusion that WhenAll will wait for all tasks, when in reality all tasks except the last N have already completed because the loop itself doesn't continue until the previous tasks have completed. You should thus move the synchronization object acquisition into your inner task so that you can queue them all up before you start waiting for them.
I also believe (though I don't 100% know) that using SemaphoreSlim is not a good approach as I believe any thread-related synchronization objects might be unsafe to use in a task-related work. Threads in the threadpool are reused while live tasks are waiting for subtasks to complete which means such a thread might already own the synchronization object from a previous task that has yet to complete and thus allow more than those 2 you wanted to run to run at the "same time". SemaphoreSlim is OK to use, the other synchronization primitives might not be.
I'm attempting to use AsParallel() with async-await to have an application process a series of tasks in parallel, but with a restricted degree of concurrency due to the task starting an external Process that has significant memory usage (hence wanting to wait for the process to complete before proceeding to the next item in the series). Most literature I've seen on the function ParallelEnumerable.WithDegreeOfSeparation suggests that using it will set a max limit on concurrent tasks at any one time, but my own tests seem to suggest that it's skipping the limit altogether.
To provide an rough example (WithDegreeOrParallelism() set to 1 deliberately to demonstrate the issue):
public class Example
{
private async Task HeavyTask(int i)
{
await Task.Delay(10 * 1000);
}
public async Task Run()
{
int n = 0;
await Task.WhenAll(Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(1)
.Select(async i =>
{
Interlocked.Increment(ref n);
Console.WriteLine("[+] " + n);
await HeavyTask(i);
Interlocked.Decrement(ref n);
Console.WriteLine("[-] " + n);
}));
}
}
class Program
{
public static void Main(string[] args)
{
Task.Run(async () =>
{
await new Example().Run();
}).Wait();
}
}
From what I understand, the code above is meant to produce output along the lines of:
[+] 1
[-] 0
[+] 1
[-] 0
...
But instead returns:
[+] 1
[+] 2
[+] 3
[+] 4
...
Suggesting that it starting off all the tasks in the list and then waiting for the tasks to return.
Is there anything particularly obvious (or non-obvious) that I'm doing wrong which is making it seem like WithDegreeOfParallelism() is being ignored?
Update
Sorry, after testing your code i understand what you are seeing now
async i =>
Async lambda is just async void, basically unobserved task which will run regardless Thread.CurrentThread.ManagedThreadId); will show you clearly it is consuming as many threads as it likes
Also note, if your heavy task is IO bound, then skip the PLINQ and Parallel use async and await in an TPL Dataflow ActionBlock as it will give you the best of both worlds
E.g
public static async Task DoWorkLoads(List<Something> results)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var block = new ActionBlock<int>(MyMethodAsync, options);
foreach (var item in list)
block.Post(item );
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(int i)
{
await Task.Delay(10 * 1000);
}
Original
This is very subtle and a very common misunderstanding, however the documentation i think seems wrong
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Though if we dig into this a bit more we get a better understanding, also there are github conversations on this as well.
ParallelOptions.MaxDegreeOfParallelism vs PLINQ’s WithDegreeOfParallelism
PLINQ is different. Some important Standard Query Operators in PLINQ
require communication between the threads involved in the processing
of the query, including some that rely on a Barrier to enable threads
to operate in lock-step. The PLINQ design requires that a specific
number of threads be actively involved for the query to make any
progress. Thus when you specify a DegreeOfParallelism for PLINQ,
you’re specifying the actual number of threads that will be involved,
rather than just a maximum.
I've got an ASP.NET site that is running a modest amount of requests (about 500rpm split across 3 servers), and usually the requests take about 15ms. However, I've found that there are frequently requests that take much longer (1s or more). I've narrowed the latency down to a call to Task.WhenAll. Here's an example of the offending code:
var taskA = dbA.GetA(id);
var taskB = dbB.GetB(id);
var taskC = dbC.GetC(id);
var taskD = dbD.GetD(id);
await Task.WhenAll(taskA, taskB, taskC, taskD);
Each individual task is measured and takes less than 10ms to complete. I've pinpointed the delay down to the Task.WhenAll call, and it seems to have something to do with how the task is scheduled. As far as I can tell, there's not a lot of pressure on the TPL task pool, so I'm at a loss for why the performance is so sporadic.
Async operation involve context switches, which are time consuming. Unfortunately, not always in a deterministic way. To speed things up in your case, try to prefix your Task.WhenAll call with ConfigureAwait(false), as follows:
await Task.WhenAll(taskA, taskB, taskC, taskD).ConfigureAwait(false);
This will eliminate an additional context switch, which is actually recommended approach for server-side applications.
Creating threads takes overhead. Depending on what you're doing, you can also try a Parallel.ForEach.
public static void yourMethod(int id){
var tasks = new List<IMyCustomType> { new dbA.GetA(id), new dbB.GetB(id), new dbC.GetC(id), new dbD.GetD(id)};
// Your simple stopwatch for timing
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
// For each 'tasks' list item, call 'executeTasks' (Max 10 occurrences)
// - Processing for all tasks will be complete before
// continuing processing on the main thread
Parallel.ForEach(tasks, new ParallelOptions { MaxDegreeOfParallelism = 10 }, executeTasks);
stopWatch.Stop();
Console.WriteLine("Completed execution in: " + stopWatch.Elapsed.TotalSeconds);
}
private static void executeTasks(string obj)
{
// Your task's work here.
}
I have a slightly complex requirement of performing some tasks in parallel, and having to wait for some of them to finish before continuing. Now, I am encountering unexpected behavior, when I have a number of tasks, that I want executed in parallel, but inside a ContinueWith handler. I have whipped up a small sample to illustrate the problem:
var task1 = Task.Factory.StartNew(() =>
{
Console.WriteLine("11");
Thread.Sleep(1000);
Console.WriteLine("12");
}).ContinueWith(async t =>
{
Console.WriteLine("13");
var innerTasks = new List<Task>();
for (var i = 0; i < 10; i++)
{
var j = i;
innerTasks.Add(Task.Factory.StartNew(() =>
{
Console.WriteLine("1_" + j + "_1");
Thread.Sleep(500);
Console.WriteLine("1_" + j + "_2");
}));
}
await Task.WhenAll(innerTasks.ToArray());
//Task.WaitAll(innerTasks.ToArray());
Thread.Sleep(1000);
Console.WriteLine("14");
});
var task2 = Task.Factory.StartNew(() =>
{
Console.WriteLine("21");
Thread.Sleep(1000);
Console.WriteLine("22");
}).ContinueWith(t =>
{
Console.WriteLine("23");
Thread.Sleep(1000);
Console.WriteLine("24");
});
Console.WriteLine("1");
await Task.WhenAll(task1, task2);
Console.WriteLine("2");
The basic pattern is:
- Task 1 should be executed in parallel with Task 2.
- Once the first part of part 1 is done, it should do some more things in parallel. I want to complete, once everything is done.
I expect the following result:
1 <- Start
11 / 21 <- The initial task start
12 / 22 <- The initial task end
13 / 23 <- The continuation task start
Some combinations of "1_[0..9]_[1..2]" and 24 <- the "inner" tasks of task 1 + the continuation of task 2 end
14 <- The end of the task 1 continuation
2 <- The end
Instead, what happens, is that the await Task.WhenAll(innerTasks.ToArray()); does not "block" the continuation task from completing. So, the inner tasks execute after the outer await Task.WhenAll(task1, task2); has completed. The result is something like:
1 <- Start
11 / 21 <- The initial task start
12 / 22 <- The initial task end
13 / 23 <- The continuation task start
Some combinations of "1_[0..9]_[1..2]" and 24 <- the "inner" tasks of task 1 + the continuation of task 2 end
2 <- The end
Some more combinations of "1_[0..9]_[1..2]" <- the "inner" tasks of task 1
14 <- The end of the task 1 continuation
If, instead, I use Task.WaitAll(innerTasks.ToArray()), everything seems to work as expected. Of course, I would not want to use WaitAll, so I won't block any threads.
My questions are:
Why is this unexpected behavior occuring?
How can I remedy the situation without blocking any threads?
Thanks a lot in advance for any pointers!
You're using the wrong tools. Instead of StartNew, use Task.Run. Instead of ContinueWith, use await:
var task1 = Task1();
var task2 = Task2();
Console.WriteLine("1");
await Task.WhenAll(task1, task2);
Console.WriteLine("2");
private async Task Task1()
{
await Task.Run(() =>
{
Console.WriteLine("11");
Thread.Sleep(1000);
Console.WriteLine("12");
});
Console.WriteLine("13");
var innerTasks = new List<Task>();
for (var i = 0; i < 10; i++)
{
innerTasks.Add(Task.Run(() =>
{
Console.WriteLine("1_" + i + "_1");
Thread.Sleep(500);
Console.WriteLine("1_" + i + "_2");
}));
await Task.WhenAll(innerTasks);
}
Thread.Sleep(1000);
Console.WriteLine("14");
}
private async Task Task2()
{
await Task.Run(() =>
{
Console.WriteLine("21");
Thread.Sleep(1000);
Console.WriteLine("22");
});
Console.WriteLine("23");
Thread.Sleep(1000);
Console.WriteLine("24");
}
Task.Run and await are superior here because they correct a lot of unexpected behavior in StartNew/ContinueWith. In particular, asynchronous delegates and (for Task.Run) always using the thread pool.
I have more detailed info on my blog regarding why you shouldn't use StartNew and why you shouldn't use ContinueWith.
As noted in the comments, what you're seeing is normal. The Task returned by ContinueWith() completes when the delegate passed to and invoked by ContinueWith() finishes executing. This happens the first time the anonymous method uses the await statement, and the delegate returns a Task object itself that represents the eventual completion of the entire anonymous method.
Since you are only waiting on the ContinueWith() task, and this task only represents the availability of the task that represents the anonymous method, not the completion of that task, your code doesn't wait.
From your example, it's not clear what the best fix is. But if you make this small change, it will do what you want:
await Task.WhenAll(await task1, task2);
I.e. in the WhenAll() call, don't wait on the ContinueWith() task itself, but rather on the task that task will eventually return. Use await here to avoid blocking the thread while you wait for that task to be available.
When using async methods/lambdas with StartNew, you either wait on the returned task and the contained task:
var task = Task.Factory.StartNew(async () => { /* ... */ });
task.Wait();
task.Result.Wait();
// consume task.Result.Result
Or you use the extension method Unwrap on the result of StartNew and wait on the task it returns.
var task = Task.Factory.StartNew(async () => { /* ... */ })
.Unwrap();
task.Wait();
// consume task.Result
The following discussion goes along the line that Task.Factory.StartNew and ContinueWith should be avoided in specific cases, such as when you don't provide creation or continuation options or when you don't provide a task scheduler.
I don't agree that Task.Factory.StartNew shouldn't be used, I agree that you should use (or consider using) Task.Run wherever you use a Task.Factory.StartNew method overload that doesn't take TaskCreationOptions or a TaskScheduler.
Note that this only applies to the default Task.Factory. I've used custom task factories where I chose to use the StartNew overloads without options and task scheduler, because I configured the factories specific defaults for my needs.
Likewise, I don't agree that ContinueWith shouldn't be used, I agree that you should use (or consider using) async/await wherever you use a ContinueWith method overload that doesn't take TaskContinuationOptions or a TaskScheduler.
For instance, up to C# 5, the most practical way to workaround the limitation of await not being supported in catch and finally blocks is to use ContinueWith.
C# 6:
try
{
return await something;
}
catch (SpecificException ex)
{
await somethingElse;
// throw;
}
finally
{
await cleanup;
}
Equivalent before C# 6:
return await something
.ContinueWith(async somethingTask =>
{
var ex = somethingTask.Exception.InnerException as SpecificException;
if (ex != null)
{
await somethingElse;
// await somethingTask;
}
},
CancellationToken.None,
TaskContinuationOptions.DenyChildAttach | TaskContinuationOptions.NotOnRanToCompletion,
TaskScheduler.Default)
.Unwrap()
.ContinueWith(async catchTask =>
{
await cleanup;
await catchTask;
},
CancellationToken.None,
TaskContinuationOptions.DenyChildAttach,
TaskScheduler.Default)
.Unwrap();
Since, as I told, in some cases I have a TaskFactory with specific defaults, I've defined a few extension methods that take a TaskFactory, reducing the error chance of not passing one of the arguments (I known I can always forget to pass the factory itself):
public static Task ContinueWhen(this TaskFactory taskFactory, Task task, Action<Task> continuationAction)
{
return task.ContinueWith(continuationAction, taskFactory.CancellationToken, taskFactory.ContinuationOptions, taskFactory.Scheduler);
}
public static Task<TResult> ContinueWhen<TResult>(this TaskFactory taskFactory, Task task, Func<Task, TResult> continuationFunction)
{
return task.ContinueWith(continuationFunction, taskFactory.CancellationToken, taskFactory.ContinuationOptions, taskFactory.Scheduler);
}
// Repeat with argument combinations:
// - Task<TResult> task (instead of non-generic Task task)
// - object state
// - bool notOnRanToCompletion (useful in C# before 6)
Usage:
// using namespace that contains static task extensions class
var task = taskFactory.ContinueWhen(existsingTask, t => Continue(a, b, c));
var asyncTask = taskFactory.ContinueWhen(existingTask, async t => await ContinueAsync(a, b, c))
.Unwrap();
I decided not to mimic Task.Run by not overloading the same method name to unwrapping task-returning delegates, it's really not always what you want. Actually, I didn't even implement ContinueWhenAsync extension methods so you need to use Unwrap or two awaits.
Often, these continuations are I/O asynchronous operations, and the pre- and post-processing overhead should be so small that you shouldn't care if it starts running synchronously up to the first yielding point, or even if it completes synchronously (e.g. using an underlying MemoryStream or a mocked DB access). Also, most of them don't depend on a synchronization context.
Whenever you apply the Unwrap extension method or two awaits, you should check if the task falls in this category. If so, async/await is most probably a better choice than starting a task.
For asynchronous operations with a non-negligible synchronous overhead, starting a new task may be preferable. Even so, a notable exception where async/await is still a better choice is if your code is async from the start, such as an async method invoked by a framework or host (ASP.NET, WCF, NServiceBus 6+, etc.), as the overhead is your actual business. For long processing, you may consider using Task.Yield with care. One of the tenets of asynchronous code is to not be too fine grained, however, too coarse grained is just as bad: a set of heavy-duty tasks may prevent the processing of queued lightweight tasks.
If the asynchronous operation depends on a synchronization context, you can still use async/await if you're within that context (in this case, think twice or more before using .ConfigureAwait(false)), otherwise, start a new task using a task scheduler from the respective synchronization context.
I have this example code:
static void Main(string[] args) {
var t1 = Task.Run(async () => {
Console.WriteLine("Putting in fake processing 1.");
await Task.Delay(300);
Console.WriteLine("Fake processing finished 1. ");
});
var t2 = t1.ContinueWith(async (c) => {
Console.WriteLine("Putting in fake processing 2.");
await Task.Delay(200);
Console.WriteLine("Fake processing finished 2.");
});
var t3 = t2.ContinueWith(async (c) => {
Console.WriteLine("Putting in fake processing 3.");
await Task.Delay(100);
Console.WriteLine("Fake processing finished 3.");
});
Console.ReadLine();
}
The console output baffles me:
Putting in fake processing 1.
Fake processing finished 1.
Putting in fake processing 2.
Putting in fake processing 3.
Fake processing finished 3.
Fake processing finished 2.
I am trying to chain the tasks so they execute one after another, what am I doing wrong? And I can't use await, this is just example code, in reality I am queueing incoming tasks (some asynchronous, some not) and want to execute them in the same order they came in but with no parallelism, ContinueWith seemed better than creating a ConcurrentQueue and handling everythning myself, but it just doesn't work...
Take a look at the type of t2. It's a Task<Task>. t2 will be completed when it finishes starting the task that does the actual work not when that work actually finishes.
The smallest change to your code to get it to work would be to add an unwrap after both your second and third calls to ContinueWith, so that you get out the task that represents the completion of your work.
The more idiomatic solution would be to simply remove the ContinueWith calls entirely and just use await to add continuations to tasks.
Interestingly enough, you would see the same behavior for t1 if you used Task.Factory.StartNew, but Task.Run is specifically designed to work with async lambdas and actually internally unwraps all Action<Task> delegates to return the result of the task returned, rather than a task that represents starting that task, which is why you don't need to unwrap that task.
in reality I am queueing incoming tasks (some asynchronous, some not) and want to execute them in the same order they came in but with no parallelism
You probably want to use TPL Dataflow for that. Specifically, ActionBlock.
var block = new ActionBlock<object>(async item =>
{
// Handle synchronous item
var action = item as Action;
if (action != null)
action();
// Handle asynchronous item
var func = item as Func<Task>;
if (func != null)
await func();
});
// To queue a synchronous item
Action synchronous = () => Thread.Sleep(1000);
block.Post(synchronous);
// To queue an asynchronous item
Func<Task> asynchronous = async () => { await Task.Delay(1000); };
blockPost(asynchronous);