ParallelEnumerable.WithDegreeOfParallelism() not restricting tasks? - c#

I'm attempting to use AsParallel() with async-await to have an application process a series of tasks in parallel, but with a restricted degree of concurrency due to the task starting an external Process that has significant memory usage (hence wanting to wait for the process to complete before proceeding to the next item in the series). Most literature I've seen on the function ParallelEnumerable.WithDegreeOfSeparation suggests that using it will set a max limit on concurrent tasks at any one time, but my own tests seem to suggest that it's skipping the limit altogether.
To provide an rough example (WithDegreeOrParallelism() set to 1 deliberately to demonstrate the issue):
public class Example
{
private async Task HeavyTask(int i)
{
await Task.Delay(10 * 1000);
}
public async Task Run()
{
int n = 0;
await Task.WhenAll(Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(1)
.Select(async i =>
{
Interlocked.Increment(ref n);
Console.WriteLine("[+] " + n);
await HeavyTask(i);
Interlocked.Decrement(ref n);
Console.WriteLine("[-] " + n);
}));
}
}
class Program
{
public static void Main(string[] args)
{
Task.Run(async () =>
{
await new Example().Run();
}).Wait();
}
}
From what I understand, the code above is meant to produce output along the lines of:
[+] 1
[-] 0
[+] 1
[-] 0
...
But instead returns:
[+] 1
[+] 2
[+] 3
[+] 4
...
Suggesting that it starting off all the tasks in the list and then waiting for the tasks to return.
Is there anything particularly obvious (or non-obvious) that I'm doing wrong which is making it seem like WithDegreeOfParallelism() is being ignored?

Update
Sorry, after testing your code i understand what you are seeing now
async i =>
Async lambda is just async void, basically unobserved task which will run regardless Thread.CurrentThread.ManagedThreadId); will show you clearly it is consuming as many threads as it likes
Also note, if your heavy task is IO bound, then skip the PLINQ and Parallel use async and await in an TPL Dataflow ActionBlock as it will give you the best of both worlds
E.g
public static async Task DoWorkLoads(List<Something> results)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var block = new ActionBlock<int>(MyMethodAsync, options);
foreach (var item in list)
block.Post(item );
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(int i)
{
await Task.Delay(10 * 1000);
}
Original
This is very subtle and a very common misunderstanding, however the documentation i think seems wrong
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Though if we dig into this a bit more we get a better understanding, also there are github conversations on this as well.
ParallelOptions.MaxDegreeOfParallelism vs PLINQ’s WithDegreeOfParallelism
PLINQ is different. Some important Standard Query Operators in PLINQ
require communication between the threads involved in the processing
of the query, including some that rely on a Barrier to enable threads
to operate in lock-step. The PLINQ design requires that a specific
number of threads be actively involved for the query to make any
progress. Thus when you specify a DegreeOfParallelism for PLINQ,
you’re specifying the actual number of threads that will be involved,
rather than just a maximum.

Related

Limiting the number of parallel task with SemaphoreSlim - why does it work?

in MS Docu you can read about SemaphoreSlim:
„Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.“
https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0
In my understanding a Task is different from Thread. Task is higher level than Thread. Different tasks can run on the same thread. Or a task can be continued on another thread than it was started on.
(Compare: "server-side applications in .NET using asynchrony will use very few threads without limiting themselves to that. If everything really can be served by a single thread, it may well be - if you never have more than one thing to do in terms of physical processing, then that's fine."
from in C# how to run method async in the same thread)
IMO if you put this information together, the conclusion is that you can’t limit the number of Tasks running in parallel with the use of a semaphore slim, but…
there are other texts that give this kind of advice (How to limit the amount of concurrent async I/O operations?, see “You can definitely do this…”)
if I’m executing this code on my machine it seems it IS possible. If I work with different numbers for _MaxDegreeOfParallelism and different ranges of numbers, _RunningTasksCount doesn’t exceed the limit that is given by MaxDegreeOfParallelism.
Can somebody provide me some information to clearify?
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
IRunner runner = new RunnerSemaphore();
runner.Run();
Console.WriteLine("Hit any key to close...");
Console.ReadLine();
}
}
public class RunnerSemaphore : IRunner
{
private readonly SemaphoreSlim _ConcurrencySemaphore;
private List<int> _Numbers;
private int _MaxDegreeOfParallelism = 3;
private object _RunningTasksLock = new object();
private int _RunningTasksCount = 0;
public RunnerSemaphore()
{
_ConcurrencySemaphore = new SemaphoreSlim(_MaxDegreeOfParallelism);
_Numbers = _Numbers = Enumerable.Range(1, 100).ToList();
}
public void Run()
{
RunAsync().Wait();
}
private async Task RunAsync()
{
List<Task> allTasks = new List<Task>();
foreach (int number in _Numbers)
{
var task = Task.Run
(async () =>
{
await _ConcurrencySemaphore.WaitAsync();
bool isFast = number != 1;
int delay = isFast ? 200 : 10000;
Console.WriteLine($"Start Work {number}\tManagedThreadId {Thread.CurrentThread.ManagedThreadId}\tRunning {IncreaseTaskCount()} tasks");
await Task.Delay(delay).ConfigureAwait(false);
Console.WriteLine($"End Work {number}\tManagedThreadId {Thread.CurrentThread.ManagedThreadId}\tRunning {DecreaseTaskCount()} tasks");
})
.ContinueWith((t) =>
{
_ConcurrencySemaphore.Release();
});
allTasks.Add(task);
}
await Task.WhenAll(allTasks.ToArray());
}
private int IncreaseTaskCount()
{
int taskCount;
lock (_RunningTasksLock)
{
taskCount = ++ _RunningTasksCount;
}
return taskCount;
}
private int DecreaseTaskCount()
{
int taskCount;
lock (_RunningTasksLock)
{
taskCount = -- _RunningTasksCount;
}
return taskCount;
}
}
Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.
Well, that was a perfectly fine description when SemaphoreSlim was first introduced - it was just a lightweight Semaphore. Since that time, it has gotten new methods (i.e., WaitAsync) that enable it to act like an asynchronous synchronization primitive.
In my understanding a Task is different from Thread. Task is higher level than Thread. Different tasks can run on the same thread. Or a task can be continued on another thread than it was started on.
This is true for what I call "Delegate Tasks". There's also a completely different kind of Task that I call "Promise Tasks". Promise tasks are similar to promises (or "futures") in other languages (e.g., JavaScript), and they just represent the completion of some event. Promise tasks do not "run" anywhere; they just complete based on some future event (usually via a callback).
async methods always return promise tasks. The code in an asynchronous method is not actually run as part of the task; the task itself only represents the completion of the async method. I recommend my async intro for more information about async and how the code portions are scheduled.
if you put this information together, the conclusion is that you can’t limit the number of Tasks running in parallel with the use of a semaphore slim
This is personal preference, but I try to be very careful about terminology, precisely to avoid problems like this question. Delegate tasks may run in parallel, e.g., Parallel. Promise tasks do not "run", and they don't run in "parallel", but you can have multiple concurrent promise tasks that are all in progress. And SemaphoreSlim's WaitAsync is a perfect match for limiting that kind of concurrency.
You may wish to read about Stephen Toub's AsyncSemaphore (and other articles in that series). It's not the same implementation as SemaphoreSlim, but behaves essentially the same as far as promise tasks are concerned.

Asynchronous tasks parallel execution

I am playing around the parallel execution of tasks in .Net. I have implemented function below which executes list of tasks in parallel by using Task.WhenAll. I also have found that there are two options I can use to add tasks in the list. The option 1 is to use Task.Run and pass Func delegate. The option 2 is to add the result of the invoked Func delegate.
So my questions are:
Task.Run (Option 1) takes additional threads from thread pool and execute tasks in them by passing them to Task.WhenAll. So the question is does Task.WhenAll run each task in the list asynchronously so the used threads are taken from and passed back to thread pool or all taken threads are blocked until execution is completed (or an exception raised)?
Does it make any difference if I call Task.Run passing synchronous (non-awaitable) or asynchronous (awaitable) delegates?
In the option 2 - theoretically no additional threads taken from thread pool to execute Tasks in the list. However, the tasks are executed concurrently. Does Task.WhenAll creates threads internally or all the tasks are executed in a single thread created by Task.WhenAll? And how SemaphoreSlim affects concurrent tasks?
What do you think is the best approach to deal with asynchronous parallel tasks?
public static async Task<IEnumerable<TResult>> ExecTasksInParallelAsync<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, Task<TResult>> task, int minDegreeOfParallelism = 1, int maxDegreeOfParallelism = 1)
{
var allTasks = new List<Task<TResult>>();
using (var throttler = new SemaphoreSlim(minDegreeOfParallelism, maxDegreeOfParallelism))
{
foreach (var element in source)
{
// do an async wait until we can schedule again
await throttler.WaitAsync();
Func<Task<TResult>> func = async () =>
{
try
{
return await task(element);
}
finally
{
throttler.Release();
}
};
//Option 1
allTasks.Add(Task.Run(func));
//Option 2
allTasks.Add(func.Invoke());
}
return await Task.WhenAll(allTasks);
}
}
The function above is executed as
[HttpGet()]
public async Task<IEnumerable<string>> Get()
{using (var client = new HttpClient())
{
var source = Enumerable.Range(1, 1000).Select(x => "https://dog.ceo/api/breeds/list/all");
var result = await Class1.ExecTasksInParallelAsync(
source, async (x) =>
{
var responseMessage = await client.GetAsync(x);
return await responseMessage.Content.ReadAsStringAsync();
}, 100, 200);
return result;
}
}
Option 2 tested better
I ran a few tests using your code and determined that option 2 is roughly 50 times faster than option 1, on my machine at least. However, using PLINQ was even 10 times faster than option 2.
Option 3, PLINQ, is even faster
You could replace that whole mess with a single line of PLINQ:
return source.AsParallel().WithDegreeOfParallelism(maxDegreeOfParallelism)
.Select( s => task(s).GetAwaiter().GetResult() );
Oops... option 4
Turns out my prior solution would reduce parallelism if the task was actually async (I had been testing with a dummy synchronous function). This solution fixes the problem:
var tasks = source.AsParallel()
.WithDegreeOfParallelism(maxDegreeOfParallelism)
.Select( s => task(s) );
await Task.WhenAll(tasks);
return tasks.Select( t => t.Result );
I ran this on my laptop with 10,000 iterations. I did three runs to ensure that there wasn't a priming effect. Results:
Run 1
Option 1: Duration: 13727ms
Option 2: Duration: 303ms
Option 3 :Duration: 39ms
Run 2
Option 1: Duration: 13586ms
Option 2: Duration: 287ms
Option 3 :Duration: 28ms
Run 3
Option 1: Duration: 13580ms
Option 2: Duration: 316ms
Option 3 :Duration: 32ms
You can try it on DotNetFiddle but you'll have to use much shorter runs to stay within quota.
In addition to allowing very short and powerful code, PLINQ totally kills it for parallel processing, as LINQ uses a functional programming approach, and the functional approach is way better for parallel tasks.

Why does Task.WhenAll occasionally take a long time

I've got an ASP.NET site that is running a modest amount of requests (about 500rpm split across 3 servers), and usually the requests take about 15ms. However, I've found that there are frequently requests that take much longer (1s or more). I've narrowed the latency down to a call to Task.WhenAll. Here's an example of the offending code:
var taskA = dbA.GetA(id);
var taskB = dbB.GetB(id);
var taskC = dbC.GetC(id);
var taskD = dbD.GetD(id);
await Task.WhenAll(taskA, taskB, taskC, taskD);
Each individual task is measured and takes less than 10ms to complete. I've pinpointed the delay down to the Task.WhenAll call, and it seems to have something to do with how the task is scheduled. As far as I can tell, there's not a lot of pressure on the TPL task pool, so I'm at a loss for why the performance is so sporadic.
Async operation involve context switches, which are time consuming. Unfortunately, not always in a deterministic way. To speed things up in your case, try to prefix your Task.WhenAll call with ConfigureAwait(false), as follows:
await Task.WhenAll(taskA, taskB, taskC, taskD).ConfigureAwait(false);
This will eliminate an additional context switch, which is actually recommended approach for server-side applications.
Creating threads takes overhead. Depending on what you're doing, you can also try a Parallel.ForEach.
public static void yourMethod(int id){
var tasks = new List<IMyCustomType> { new dbA.GetA(id), new dbB.GetB(id), new dbC.GetC(id), new dbD.GetD(id)};
// Your simple stopwatch for timing
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
// For each 'tasks' list item, call 'executeTasks' (Max 10 occurrences)
// - Processing for all tasks will be complete before
// continuing processing on the main thread
Parallel.ForEach(tasks, new ParallelOptions { MaxDegreeOfParallelism = 10 }, executeTasks);
stopWatch.Stop();
Console.WriteLine("Completed execution in: " + stopWatch.Elapsed.TotalSeconds);
}
private static void executeTasks(string obj)
{
// Your task's work here.
}

Difference between .ForEach(async x=> {await}) and foreach{await} in an asnyc method

What is the difference between these two simplified pieces of code, specifically the foreach parts.
public async Task UploadFilesAsync(IEnumerable<UploadFile> uploads)
{
uploads.ToList().ForEach(async upload =>
{
await UploadFileAsync(upload);
});
}
public async Task UploadFilesAsync(IEnumerable<UploadFile> uploads)
{
foreach (upload in uploads)
{
await UploadFileAsync(upload);
}
}
I know the first essentially creates an async void action which is not the greatest solution for various reasons, but would the second do the same, or is that more of an accepted practice?
Since the first method includes a foreach that is not part of the async execution it will start the inner anonymous methods and quit. The anonymous methods would finish after the UploadFileAsync method finished.
The second version runs sequentially because the foreach awaits each iteration (before calling MoveNext()), so the UploadFileAsync method would quit after all the inner calls finished.
You can test it with the following code:
class Program
{
static void Main(string[] args)
{
int[] uploads = new int[] { 600, 2000, 1000 };
UploadFilesAsync(uploads).ConfigureAwait(true);
UploadFilesAsync2(uploads).ConfigureAwait(true);
Console.ReadLine();
}
public static async Task UploadFilesAsync(IEnumerable<int> uploads)
{
Console.WriteLine("Start version 1");
uploads.ToList().ForEach(async upload =>
{
Console.WriteLine("Start version 1 waiting " + upload);
await Task.Delay(upload);
Console.WriteLine("End version 1 waiting " + upload);
});
Console.WriteLine("End version 1");
}
public static async Task UploadFilesAsync2(IEnumerable<int> uploads)
{
Console.WriteLine("Start version 2");
foreach (var upload in uploads)
{
Console.WriteLine("Start version 2 waiting " + upload);
await Task.Delay(upload);
Console.WriteLine("End version 2 waiting " + upload);
}
Console.WriteLine("End version 2");
}
}
So, this is not a matter of right or wrong. It is simply about what you would like to achieve. Using the first version you can easily face a closure problem, if you have resources created in the UploadFilesAsync method and want to use them in the inner anonymous methods. Since the method quits before the uploads finish, the resources you created would be freed before the inner methods could use them.
First things first. Your focus seems to be on the async/await part of the code, so I will answer that first.
I don't believe there is any difference at all from that perspective
But other than that there are some interesting differences
By calling uploads.ToList() in approach #1, you are iterating over the complete IEnumerable<UploadFile> uploads whereas in the approach #2, you are using the power of late execution
By using List<T>.ForEach, you necessitate the creation of a lambda expression whose body will get invoked for each element in the List<T>. This can be fairly expensive if it is invoked a lot of times and does fairly light weight work. You clearly avoid this unwanted overhead in approach #2
Exception handling will be a pain in approach #1 whereas approach #2 allows for the use for typical exception handling mechanisms
Now, understand that when your code hits the await call, it pretty much gets suspended for the Task to get completed. Unless you want to process the results of UploadFileAsync(upload) after that call, you can simply proceed without await'ing and let the execution continue, till the point where these results are needed
public async Task UploadFilesAsync(IEnumerable<UploadFile> uploads)
{
foreach (upload in uploads)
{
UploadFileAsync(upload);
}
}
This way, you can let the caller decide, when to wait for the results.

Use Parallel.ForEach on method returning task - avoid Task.WaitAll

I've got a method which takes IWorkItem, starts work on it and returns related task. The method has to look like this because of external library used.
public Task WorkOn(IWorkItem workItem)
{
//...start asynchronous operation, return task
}
I want to do this work on multiple work items. I don't know how many of them will be there - maybe 1, maybe 10 000.
WorkOn method has internal pooling and may involve waiting if too many pararell executions will be reached. (like in SemaphoreSlim.Wait):
public Task WorkOn(IWorkItem workItem)
{
_semaphoreSlim.Wait();
}
My current solution is:
public void Do(params IWorkItem[] workItems)
{
var tasks = new Task[workItems.Length];
for (var i = 0; i < workItems.Length; i++)
{
tasks[i] = WorkOn(workItems[i]);
}
Task.WaitAll(tasks);
}
Question: may I use somehow Parallel.ForEach in this case? To avoid creating 10000 tasks and later wait because of WorkOn's throttling?
That actually is not that easy. You can use Parallel.ForEach to throttle the amount of tasks that are spawned. But I am unsure how that will perform/behave in your condition.
As a general rule of thumb I usually try to avoid mixing Task and Parallel.
Surely you can do something like this:
public void Do(params IWorkItem[] workItems)
{
Parallel.ForEach(workItems, (workItem) => WorkOn(workItem).Wait());
}
Under "normal" conditions this should limit your concurrency nicely.
You could also go full async-await and add some limiting to your concurrency with some tricks. But you have to do the concurrency limiting yourself in that case.
const int ConcurrencyLimit = 8;
public async Task Do(params IWorkItem[] workItems)
{
var cursor = 0;
var currentlyProcessing = new List<Task>(ConcurrencyLimit);
while (cursor < workItems.Length)
{
while (currentlyProcessing.Count < ConcurrencyLimit && cursor < workItems.Length)
{
currentlyProcessing.Add(WorkOn(workItems[cursor]));
cursor++;
}
Task finished = await Task.WhenAny(currentlyProcessing);
currentlyProcessing.Remove(finished);
}
await Task.WhenAll(currentlyProcessing);
}
As I said... a lot more complicated. But it will limit the concurrency to any value you apply as well. In addition it properly uses the async-await pattern. If you don't want non-blocking multi threading you can easily wrap this function into another function and do a blocking .Wait on the task returned by this function.
In key in this implementation is the Task.WhenAny function. This function will return one finished task in the applied list of task (wrapped by another task for the await.

Categories