Running multiple operations with max concurrency - 2 last tasks are not executed - c#

I have created a class that allows me to run multiple operations concurrently with an option to set a max concurrency limit. I.e., if I have 100 operations to do, and I set maxCurrency to 10, at any given time, maximum 10 operations should be running concurrently. Eventually, all of the operations should be executed.
Here's the code:
public async Task<IReadOnlyCollection<T>> Run<T>(IEnumerable<Func<CancellationToken, Task<T>>> operations, int maxConcurrency, CancellationToken ct)
{
using var semaphore = new SemaphoreSlim(maxConcurrency, maxConcurrency);
var results = new ConcurrentBag<T>();
var tasks = new List<Task>();
foreach (var operation in operations)
{
await semaphore.WaitAsync(ct).ConfigureAwait(false);
var task = Task.Factory.StartNew(async () =>
{
try
{
Debug.WriteLine($"Adding new result");
var singleResult = await operation(ct).ConfigureAwait(false);
results.Add(singleResult);
Debug.WriteLine($"Added {singleResult}");
}
finally
{
semaphore.Release();
}
}, ct);
tasks.Add(task);
}
await Task.WhenAll(tasks).ConfigureAwait(false);
Debug.WriteLine($"Completed tasks: {tasks.Count(t => t.IsCompleted)}");
Debug.WriteLine($"Calculated results: {results.Count}");
return results.ToList().AsReadOnly();
}
Here's an example of how I use it:
var operations = Enumerable.Range(1, 10)
.Select<int, Func<CancellationToken, Task<int>>>(n => async ct =>
{
await Task.Delay(100, ct);
return n;
});
var data = await _sut.Run(operations, 2, CancellationToken.None);
Every time I execute this, the data collection has just 8 results. I'd expect to have 10 results.
Here's the Debug log:
Adding new
Adding new
Added 1
Added 2
Adding new
Adding new
Added 3
Added 4
Adding new
Adding new
Added 5
Adding new
Added 6
Adding new
Added 7
Adding new
Added 8
Adding new
Completed tasks: 10
Calculated results: 8
As you can see:
10 tasks are Completed
"Adding new" is logged 10 times
"Added x" is logged 8 times
I do not understand why the 2 last operations are not finished. All tasks have IsComplete set as true, which, as I understand, should mean that all of them got executed to an end.

The issue here is that Task.Factory.StartNew returns a task that when awaited returns the inner task.
It does not give you a task that will wait for this inner task, hence your problem.
The easiest way to fix this is to call Unwrap on the tasks you create, which will unwrap the inner task and allow you to wait for that.
This should work:
var task = ....
....
}, ct).Unwrap();
with this small change you get this output:
...
Added 9
Added 10
Completed tasks: 10
Calculated results: 10
Note that my comments on your question still stands:
You're still working with the illusion that WhenAll will wait for all tasks, when in reality all tasks except the last N have already completed because the loop itself doesn't continue until the previous tasks have completed. You should thus move the synchronization object acquisition into your inner task so that you can queue them all up before you start waiting for them.
I also believe (though I don't 100% know) that using SemaphoreSlim is not a good approach as I believe any thread-related synchronization objects might be unsafe to use in a task-related work. Threads in the threadpool are reused while live tasks are waiting for subtasks to complete which means such a thread might already own the synchronization object from a previous task that has yet to complete and thus allow more than those 2 you wanted to run to run at the "same time". SemaphoreSlim is OK to use, the other synchronization primitives might not be.

Related

ParallelEnumerable.WithDegreeOfParallelism() not restricting tasks?

I'm attempting to use AsParallel() with async-await to have an application process a series of tasks in parallel, but with a restricted degree of concurrency due to the task starting an external Process that has significant memory usage (hence wanting to wait for the process to complete before proceeding to the next item in the series). Most literature I've seen on the function ParallelEnumerable.WithDegreeOfSeparation suggests that using it will set a max limit on concurrent tasks at any one time, but my own tests seem to suggest that it's skipping the limit altogether.
To provide an rough example (WithDegreeOrParallelism() set to 1 deliberately to demonstrate the issue):
public class Example
{
private async Task HeavyTask(int i)
{
await Task.Delay(10 * 1000);
}
public async Task Run()
{
int n = 0;
await Task.WhenAll(Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(1)
.Select(async i =>
{
Interlocked.Increment(ref n);
Console.WriteLine("[+] " + n);
await HeavyTask(i);
Interlocked.Decrement(ref n);
Console.WriteLine("[-] " + n);
}));
}
}
class Program
{
public static void Main(string[] args)
{
Task.Run(async () =>
{
await new Example().Run();
}).Wait();
}
}
From what I understand, the code above is meant to produce output along the lines of:
[+] 1
[-] 0
[+] 1
[-] 0
...
But instead returns:
[+] 1
[+] 2
[+] 3
[+] 4
...
Suggesting that it starting off all the tasks in the list and then waiting for the tasks to return.
Is there anything particularly obvious (or non-obvious) that I'm doing wrong which is making it seem like WithDegreeOfParallelism() is being ignored?
Update
Sorry, after testing your code i understand what you are seeing now
async i =>
Async lambda is just async void, basically unobserved task which will run regardless Thread.CurrentThread.ManagedThreadId); will show you clearly it is consuming as many threads as it likes
Also note, if your heavy task is IO bound, then skip the PLINQ and Parallel use async and await in an TPL Dataflow ActionBlock as it will give you the best of both worlds
E.g
public static async Task DoWorkLoads(List<Something> results)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var block = new ActionBlock<int>(MyMethodAsync, options);
foreach (var item in list)
block.Post(item );
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(int i)
{
await Task.Delay(10 * 1000);
}
Original
This is very subtle and a very common misunderstanding, however the documentation i think seems wrong
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Though if we dig into this a bit more we get a better understanding, also there are github conversations on this as well.
ParallelOptions.MaxDegreeOfParallelism vs PLINQ’s WithDegreeOfParallelism
PLINQ is different. Some important Standard Query Operators in PLINQ
require communication between the threads involved in the processing
of the query, including some that rely on a Barrier to enable threads
to operate in lock-step. The PLINQ design requires that a specific
number of threads be actively involved for the query to make any
progress. Thus when you specify a DegreeOfParallelism for PLINQ,
you’re specifying the actual number of threads that will be involved,
rather than just a maximum.

Asynchronous tasks parallel execution

I am playing around the parallel execution of tasks in .Net. I have implemented function below which executes list of tasks in parallel by using Task.WhenAll. I also have found that there are two options I can use to add tasks in the list. The option 1 is to use Task.Run and pass Func delegate. The option 2 is to add the result of the invoked Func delegate.
So my questions are:
Task.Run (Option 1) takes additional threads from thread pool and execute tasks in them by passing them to Task.WhenAll. So the question is does Task.WhenAll run each task in the list asynchronously so the used threads are taken from and passed back to thread pool or all taken threads are blocked until execution is completed (or an exception raised)?
Does it make any difference if I call Task.Run passing synchronous (non-awaitable) or asynchronous (awaitable) delegates?
In the option 2 - theoretically no additional threads taken from thread pool to execute Tasks in the list. However, the tasks are executed concurrently. Does Task.WhenAll creates threads internally or all the tasks are executed in a single thread created by Task.WhenAll? And how SemaphoreSlim affects concurrent tasks?
What do you think is the best approach to deal with asynchronous parallel tasks?
public static async Task<IEnumerable<TResult>> ExecTasksInParallelAsync<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, Task<TResult>> task, int minDegreeOfParallelism = 1, int maxDegreeOfParallelism = 1)
{
var allTasks = new List<Task<TResult>>();
using (var throttler = new SemaphoreSlim(minDegreeOfParallelism, maxDegreeOfParallelism))
{
foreach (var element in source)
{
// do an async wait until we can schedule again
await throttler.WaitAsync();
Func<Task<TResult>> func = async () =>
{
try
{
return await task(element);
}
finally
{
throttler.Release();
}
};
//Option 1
allTasks.Add(Task.Run(func));
//Option 2
allTasks.Add(func.Invoke());
}
return await Task.WhenAll(allTasks);
}
}
The function above is executed as
[HttpGet()]
public async Task<IEnumerable<string>> Get()
{using (var client = new HttpClient())
{
var source = Enumerable.Range(1, 1000).Select(x => "https://dog.ceo/api/breeds/list/all");
var result = await Class1.ExecTasksInParallelAsync(
source, async (x) =>
{
var responseMessage = await client.GetAsync(x);
return await responseMessage.Content.ReadAsStringAsync();
}, 100, 200);
return result;
}
}
Option 2 tested better
I ran a few tests using your code and determined that option 2 is roughly 50 times faster than option 1, on my machine at least. However, using PLINQ was even 10 times faster than option 2.
Option 3, PLINQ, is even faster
You could replace that whole mess with a single line of PLINQ:
return source.AsParallel().WithDegreeOfParallelism(maxDegreeOfParallelism)
.Select( s => task(s).GetAwaiter().GetResult() );
Oops... option 4
Turns out my prior solution would reduce parallelism if the task was actually async (I had been testing with a dummy synchronous function). This solution fixes the problem:
var tasks = source.AsParallel()
.WithDegreeOfParallelism(maxDegreeOfParallelism)
.Select( s => task(s) );
await Task.WhenAll(tasks);
return tasks.Select( t => t.Result );
I ran this on my laptop with 10,000 iterations. I did three runs to ensure that there wasn't a priming effect. Results:
Run 1
Option 1: Duration: 13727ms
Option 2: Duration: 303ms
Option 3 :Duration: 39ms
Run 2
Option 1: Duration: 13586ms
Option 2: Duration: 287ms
Option 3 :Duration: 28ms
Run 3
Option 1: Duration: 13580ms
Option 2: Duration: 316ms
Option 3 :Duration: 32ms
You can try it on DotNetFiddle but you'll have to use much shorter runs to stay within quota.
In addition to allowing very short and powerful code, PLINQ totally kills it for parallel processing, as LINQ uses a functional programming approach, and the functional approach is way better for parallel tasks.

Why does Task.WhenAll occasionally take a long time

I've got an ASP.NET site that is running a modest amount of requests (about 500rpm split across 3 servers), and usually the requests take about 15ms. However, I've found that there are frequently requests that take much longer (1s or more). I've narrowed the latency down to a call to Task.WhenAll. Here's an example of the offending code:
var taskA = dbA.GetA(id);
var taskB = dbB.GetB(id);
var taskC = dbC.GetC(id);
var taskD = dbD.GetD(id);
await Task.WhenAll(taskA, taskB, taskC, taskD);
Each individual task is measured and takes less than 10ms to complete. I've pinpointed the delay down to the Task.WhenAll call, and it seems to have something to do with how the task is scheduled. As far as I can tell, there's not a lot of pressure on the TPL task pool, so I'm at a loss for why the performance is so sporadic.
Async operation involve context switches, which are time consuming. Unfortunately, not always in a deterministic way. To speed things up in your case, try to prefix your Task.WhenAll call with ConfigureAwait(false), as follows:
await Task.WhenAll(taskA, taskB, taskC, taskD).ConfigureAwait(false);
This will eliminate an additional context switch, which is actually recommended approach for server-side applications.
Creating threads takes overhead. Depending on what you're doing, you can also try a Parallel.ForEach.
public static void yourMethod(int id){
var tasks = new List<IMyCustomType> { new dbA.GetA(id), new dbB.GetB(id), new dbC.GetC(id), new dbD.GetD(id)};
// Your simple stopwatch for timing
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
// For each 'tasks' list item, call 'executeTasks' (Max 10 occurrences)
// - Processing for all tasks will be complete before
// continuing processing on the main thread
Parallel.ForEach(tasks, new ParallelOptions { MaxDegreeOfParallelism = 10 }, executeTasks);
stopWatch.Stop();
Console.WriteLine("Completed execution in: " + stopWatch.Elapsed.TotalSeconds);
}
private static void executeTasks(string obj)
{
// Your task's work here.
}

How to wait for a function execution for specific duration

My C# application stops responding for a long time, as I break the Debug it stops on a function.
foreach (var item in list)
{
xmldiff.Compare(item, secondary, output);
...
}
I guess the running time of this function is long or it hangs. Anyway, I want to wait for a certain time (e.g. 5 seconds) for the execution of this function, and if it exceeds this time, I skip it and go to the next item in the loop. How can I do it? I found some similar question but they are mostly for processes or asynchronous methods.
You can do it the brutal way: spin up a thread to do the work, join it with timeout, then abort it, if the join didn't work.
Example:
var worker = new Thread( () => { xmlDiff.Compare(item, secondary, output); } );
worker.Start();
if (!worker.Join( TimeSpan.FromSeconds( 1 ) ))
worker.Abort();
But be warned - aborting threads is not considered nice and can make your app unstable. If at all possible try to modify Compare to accept a CancellationToken to cancel the comparison.
I would avoid directly using threads and use Microsoft's Reactive Extensions (NuGet "Rx-Main") to abstract away the management of the threads.
I don't know the exact signature of xmldiff.Compare(item, secondary, output) but if I assume it produces an integer then I could do this with Rx:
var query =
from item in list.ToObservable()
from result in
Observable
.Start(() => xmldiff.Compare(item, secondary, output))
.Timeout(TimeSpan.FromSeconds(5.0), Observable.Return(-1))
select new { item, result };
var subscription =
query
.Subscribe(x =>
{
/* do something with `x.item` and/or `x.result` */
});
This automatically iterates through each item and starts a background computation of xmldiff.Compare, but only allows each computation to take as much as 5.0 seconds before returning a default value of -1.
The subscription variable is an IDisposable, so if you want to abort the entire query before it completes just call .Dispose().
I skip it and go to the next item in the loop
By "skip it", do you mean "leave it there" or "cancel it"? The two scenarios are quite different. But for both two I suggest you use Task.
//generate 10 example tasks
var tasks = Enumerable
.Range(0, 10)
.Select(n => new Task(() => DoSomething(n)))
.ToList();
var maxExecutionTime = TimeSpan.FromSeconds(5);
foreach (var task in tasks)
{
if (task.Wait(maxExecutionTime))
{
//the task is finished in time
}
else
{
// the task is over time
// just leave it there
// the loop continues
// if you want to cancel it, see
// http://stackoverflow.com/questions/4783865/how-do-i-abort-cancel-tpl-tasks
}
}
One thing to improve is "do you really need to run your tasks one by one?" If they are independent you can run them in parallel.

How to wait on all tasks (created task and subtask) without using TaskCreationOptions.AttachedToParent

I will have to create a concurrent software which create several Task, and every Task could generate another task(that could also generate another Task, ...).
I need that the call to the method which launch task is blocking: no return BEFORE all task and subtask are completed.
I know there is this TaskCreationOptions.AttachedToParent property, but I think it will not fit:
The server will have something like 8 cores at least, and each task will create 2-3 subtask, so if I set the AttachedToParent option, I've the impression that the second sub-task will not start before the three tasks of the first subtask ends. So I will have a limited multitasking here.
So with this process tree:
I've the impression that if I set AttachedToParent property everytime I launch a thread, B will not ends before E,F,G are finished, so C will start before B finish, and I will have only 3 actives thread instead of the 8 I can have.
If I don't put the AttachedToParent property, A will be finished very fast and return.
So how could I do to ensure that I've always my 8 cores fully used if I don't set this option?
The TaskCreationOptions.AttachedToParent does not prevent the other subtasks from starting, but rather prevents the parent task itself from closing. So when E,F and G are started with AttachedToParent, B is not flagged as finished until all three are finished. So it should do just as you want.
The source (in the accepted answer).
As Me.Name mentioned, AttachedToParent doesn't behave according to your impressions. I think it's a fine option in this case.
But if you don't want to use that for whatever reason, you can wait for all the child tasks to finish with Task.WaitAll(). Although it means you have to have all of them in a collection.
Task.WaitAll() blocks the current thread until all the Tasks are finished. If you don't want that and you are on .Net 4.5, you can use Task.WhenAll(), which will return a single Task that will finish when all of the given Tasks finish.
You could you TaskFactory create options like in this example:
Task parent = new Task(() => {
var cts = new CancellationTokenSource();
var tf = new TaskFactory<Int32>(cts.Token,
TaskCreationOptions.AttachedToParent,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
// This tasks creates and starts 3 child tasks
var childTasks = new[] {
tf.StartNew(() => Sum(cts.Token, 10000)),
tf.StartNew(() => Sum(cts.Token, 20000)),
tf.StartNew(() => Sum(cts.Token, Int32.MaxValue)) // Too big, throws Overflow
};
// If any of the child tasks throw, cancel the rest of them
for (Int32 task = 0; task <childTasks.Length; task++)
childTasks[task].ContinueWith(
t => cts.Cancel(), TaskContinuationOptions.OnlyOnFaulted);
// When all children are done, get the maximum value returned from the
// non-faulting/canceled tasks. Then pass the maximum value to another
// task which displays the maximum result
tf.ContinueWhenAll(
childTasks,
completedTasks => completedTasks.Where(
t => !t.IsFaulted && !t.IsCanceled).Max(t => t.Result), CancellationToken.None)
.ContinueWith(t =>Console.WriteLine("The maximum is: " + t.Result),
TaskContinuationOptions.ExecuteSynchronously);
});
// When the children are done, show any unhandled exceptions too
parent.ContinueWith(p => {
// I put all this text in a StringBuilder and call Console.WriteLine just once
// because this task could execute concurrently with the task above & I don't
// want the tasks' output interspersed
StringBuildersb = new StringBuilder(
"The following exception(s) occurred:" + Environment.NewLine);
foreach (var e in p.Exception.Flatten().InnerExceptions)
sb.AppendLine(" "+ e.GetType().ToString());
Console.WriteLine(sb.ToString());
}, TaskContinuationOptions.OnlyOnFaulted);
// Start the parent Task so it can start its children
parent.Start();

Categories