Parallel foreach with asynchronous lambda - c#

I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.

If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.

You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.

One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.

With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;

Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}

My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);

I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);

In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.

The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}

For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process

Related

How to know Parallel.ForEach is finished to start next instruction? [duplicate]

I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process

C# Wait for Async Loop to Finish Not Working [duplicate]

This question already has answers here:
Parallel foreach with asynchronous lambda
(10 answers)
Closed last year.
In a metro app, I need to execute a number of WCF calls. There are a significant number of calls to be made, so I need to do them in a parallel loop. The problem is that the parallel loop exits before the WCF calls are all complete.
How would you refactor this to work as expected?
var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customers = new System.Collections.Concurrent.BlockingCollection<Customer>();
Parallel.ForEach(ids, async i =>
{
ICustomerRepo repo = new CustomerRepo();
var cust = await repo.GetCustomer(i);
customers.Add(cust);
});
foreach ( var customer in customers )
{
Console.WriteLine(customer.ID);
}
Console.ReadKey();
The whole idea behind Parallel.ForEach() is that you have a set of threads and each thread processes part of the collection. As you noticed, this doesn't work with async-await, where you want to release the thread for the duration of the async call.
You could “fix” that by blocking the ForEach() threads, but that defeats the whole point of async-await.
What you could do is to use TPL Dataflow instead of Parallel.ForEach(), which supports asynchronous Tasks well.
Specifically, your code could be written using a TransformBlock that transforms each id into a Customer using the async lambda. This block can be configured to execute in parallel. You would link that block to an ActionBlock that writes each Customer to the console.
After you set up the block network, you can Post() each id to the TransformBlock.
In code:
var ids = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var getCustomerBlock = new TransformBlock<string, Customer>(
async i =>
{
ICustomerRepo repo = new CustomerRepo();
return await repo.GetCustomer(i);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
});
var writeCustomerBlock = new ActionBlock<Customer>(c => Console.WriteLine(c.ID));
getCustomerBlock.LinkTo(
writeCustomerBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
foreach (var id in ids)
getCustomerBlock.Post(id);
getCustomerBlock.Complete();
writeCustomerBlock.Completion.Wait();
Although you probably want to limit the parallelism of the TransformBlock to some small constant. Also, you could limit the capacity of the TransformBlock and add the items to it asynchronously using SendAsync(), for example if the collection is too big.
As an added benefit when compared to your code (if it worked) is that the writing will start as soon as a single item is finished, and not wait until all of the processing is finished.
svick's answer is (as usual) excellent.
However, I find Dataflow to be more useful when you actually have large amounts of data to transfer. Or when you need an async-compatible queue.
In your case, a simpler solution is to just use the async-style parallelism:
var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customerTasks = ids.Select(i =>
{
ICustomerRepo repo = new CustomerRepo();
return repo.GetCustomer(i);
});
var customers = await Task.WhenAll(customerTasks);
foreach (var customer in customers)
{
Console.WriteLine(customer.ID);
}
Console.ReadKey();
Using DataFlow as svick suggested may be overkill, and Stephen's answer does not provide the means to control the concurrency of the operation. However, that can be achieved rather simply:
public static async Task RunWithMaxDegreeOfConcurrency<T>(
int maxDegreeOfConcurrency, IEnumerable<T> collection, Func<T, Task> taskFactory)
{
var activeTasks = new List<Task>(maxDegreeOfConcurrency);
foreach (var task in collection.Select(taskFactory))
{
activeTasks.Add(task);
if (activeTasks.Count == maxDegreeOfConcurrency)
{
await Task.WhenAny(activeTasks.ToArray());
//observe exceptions here
activeTasks.RemoveAll(t => t.IsCompleted);
}
}
await Task.WhenAll(activeTasks.ToArray()).ContinueWith(t =>
{
//observe exceptions in a manner consistent with the above
});
}
The ToArray() calls can be optimized by using an array instead of a list and replacing completed tasks, but I doubt it would make much of a difference in most scenarios. Sample usage per the OP's question:
RunWithMaxDegreeOfConcurrency(10, ids, async i =>
{
ICustomerRepo repo = new CustomerRepo();
var cust = await repo.GetCustomer(i);
customers.Add(cust);
});
EDIT Fellow SO user and TPL wiz Eli Arbel pointed me to a related article from Stephen Toub. As usual, his implementation is both elegant and efficient:
public static Task ForEachAsync<T>(
this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current).ContinueWith(t =>
{
//observe exceptions
});
}));
}
You can save effort with the new AsyncEnumerator NuGet Package, which didn't exist 4 years ago when the question was originally posted. It allows you to control the degree of parallelism:
using System.Collections.Async;
...
await ids.ParallelForEachAsync(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var cust = await repo.GetCustomer(i);
customers.Add(cust);
},
maxDegreeOfParallelism: 10);
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
Wrap the Parallel.Foreach into a Task.Run() and instead of the await keyword use [yourasyncmethod].Result
(you need to do the Task.Run thing to not block the UI thread)
Something like this:
var yourForeachTask = Task.Run(() =>
{
Parallel.ForEach(ids, i =>
{
ICustomerRepo repo = new CustomerRepo();
var cust = repo.GetCustomer(i).Result;
customers.Add(cust);
});
});
await yourForeachTask;
This should be pretty efficient, and easier than getting the whole TPL Dataflow working:
var customers = await ids.SelectAsync(async i =>
{
ICustomerRepo repo = new CustomerRepo();
return await repo.GetCustomer(i);
});
...
public static async Task<IList<TResult>> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector, int maxDegreesOfParallelism = 4)
{
var results = new List<TResult>();
var activeTasks = new HashSet<Task<TResult>>();
foreach (var item in source)
{
activeTasks.Add(selector(item));
if (activeTasks.Count >= maxDegreesOfParallelism)
{
var completed = await Task.WhenAny(activeTasks);
activeTasks.Remove(completed);
results.Add(completed.Result);
}
}
results.AddRange(await Task.WhenAll(activeTasks));
return results;
}
An extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
I am a little late to party but you may want to consider using GetAwaiter.GetResult() to run your async code in sync context but as paralled as below;
Parallel.ForEach(ids, i =>
{
ICustomerRepo repo = new CustomerRepo();
// Run this in thread which Parallel library occupied.
var cust = repo.GetCustomer(i).GetAwaiter().GetResult();
customers.Add(cust);
});
After introducing a bunch of helper methods, you will be able run parallel queries with this simple syntax:
const int DegreeOfParallelism = 10;
IEnumerable<double> result = await Enumerable.Range(0, 1000000)
.Split(DegreeOfParallelism)
.SelectManyAsync(async i => await CalculateAsync(i).ConfigureAwait(false))
.ConfigureAwait(false);
What happens here is: we split source collection into 10 chunks (.Split(DegreeOfParallelism)), then run 10 tasks each processing its items one by one (.SelectManyAsync(...)) and merge those back into a single list.
Worth mentioning there is a simpler approach:
double[] result2 = await Enumerable.Range(0, 1000000)
.Select(async i => await CalculateAsync(i).ConfigureAwait(false))
.WhenAll()
.ConfigureAwait(false);
But it needs a precaution: if you have a source collection that is too big, it will schedule a Task for every item right away, which may cause significant performance hits.
Extension methods used in examples above look as follows:
public static class CollectionExtensions
{
/// <summary>
/// Splits collection into number of collections of nearly equal size.
/// </summary>
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, int slicesCount)
{
if (slicesCount <= 0) throw new ArgumentOutOfRangeException(nameof(slicesCount));
List<T> source = src.ToList();
var sourceIndex = 0;
for (var targetIndex = 0; targetIndex < slicesCount; targetIndex++)
{
var list = new List<T>();
int itemsLeft = source.Count - targetIndex;
while (slicesCount * list.Count < itemsLeft)
{
list.Add(source[sourceIndex++]);
}
yield return list;
}
}
/// <summary>
/// Takes collection of collections, projects those in parallel and merges results.
/// </summary>
public static async Task<IEnumerable<TResult>> SelectManyAsync<T, TResult>(
this IEnumerable<IEnumerable<T>> source,
Func<T, Task<TResult>> func)
{
List<TResult>[] slices = await source
.Select(async slice => await slice.SelectListAsync(func).ConfigureAwait(false))
.WhenAll()
.ConfigureAwait(false);
return slices.SelectMany(s => s);
}
/// <summary>Runs selector and awaits results.</summary>
public static async Task<List<TResult>> SelectListAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector)
{
List<TResult> result = new List<TResult>();
foreach (TSource source1 in source)
{
TResult result1 = await selector(source1).ConfigureAwait(false);
result.Add(result1);
}
return result;
}
/// <summary>Wraps tasks with Task.WhenAll.</summary>
public static Task<TResult[]> WhenAll<TResult>(this IEnumerable<Task<TResult>> source)
{
return Task.WhenAll<TResult>(source);
}
}
The problem of parallelizing asynchronous operations has been solved with the introduction of the Parallel.ForEachAsync API in .NET 6, but people who are using older .NET platforms might still need a decent substitute. An easy way to implement one is to use an ActionBlock<T> component from the TPL Dataflow library. This library is included in the standard .NET libraries (.NET Core and .NET 5+), and available as a NuGet package for the .NET Framework. Here is how it can be used:
public static Task Parallel_ForEachAsync<T>(ICollection<T> source,
int maxDegreeOfParallelism, Func<T, Task> action)
{
var options = new ExecutionDataflowBlockOptions();
options.MaxDegreeOfParallelism = maxDegreeOfParallelism;
var block = new ActionBlock<T>(action, options);
foreach (var item in source) block.Post(item);
block.Complete();
return block.Completion;
}
This solution is only suitable for materialized source sequences, hence the type of the parameter is ICollection<T> instead of the more common IEnumerable<T>. It also has the surprising behavior of ignoring any OperationCanceledExceptions thrown by the action. Addressing these nuances and attempting to replicate precisely the features and behavior of the Parallel.ForEachAsync is doable, but it requires almost as much code as if more primitive tools were used. I've posted such an attempt in the 9th revision of this answer.
Below is a different attempt to implement the Parallel.ForEachAsync method, offering exactly the same features as the .NET 6 API, and mimicking its behavior as much as possible. It uses only basic TPL tools. The idea is to create a number of worker tasks equal to the desirable MaxDegreeOfParallelism, with each task enumerating the same enumerator in a synchronized fashion. This is similar to how the Parallel.ForEachAsync is implemented internally. The difference is that the .NET 6 API starts with a single worker and progressively adds more, while the implementation below creates all the workers from the start:
public static Task Parallel_ForEachAsync<T>(IEnumerable<T> source,
ParallelOptions parallelOptions,
Func<T, CancellationToken, Task> body)
{
if (source == null) throw new ArgumentNullException("source");
if (parallelOptions == null) throw new ArgumentNullException("parallelOptions");
if (body == null) throw new ArgumentNullException("body");
int dop = parallelOptions.MaxDegreeOfParallelism;
if (dop < 0) dop = Environment.ProcessorCount;
CancellationToken cancellationToken = parallelOptions.CancellationToken;
TaskScheduler scheduler = parallelOptions.TaskScheduler ?? TaskScheduler.Current;
IEnumerator<T> enumerator = source.GetEnumerator();
CancellationTokenSource cts = CancellationTokenSource
.CreateLinkedTokenSource(cancellationToken);
var semaphore = new SemaphoreSlim(1, 1); // Synchronizes the enumeration
var workerTasks = new Task[dop];
for (int i = 0; i < dop; i++)
{
workerTasks[i] = Task.Factory.StartNew(async () =>
{
try
{
while (true)
{
if (cts.IsCancellationRequested)
{
cancellationToken.ThrowIfCancellationRequested();
break;
}
T item;
await semaphore.WaitAsync(); // Continue on captured context.
try
{
if (!enumerator.MoveNext()) break;
item = enumerator.Current;
}
finally { semaphore.Release(); }
await body(item, cts.Token); // Continue on captured context.
}
}
catch { cts.Cancel(); throw; }
}, CancellationToken.None, TaskCreationOptions.DenyChildAttach, scheduler)
.Unwrap();
}
return Task.WhenAll(workerTasks).ContinueWith(t =>
{
// Clean up
try { semaphore.Dispose(); cts.Dispose(); } finally { enumerator.Dispose(); }
return t;
}, CancellationToken.None, TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default).Unwrap();
}
There is a difference in the signature. The body parameter is of type Func<TSource, CancellationToken, Task> instead of Func<TSource, CancellationToken, ValueTask>. This is because value-tasks are a relatively recent feature, and are not available in .NET Framework.
There is also a difference in the behavior. This implementation reacts to OperationCanceledExceptions thrown by the body, by completing as canceled. The correct behavior would be to propagate these exceptions as individual errors, and complete as faulted. Fixing this minor flaw is doable, but I preferred to not complicate further this relatively short and readable implementation.
Easy native way without TPL:
int totalThreads = 0; int maxThreads = 3;
foreach (var item in YouList)
{
while (totalThreads >= maxThreads) await Task.Delay(500);
Interlocked.Increment(ref totalThreads);
MyAsyncTask(item).ContinueWith((res) => Interlocked.Decrement(ref totalThreads));
}
you can check this solution with next task:
async static Task MyAsyncTask(string item)
{
await Task.Delay(2500);
Console.WriteLine(item);
}

Avoiding await in foreach loop

I am trying to optimize this code to decrease the time taken to complete the forloop. In this case, CreateNotification() takes a long time and using async await does not improve performance as each asynchronous call is being awaited. I would like to use Task.WhenAll() to optimize the code. How can I do this?
foreach (var notification in notificationsInput.Notifications)
{
try
{
var result = await CreateNotification(notification);
notification.Result = result;
}
catch (Exception exception)
{
notification.Result = null;
}
notifications.Add(notification);
}
You can call Select on the collection whose elements you want to process in parallel, passing an asynchronous delegate to it. This asynchronous delegate would return a Task for each element that's processed, so you could then call Task.WhenAll on all these tasks. The pattern is like so:
var tasks = collection.Select(async (x) => await ProcessAsync(x));
await Task.WhenAll(tasks);
For your example:
var tasks = notificationsInput.Notifications.Select(async (notification) =>
{
try
{
var result = await CreateNotification(notification);
notification.Result = result;
}
catch (Exception exception)
{
notification.Result = null;
}
});
await Task.WhenAll(tasks);
This assumes that CreateNotification is thread-safe.
Update
You will need to install DataFlow to use this solution
https://www.nuget.org/packages/System.Threading.Tasks.Dataflow/
Depending on what CreateNotification is and whether you want to run this in parallel.
You could use a DataFlow ActionBlock, it will give you the best of both worlds if this is IO bound or Mix IO/CPU bound operations and let you run async and in parallel
public static async Task DoWorkLoads(NotificationsInput notificationsInput)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 50
};
var block = new ActionBlock<Something>(MyMethodAsync, options);
foreach (var notification in notificationsInput.Notifications)
block.Post(notification);
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(Notification notification)
{
var result = await CreateNotification(notification);
notification.Result = result;
}
Add pepper and salt to taste.
I think this ought to be equivalent to your code:
var notifications = new ConcurrentBag<Notification>();
var tasks = new List<Task>();
foreach (var notification in notificationsInput.Notifications)
{
var task = CreateNotification(notification)
.ContinueWith(t =>
{
if (t.Exception != null)
{
notification.Result = null;
}
else
{
notification.Result = t.Result;
}
notifications.Add(notification);
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
.ContinueWith( will receive the completed/failed task from CreateNotification(, and is itself a task. We add the ContinueWith task to a list and use that in the WhenAll(.
I'm using a ConcurrentBag for notifications so that you can add from multiple threads safely. If you want to turn this into a regular list, you can call var regularListNotifications = notifications.ToList(); (assuming you have a using for LINQ).

Await multiple async Task while setting max running task at a time

So I just started to try and understand async, Task, lambda and so on, and I am unable to get it to work like I want. With the code below I want for it to lock btnDoWebRequest, do a unknow number of WebRequests as a Task and once all the Task are done unlock btnDoWebRequest. However I only want a max of 3 or whatever number I set of Tasks running at one time, which I got partly from Have a set of Tasks with only X running at a time.
But after trying and modifying my code in multiple ways, it will always immediately jump back and reenabled btnDoWebRequest. Of course VS is warning me about needing awaits, currently at ".ContinueWith((task)" and at the async in "await Task.WhenAll(requestInfoList .Select(async i =>", but can't seem to work where or how to put in the needed awaits. Of course as I'm still learning there is a good chance I am going at this all wrong and the whole thing needs to be reworked. So any help would be greatly appreciated.
Thanks
private SemaphoreSlim maxThread = new SemaphoreSlim(3);
private void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
Task.Factory.StartNew(async () => await DoWebRequest()).Wait();
btnDoWebRequest.Enabled = true;
}
private async Task DoWebRequest()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
for (int i = 0; dataRequestInfo.RowCount - 1 > i; i++)
{
requestInfoList.Add((requestInfo)dataRequestInfo.Rows[i].Tag);
}
await Task.WhenAll(requestInfoList .Select(async i =>
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
var task = Global.webRequestWork(i);
}, TaskCreationOptions.LongRunning).ContinueWith((task) => maxThread.Release());
}));
}
First, don't use Task.Factory.StartNew by default. In fact, this should be avoided in async code. If you need to execute code on a background thread, then use Task.Run.
In your case, there's no need to use Task.Run (or Task.Factory.StartNew).
Start at the lowest level and work your way up. You already have an asynchronous web-requesting method, which I'll rename to WebRequestAsync to follow the Task-based Asynchronous Programming naming guidelines.
Next, throttle it by using the asynchronous APIs on SemaphoreSlim:
await maxThread.WaitAsync();
try
{
await Global.WebRequestWorkAsync(i);
}
finally
{
maxThread.Release();
}
Do that for each request info (note that no background thread is required):
private async Task DoWebRequestsAsync()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
for (int i = 0; dataRequestInfo.RowCount - 1 > i; i++)
{
requestInfoList.Add((requestInfo)dataRequestInfo.Rows[i].Tag);
}
await Task.WhenAll(requestInfoList.Select(async i =>
{
await maxThread.WaitAsync();
try
{
await Global.WebRequestWorkAsync(i);
}
finally
{
maxThread.Release();
}
}));
}
Finally, call this from your UI (again, no background thread is required):
private async void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
await DoWebRequestsAsync();
btnDoWebRequest.Enabled = true;
}
In summary, only use Task.Run when you need to; do not use Task.Factory.StartNew, and do not use Wait (use await instead). I have an async intro on my blog with more information.
There are a couple of things wrong with your code:
Using Wait() on a Task is like running things synchronously, hence you only notice the UI reacting when everything is done and the button reenabled. You need to await an async method in order for it to truely run async. More so, if a method is doing IO bound work like a web request, spinning up a new Thread Pool thread (using Task.Factory.StartNew) is redundant and is a waste of resources.
Your button click event handler needs to be marked with async so you can await inside your method.
I've cleaned up your code a bit for clarity, using the new SemaphoreSlim WaitAsync and replaced your for with a LINQ query. You may only take the first two points and apply them to your code.
private SemaphoreSlim maxThread = new SemaphoreSlim(3);
private async void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
await DoWebRequest();
btnDoWebRequest.Enabled = true;
}
private async Task DoWebRequest()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
var requestInfoList = dataRequestInfo.Rows.Select(x => x.Tag).Cast<requestInfo>();
var tasks = requestInfoList.Select(async I =>
{
await maxThread.WaitAsync();
try
{
await Global.webRequestWork(i);
}
finally
{
maxThread.Release();
}
});
await Task.WhenAll(tasks);
I have created an extension method for this.
It can be used like this:
var tt = new List<Func<Task>>()
{
() => Thread.Sleep(300), //Thread.Sleep can be replaced by your own functionality, like calling the website
() => Thread.Sleep(800),
() => Thread.Sleep(250),
() => Thread.Sleep(1000),
() => Thread.Sleep(100),
() => Thread.Sleep(200),
};
await tt.WhenAll(3); //this will let 3 threads run, if one ends, the next will start, untill all are finished.
The extention method:
public static class TaskExtension
{
public static async Task WhenAll(this List<Func<Task>> actions, int threadCount)
{
var _countdownEvent = new CountdownEvent(actions.Count);
var _throttler = new SemaphoreSlim(threadCount);
foreach (Func<Task> action in actions)
{
await _throttler.WaitAsync();
Task.Run(async () =>
{
try
{
await action();
}
finally
{
_throttler.Release();
_countdownEvent.Signal();
}
});
}
_countdownEvent.Wait();
}
}
We can easily achieve this using SemaphoreSlim. Extension method I've created:
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxActionsToRunInParallel = null)
{
if (maxActionsToRunInParallel.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item);
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);

Asynchronous Task.WhenAll with timeout

Is there a way in the new async dotnet 4.5 library to set a timeout on the Task.WhenAll method? I want to fetch several sources, and stop after say 5 seconds, and skip the sources that weren't finished.
You could combine the resulting Task with a Task.Delay() using Task.WhenAny():
await Task.WhenAny(Task.WhenAll(tasks), Task.Delay(timeout));
If you want to harvest completed tasks in case of a timeout:
var completedResults =
tasks
.Where(t => t.Status == TaskStatus.RanToCompletion)
.Select(t => t.Result)
.ToList();
I think a clearer, more robust option that also does exception handling right would be to use Task.WhenAny on each task together with a timeout task, go through all the completed tasks and filter out the timeout ones, and use await Task.WhenAll() instead of Task.Result to gather all the results.
Here's a complete working solution:
static async Task<TResult[]> WhenAll<TResult>(IEnumerable<Task<TResult>> tasks, TimeSpan timeout)
{
var timeoutTask = Task.Delay(timeout).ContinueWith(_ => default(TResult));
var completedTasks =
(await Task.WhenAll(tasks.Select(task => Task.WhenAny(task, timeoutTask)))).
Where(task => task != timeoutTask);
return await Task.WhenAll(completedTasks);
}
Check out the "Early Bailout" and "Task.Delay" sections from Microsoft's Consuming the Task-based Asynchronous Pattern.
Early bailout. An operation represented by t1 can be grouped in a
WhenAny with another task t2, and we can wait on the WhenAny task. t2
could represent a timeout, or cancellation, or some other signal that
will cause the WhenAny task to complete prior to t1 completing.
What you describe seems like a very common demand however I could not find anywhere an example of this. And I searched a lot... I finally created the following:
TimeSpan timeout = TimeSpan.FromSeconds(5.0);
Task<Task>[] tasksOfTasks =
{
Task.WhenAny(SomeTaskAsync("a"), Task.Delay(timeout)),
Task.WhenAny(SomeTaskAsync("b"), Task.Delay(timeout)),
Task.WhenAny(SomeTaskAsync("c"), Task.Delay(timeout))
};
Task[] completedTasks = await Task.WhenAll(tasksOfTasks);
List<MyResult> = completedTasks.OfType<Task<MyResult>>().Select(task => task.Result).ToList();
I assume here a method SomeTaskAsync that returns Task<MyResult>.
From the members of completedTasks, only tasks of type MyResult are our own tasks that managed to beat the clock. Task.Delay returns a different type.
This requires some compromise on typing, but still works beautifully and quite simple.
(The array can of course be built dynamically using a query + ToArray).
Note that this implementation does not require SomeTaskAsync to receive a cancellation token.
In addition to timeout, I also check the cancellation which is useful if you are building a web app.
public static async Task WhenAll(
IEnumerable<Task> tasks,
int millisecondsTimeOut,
CancellationToken cancellationToken)
{
using(Task timeoutTask = Task.Delay(millisecondsTimeOut))
using(Task cancellationMonitorTask = Task.Delay(-1, cancellationToken))
{
Task completedTask = await Task.WhenAny(
Task.WhenAll(tasks),
timeoutTask,
cancellationMonitorTask
);
if (completedTask == timeoutTask)
{
throw new TimeoutException();
}
if (completedTask == cancellationMonitorTask)
{
throw new OperationCanceledException();
}
await completedTask;
}
}
Check out a custom task combinator proposed in http://tutorials.csharp-online.net/Task_Combinators
async static Task<TResult> WithTimeout<TResult>
(this Task<TResult> task, TimeSpan timeout)
{
Task winner = await (Task.WhenAny
(task, Task.Delay (timeout)));
if (winner != task) throw new TimeoutException();
return await task; // Unwrap result/re-throw
}
I have not tried it yet.
void result version of #i3arnon 's answer, along with comments and changing first argument to use extension this.
I've also got a forwarding method specifying timeout as an int using TimeSpan.FromMilliseconds(millisecondsTimeout) to match other Task methods.
public static async Task WhenAll(this IEnumerable<Task> tasks, TimeSpan timeout)
{
// Create a timeout task.
var timeoutTask = Task.Delay(timeout);
// Get the completed tasks made up of...
var completedTasks =
(
// ...all tasks specified
await Task.WhenAll(tasks
// Now finish when its task has finished or the timeout task finishes
.Select(task => Task.WhenAny(task, timeoutTask)))
)
// ...but not the timeout task
.Where(task => task != timeoutTask);
// And wait for the internal WhenAll to complete.
await Task.WhenAll(completedTasks);
}
Seems like the Task.WaitAll overload with the timeout parameter is all you need - if it returns true, then you know they all completed - otherwise, you can filter on IsCompleted.
if (Task.WaitAll(tasks, myTimeout) == false)
{
tasks = tasks.Where(t => t.IsCompleted);
}
...
I came to the following piece of code that does what I needed:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Net.Http;
using System.Json;
using System.Threading;
namespace MyAsync
{
class Program
{
static void Main(string[] args)
{
var cts = new CancellationTokenSource();
Console.WriteLine("Start Main");
List<Task<List<MyObject>>> listoftasks = new List<Task<List<MyObject>>>();
listoftasks.Add(GetGoogle(cts));
listoftasks.Add(GetTwitter(cts));
listoftasks.Add(GetSleep(cts));
listoftasks.Add(GetxSleep(cts));
List<MyObject>[] arrayofanswers = Task.WhenAll(listoftasks).Result;
List<MyObject> answer = new List<MyObject>();
foreach (List<MyObject> answers in arrayofanswers)
{
answer.AddRange(answers);
}
foreach (MyObject o in answer)
{
Console.WriteLine("{0} - {1}", o.name, o.origin);
}
Console.WriteLine("Press <Enter>");
Console.ReadLine();
}
static async Task<List<MyObject>> GetGoogle(CancellationTokenSource cts)
{
try
{
Console.WriteLine("Start GetGoogle");
List<MyObject> l = new List<MyObject>();
var client = new HttpClient();
Task<HttpResponseMessage> awaitable = client.GetAsync("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=broersa", cts.Token);
HttpResponseMessage res = await awaitable;
Console.WriteLine("After GetGoogle GetAsync");
dynamic data = JsonValue.Parse(res.Content.ReadAsStringAsync().Result);
Console.WriteLine("After GetGoogle ReadAsStringAsync");
foreach (var r in data.responseData.results)
{
l.Add(new MyObject() { name = r.titleNoFormatting, origin = "google" });
}
return l;
}
catch (TaskCanceledException)
{
return new List<MyObject>();
}
}
static async Task<List<MyObject>> GetTwitter(CancellationTokenSource cts)
{
try
{
Console.WriteLine("Start GetTwitter");
List<MyObject> l = new List<MyObject>();
var client = new HttpClient();
Task<HttpResponseMessage> awaitable = client.GetAsync("http://search.twitter.com/search.json?q=broersa&rpp=5&include_entities=true&result_type=mixed",cts.Token);
HttpResponseMessage res = await awaitable;
Console.WriteLine("After GetTwitter GetAsync");
dynamic data = JsonValue.Parse(res.Content.ReadAsStringAsync().Result);
Console.WriteLine("After GetTwitter ReadAsStringAsync");
foreach (var r in data.results)
{
l.Add(new MyObject() { name = r.text, origin = "twitter" });
}
return l;
}
catch (TaskCanceledException)
{
return new List<MyObject>();
}
}
static async Task<List<MyObject>> GetSleep(CancellationTokenSource cts)
{
try
{
Console.WriteLine("Start GetSleep");
List<MyObject> l = new List<MyObject>();
await Task.Delay(5000,cts.Token);
l.Add(new MyObject() { name = "Slept well", origin = "sleep" });
return l;
}
catch (TaskCanceledException)
{
return new List<MyObject>();
}
}
static async Task<List<MyObject>> GetxSleep(CancellationTokenSource cts)
{
Console.WriteLine("Start GetxSleep");
List<MyObject> l = new List<MyObject>();
await Task.Delay(2000);
cts.Cancel();
l.Add(new MyObject() { name = "Slept short", origin = "xsleep" });
return l;
}
}
}
My explanation is in my blogpost:
http://blog.bekijkhet.com/2012/03/c-async-examples-whenall-whenany.html
In addition to svick's answer, the following works for me when I have to wait for a couple of tasks to complete but have to process something else while I'm waiting:
Task[] TasksToWaitFor = //Your tasks
TimeSpan Timeout = TimeSpan.FromSeconds( 30 );
while( true )
{
await Task.WhenAny( Task.WhenAll( TasksToWaitFor ), Task.Delay( Timeout ) );
if( TasksToWaitFor.All( a => a.IsCompleted ) )
break;
//Do something else here
}
You can use the following code:
var timeoutTime = 10;
var tasksResult = await Task.WhenAll(
listOfTasks.Select(x => Task.WhenAny(
x, Task.Delay(TimeSpan.FromMinutes(timeoutTime)))
)
);
var succeededtasksResponses = tasksResult
.OfType<Task<MyResult>>()
.Select(task => task.Result);
if (succeededtasksResponses.Count() != listOfTasks.Count())
{
// Not all tasks were completed
// Throw error or do whatever you want
}
//You can use the succeededtasksResponses that contains the list of successful responses
How it works:
You need to put in the timeoutTime variable the limit of time for all tasks to be completed. So basically all tasks will wait in maximum the time that you set in timeoutTime. When all the tasks return the result, the timeout will not occur and the tasksResult will be set.
After that we are only getting the completed tasks. The tasks that were not completed will have no results.
I tried to improve on the excellent i3arnon's solution, in order to fix some minor issues, but I ended up with a completely different implementation. The two issues that I tried to solve are:
In case more than one of the tasks fail, propagate the errors of all failed tasks, and not just the error of the first failed task in the list.
Prevent memory leaks in case all tasks complete much faster than the timeout.
Leaking an active Task.Delay might result in a non-negligible amount of leaked memory, in case the WhenAll is called in a loop, and the timeout is large.
On top of that I added a cancellationToken argument, XML documentation that explains what this method is doing, and argument validation. Here it is:
/// <summary>
/// Returns a task that will complete when all of the tasks have completed,
/// or when the timeout has elapsed, or when the token is canceled, whatever
/// comes first. In case the tasks complete first, the task contains the
/// results/exceptions of all the tasks. In case the timeout elapsed first,
/// the task contains the results/exceptions of the completed tasks only.
/// In case the token is canceled first, the task is canceled. To determine
/// whether a timeout has occured, compare the number of the results with
/// the number of the tasks.
/// </summary>
public static Task<TResult[]> WhenAll<TResult>(
Task<TResult>[] tasks,
TimeSpan timeout, CancellationToken cancellationToken = default)
{
if (tasks == null) throw new ArgumentNullException(nameof(tasks));
tasks = tasks.ToArray(); // Defensive copy
if (tasks.Any(t => t == null)) throw new ArgumentException(
$"The {nameof(tasks)} argument included a null value.", nameof(tasks));
if (timeout < TimeSpan.Zero && timeout != Timeout.InfiniteTimeSpan)
throw new ArgumentOutOfRangeException(nameof(timeout));
if (cancellationToken.IsCancellationRequested)
return Task.FromCanceled<TResult[]>(cancellationToken);
var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
cts.CancelAfter(timeout);
var continuationOptions = TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously;
var continuations = tasks.Select(task => task.ContinueWith(_ => { },
cts.Token, continuationOptions, TaskScheduler.Default));
return Task.WhenAll(continuations).ContinueWith(allContinuations =>
{
cts.Dispose();
if (allContinuations.IsCompletedSuccessfully)
return Task.WhenAll(tasks); // No timeout or cancellation occurred
Debug.Assert(allContinuations.IsCanceled);
if (cancellationToken.IsCancellationRequested)
return Task.FromCanceled<TResult[]>(cancellationToken);
// Now we know that timeout has occurred
return Task.WhenAll(tasks.Where(task => task.IsCompleted));
}, default, continuationOptions, TaskScheduler.Default).Unwrap();
}
This WhenAll implementation elides async and await, which is not advisable in general. In this case it is necessary, in order to propagate all the errors in a not nested AggregateException. The intention is to simulate the behavior of the built-in Task.WhenAll method as accurately as possible.
Usage example:
string[] results;
Task<string[]> whenAllTask = WhenAll(tasks, TimeSpan.FromSeconds(15));
try
{
results = await whenAllTask;
}
catch when (whenAllTask.IsFaulted) // It might also be canceled
{
// Log all errors
foreach (var innerEx in whenAllTask.Exception.InnerExceptions)
{
_logger.LogError(innerEx, innerEx.Message);
}
throw; // Propagate the error of the first failed task
}
if (results.Length < tasks.Length) throw new TimeoutException();
return results;
Note: the above API has a design flaw. In case at least one of the tasks has failed or has been canceled, there is no way to determine whether a timeout has occurred. The Exception.InnerExceptions property of the task returned by the WhenAll may contain the exceptions of all tasks, or part of the tasks, and there is no way to say which is which. Unfortunately I can't think of a solution to this problem.

Categories