I was looking at someone sample code for async and noticed a few issues with the way it was implemented. Whilst looking at the code I wondered if it would be more efficient to loop through a list using as parallel, rather than just looping through the list normally.
As far as I can tell there is very little difference in performance, both use up every processor, and both talk around the same amount of time to completed.
This is the first way of doing it
var tasks= Client.GetClients().Select(async p => await p.Initialize());
And this is the second
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
Am I correct in assuming there is no difference between the two?
The full program can be found below
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
RunCode1();
Console.WriteLine("Here");
Console.ReadLine();
RunCode2();
Console.WriteLine("Here");
Console.ReadLine();
}
private async static void RunCode1()
{
Stopwatch myStopWatch = new Stopwatch();
myStopWatch.Start();
var tasks= Client.GetClients().Select(async p => await p.Initialize());
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time ellapsed(ms): " + myStopWatch.ElapsedMilliseconds);
myStopWatch.Stop();
}
private async static void RunCode2()
{
Stopwatch myStopWatch = new Stopwatch();
myStopWatch.Start();
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time ellapsed(ms): " + myStopWatch.ElapsedMilliseconds);
myStopWatch.Stop();
}
}
class Client
{
public static IEnumerable<Client> GetClients()
{
for (int i = 0; i < 100; i++)
{
yield return new Client() { Id = Guid.NewGuid() };
}
}
public Guid Id { get; set; }
//This method has to be called before you use a client
//For the sample, I don't put it on the constructor
public async Task Initialize()
{
await Task.Factory.StartNew(() =>
{
Stopwatch timer = new Stopwatch();
timer.Start();
while(timer.ElapsedMilliseconds<1000)
{}
timer.Stop();
});
Console.WriteLine("Completed: " + Id);
}
}
}
There should be very little discernible difference.
In your first case:
var tasks = Client.GetClients().Select(async p => await p.Initialize());
The executing thread will (one at a time) start executing Initialize for each element in the client list. Initialize immediately queues a method to the thread pool and returns an uncompleted Task.
In your second case:
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
The executing thread will fork to the thread pool and (in parallel) start executing Initialize for each element in the client list. Initialize has the same behavior: it immediately queues a method to the thread pool and returns.
The two timings are nearly identical because you're only parallelizing a small amount of code: the queueing of the method to the thread pool and the return of an uncompleted Task.
If Initialize did some longer (synchronous) work before its first await, it may make sense to use AsParallel.
Remember, all async methods (and lambdas) start out being executed synchronously (see the official FAQ or my own intro post).
There's a singular major difference.
In the following code, you are taking it upon yourself to perform the partitioning. In other words, you're creating one Task object per item from the IEnumerable<T> that is returned from the call to GetClients():
var tasks= Client.GetClients().Select(async p => await p.Initialize());
In the second, the call to AsParallel is internally going to use Task instances to execute partitions of the IEnumerable<T> and you're going to have the initial Task that is returned from the lambda async p => await p.Initialize():
var tasks = Client.GetClients().AsParallel().
Select(async p => await p.Initialize());
Finally, you're not really doing anything by using async/await here. Granted, the compiler might optimize this out, but you're just waiting on a method that returns a Task and then returning a continuation that does nothing back through the lambda. That said, since the call to Initialize is already returning a Task, it's best to keep it simple and just do:
var tasks = Client.GetClients().Select(p => p.Initialize());
Which will return the sequence of Task instances for you.
To improve on the above 2 answers this is the simplest way to get an async/threaded execution that is awaitable:
var results = await Task.WhenAll(Client.GetClients()
.Select(async p => p.Initialize()));
This will ensure that it spins separate threads and that you get the results at the end. Hope that helps someone. Took me quite a while to figure this out properly since this is very not obvious and the AsParallel() function seems to be what you want but doesn't use async/await.
Related
I have an IEnumerable<Task>, where each Task will call the same endpoint. However, the endpoint can only handle so many calls per second. How can I put, say, a half second delay between each call?
I have tried adding Task.Delay(), but of course awaiting them simply means that the app waits a half second before sending all the calls at once.
Here is a code snippet:
var resultTasks = orders
.Select(async task =>
{
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
return result;
} );
var results = Task.WhenAll(resultTasks);
I feel like I should do something like
Task.WhenAll(resultTasks.EmitOverTime(500));
... but how exactly do I do that?
What you describe in your question is in other words rate limiting. You'd like to apply rate limiting policy to your client, because the API you use enforces such a policy on the server to protect itself from abuse.
While you could implement rate limiting yourself, I'd recommend you to go with some well established solution. Rate Limiter from Davis Desmaisons was the one that I picked at random and I instantly liked it. It had solid documentation, superior coverage and was easy to use. It is also available as NuGet package.
Check out the simple snippet below that demonstrates running semi-overlapping tasks in sequence while defering the task start by half a second after the immediately preceding task started. Each task lasts at least 750 ms.
using ComposableAsync;
using RateLimiter;
using System;
using System.Threading.Tasks;
namespace RateLimiterTest
{
class Program
{
static void Main(string[] args)
{
Log("Starting tasks ...");
var constraint = TimeLimiter.GetFromMaxCountByInterval(1, TimeSpan.FromSeconds(0.5));
var tasks = new[]
{
DoWorkAsync("Task1", constraint),
DoWorkAsync("Task2", constraint),
DoWorkAsync("Task3", constraint),
DoWorkAsync("Task4", constraint)
};
Task.WaitAll(tasks);
Log("All tasks finished.");
Console.ReadLine();
}
static void Log(string message)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff ") + message);
}
static async Task DoWorkAsync(string name, IDispatcher constraint)
{
await constraint;
Log(name + " started");
await Task.Delay(750);
Log(name + " finished");
}
}
}
Sample output:
10:03:27.121 Starting tasks ...
10:03:27.154 Task1 started
10:03:27.658 Task2 started
10:03:27.911 Task1 finished
10:03:28.160 Task3 started
10:03:28.410 Task2 finished
10:03:28.680 Task4 started
10:03:28.913 Task3 finished
10:03:29.443 Task4 finished
10:03:29.443 All tasks finished.
If you change the constraint to allow maximum two tasks per second (var constraint = TimeLimiter.GetFromMaxCountByInterval(2, TimeSpan.FromSeconds(1));), which is not the same as one per half a second, then the output could be like:
10:06:03.237 Starting tasks ...
10:06:03.264 Task1 started
10:06:03.268 Task2 started
10:06:04.026 Task2 finished
10:06:04.031 Task1 finished
10:06:04.275 Task3 started
10:06:04.276 Task4 started
10:06:05.032 Task4 finished
10:06:05.032 Task3 finished
10:06:05.033 All tasks finished.
Note that the current version of Rate Limiter targets .NETFramework 4.7.2+ or .NETStandard 2.0+.
This is just a thought, but another approach could be to create a queue and add another thread that runs polling the queue for calls that need to go out to your endpoint.
Have you considered just turning that into a foreach-loop with a Task.Delay call? You seem to want to explicitly call them sequentially, it won't hurt if that is obvious from your code.
var results = new List<YourResultType>;
foreach(var order in orders){
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
results.Add(result.Response);
}
catch(Exception ex)
{
result.Exception = ex;
}
}
Instead of selecting from orders you could loop over them, and inside the loop put the result into a list and then call Task.WhenAll.
Would look something like:
var resultTasks = new List<VendorTaskResult>(orders.Count);
orders.ToList().ForEach( item => {
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
resultTasks.Add(result);
Thread.Sleep(x);
});
var results = Task.WhenAll(resultTasks);
If you want to control the number of requests executed simultaneously, you have to use a semaphore.
I have something very similar, and it works fine with me. Please note that I call ToArray() after the Linq query finishes, that triggers the tasks:
using (HttpClient client = new HttpClient()) {
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
await Task.Delay(300);
return client.GetStringAsync(<url with variable job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
}
Now please note that this will create n nunmber of tasks, all in parallel, and the Task.Delay literally does nothing. If you want to call the pages synchronously (as it sounds by putting a delay between the calls), then this code may be better:
using (HttpClient client = new HttpClient()) {
foreach (string job in _group) {
await Task.Delay(300);
_pages.Add(await client.GetStringAsync(<url with variable job>));
}
}
The download of the pages is still asynchronous (while downloading other tasks are done), but each call to download the page is synchronous, ensuring that you can wait for one to finish in order to call the next one.
The code can be easily changed to call the pages asynchronously in chunks, like every 10 pages, wait 300ms, like in this sample:
IEnumerable<string[]> toParse = myData
.Select((v, i) => new { v.code, group = i / 20 })
.GroupBy(x => x.group)
.Select(g => g.Select(x => x.code).ToArray());
using (HttpClient client = new HttpClient()) {
foreach (string[] _group in toParse) {
string[] _pages = null;
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
return client.GetStringAsync(<url with job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
await Task.Delay(5000);
}
}
All this does is group your pages in chunks of 20, iterate through the chunks, download all pages of the chunk asynchronously, wait 5 seconds, move on to the next chunk.
I hope that is what you were waiting for :)
The proposed method EmitOverTime is doable, but only by blocking the current thread:
public static IEnumerable<Task<TResult>> EmitOverTime<TResult>(
this IEnumerable<Task<TResult>> tasks, int delay)
{
foreach (var item in tasks)
{
Thread.Sleep(delay); // Delay by blocking
yield return item;
}
}
Usage:
var results = await Task.WhenAll(resultTasks.EmitOverTime(500));
Probably better is to create a variant of Task.WhenAll that accepts a delay argument, and delays asyncronously:
public static async Task<TResult[]> WhenAllWithDelay<TResult>(
IEnumerable<Task<TResult>> tasks, int delay)
{
var tasksList = new List<Task<TResult>>();
foreach (var task in tasks)
{
await Task.Delay(delay).ConfigureAwait(false);
tasksList.Add(task);
}
return await Task.WhenAll(tasksList).ConfigureAwait(false);
}
Usage:
var results = await WhenAllWithDelay(resultTasks, 500);
This design implies that the enumerable of tasks should be enumerated only once. It is easy to forget this during development, and start enumerating it again, spawning a new set of tasks. For this reason I propose to make it an OnlyOnce enumerable, as it is shown in this question.
Update: I should mention why the above methods work, and under what premise. The premise is that the supplied IEnumerable<Task<TResult>> is deferred, in other words non-materialized. At the method's start there are no tasks created yet. The tasks are created one after the other during the enumeration of the enumerable, and the trick is that the enumeration is slow and controlled. The delay inside the loop ensures that the tasks are not created all at once. They are created hot (in other words already started), so at the time the last task has been created some of the first tasks may have already been completed. The materialized list of half-running/half-completed tasks is then passed to Task.WhenAll, that waits for all to complete asynchronously.
I'm new to tasks and have a question regarding the usage. Does the Task.Factory fire for all items in the foreach loop or block at the 'await' basically making the program single threaded? If I am thinking about this correctly, the foreach loop starts all the tasks and the .GetAwaiter().GetResult(); is blocking the main thread until the last task is complete.
Also, I'm just wanting some anonymous tasks to load the data. Would this be a correct implementation? I'm not referring to exception handling as this is simply an example.
To provide clarity, I am loading data into a database from an outside API. This one is using the FRED database. (https://fred.stlouisfed.org/), but I have several I will hit to complete the entire transfer (maybe 200k data points). Once they are done I update the tables, refresh market calculations, etc. Some of it is real time and some of it is End-of-day. I would also like to say, I currently have everything working in docker, but have been working to update the code using tasks to improve execution.
class Program
{
private async Task SQLBulkLoader()
{
foreach (var fileListObj in indicators.file_list)
{
await Task.Factory.StartNew( () =>
{
string json = this.GET(//API call);
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = fileListObj.series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", fileListObj.series_id);
}
});
}
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
Your awaiting the task returned from Task.Factory.StartNew does make it effectively single threaded. You can see a simple demonstration of this with this short LinqPad example:
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
await Task.Run(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
});
}
Here we wait less as we progress through the loop. The output is:
0 inline
0 in thread
1 inline
1 in thread
2 inline
2 in thread
If you remove the await in front of StartNew you'll see it runs in parallel. As others have mentioned, you can certainly use Parallel.ForEach, but for a demonstration of doing it a bit more manually, you can consider a solution like this:
var tasks = new List<Task>();
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
tasks.Add(Task.Factory.StartNew(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
}));
}
Task.WaitAll(tasks.ToArray());
Notice now how the result is:
0 inline
1 inline
2 inline
2 in thread
1 in thread
0 in thread
This is a typical problem that C# 8.0 Async Streams are going to solve very soon.
Until C# 8.0 is released, you can use the AsyncEnumarator library:
using System.Collections.Async;
class Program
{
private async Task SQLBulkLoader() {
await indicators.file_list.ParallelForEachAsync(async fileListObj =>
{
...
await s.WriteToServerAsync(dataTableConversion);
...
},
maxDegreeOfParalellism: 3,
cancellationToken: default);
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
I do not recommend using Parallel.ForEach and Task.WhenAll as those functions are not designed for asynchronous streaming.
You'll want to add each task to a collection and then use Task.WhenAll to await all of the tasks in that collection:
private async Task SQLBulkLoader()
{
var tasks = new List<Task>();
foreach (var fileListObj in indicators.file_list)
{
tasks.Add(Task.Factory.StartNew( () => { //Doing Stuff }));
}
await Task.WhenAll(tasks.ToArray());
}
My take on this: most time consuming operations will be getting the data using a GET operation and the actual call to WriteToServer using SqlBulkCopy. If you take a look at that class you will see that there is a native async method WriteToServerAsync method (docs here)
. Always use those before creating Tasks yourself using Task.Run.
The same applies to the http GET call. You can use the native HttpClient.GetAsync (docs here) for that.
Doing that you can rewrite your code to this:
private async Task ProcessFileAsync(string series_id)
{
string json = await GetAsync();
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
await s.WriteToServerAsync(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", series_id);
}
}
private async Task SQLBulkLoaderAsync()
{
var tasks = indicators.file_list.Select(f => ProcessFileAsync(f.series_id));
await Task.WhenAll(tasks);
}
Both operations (http call and sql server call) are I/O calls. Using the native async/await pattern there won't even be a thread created or used, see this question for a more in-depth explanation. That is why for IO bound operations you should never have to use Task.Run (or Task.Factory.StartNew. But do mind that Task.Run is the recommended approach).
Sidenote: if you are using HttpClient in a loop, please read this about how to correctly use it.
If you need to limit the number of parallel actions you could also use TPL Dataflow as it plays very nice with Task based IO bound operations. The SQLBulkLoaderAsyncshould then be modified to (leaving the ProcessFileAsync method from earlier this answer intact):
private async Task SQLBulkLoaderAsync()
{
var ab = new ActionBlock<string>(ProcessFileAsync, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var file in indicators.file_list)
{
ab.Post(file.series_id);
}
ab.Complete();
await ab.Completion;
}
Use a Parallel.ForEach loop to enable data parallelism over any System.Collections.Generic.IEnumerable<T> source.
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(fileList, (currentFile) =>
{
//Doing Stuff
Console.WriteLine("Processing {0} on thread {1}", currentFile, Thread.CurrentThread.ManagedThreadId);
});
Why didnt you try this :), this program will not start parallel tasks (in a foreach), it will be blocking but the logic in task will be done in separate thread from threadpool (only one at the time, but the main thread will be blocked).
The proper approach in your situation is to use Paraller.ForEach
How can I convert this foreach code to Parallel.ForEach?
I create a task with continuationWith(anotherTask) as below. I want to find out the time taken for completing its work by the first task. I share the variable "task1StartedDateTime" between task1 and child task. Will this work without any issues?
public static void MyMethod()
{
var task1StartedDateTime = DateTime.Now;
var task1 = doWorkAsync();
task1.ContinueWith(t1 => {
var task1TookTime = DateTime.Now - task1StartedDateTime;
Console.WriteLine($"Task 1 took {task1TookTime}");
//Some other work of this child task
});
}
Yes it will work. However it should be better to make use of the StopWatch class, since this is a more accurate and efficient way of calculating elapsed time of a method, process whatever running on a machine. For more info related to the latter argument, please have a look here:
var stopwatch = StopWatch.StartNew();
var task1 = doWorkAsync();
task1.ContinueWith(t1 => {
stopwatch.Stop();
Console.WriteLine($"Task 1 took {stopwatch.EllapsedMilliseconds} ms.");
//Some other work of this child task
}
Yes, you can use captured variables in a lambda - captured variables closed over in this way will be promoted to an anonymous class instance, to ensure they can outlive the method they are declared in, and to allow sharing between the outer method and the continuation.
However, you should use aStopwatch for measuring time - it is more accurate.
In .Net 4.5 and later, you also have the option to replace the continuation in .ContinueWith to an awaited continuation - this has additional guarantees, and is easier to read:
public static async Task MyMethod()
{
var sw = new Stopwatch();
sw.Start();
await doWorkAsync();
var task1TookTime = sw.Elapsed;
Console.WriteLine($"Task 1 took {task1TookTime}");
//Some other work of this child task
}
(Although note that if MyMethod is awaited, that the Task will only complete once doWorkAsync and the timer logging is complete, which differs from your original implementation).
I have a list of objects that I need to run a long running process on and I would like to kick them off asynchronously, then when they are all finished return them as a list to the calling method. I've been trying different methods that I have found, however it appears that the processes are still running synchronously in the order that they are in the list. So I am sure that I am missing something in the process of how to execute a list of tasks.
Here is my code:
public async Task<List<ShipmentOverview>> GetShipmentByStatus(ShipmentFilterModel filter)
{
if (string.IsNullOrEmpty(filter.Status))
{
throw new InvalidShipmentStatusException(filter.Status);
}
var lookups = GetLookups(false, Brownells.ConsolidatedShipping.Constants.ShipmentStatusType);
var lookup = lookups.SingleOrDefault(sd => sd.Name.ToLower() == filter.Status.ToLower());
if (lookup != null)
{
filter.StatusId = lookup.Id;
var shipments = Shipments.GetShipments(filter);
var tasks = shipments.Select(async model => await GetOverview(model)).ToList();
ShipmentOverview[] finishedTask = await Task.WhenAll(tasks);
return finishedTask.ToList();
}
else
{
throw new InvalidShipmentStatusException(filter.Status);
}
}
private async Task<ShipmentOverview> GetOverview(ShipmentModel model)
{
String version;
var user = AuthContext.GetUserSecurityModel(Identity.Token, out version) as UserSecurityModel;
var profile = AuthContext.GetProfileSecurityModel(user.Profiles.First());
var overview = new ShipmentOverview
{
Id = model.Id,
CanView = true,
CanClose = profile.HasFeatureAction("Shipments", "Close", "POST"),
CanClear = profile.HasFeatureAction("Shipments", "Clear", "POST"),
CanEdit = profile.HasFeatureAction("Shipments", "Get", "PUT"),
ShipmentNumber = model.ShipmentNumber.ToString(),
ShipmentName = model.Name,
};
var parcels = Shipments.GetParcelsInShipment(model.Id);
overview.NumberParcels = parcels.Count;
var orders = parcels.Select(s => WareHouseClient.GetOrderNumberFromParcelId(s.ParcelNumber)).ToList();
overview.NumberOrders = orders.Distinct().Count();
//check validations
var vals = Shipments.GetShipmentValidations(model.Id);
if (model.ValidationTypeId == Constants.OrderValidationType)
{
if (vals.Count > 0)
{
overview.NumberOrdersTotal = vals.Count();
overview.NumberParcelsTotal = vals.Sum(s => WareHouseClient.GetParcelsPerOrder(s.ValidateReference));
}
}
return overview;
}
It looks like you're using asynchronous methods while you really want threads.
Asynchronous methods yield control back to the calling method when an async method is called, then wait until the methods has completed on the await. You can see how it works here.
Basically, the only usefulness of async/await methods is not to lock the UI, so that it stays responsive.
If you want to fire multiple processings in parallel, you will want to use threads, like such:
using System.Threading.Tasks;
public void MainMethod() {
// Parallel.ForEach will automagically run the "right" number of threads in parallel
Parallel.ForEach(shipments, shipment => ProcessShipment(shipment));
// do something when all shipments have been processed
}
public void ProcessShipment(Shipment shipment) { ... }
Marking the method as async doesn't auto-magically make it execute in parallel. Since you're not using await at all, it will in fact execute completely synchronously as if it wasn't async. You might have read somewhere that async makes functions execute asynchronously, but this simply isn't true - forget it. The only thing it does is build a state machine to handle task continuations for you when you use await and actually build all the code to manage those tasks and their error handling.
If your code is mostly I/O bound, use the asynchronous APIs with await to make sure the methods actually execute in parallel. If they are CPU bound, a Task.Run (or Parallel.ForEach) will work best.
Also, there's no point in doing .Select(async model => await GetOverview(model). It's almost equivalent to .Select(model => GetOverview(model). In any case, since the method actually doesn't return an asynchronous task, it will be executed while doing the Select, long before you get to the Task.WhenAll.
Given this, even the GetShipmentByStatus's async is pretty much useless - you only use await to await the Task.WhenAll, but since all the tasks are already completed by that point, it will simply complete synchronously.
If your tasks are CPU bound and not I/O bound, then here is the pattern I believe you're looking for:
static void Main(string[] args) {
Task firstStepTask = Task.Run(() => firstStep());
Task secondStepTask = Task.Run(() => secondStep());
//...
Task finalStepTask = Task.Factory.ContinueWhenAll(
new Task[] { step1Task, step2Task }, //more if more than two steps...
(previousTasks) => finalStep());
finalStepTask.Wait();
}
Given the following code, is it OK to do async/await inside a Parallel.ForEach ?
eg.
Parallel.ForEach(names, name =>
{
// Do some stuff...
var foo = await GetStuffFrom3rdPartyAsync(name);
// Do some more stuff, with the foo.
});
or is there some gotcha's that I need to be made aware of?
EDIT: No idea if this compiles, btw. Just Pseduo-code .. thinking out loud.
No, It doesn't make sense to combine async with Paralell.Foreach.
Consider the following example:
private void DoSomething()
{
var names = Enumerable.Range(0,10).Select(x=> "Somename" + x);
Parallel.ForEach(names, async(name) =>
{
await Task.Delay(1000);
Console.WriteLine("Name {0} completed",name);
});
Console.WriteLine("Parallel ForEach completed");
}
What output you will expect?
Name Somename3 completed
Name Somename8 completed
Name Somename4 completed
...
Parallel ForEach completed
That's not what will happen. It will output :
Parallel ForEach completed
Name Somename3 completed
Name Somename8 completed
Name Somename4 completed
...
Why? Because when ForEach hits first await the method actually returns, Parallel.ForEach doesn't know it is asynchronous and it ran to completion!. Code after await runs as continuation on another thread not "Paralell processing thread"
Stephen toub addressed this here
From the name, I'm assuming that GetStuffFrom3rdPartyAsync is I/O-bound. The Parallel class is specifically for CPU-bound code.
In the asynchronous world, you can start multiple tasks and then (asynchronously) wait for them all to complete using Task.WhenAll. Since you're starting with a sequence, it's probably easiest to project each element to an asynchronous operation, and then await all of those operations:
await Task.WhenAll(names.Select(async name =>
{
// Do some stuff...
var foo = await GetStuffFrom3rdPartyAsync(name);
// Do some more stuff, with the foo.
}));
A close alternative might be this:
static void ForEach<T>(IEnumerable<T> data, Func<T, Task> func)
{
var tasks = data.Select(item =>
Task.Run(() => func(item)));
Task.WaitAll(tasks.ToArray());
}
// ...
ForEach(names, name => GetStuffFrom3rdPartyAsync(name));
Ideally, you shouldn't be using a blocking call like Task.WaitAll, if you can make the whole chain of methods calls async, "all the way down" on the current call stack:
var tasks = data.Select(item =>
Task.Run(() => func(item)));
await Task.WhenAll(tasks.ToArray());
Furthermore, if you don't do any CPU-bound work inside GetStuffFrom3rdPartyAsync, Task.Run may be redundant:
var tasks = data.Select(item => func(item));
As pointed out by #Sriram Sakthivel there are some problems with using Parallel.ForEach with asynchronous lambdas. Steven Toub's ForEachASync can do the equivalent. He talks about it here, but here is the code:
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition) while (partition.MoveNext()) await body(partition.Current);
}));
}
}
It uses the Partitioner class to create a load balancing partitioner(doco), and allows you to specify how many threads you want to run with the dop parameter. to see the difference between it and Parallel.ForEach. Try the following code.
class Program
{
public static async Task GetStuffParallelForEach()
{
var data = Enumerable.Range(1, 10);
Parallel.ForEach(data, async i =>
{
await Task.Delay(1000 * i);
Console.WriteLine(i);
});
}
public static async Task GetStuffForEachAsync()
{
var data = Enumerable.Range(1, 10);
await data.ForEachAsync(5, async i =>
{
await Task.Delay(1000 * i);
Console.WriteLine(i);
});
}
static void Main(string[] args)
{
//GetStuffParallelForEach().Wait(); // Finished printed before work is complete
GetStuffForEachAsync().Wait(); // Finished printed after all work is done
Console.WriteLine("Finished");
Console.ReadLine();
}
if you run GetStuffForEachAsync the program waits for all work to finish. If you run GetStuffParallelForEach, the line Finished will be printed before the work is finished.