I'm writing a Windows service and am looking for a way to execute a number of foreach loops in parallel, where each loop makes a call to an asynchronous (TAP) method. I initially tried the following code, which doesn't work because Parallel.ForEach and async/await are not compatible. Does anyone know whether there is an alternate approach that can achieve this?
Parallel.ForEach(messagesByFromNumber, async messageGroup =>
{
foreach (var message in messageGroup)
{
await message.SendAsync();
}
});
For the sake of clarity, due to the way that SendAsync() operates, each copy of the foreach loop must execute serially; in other words, the foreach loop cannot become concurrent / parallel.
There's no need to use Parallel.Foreach if your goal is to run these concurrently. Simply go over all your groups, create a task for each group that does a foreach of SendAsync, get all the tasks, and await them all at once with Task.WhenAll:
var tasks = messagesByFromNumber.Select(async messageGroup =>
{
foreach (var message in messageGroup)
{
await message.SendAsync();
}
});
await Task.WhenAll(tasks)
You can make it more cleat with AsyncEnumerator NuGet Package:
using System.Collections.Async;
await messagesByFromNumber.ParallelForEachAsync(async messageGroup =>
{
foreach (var message in messageGroup)
{
await message.SendAsync();
}
}, maxDegreeOfParallelism: 10);
Related
I know that using await in foreach is not a good practice due to performance, as it would await sequentially for each task.
foreach (var task in result)
{
task.Stages = await GetStagesForTask(task.Id);
}
So how can I improve that code? I was trying to do something like this:
List<Task> listOfTasks = new List<Task>();
foreach (var task in result)
{
var stage = GetStagesForTask(task.Id);
listOfTasks.Add(stage);
task.Stages = stage;
}
await Task.WhenAll(listOfTasks);
but of course because of incorrect type here task.Stages = stage; it won't work.
You could use LINQ with an asynchronous delegate:
var tasks = result.Select(async task =>
{
var stage = await GetStagesForTask(task.Id);
task.Stages = stage;
});
await Task.WhenAll(tasks);
Or introduce a local function:
List<Task> listOfTasks = new List<Task>();
async Task SetStagesAsync(YourTask task)
{
task.Stages = await GetStagesForTask(task.Id);
}
foreach (var task in result)
{
listOfTasks.Add(SetStagesAsync(task));
}
await Task.WhenAll(listOfTasks);
Or even a combination of the two:
async Task SetStagesAsync(YourTask task)
{
task.Stages = await GetStagesForTask(task.Id);
}
await Task.WhenAll(result.Select(SetStagesAsync));
Johnathan Barclay's solution is perfect if you don't mind mutating concurrently the Stages property of your entities. But if you prefer to defer the mutations until all the asynchronous operations have been completed, then you could consider projecting your entities to a list of Task<Action>s, then awaiting these tasks using the Task.WhenAll, and finally invoking sequentially all the resulting Actions:
Task<Action>[] tasks = entities.Select(async entity =>
{
var stages = await GetStagesForEntityAsync(entity.Id);
return new Action(() => entity.Stages = stages);
}).ToArray();
Action[] actions = await Task.WhenAll(tasks);
foreach (var action in actions) action.Invoke();
In the above example I have renamed the task and result variables of your example to entity/entities, to prevent any confusion between your entities and the built-in Task class.
The LINQ Select operator makes it easy to project one enumerable to another, and it's especially handy when you want to create a list of custom tasks from a list of objects.
A neat way to solve this issue is to create a thread for each task, the await would happen inside the separate threads. So, you would create the threads in a foreach, they would do the await in separate threads and then another foreach would call .Join() for each thread. In this manner you will not have sequential await, but create the threads, they would work in parallel and you will wait as much as your longest task and not as much as the sum of the time needed for all tasks.
However, if you have many tasks, then beware eating up all resources. If you have many tasks, then break them into chunks of maybe 10 threads and apply the approach I have described above.
I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}
I want to upload potentially large batches (possibly 100s) of files to FTP, using the SSH.NET library and the Renci.SshNet.Async extensions. I need to limit the number of concurrent uploads to five, or whatever number I discover the FTP can handle.
This is my code before any limiting:
using (var sftp = new SftpClient(sftpHost, 22, sftpUser, sftpPass))
{
var tasks = new List<Task>();
try
{
sftp.Connect();
foreach (var file in Directory.EnumerateFiles(localPath, "*.xml"))
{
tasks.Add(
sftp.UploadAsync(
File.OpenRead(file), // Stream input
Path.GetFileName(file), // string path
true)); // bool canOverride
}
await Task.WhenAll(tasks);
sftp.Disconnect();
}
// trimmed catch
}
I've read about SemaphoreSlim, but I don't fully understand how it works and how it is used with TAP. This is, based on the MSDN documentation, how I would implement it.
I'm unsure if using Task.Run is the correct way to go about this, as it's I/O bound, and from what I know, Task.Run is for CPU-bound work and async/await for I/O-bound work. I also don't understand how these tasks enter (is that the correct terminology) the semaphore, as all they do is call .Release() on it.
using (var sftp = new SftpClient(sftpHost, 22, sftpUser, sftpPass))
{
var tasks = new List<Task>();
var semaphore = new SemaphoreSlim(5);
try
{
sftp.Connect();
foreach (var file in Directory.EnumerateFiles(localPath, "*.xml"))
{
tasks.Add(
Task.Run(() =>
{
sftp.UploadAsync(
File.OpenRead(file), // Stream input
Path.GetFileName(file), // string path
true)); // bool canOverride
semaphore.Release();
});
}
await Task.WhenAll(tasks);
sftp.Disconnect();
}
// trimmed catch
}
from what I know, Task.Run is for CPU-bound work
Correct.
and async/await for I/O-bound work.
No. await is a tool for adding continuations to an asynchronous operation. It doesn't care about the nature of what that asynchronous operation is. It simply makes it easier to compose asynchronous operations of any kind together.
If you want to compose several asyncrhonous operations together you do that by making an async method, using the various asynchronous operations, awaiting them when you need their results (or for them to be completed) and then use the Task form that method as its own new asynchronous operation.
In your case your new asynchronous operation simply needs to be awaiting the semaphore, uploading your file, then releasing the semaphore.
async Task UploadFile()
{
await semaphore.WaitAsync();
try
{
await sftp.UploadAsync(
File.OpenRead(file),
Path.GetFileName(file),
true));
}
finally
{
semaphore.Release();
}
}
Now you can simply call that method for each file.
Additionally, because this is such a common operation to do, you may find it worth it to create a new class to handle this logic so that you can simply make a queue, and add items to the queue, and have it handle the throttling internally, rather than replicating that mechanic everywhere you use it.
I need to send a request to multiple servers and am trying to use tasks to run each connection asynchronously. I have a function that is configured to make the connections:
internal static Task<EventRecordEx> GetEventRecordFromServer(string server, string activityID)
I have tried the following but it runs synchronously...
var taskList = new List<Task<EventRecordEx>>();
foreach (string server in server_list)
{
taskList.Add(GetEventRecordFromServer(server, id));
}
await Task.Factory.ContinueWhenAll(taskList.ToArray(), completedTasks =>
{
foreach (var task in completedTasks)
{
// do something with the results
}
});
What am I doing wrong?
In my understanding when you use .ContinueWhenAll, you'll have a hard time debugging for the exceptions when one of the tasks fail as it will return an AggregateException, I'd suggest running the task individually, then use .ConfigureAwait(false) to make sure that it runs in a nun UI thread like so:
foreach(Task task in taskList.ToArray()){
await task.ConfigureAwait(false);
// Do something.
}
I would like to use .NET iterator with parallel Tasks/await?. Something like this:
IEnumerable<TDst> Foo<TSrc, TDest>(IEnumerable<TSrc> source)
{
Parallel.ForEach(
source,
s=>
{
// Ordering is NOT important
// items can be yielded as soon as they are done
yield return ExecuteOrDownloadSomething(s);
}
}
Unfortunately .NET cannot natively handle this. Best answer so far by #svick - use AsParallel().
BONUS: Any simple async/await code that implements multiple publishers and a single subscriber? The subscriber would yield, and the pubs would process. (core libraries only)
This seems like a job for PLINQ:
return source.AsParallel().Select(s => ExecuteOrDownloadSomething(s));
This will execute the delegate in parallel using a limited number of threads, returning each result as soon as it completes.
If the ExecuteOrDownloadSomething() method is IO-bound (e.g. it actually downloads something) and you don't want to waste threads, then using async-await might make sense, but it would be more complicated.
If you want to fully take advantage of async, you shouldn't return IEnumerable, because it's synchronous (i.e. it blocks if no items are available). What you need is some sort of asynchronous collection, and you can use ISourceBlock (specifically, TransformBlock) from TPL Dataflow for that:
ISourceBlock<TDst> Foo<TSrc, TDest>(IEnumerable<TSrc> source)
{
var block = new TransformBlock<TSrc, TDest>(
async s => await ExecuteOrDownloadSomethingAsync(s),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
});
foreach (var item in source)
block.Post(item);
block.Complete();
return block;
}
If the source is “slow” (i.e. you want to start processing the results from Foo() before iterating source is completed), you might want to move the foreach and Complete() call to a separate Task. Even better solution would be to make source into a ISourceBlock<TSrc> too.
So it appears what you really want to do is to order a sequence of tasks based on when they complete. This is not terribly complex:
public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
var input = tasks.ToList();
var output = input.Select(task => new TaskCompletionSource<T>());
var collection = new BlockingCollection<TaskCompletionSource<T>>();
foreach (var tcs in output)
collection.Add(tcs);
foreach (var task in input)
{
task.ContinueWith(t =>
{
var tcs = collection.Take();
switch (task.Status)
{
case TaskStatus.Canceled:
tcs.TrySetCanceled();
break;
case TaskStatus.Faulted:
tcs.TrySetException(task.Exception.InnerExceptions);
break;
case TaskStatus.RanToCompletion:
tcs.TrySetResult(task.Result);
break;
}
}
, CancellationToken.None
, TaskContinuationOptions.ExecuteSynchronously
, TaskScheduler.Default);
}
return output.Select(tcs => tcs.Task);
}
So here we create a TaskCompletionSource for each input task, then go through each of the tasks and set a continuation which grabs the next completion source from a BlockingCollection and sets it's result. The first task completed grabs the first tcs that was returned, the second task completed gets the second tcs that was returned, and so on.
Now your code becomes quite simple:
var tasks = collection.Select(item => LongRunningOperationThatReturnsTask(item))
.Order();
foreach(var task in tasks)
{
var result = task.Result;//or you could `await` each result
//....
}
In the asynchronous library made by the MS robotics team, they had concurrency primitives which allowed for using an iterator to yield asynchronous code.
The library (CCR) is free (It didn't use to be free). A nice introductory article can be found here: Concurrent affairs
Perhaps you can use this library alongside .Net task library, or it'll inspire you to 'roll your own'