C# ActionBlock Equivalent that collects results - c#

I'm currently using an ActionBlock to process serially started asynchronous jobs. It works very well for processing each item Posted to it, but there is no way to collect a list of the results from each job.
What can I use to collect the results of my jobs in a thread safe manner?
My code is currently something like this:
var actionBlock = new ActionBlock<int> (async i => await Process(i));
for(int i = 0; i < 100; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
I've tried using a TransformBlock instead, but it hangs indefinitely when awaiting the Completion. The completion's status is "WaitingForActivation".
My code with the TransformBlock is something like this:
var transformBlock = new TransformBlock<int, string> (async i => await Process(i));
for(int i = 0; i < 100; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
transformBlock.TryReceiveAll(out IList<string> strings);

It turns out a ConcurrentBag is the answer
var bag = new ConcurrentBag<string>();
var actionBlock = new ActionBlock<int> (async i =>
bag.Add(await Process(i))
);
for(int i = 0; i < 100; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
Now 'bag' has all the results in it, and can be accessed as an IEnumerable.
The code I've actually ended up using uses a Parallel.ForEach instead of the ActionBlock.
Parallel.ForEach
(
inputData,
i => bag.Add(await Process(i))
);
This is quite a lot simpler, but seems about as good for performance and still has options to limit the degree of parallelism etc.

Related

Storing each async result in its own array element

Let's say I want to download 1000 recipes from a website. The websites accepts at most 10 concurrent connections. Each recipe should be stored in an array, at its corresponding index. (I don't want to send the array to the DownloadRecipe method.)
Technically, I've already solved the problem, but I would like to know if there is an even cleaner way to use async/await or something else to achieve it?
static async Task MainAsync()
{
int recipeCount = 1000;
int connectionCount = 10;
string[] recipes = new string[recipeCount];
Task<string>[] tasks = new Task<string>[connectionCount];
int r = 0;
while (r < recipeCount)
{
for (int t = 0; t < tasks.Length; t++)
{
tasks[t] = Task.Run(async () => recipes[r] = await DownloadRecipe(r));
r++;
}
await Task.WhenAll(tasks);
}
}
static async Task<string> DownloadRecipe(int index)
{
// ... await calls to download recipe
}
Also, this solution it's not optimal, since it doesn't bother starting a new download until all the 10 running downloads are finished. Is there something we can improve there without bloating the code too much? A thread pool limited to 10 threads?
There are many many ways you could do this. One way is to use an ActionBlock which give you access to MaxDegreeOfParallelism fairly easily and will work well with async methods
static async Task MainAsync()
{
var recipeCount = 1000;
var connectionCount = 10;
var recipes = new string[recipeCount];
async Task Action(int i) => recipes[i] = await DownloadRecipe(i);
var processor = new ActionBlock<int>(Action, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = connectionCount,
SingleProducerConstrained = true
});
for (var i = 0; i < recipeCount; i++)
await processor.SendAsync(i);
processor.Complete();
await processor.Completion;
}
static async Task<string> DownloadRecipe(int index)
{
...
}
Another way might be to use a SemaphoreSlim
var slim = new SemaphoreSlim(connectionCount, connectionCount);
var tasks = Enumerable
.Range(0, recipeCount)
.Select(Selector);
async Task<string> Selector(int i)
{
await slim.WaitAsync()
try
{
return await DownloadRecipe(i)
}
finally
{
slim.Release();
}
}
var recipes = await Task.WhenAll(tasks);
Another set of approaches is to use Reactive Extensions (Rx)... Once again there are many ways to do this, this is just an awaitable approach (and likely could be better all things considered)
var results = await Enumerable
.Range(0, recipeCount)
.ToObservable()
.Select(i => Observable.FromAsync(() => DownloadRecipe(i)))
.Merge(connectionCount)
.ToArray()
.ToTask();
Alternative approach to have 10 "pools" which will load data "simultaneously".
You don't need to wrap IO operations with the separate thread. Using separate thread for IO operations is just a waste of resources.
Notice that thread which downloads data will do nothing, but just waiting for a response. This is where async-await approach come very handy - we can send multiple requests without waiting them to complete and without wasting threads.
static async Task MainAsync()
{
var requests = Enumerable.Range(0, 1000).ToArray();
var maxConnections = 10;
var pools = requests
.GroupBy(i => i % maxConnections)
.Select(group => DownloadRecipesFor(group.ToArray()))
.ToArray();
await Task.WhenAll(pools);
var recipes = pools.SelectMany(pool => pool.Result).ToArray();
}
static async Task<IEnumerable<string>> DownLoadRecipesFor(params int[] requests)
{
var recipes = new List<string>();
foreach (var request in requests)
{
var recipe = await DownloadRecipe(request);
recipes.Add(recipe);
}
return recipes;
}
Because inside the pool (DownloadRecipesFor method) we download results one by one - we make sure that we have no more than 10 active requests all the time.
This is little bit more effective than originals, because we don't wait for 10 tasks to complete before starting next "bunch".
This is not ideal, because if last "pool" finishes early then others it aren't able to pickup next request to handle.
Final result will have corresponding indexes, because we will process "pools" and requests inside in same order as we created them.

How to implement asynchronous along with in-parallel (in a non-blocking fashion) for I/O operations (like API requests)?

There is an API Async method (GetMatrix) which takes a row number of a square matrix as input parameter and responds back with the values in that row.
I need to construct the whole matrix by calling the API multiple times (row size of the matrix times).
Right now I have initialized a jagged array and I'm sending async calls to that API in a "For loop" like below.
var matrix = new int[size][];
for (int i = 0; i < size; i++)
{
matrix[i] = await GetMatrix(i);
}
It's taking a decent amount of time to construct the whole matrix if the matrix is of huge size (like 1000 rows).
Can we create parallel tasks in conjunction with the above async calls? I think it will be faster than just async calls. How can we do that in C#?
Can we create parallel tasks in conjunction with the above async calls?
What you're looking for is asynchronous concurrency, which is most commonly done with Task.WhenAll:
var sizes = Enumerable.Range(0, size);
var tasks = sizes.Select(GetMatrix).ToList();
var matrix = await Task.WhenAll(tasks);
Side note: ContinueWith is dangerous.
If your method doesn't require to be async then you can try to use Parallel.For like this var matrix = new int[size][]; Parallel.For(0, size, async i => { matrix[i] = await GetMatrix(i); });
But if your method should be async then you could do something like mjwills said:
var matrix = new int[size][];
var tasks = new Task[size];
for (int i = 0; i < size; i++)
{
var local = i;
tasks[i] = GetMatrix(i).ContinueWith(t => matrix[local] = t.Result);
}
await Task.WhenAll(tasks);

Running Tasks in parallel

I am failing to understand why this doesn't seem to run the tasks in Parallel:
var tasks = new Task<MyReturnType>[mbis.Length];
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = CAS.Service.GetAllRouterInterfaces(mbis[i], 3);
}
Parallel.ForEach(tasks, task => task.Start());
By stepping through the execution, I see that as soon as this line is evaluated:
tasks[i] = CAS.Service.GetAllRouterInterfaces(mbis[i], 3);
The task starts. I want to add all the new tasks to the list, and then execute them in parallel.
If GetAllRouterInterfaces is an async method, the resulting Task will already be started (see this answer for further explanation).
This means that tasks will contain multiple tasks all of which are running in parallel without the subsequent call to Parallel.ForEach.
You may wish to wait for all the entries in tasks to complete, you can do this with an await Task.WhenAll(tasks);.
So you should end up with:
var tasks = new Task<MyReturnType>[mbis.Length];
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = CAS.Service.GetAllRouterInterfaces(mbis[i], 3);
}
await Task.WhenAll(tasks);
Update from comments
It seems that despite GetAllRouterInterfaces being async and returning a Task it is still making synchronous POST requests (presumably before any other await). This would explain why you are getting minimal concurrency as each call to GetAllRouterInterfaces is blocking while this request is made. The ideal solution would be to make an aynchronous POST request, e.g:
await webclient.PostAsync(request).ConfigureAwait(false);
This will ensure your for loop is not blocked and the requests are made concurrently.
Further update after conversation
It seems you are unable to make the POST requests asynchronous and GetAllRouterInterfaces does not actually do any asynchronous work, due to this I have advised the following:
Remove async from GetAllRouterInterfaces and change the return type to MyReturnType
Call GetAllRouterInterfaces in parallel like so
var routerInterfaces = mbis.AsParallel()
.Select(mbi => CAS.Service.GetAllRouterInterfaces(mbi, 3));
I don't know if I understand you the right way.
First of all, if GetAllRouterInterfaces is returns a Task you have to await the result.
With Parallel.ForEach you can't await tasks like as it is, but you can do something similar like this:
public async Task RunInParallel(IEnumerable<TWhatEver> mbisItems)
{
//mbisItems == your parameter that you want to pass to GetAllRouterInterfaces
//degree of cucurrency
var concurrentTasks = 3;
//Parallel.Foreach does internally something like this:
await Task.WhenAll(
from partition in Partitioner.Create(mbisItems).GetPartitions(concurrentTasks)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
{
var currentMbis = partition.Current;
var yourResult = await GetAllRouterInterfaces(currentMbis,3);
}
}
));
}

Advice on executing sequential operations in a non-blocking manner, Task.Run vs task with ContinueWith

We have a processor that will receive a queue of elements, and for every element, it will run some actions that need to be guaranteed to be executed in a sequential manner. Each action to execute on an element is a Promise Task (http://blog.stephencleary.com/2014/04/a-tour-of-task-part-0-overview.html). The processing of each element in the queue doesn't need to wait for completion of the previous one.
The signature of the actions can be assumed to be something like this:
Task MyAwaitableMethod(int delay)
The way I'm seeing, the problem can be reduced to executing a loop, and inside the loop, executing sequential operations, and each iteration shouldn't block. I'm looking at 2 approaches:
1.
for (var i = 0; i < Iterations; i++)
{
Task.Run(async () => {
await MyAwaitableMethod(DelayInMilliseconds);
await MyAwaitableMethod(DelayInMilliseconds);
});
}
2.
for (var i = 0; i < Iterations; i++)
{
MyAwaitableMethod(DelayInMilliseconds).ContinueWith(
antecedent => MyAwaitableMethod(DelayInMilliseconds));
}
I was assuming, given the actions are Promises, that with approach #2, there would be less threads created, as opposed to Task.Run, which I'd assume would create more threads. But in tests I've run, the number of threads created for both when executing a high number of iterations tends to be the same, and not dependent of the given number of iterations.
Are both methods entirely equivalent? Or you guys have better suggestions?
EDIT (rephrasing the question)
Are both methods equivalent in terms of the number of threads both require?
Thanks
Why not use Task.WhenAll()?
var tasks = new List<Task>();
for (var i = 0; i < Iterations; i++)
{
Task t = MyAwaitableMethod(DelayInMilliseconds);
tasks.Add(t);
}
await Task.WhenAll(tasks);
Part of the beauty of async-await is to write sequential asynchronous code.
If it was not to be asynchronous, you would write:
for (var i = 0; i < Iterations; i++)
{
MyAwaitableMethod(DelayInMilliseconds);
MyAwaitableMethod(DelayInMilliseconds);
}
If you want it to be asynchronous, just write:
for (var i = 0; i < Iterations; i++)
{
await MyAwaitableMethod(DelayInMilliseconds);
await MyAwaitableMethod(DelayInMilliseconds);
}
The code you posted does not satisfy your requirement to process each item only after the previous one because you are not awaiting for Task.Run.

creating a .net async wrapper to a sync request

I have the following situation (or a basic misunderstanding with the async await mechanism).
Assume you have a set of 1-20 web request call that takes a long time: findItemsByProduct().
you want to wrap it around in an async request, that would be able to abstract all these calls into one async call, but I can't seem to be able to do it without using more threads.
If I'm doing:
int total = result.paginationOutput.totalPages;
for (int i = 2; i < total + 1; i++)
{
await Task.Factory.StartNew(() =>
{
result = client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
}
return newList;
problem here, that the calls don't run together, rather they are waiting one by one.
I would like all the calls to run together and than harvest the results.
as pseudo code, I would like the code to run like this:
forEach item {
result = item.makeWebRequest();
}
foreach item {
List.addRange(item.harvestResults);
}
I have no idea how to make the code to do that though..
Ideally, you should add a findItemsByProductAsync that returns a Task<Item[]>. That way, you don't have to create unnecessary tasks using StartNew or Task.Run.
Then your code can look like this:
int total = result.paginationOutput.totalPages;
// Start all downloads; each download is represented by a task.
Task<Item[]>[] tasks = Enumerable.Range(2, total - 1)
.Select(i => client.findItemsByProductAsync(i)).ToArray();
// Wait for all downloads to complete.
Item[][] results = await Task.WhenAll(tasks);
// Flatten the results into a single collection.
return results.SelectMany(x => x).ToArray();
Given your requirements which I see as:
Process n number of non-blocking tasks
Process results after all queries have returned
I would use the CountdownEvent for this e.g.
var results = new ConcurrentBag<ItemType>(result.pagination.totalPages);
using (var e = new CountdownEvent(result.pagination.totalPages))
{
for (int i = 2; i <= result.pagination.totalPages+1; i++)
{
Task.Factory.StartNew(() => return client.findItemsByProduct(i))
.ContinueWith(items => {
results.AddRange(items);
e.Signal(); // signal task is done
});
}
// Wait for all requests to complete
e.Wait();
}
// Process results
foreach (var item in results)
{
...
}
This particular problem is solved easily enough without even using await. Simply create each of the tasks, put all of the tasks into a list, and then use WhenAll on that list to get a task that represents the completion of all of those tasks:
public static Task<Item[]> Foo()
{
int total = result.paginationOutput.totalPages;
var tasks = new List<Task<Item>>();
for (int i = 2; i < total + 1; i++)
{
tasks.Add(Task.Factory.StartNew(() => client.findItemsByProduct(i)));
}
return Task.WhenAll(tasks);
}
Also note you have a major problem in how you use result in your code. You're having each of the different tasks all using the same variable, so there are race conditions as to whether or not it works properly. You could end up adding the same call twice and having one skipped entirely. Instead you should have the call to findItemsByProduct be the result of the task, and use that task's Result.
If you want to use async-await properly you have to declare your functions async, and the functions that call you also have to be async. This continues until you have once synchronous function that starts the async process.
Your function would look like this:
by the way you didn't describe what's in the list. I assume they are
object of type T. in that case result.SearchResult.Item returns
IEnumerable
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var newList = new List<T>();
for (int i = 2; i < total + 1; i++)
{
IEnumerable<T> result = await Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
return newList;
}
If you do it this way, your function will be asynchronous, but the findItemsByProduct will be executed one after another. If you want to execute them simultaneously you should not await for the result, but start the next task before the previous one is finished. Once all tasks are started wait until all are finished. Like this:
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var tasks= new List<Task<IEnumerable<T>>>();
// start all tasks. don't wait for the result yet
for (int i = 2; i < total + 1; i++)
{
Task<IEnumerable<T>> task = Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
tasks.Add(task);
}
// now that all tasks are started, wait until all are finished
await Task.WhenAll(tasks);
// the result of each task is now in task.Result
// the type of result is IEnumerable<T>
// put all into one big list using some linq:
return tasks.SelectMany ( task => task.Result.SearchResult.Item)
.ToList();
// if you're not familiar to linq yet, use a foreach:
var newList = new List<T>();
foreach (var task in tasks)
{
newList.AddRange(task.Result.searchResult.item);
}
return newList;
}

Categories