Wait for all items to be transformed via TransformBlock

Wait for all items to be transformed via TransformBlock - c#

I'm looking for a way to await all items to be processed via TPL TransformBlock.
Sample code:
var transformBlock = new TransformBlock<int, int>(async number =>
{
await Task.Delay(TimeSpan.FromMilliseconds(300));
return number * 2;
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
});
foreach (var number in Enumerable.Range(1, 100))
{
transformBlock.Post(number);
}
transformBlock.Complete();
At this point I have a call to Complete, which to my understanding signals to TransformBlock to not receive any more items & to finish processing all available data (available in InputQueue).
But I'm not sure how to await for all items to be available ?
Awaiting Completion task is not a solution becase as the answer states:
An instance of TransformBlock is not considered "complete" until the
following conditions are met:
TransformBlock.Complete() has been called
InputCount == 0 – the block has applied its transformation to every incoming element
OutputCount == 0 – all transformed elements have left the output buffer
One way I found is to await all tasks returned from ReceiveAsync, e.g.
var tasks = new List<Task<int>>();
foreach (var number in Enumerable.Range(1, 100))
{
transformBlock.Post(number);
tasks.Add(transformBlock.ReceiveAsync());
}
transformBlock.Complete();
await Task.WhenAll(tasks);
tasks.Select(t => t.GetAwaiter().GetResult())
.ToList()
.ForEach(Console.WriteLine);
However, I'm not really sure this is 100 % correct.
Another options I see in this answer suggestion made to add another TPL blocks and propogate completion, that way we can await transform block's completion and then consume results from linked TPL block, but it does seem like overcomplication of the task & I'm assuming there is better (built-in or less verbose) way ?

Related

Using tasks for repeated actions

I have 50 Machine Learning agents. Every frame, they get some inputs and compute the neural network. Because every agent is independent, I would like to make every agent compute the network as a separate task.
If I were to create a task for every agent, each frame, it will make my program slower. I tried to group my agents into 2 tasks (25 and 25), but it was still an overhead.
The way I see it, is to create n threads for n groups of agents at the beginning and query those threads each frame, somehow. A thread would compute the network for the group of agents, then wait until the next query.
I have read some articles on this topic, and I found out I can't reuse a task. So, what workaround could work?
Basically, I have a repeated action on 50 agents, that is run every frame, for about a minute, and it would be a waste not to parallelize them.
I am still new to multithreading and tasks, so I am relying on your help.
Side notes: I'm using Genetic Algorithms in Unity.
Here is the code in which I have tried to divide the agents in n groups, and compute their networks in n tasks.
public async Task EvaluateAsync(int groupSize = 10)
{
var groups = genomes.Select((g, i) => new { Value = g, Index = i })
.GroupBy(x => x.Index / groupSize)
.Select(x => x.Select(v => v.Value));
var tasks = groups.Select(g =>
{
return Task.Run(() =>
{
foreach (var element in g)
element.Fitness += ComputeFitness(element as NeuralGenome);
});
}).ToArray();
for (var i = 0; i < tasks.Length; i++)
await tasks[i];
}
And in the Update() function I call:
EvaluateAsync(25).Wait();
It is a bit faster when the network is very very big, but it's much slower when there are only 10 neurons.
Making the groups smaller, would result in a better performance only if the networks are very huge.
Here I create a task for each agent:
public async Task EvaluateAsyncEach()
{
var tasks = genomes.Select(x => Task.Run(() => x.Fitness += ComputeFitness(x as NeuralGenome)))
.ToArray();
foreach (var task in tasks)
await task;
}
The following measurements are made for 10 frames. Meaning, t/10 will be the time for one task.
Time for normal running:
00:00:00.3791190
00:00:00.3758430
00:00:00.3697020
00:00:00.3743900
00:00:00.3764850
One task for each agent each frame:
00:00:01.1288240
00:00:01.0761770
00:00:00.9311210
00:00:01.0122570
00:00:00.8938200
In groups of 25:
00:00:00.5401100
00:00:00.5629660
00:00:00.5640470
00:00:00.5932220
00:00:00.6053940
00:00:00.5828170

You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
var query =
from genome in genomes.ToObservable()
from fitness in Observable.Start(() => ComputeFitness(genome as NeuralGenome))
select new { genome, fitness };
IDisposable subscription =
query.Subscribe(x => x.genome.Fitness += x.fitness);
It does all of its own thread/task management under the hood. It also produces results as soon as possible as they get computed.
If you want to be able to await the results you can do it this way:
var query =
from genome in genomes.ToObservable()
from fitness in Observable.Start(() => ComputeFitness(genome as NeuralGenome))
select new { genome, fitness };
var results = await query.ToArray();
foreach (var x in results)
{
x.genome.Fitness += x.fitness;
}
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your query.
Based on the code in your comment, I think you should look at this instead:
private async Task ComputingNetworksAsync()
{
var query =
from a in agents.ToObservable()
let i = a.GenerateNetworkInputs()
from n in Observable.Start(() => a.ComputeNetwork(i))
select n;
await query.ToArray();
}
That's a direct equivalent to your code (except for the .ToArray()).
However, you can go one step further and do this:
private async Task ComputingNetworksAsync()
{
var query =
from a in agents.ToObservable()
from i in Observable.Start(() => a.GenerateNetworkInputs())
from n in Observable.Start(() => a.ComputeNetwork(i))
select n;
await query.ToArray();
}

This is a good article.
http://fintechexplained.blogspot.com/2018/05/top-ten-tips-for-implementing-multi.html?m=1
Your solution is PLINQ. Avoid creating new tasks

When returning multiple async tasks how do I know which results came from which task?

I am have the following code to run multiple async tasks and wait for all the results.
string[] personStoreNames = _faceStoreRepo.GetPersonStoreNames();
IEnumerable<Task<IdentifyResult[]>> identifyFaceTasks =
personStoreNames.Select(storename => _faceServiceClient.IdentifyAsync(storename, faceIds, 1));
var recognitionresults =
await Task.WhenAll(identifyFaceTasks);
When I get the results how can I get the storename for each task result. Each array of IdentifyResult will be for a certain storename, but I'm not sure how to end up with my IdentifyResults and the storename they were found in.

As MSDN says use same indexes to get results that you used for parameters.
WhenAll
If none of the tasks faulted and none of the tasks were canceled, the resulting task will end in the TaskStatus.RanToCompletion state. The Result of the returned task will be set to an array containing all of the results of the supplied tasks in the same order as they were provided (e.g. if the input tasks array contained t1, t2, t3, the output task's Result will return an TResult[] where arr[0] == t1.Result, arr[1] == t2.Result, and arr[2] == t3.Result).

This is not a direct answer to the question, but you can use Microsoft's Reactive Framework to make this code a bit neater.
You can write this:
var query =
from sn in _faceStoreRepo.GetPersonStoreNames().ToObservable()
from irs in Observable.FromAsync(() => _faceServiceClient.IdentifyAsync(sn, faceIds, 1))
select new { sn, irs };
var result = await query.ToArray();
result is an array of anonymous variables of new { sn, irs }.
One advantage is that you can process the values as they become available:
var result = await query
.Do(x => { /* process each `x.sn` & `x.irs` pair as they arrive */ })
.ToArray();

Counting Non-Faulted Tasks causes re-execution of each task

I am saving a bunch of items to my database using async saves
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
});
await Task.WhenAll(tasks);
I need to verify how many times SaveAsync was successful (It throws and exception if something goes wrong). I am using IsFaulted flag to examine the tasks:
var successCount = tasks.Count(t => !t.IsFaulted);
Collection items consists of 3 elements so SaveAsync should have been called three times but it is executed 6 times. Upon closer examination I noticed that counting non-faulted tasks with c.Count(...) causes each of the task to re-run.
I suspect it has something to do with deferred LINQ execution but I am not sure why exactly and how to fix this.
Any suggestion why I observe this behavior and what would be the optimal pattern to avoid this artifact?

It happens because of multiple enumeration of your Select query.
In order to fix it, force enumeration by calling ToList() method. Then it will work correctly.
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
})
.ToList();
Also you may take a look at these more detailed answers:
https://stackoverflow.com/a/8240935/3872935
https://stackoverflow.com/a/20129161/3872935.

Rx: Wait for several observables to complete

I have list of operations to complete and I want to return an observable which is notified when all the observables are completed (returning status of operations will be the best):
foreach (var id in service.FetchItems().ToEnumerable().ToArray())
{
service.Delete(id); // <- returns IObservable<Unit>
}
// something.Wait();
service.FetchItems() returns IObservable<string>, service.Delete(...) returns IObservable<Unit>
Is the following approach correct?
service.FetchItems().ForEachAsync(id => service.Delete(id)).ToObservable().Wait();

I would avoid all awaiting and tasks and just stick with plain RX for this.
Try this approach:
var query =
from id in service.FetchItems()
from u in service.Delete(id)
select id;
query
.ToArray()
.Subscribe(ids =>
{
/* all fetches and deletes done now */
});
The .ToArray() operator in Rx takes an IObservable<T> that returns zero or more T's and returns an IObservable<T[]> that returns a single array that contains zero or more T's only when the source observable completes.

ToEnumerable blocks waiting for the next element in the sequence. You could do:
Task delAllTask = service.FetchItems()
.SelectMany(service.Delete)
.ToTask();
then you can block on the task or continue asynchronously e.g.
delAllTask.Wait();
delAllTask.ContinueWith(...);

Parallel Linq - return first result that comes back

I'm using PLINQ to run a function that tests serial ports to determine if they're a GPS device.
Some serial ports immediately are found to be a valid GPS. In this case, I want the first one to complete the test to be the one returned. I don't want to wait for the rest of the results.
Can I do this with PLINQ, or do I have to schedule a batch of tasks and wait for one to return?

PLINQ is probably not going to suffice here. While you can use .First, in .NET 4, this will cause it to run sequentially, which defeats the purpose. (Note that this will be improved in .NET 4.5.)
The TPL, however, is most likely the right answer here. You can create a Task<Location> for each serial port, and then use Task.WaitAny to wait on the first successful operation.
This provides a simple way to schedule a bunch of "tasks" and then just use the first result.

I have been thinking about this on and off for the past couple days and I can't find a built in PLINQ way to do this in C# 4.0. The accepted answer to this question of using FirstOrDefault does not return a value until the full PLINQ query is complete and still returns the (ordered) first result. The following extreme example shows the behavior:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
// waits until all results are in, then returns first
q.FirstOrDefault().Dump("result");
I don't see a built-in way to immediately get the first available result, but I was able to come up with two workarounds.
The first creates Tasks to do the work and returns the Task, resulting in a quickly completed PLINQ query. The resulting tasks can be passed to WaitAny to get the first result as soon as it is available:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
return Task.Factory.StartNew(() =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
});
cts.CancelAfter(5000);
// returns as soon as the tasks are created
var ts = q.ToArray();
// wait till the first task finishes
var idx = Task.WaitAny( ts );
ts[idx].Result.Dump("res");
This is probably a terrible way to do it. Since the actual work of the PLINQ query is just a very fast Task.Factory.StartNew, it's pointless to use PLINQ at all. A simple .Select( i => Task.Factory.StartNew( ... on the IEnumerable is cleaner and probably faster.
The second workaround uses a queue (BlockingCollection) and just inserts results into this queue once they are computed:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
var qu = new BlockingCollection<string>();
// ForAll blocks until PLINQ query is complete
Task.Factory.StartNew(() => q.ForAll( x => qu.Add(x) ));
// get first result asap
qu.Take().Dump("result");
With this method, the work is done using PLINQ, and the BlockingCollecion's Take() will return the first result as soon as it is inserted by the PLINQ query.
While this produces the desired result, I am not sure it has any advantage over just using the simpler Tasks + WaitAny

Upon further review, you can apparently just use FirstOrDefault to solve this. PLINQ will not preserve ordering by default, and with an unbuffered query, will return immediately.
http://msdn.microsoft.com/en-us/library/dd460677.aspx

To accomplish this entirely with PLINQ in .NET 4.0:
SerialPorts. // Your IEnumerable of serial ports
AsParallel().AsUnordered(). // Run as an unordered parallel query
Where(IsGps). // Matching the predicate IsGps (Func<SerialPort, bool>)
Take(1). // Taking the first match
FirstOrDefault(); // And unwrap it from the IEnumerable (or null if none are found
The key is to not use an ordered evaluation like First or FirstOrDefault until you have specified that you only care to find one.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Wait for all items to be transformed via TransformBlock - c#

Related

Using tasks for repeated actions

When returning multiple async tasks how do I know which results came from which task?

Counting Non-Faulted Tasks causes re-execution of each task

Rx: Wait for several observables to complete

Parallel Linq - return first result that comes back

Categories

Resources