Using tasks for repeated actions - c#

I have 50 Machine Learning agents. Every frame, they get some inputs and compute the neural network. Because every agent is independent, I would like to make every agent compute the network as a separate task.
If I were to create a task for every agent, each frame, it will make my program slower. I tried to group my agents into 2 tasks (25 and 25), but it was still an overhead.
The way I see it, is to create n threads for n groups of agents at the beginning and query those threads each frame, somehow. A thread would compute the network for the group of agents, then wait until the next query.
I have read some articles on this topic, and I found out I can't reuse a task. So, what workaround could work?
Basically, I have a repeated action on 50 agents, that is run every frame, for about a minute, and it would be a waste not to parallelize them.
I am still new to multithreading and tasks, so I am relying on your help.
Side notes: I'm using Genetic Algorithms in Unity.
Here is the code in which I have tried to divide the agents in n groups, and compute their networks in n tasks.
public async Task EvaluateAsync(int groupSize = 10)
{
var groups = genomes.Select((g, i) => new { Value = g, Index = i })
.GroupBy(x => x.Index / groupSize)
.Select(x => x.Select(v => v.Value));
var tasks = groups.Select(g =>
{
return Task.Run(() =>
{
foreach (var element in g)
element.Fitness += ComputeFitness(element as NeuralGenome);
});
}).ToArray();
for (var i = 0; i < tasks.Length; i++)
await tasks[i];
}
And in the Update() function I call:
EvaluateAsync(25).Wait();
It is a bit faster when the network is very very big, but it's much slower when there are only 10 neurons.
Making the groups smaller, would result in a better performance only if the networks are very huge.
Here I create a task for each agent:
public async Task EvaluateAsyncEach()
{
var tasks = genomes.Select(x => Task.Run(() => x.Fitness += ComputeFitness(x as NeuralGenome)))
.ToArray();
foreach (var task in tasks)
await task;
}
The following measurements are made for 10 frames. Meaning, t/10 will be the time for one task.
Time for normal running:
00:00:00.3791190
00:00:00.3758430
00:00:00.3697020
00:00:00.3743900
00:00:00.3764850
One task for each agent each frame:
00:00:01.1288240
00:00:01.0761770
00:00:00.9311210
00:00:01.0122570
00:00:00.8938200
In groups of 25:
00:00:00.5401100
00:00:00.5629660
00:00:00.5640470
00:00:00.5932220
00:00:00.6053940
00:00:00.5828170

You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
var query =
from genome in genomes.ToObservable()
from fitness in Observable.Start(() => ComputeFitness(genome as NeuralGenome))
select new { genome, fitness };
IDisposable subscription =
query.Subscribe(x => x.genome.Fitness += x.fitness);
It does all of its own thread/task management under the hood. It also produces results as soon as possible as they get computed.
If you want to be able to await the results you can do it this way:
var query =
from genome in genomes.ToObservable()
from fitness in Observable.Start(() => ComputeFitness(genome as NeuralGenome))
select new { genome, fitness };
var results = await query.ToArray();
foreach (var x in results)
{
x.genome.Fitness += x.fitness;
}
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your query.
Based on the code in your comment, I think you should look at this instead:
private async Task ComputingNetworksAsync()
{
var query =
from a in agents.ToObservable()
let i = a.GenerateNetworkInputs()
from n in Observable.Start(() => a.ComputeNetwork(i))
select n;
await query.ToArray();
}
That's a direct equivalent to your code (except for the .ToArray()).
However, you can go one step further and do this:
private async Task ComputingNetworksAsync()
{
var query =
from a in agents.ToObservable()
from i in Observable.Start(() => a.GenerateNetworkInputs())
from n in Observable.Start(() => a.ComputeNetwork(i))
select n;
await query.ToArray();
}

This is a good article.
http://fintechexplained.blogspot.com/2018/05/top-ten-tips-for-implementing-multi.html?m=1
Your solution is PLINQ. Avoid creating new tasks

Related

Wait for all items to be transformed via TransformBlock

I'm looking for a way to await all items to be processed via TPL TransformBlock.
Sample code:
var transformBlock = new TransformBlock<int, int>(async number =>
{
await Task.Delay(TimeSpan.FromMilliseconds(300));
return number * 2;
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
});
foreach (var number in Enumerable.Range(1, 100))
{
transformBlock.Post(number);
}
transformBlock.Complete();
At this point I have a call to Complete, which to my understanding signals to TransformBlock to not receive any more items & to finish processing all available data (available in InputQueue).
But I'm not sure how to await for all items to be available ?
Awaiting Completion task is not a solution becase as the answer states:
An instance of TransformBlock is not considered "complete" until the
following conditions are met:
TransformBlock.Complete() has been called
InputCount == 0 – the block has applied its transformation to every incoming element
OutputCount == 0 – all transformed elements have left the output buffer
One way I found is to await all tasks returned from ReceiveAsync, e.g.
var tasks = new List<Task<int>>();
foreach (var number in Enumerable.Range(1, 100))
{
transformBlock.Post(number);
tasks.Add(transformBlock.ReceiveAsync());
}
transformBlock.Complete();
await Task.WhenAll(tasks);
tasks.Select(t => t.GetAwaiter().GetResult())
.ToList()
.ForEach(Console.WriteLine);
However, I'm not really sure this is 100 % correct.
Another options I see in this answer suggestion made to add another TPL blocks and propogate completion, that way we can await transform block's completion and then consume results from linked TPL block, but it does seem like overcomplication of the task & I'm assuming there is better (built-in or less verbose) way ?

When returning multiple async tasks how do I know which results came from which task?

I am have the following code to run multiple async tasks and wait for all the results.
string[] personStoreNames = _faceStoreRepo.GetPersonStoreNames();
IEnumerable<Task<IdentifyResult[]>> identifyFaceTasks =
personStoreNames.Select(storename => _faceServiceClient.IdentifyAsync(storename, faceIds, 1));
var recognitionresults =
await Task.WhenAll(identifyFaceTasks);
When I get the results how can I get the storename for each task result. Each array of IdentifyResult will be for a certain storename, but I'm not sure how to end up with my IdentifyResults and the storename they were found in.
As MSDN says use same indexes to get results that you used for parameters.
WhenAll
If none of the tasks faulted and none of the tasks were canceled, the resulting task will end in the TaskStatus.RanToCompletion state. The Result of the returned task will be set to an array containing all of the results of the supplied tasks in the same order as they were provided (e.g. if the input tasks array contained t1, t2, t3, the output task's Result will return an TResult[] where arr[0] == t1.Result, arr[1] == t2.Result, and arr[2] == t3.Result).
This is not a direct answer to the question, but you can use Microsoft's Reactive Framework to make this code a bit neater.
You can write this:
var query =
from sn in _faceStoreRepo.GetPersonStoreNames().ToObservable()
from irs in Observable.FromAsync(() => _faceServiceClient.IdentifyAsync(sn, faceIds, 1))
select new { sn, irs };
var result = await query.ToArray();
result is an array of anonymous variables of new { sn, irs }.
One advantage is that you can process the values as they become available:
var result = await query
.Do(x => { /* process each `x.sn` & `x.irs` pair as they arrive */ })
.ToArray();

Reactive Extensions SelectMany with large objects

I have this little piece of code that simulates a flow that uses large objects (that huge byte[]). For each item in the sequence, an async method is invoked to get some result. The problem? As it is, it throws OutOfMemoryException.
Code compatible with LINQPad (C# Program):
void Main()
{
var selectMany = Enumerable.Range(1, 100)
.Select(i => new LargeObject(i))
.ToObservable()
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time. How can I do it the right way?
The underlying idea is to invoke DoSomethingAsync in order to get some result for each object, so that's why I use SelectMany. To simplify, I just have introduced a Task.Delay, but in real life it is a service that can process some items concurrently, so I want to introduce some concurrency mechanism to get advantage of it.
Please, notice that, theoretically, processing a little number of items at time shouldn't fill the memory. In fact, we only need each "large object" to get the results of the DoSomethingAsync method. After that point, the large object isn't used anymore.
I feel like i'm repeating myself. Similar to your last question and my last answer, what you need to do is limit the number of bigObjects™ to be created concurrent.
To do so, you need to combine object creation and processing and put it on the same thread pool. Now the problem is, we use async methods to allow threads to do other things while our async method run. Since your slow network call is async, your (fast) object creation code will keep creating large objects too fast.
Instead, we can use Rx to keep count of the number of concurrent Observables running by combine the object creation with the async call and use .Merge(maxConcurrent) to limit concurrency.
As a bonus, we can also set a minimal time for queries to execute. Just Zip with something that takes a minimal delay.
static void Main()
{
var selectMany = Enumerable.Range(1, 100)
.ToObservable()
.Select(i => Observable.Defer(() => Observable.Return(new LargeObject(i)))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)))
.Zip(Observable.Timer(TimeSpan.FromMilliseconds(400)), (el, _) => el)
).Merge(4);
selectMany
.Subscribe(r => Console.WriteLine(r));
Console.ReadLine();
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
Console.WriteLine(id + "!");
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time.
Yes, because you are creating them all at once.
If I simplify your code I can show you why:
void Main()
{
var selectMany =
Enumerable
.Range(1, 5)
.Do(x => Console.WriteLine($"{x}!"))
.ToObservable()
.SelectMany(i => Observable.FromAsync(() => DoSomethingAsync(i)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(int i)
{
await Task.Delay(1);
return i;
}
Running this produces:
1!
2!
3!
4!
5!
4
3
5
2
1
Because of the Observable.FromAsync you are allowing the source to run to completion before any of the results return. In other words you are quickly building all of the large objects, but slowly processing them.
You should allow Rx to run synchronously, but on the default scheduler so that your main thread is not blocked. The code will then run without any memory issues and your program will remain responsive on the main thread.
Here's the code for this:
var selectMany =
Observable
.Range(1, 100, Scheduler.Default)
.Select(i => new LargeObject(i))
.Select(o => DoSomethingAsync(o))
.Select(t => t.Result);
(I've effectively replaced Enumerable.Range(1, 100).ToObservable() with Observable.Range(1, 100) as that will also help with some issues.)
I've tried testing other options, but so far anything that allows DoSomethingAsync to run asynchronously runs into the out of memory error.
ConcatMap supports this out of the box. I know this operator is not available in .net, but you can make the same using Concat operator which defers subscribing to each inner source until the previous one completes.
You can introduce a time interval delay this way:
var source = Enumerable.Range(1, 100)
.ToObservable()
.Zip(Observable.Interval(TimeSpan.FromSeconds(1)), (i, ts) => i)
.Select(i => new LargeObject(i))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
So instead of pulling all 100 integers at once, immediately converting them to the LargeObject then calling DoSomethingAsync on all 100, it drips the integers out one-by-one spaced out one second each.
This is what a TPL+Rx solution would look like. Needless to say it is less elegant than Rx alone, or TPL alone. However, I don't think this problem is well suited for Rx:
void Main()
{
var source = Observable.Range(1, 100);
const int MaxParallelism = 5;
var transformBlock = new TransformBlock<int, int>(async i => await DoSomethingAsync(new LargeObject(i)),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = MaxParallelism });
source.Subscribe(transformBlock.AsObserver());
var selectMany = transformBlock.AsObservable();
selectMany
.Subscribe(r => Console.WriteLine(r));
}

Throttle Rx.Observable without skipping values

Throttle method skips values from an observable sequence if others follow too quickly. But I need a method to just delay them. That is, I need to set a minimum delay between items, without skipping any.
Practical example: there's a web service which can accept requests no faster than once a second; there's a user who can add requests, single or in batches. Without Rx, I'll create a list and a timer. When users adds requests, I'll add them to the list. In the timer event, I'll check wether the list is empty. If it is not, I'll send a request and remove the corresponding item. With locks and all that stuff. Now, with Rx, I can create Subject, add items when users adds requests. But I need a way to make sure the web service is not flooded by applying delays.
I'm new to Rx, so maybe I'm missing something obvious.
There's a fairly easy way to do what you want using an EventLoopScheduler.
I started out with an observable that will randomly produce values once every 0 to 3 seconds.
var rnd = new Random();
var xs =
Observable
.Generate(
0,
x => x < 20,
x => x + 1,
x => x,
x => TimeSpan.FromSeconds(rnd.NextDouble() * 3.0));
Now, to make this output values immediately unless the last value was within a second ago I did this:
var ys =
Observable.Create<int>(o =>
{
var els = new EventLoopScheduler();
return xs
.ObserveOn(els)
.Do(x => els.Schedule(() => Thread.Sleep(1000)))
.Subscribe(o);
});
This effectively observes the source on the EventLoopScheduler and then puts it to sleep for 1 second after each OnNext so that it can only begin the next OnNext after it wakes up.
I tested that it worked with this code:
ys
.Timestamp()
.Select(x => x.Timestamp.Second + (double)x.Timestamp.Millisecond/1000.0)
.Subscribe(x => Console.WriteLine(x));
I hope this helps.
How about a simple extension method:
public static IObservable<T> StepInterval<T>(this IObservable<T> source, TimeSpan minDelay)
{
return source.Select(x =>
Observable.Empty<T>()
.Delay(minDelay)
.StartWith(x)
).Concat();
}
Usage:
var bufferedSource = source.StepInterval(TimeSpan.FromSeconds(1));
I want to suggest an approach with using Observable.Zip:
// Incoming requests
var requests = new[] {1, 2, 3, 4, 5}.ToObservable();
// defines the frequency of the incoming requests
// This is the way to emulate flood of incoming requests.
// Which, by the way, uses the same approach that will be used in the solution
var requestsTimer = Observable.Interval(TimeSpan.FromSeconds(0.1));
var incomingRequests = Observable.Zip(requests, requestsTimer, (number, time) => {return number;});
incomingRequests.Subscribe((number) =>
{
Console.WriteLine($"Request received: {number}");
});
// This the minimum interval at which we want to process the incoming requests
var processingTimeInterval = Observable.Interval(TimeSpan.FromSeconds(1));
// Zipping incoming requests with the interval
var requestsToProcess = Observable.Zip(incomingRequests, processingTimeInterval, (data, time) => {return data;});
requestsToProcess.Subscribe((number) =>
{
Console.WriteLine($"Request processed: {number}");
});
I was playing around with this and found .Zip (as mentioned before) to be the most simple method:
var stream = "ThisFastObservable".ToObservable();
var slowStream =
stream.Zip(
Observable.Interval(TimeSpan.FromSeconds(1)), //Time delay
(x, y) => x); // We just care about the original stream value (x), not the interval ticks (y)
slowStream.TimeInterval().Subscribe(x => Console.WriteLine($"{x.Value} arrived after {x.Interval}"));
output:
T arrived after 00:00:01.0393840
h arrived after 00:00:00.9787150
i arrived after 00:00:01.0080400
s arrived after 00:00:00.9963000
F arrived after 00:00:01.0002530
a arrived after 00:00:01.0003770
s arrived after 00:00:00.9963710
t arrived after 00:00:01.0026450
O arrived after 00:00:00.9995360
b arrived after 00:00:01.0014620
s arrived after 00:00:00.9993100
e arrived after 00:00:00.9972710
r arrived after 00:00:01.0001240
v arrived after 00:00:01.0016600
a arrived after 00:00:00.9981140
b arrived after 00:00:01.0033980
l arrived after 00:00:00.9992570
e arrived after 00:00:01.0003520
How about using an observable timer to take from a blocking queue? Code below is untested, but should give you an idea of what I mean...
//assuming somewhere there is
BlockingCollection<MyWebServiceRequestData> workQueue = ...
Observable
.Timer(new TimeSpan(0,0,1), new EventLoopScheduler())
.Do(i => myWebService.Send(workQueue.Take()));
// Then just add items to the queue using workQueue.Add(...)
.Buffer(TimeSpan.FromSeconds(0.2)).Where(i => i.Any())
.Subscribe(buffer =>
{
foreach(var item in buffer) Console.WriteLine(item)
});

Parallel Linq - return first result that comes back

I'm using PLINQ to run a function that tests serial ports to determine if they're a GPS device.
Some serial ports immediately are found to be a valid GPS. In this case, I want the first one to complete the test to be the one returned. I don't want to wait for the rest of the results.
Can I do this with PLINQ, or do I have to schedule a batch of tasks and wait for one to return?
PLINQ is probably not going to suffice here. While you can use .First, in .NET 4, this will cause it to run sequentially, which defeats the purpose. (Note that this will be improved in .NET 4.5.)
The TPL, however, is most likely the right answer here. You can create a Task<Location> for each serial port, and then use Task.WaitAny to wait on the first successful operation.
This provides a simple way to schedule a bunch of "tasks" and then just use the first result.
I have been thinking about this on and off for the past couple days and I can't find a built in PLINQ way to do this in C# 4.0. The accepted answer to this question of using FirstOrDefault does not return a value until the full PLINQ query is complete and still returns the (ordered) first result. The following extreme example shows the behavior:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
// waits until all results are in, then returns first
q.FirstOrDefault().Dump("result");
I don't see a built-in way to immediately get the first available result, but I was able to come up with two workarounds.
The first creates Tasks to do the work and returns the Task, resulting in a quickly completed PLINQ query. The resulting tasks can be passed to WaitAny to get the first result as soon as it is available:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
return Task.Factory.StartNew(() =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
});
cts.CancelAfter(5000);
// returns as soon as the tasks are created
var ts = q.ToArray();
// wait till the first task finishes
var idx = Task.WaitAny( ts );
ts[idx].Result.Dump("res");
This is probably a terrible way to do it. Since the actual work of the PLINQ query is just a very fast Task.Factory.StartNew, it's pointless to use PLINQ at all. A simple .Select( i => Task.Factory.StartNew( ... on the IEnumerable is cleaner and probably faster.
The second workaround uses a queue (BlockingCollection) and just inserts results into this queue once they are computed:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
var qu = new BlockingCollection<string>();
// ForAll blocks until PLINQ query is complete
Task.Factory.StartNew(() => q.ForAll( x => qu.Add(x) ));
// get first result asap
qu.Take().Dump("result");
With this method, the work is done using PLINQ, and the BlockingCollecion's Take() will return the first result as soon as it is inserted by the PLINQ query.
While this produces the desired result, I am not sure it has any advantage over just using the simpler Tasks + WaitAny
Upon further review, you can apparently just use FirstOrDefault to solve this. PLINQ will not preserve ordering by default, and with an unbuffered query, will return immediately.
http://msdn.microsoft.com/en-us/library/dd460677.aspx
To accomplish this entirely with PLINQ in .NET 4.0:
SerialPorts. // Your IEnumerable of serial ports
AsParallel().AsUnordered(). // Run as an unordered parallel query
Where(IsGps). // Matching the predicate IsGps (Func<SerialPort, bool>)
Take(1). // Taking the first match
FirstOrDefault(); // And unwrap it from the IEnumerable (or null if none are found
The key is to not use an ordered evaluation like First or FirstOrDefault until you have specified that you only care to find one.

Categories