Throttle method skips values from an observable sequence if others follow too quickly. But I need a method to just delay them. That is, I need to set a minimum delay between items, without skipping any.
Practical example: there's a web service which can accept requests no faster than once a second; there's a user who can add requests, single or in batches. Without Rx, I'll create a list and a timer. When users adds requests, I'll add them to the list. In the timer event, I'll check wether the list is empty. If it is not, I'll send a request and remove the corresponding item. With locks and all that stuff. Now, with Rx, I can create Subject, add items when users adds requests. But I need a way to make sure the web service is not flooded by applying delays.
I'm new to Rx, so maybe I'm missing something obvious.
There's a fairly easy way to do what you want using an EventLoopScheduler.
I started out with an observable that will randomly produce values once every 0 to 3 seconds.
var rnd = new Random();
var xs =
Observable
.Generate(
0,
x => x < 20,
x => x + 1,
x => x,
x => TimeSpan.FromSeconds(rnd.NextDouble() * 3.0));
Now, to make this output values immediately unless the last value was within a second ago I did this:
var ys =
Observable.Create<int>(o =>
{
var els = new EventLoopScheduler();
return xs
.ObserveOn(els)
.Do(x => els.Schedule(() => Thread.Sleep(1000)))
.Subscribe(o);
});
This effectively observes the source on the EventLoopScheduler and then puts it to sleep for 1 second after each OnNext so that it can only begin the next OnNext after it wakes up.
I tested that it worked with this code:
ys
.Timestamp()
.Select(x => x.Timestamp.Second + (double)x.Timestamp.Millisecond/1000.0)
.Subscribe(x => Console.WriteLine(x));
I hope this helps.
How about a simple extension method:
public static IObservable<T> StepInterval<T>(this IObservable<T> source, TimeSpan minDelay)
{
return source.Select(x =>
Observable.Empty<T>()
.Delay(minDelay)
.StartWith(x)
).Concat();
}
Usage:
var bufferedSource = source.StepInterval(TimeSpan.FromSeconds(1));
I want to suggest an approach with using Observable.Zip:
// Incoming requests
var requests = new[] {1, 2, 3, 4, 5}.ToObservable();
// defines the frequency of the incoming requests
// This is the way to emulate flood of incoming requests.
// Which, by the way, uses the same approach that will be used in the solution
var requestsTimer = Observable.Interval(TimeSpan.FromSeconds(0.1));
var incomingRequests = Observable.Zip(requests, requestsTimer, (number, time) => {return number;});
incomingRequests.Subscribe((number) =>
{
Console.WriteLine($"Request received: {number}");
});
// This the minimum interval at which we want to process the incoming requests
var processingTimeInterval = Observable.Interval(TimeSpan.FromSeconds(1));
// Zipping incoming requests with the interval
var requestsToProcess = Observable.Zip(incomingRequests, processingTimeInterval, (data, time) => {return data;});
requestsToProcess.Subscribe((number) =>
{
Console.WriteLine($"Request processed: {number}");
});
I was playing around with this and found .Zip (as mentioned before) to be the most simple method:
var stream = "ThisFastObservable".ToObservable();
var slowStream =
stream.Zip(
Observable.Interval(TimeSpan.FromSeconds(1)), //Time delay
(x, y) => x); // We just care about the original stream value (x), not the interval ticks (y)
slowStream.TimeInterval().Subscribe(x => Console.WriteLine($"{x.Value} arrived after {x.Interval}"));
output:
T arrived after 00:00:01.0393840
h arrived after 00:00:00.9787150
i arrived after 00:00:01.0080400
s arrived after 00:00:00.9963000
F arrived after 00:00:01.0002530
a arrived after 00:00:01.0003770
s arrived after 00:00:00.9963710
t arrived after 00:00:01.0026450
O arrived after 00:00:00.9995360
b arrived after 00:00:01.0014620
s arrived after 00:00:00.9993100
e arrived after 00:00:00.9972710
r arrived after 00:00:01.0001240
v arrived after 00:00:01.0016600
a arrived after 00:00:00.9981140
b arrived after 00:00:01.0033980
l arrived after 00:00:00.9992570
e arrived after 00:00:01.0003520
How about using an observable timer to take from a blocking queue? Code below is untested, but should give you an idea of what I mean...
//assuming somewhere there is
BlockingCollection<MyWebServiceRequestData> workQueue = ...
Observable
.Timer(new TimeSpan(0,0,1), new EventLoopScheduler())
.Do(i => myWebService.Send(workQueue.Take()));
// Then just add items to the queue using workQueue.Add(...)
.Buffer(TimeSpan.FromSeconds(0.2)).Where(i => i.Any())
.Subscribe(buffer =>
{
foreach(var item in buffer) Console.WriteLine(item)
});
Related
I have a stream of events:
event.EventTime: 1s-----2s----3s----4s----5s----6s---
stream: A-B-C--D-----------------E-F---G-H--
An event looks like this:
public class Event
{
public DateTime EventTime { get; set; }
public int Value { get; set; }
}
EventTime should correspond to a time at which the event arrives, but there can be a small delay. The events are not supposed to arrive out-of-order, though.
Now, when I specify an grouping interval, say 1 second, I expect the stream to be grouped like this
1s-------2s----3s----4s----5s-----6s---
[A-B-C]--[D]---[ ]---[ ]---[E-F]--[G-H]
(notice the empty intervals)
I have tried using Buffer, but sadly I need to partition by EventTime, not System.DateTime.Now. Even with boundaries, I'd need some kind of look-ahead since when I use Buffer(2,1) as boundary and compare [0] and [1], even though [1] succesfully breaks the buffer, it still gets inserted into the old one instead of the new one. I also tried GroupBy, but that yielded groups only after the input stream finished. Which should never happen. Then I tried some this thing:
var intervalStart = GetIntervalStartLocal(DateTime.Now) + intervalLength;
var intervals = Observable.Timer(intervalStart, intervalLength);
var eventsAsObservables = intervals.GroupJoin<long, Event, long, Event, (DateTime, IObservable<Event>)>(
data,
_ => Observable.Never<long>(),
_ => Observable.Never<Event>(),
(intervalNumber, events) => {
var currentIntervalStart = intervalStart + intervalNumber*intervalLength;
var eventsInInterval = events
.SkipWhile(e => GetIntervalStartLocal(e.EventTime) < currentIntervalStart)
.TakeWhile(e => GetIntervalStartLocal(e.EventTime) == currentIntervalStart);
return (currentIntervalStart, eventsInInterval);
});
var eventsForIntervalsAsObservables = eventsAsObservables.SelectMany(g => {
var lists = g.Item2.Aggregate(new List<Event>(), (es, e) => { es.Add(e); return es; });
return lists.Select(l => (intervalStart: g.Item1, events: l));
});
var task = eventsForIntervalsAsObservables.ForEachAsync(es => System.Console.WriteLine(
$"=[{es.intervalStart.TimeOfDay}]= " + string.Join("; ", es.events.Select(e => e.EventTime.TimeOfDay))));
await task;
I was thinking that I'd use GroupJoin which joins based on values. So first, I'll emit interval timestamps. Then, inside GroupJoins resultSelector, I'll compute a matching interval from each Event, using GetIntervalStartLocal function (truncates the date to an interval length). After that, I'll skip all the potential leftovers from a previous interval (SkipWhile expected interval is higher then actually computed from Event). Finally, I'll TakeWhile event computed interval matches expected.
However, there must be a problem before I even get to SkipWhile and TakeWhile, because resultSelector actually does not operate on all data from data, but ignores some, e.g. like this:
event.EventTime: 1s-----2s----3s----4s----5s----6s---
stream: A---C--D-------------------F-----H--
and then constructs (from what it operates on, correctly):
1s-----2s----3s----4s----5s---6s---
[A-C]--[D]---[ ]---[ ]---[F]--[H]--
I think I must be doing something terribly wrong here, because it shouldn't be that hard to do partitioning on a stream based on a stream event value.
You need to clarify what you want. Given this:
time : 1s-------2s----3s----4s----5s-----6s---
stream: A-B-C----D-----------------E-F----G-H-- (actual)
group : [A-B-C]--[D]---[ ]---[ ]---[E-F]--[G-H] (desired result)
It's not clear whether 'time' here is your event time-stamp, or actual time. If it's actual time, then that is of course impossible: You can't pass a list of ABC before C has arrived. If you're referring to your event time-stamp, then Buffer or perhaps Window will have to know when to stop, which isn't that easy to do.
GroupBy does work for me as follows:
var sampleSource = Observable.Interval(TimeSpan.FromMilliseconds(400))
.Timestamp()
.Select(t => new Event { EventTime = t.Timestamp.DateTime, Value = (int)t.Value });
sampleSource
.GroupBy(e => e.EventTime.Ticks / 10000000) //10M ticks per second
.Dump(); //LinqPad
The only problem with this is that each group doesn't have a close criteria, so it's a giant memory leak. So you can add a timer to close the groups:
sampleSource
.GroupBy(e => e.EventTime.Ticks / 10000000) //10M ticks per second
.Select(g => g.TakeUntil(Observable.Timer(TimeSpan.FromSeconds(2)))) //group closes 2 seconds after opening
.Dump(); //LinqPad
This closing also allows us to return lists with .ToList(), rather than Observables:
sampleSource
.GroupBy(e => e.EventTime.Ticks / 10000000) //10M ticks per second
.SelectMany(g => g.TakeUntil(Observable.Timer(TimeSpan.FromSeconds(2))).ToList())
.Dump(); //LinqPad
I have 50 Machine Learning agents. Every frame, they get some inputs and compute the neural network. Because every agent is independent, I would like to make every agent compute the network as a separate task.
If I were to create a task for every agent, each frame, it will make my program slower. I tried to group my agents into 2 tasks (25 and 25), but it was still an overhead.
The way I see it, is to create n threads for n groups of agents at the beginning and query those threads each frame, somehow. A thread would compute the network for the group of agents, then wait until the next query.
I have read some articles on this topic, and I found out I can't reuse a task. So, what workaround could work?
Basically, I have a repeated action on 50 agents, that is run every frame, for about a minute, and it would be a waste not to parallelize them.
I am still new to multithreading and tasks, so I am relying on your help.
Side notes: I'm using Genetic Algorithms in Unity.
Here is the code in which I have tried to divide the agents in n groups, and compute their networks in n tasks.
public async Task EvaluateAsync(int groupSize = 10)
{
var groups = genomes.Select((g, i) => new { Value = g, Index = i })
.GroupBy(x => x.Index / groupSize)
.Select(x => x.Select(v => v.Value));
var tasks = groups.Select(g =>
{
return Task.Run(() =>
{
foreach (var element in g)
element.Fitness += ComputeFitness(element as NeuralGenome);
});
}).ToArray();
for (var i = 0; i < tasks.Length; i++)
await tasks[i];
}
And in the Update() function I call:
EvaluateAsync(25).Wait();
It is a bit faster when the network is very very big, but it's much slower when there are only 10 neurons.
Making the groups smaller, would result in a better performance only if the networks are very huge.
Here I create a task for each agent:
public async Task EvaluateAsyncEach()
{
var tasks = genomes.Select(x => Task.Run(() => x.Fitness += ComputeFitness(x as NeuralGenome)))
.ToArray();
foreach (var task in tasks)
await task;
}
The following measurements are made for 10 frames. Meaning, t/10 will be the time for one task.
Time for normal running:
00:00:00.3791190
00:00:00.3758430
00:00:00.3697020
00:00:00.3743900
00:00:00.3764850
One task for each agent each frame:
00:00:01.1288240
00:00:01.0761770
00:00:00.9311210
00:00:01.0122570
00:00:00.8938200
In groups of 25:
00:00:00.5401100
00:00:00.5629660
00:00:00.5640470
00:00:00.5932220
00:00:00.6053940
00:00:00.5828170
You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
var query =
from genome in genomes.ToObservable()
from fitness in Observable.Start(() => ComputeFitness(genome as NeuralGenome))
select new { genome, fitness };
IDisposable subscription =
query.Subscribe(x => x.genome.Fitness += x.fitness);
It does all of its own thread/task management under the hood. It also produces results as soon as possible as they get computed.
If you want to be able to await the results you can do it this way:
var query =
from genome in genomes.ToObservable()
from fitness in Observable.Start(() => ComputeFitness(genome as NeuralGenome))
select new { genome, fitness };
var results = await query.ToArray();
foreach (var x in results)
{
x.genome.Fitness += x.fitness;
}
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your query.
Based on the code in your comment, I think you should look at this instead:
private async Task ComputingNetworksAsync()
{
var query =
from a in agents.ToObservable()
let i = a.GenerateNetworkInputs()
from n in Observable.Start(() => a.ComputeNetwork(i))
select n;
await query.ToArray();
}
That's a direct equivalent to your code (except for the .ToArray()).
However, you can go one step further and do this:
private async Task ComputingNetworksAsync()
{
var query =
from a in agents.ToObservable()
from i in Observable.Start(() => a.GenerateNetworkInputs())
from n in Observable.Start(() => a.ComputeNetwork(i))
select n;
await query.ToArray();
}
This is a good article.
http://fintechexplained.blogspot.com/2018/05/top-ten-tips-for-implementing-multi.html?m=1
Your solution is PLINQ. Avoid creating new tasks
I have code with Parallel.Foreach which is processing files and doing some operation on each file in parallel.
Parallel.ForEach(lstFiles, file=>
{
// Doing some operation on file
// Skip file and move to next if it is taking too long
});
I want to skip a file and move to next file (but don't want to exit the Parallel.Foreach) if a particular file is taking too long (say 2 mins). Is there any way in Parallel.Foreach to check the time taken by thread to process a single file.
Thanks
I'd suggest you don't use Parallel.ForEach and instead use Mirosoft's extremely more powerful Reactive Framework. Then you can do this:
var query =
from file in lstFiles.ToObservable()
from result in Observable.Amb(
Observable.Start(() => SomeOperation(file)).Select(_ => true),
Observable.Timer(TimeSpan.FromMinutes(2.0)).Select(_ => false))
select new { file, result };
IDisposable subscription =
query
.Subscribe(x =>
{
/* do something with each `new { file, result }`
as they arrive. */
}, ex =>
{
/* do something if an error is encountered */
/* (stops processing on first error) */
}, () =>
{
/* do something if they have all finished successfully */
})
This is all done in parallel. The Observable.Amb operator starts the two observables defined in its argument list and takes the value from which ever of the two produces a value first - if it's the Start observable it has processed your file and if it's the Timer observable then 2.0 minutes has elapsed without a result from the file.
If you want to stop the processing when it is half-way through then just call subscription.Dispose().
Use NuGet "System.Reactive" to get the bits.
The query in lambda form as per request in comments:
var query =
lstFiles
.ToObservable()
.SelectMany(
file =>
Observable.Amb(
Observable.Start(() => SomeOperation(file)).Select(_ => true),
Observable.Timer(TimeSpan.FromMinutes(2.0)).Select(_ => false)),
(file, result) => new { file, result });
Given a class:
class Foo { DateTime Timestamp {get; set;} }
...and an IObservable<Foo>, with guaranteed monotonically increasing Timestamps, how can I generate an IObservable<IList<Foo>> chunked into Lists based on those Timestamps?
I.e. each IList<Foo> should have five seconds of events, or whatever. I know I can use Buffer with a TimeSpan overload, but I need to take the time from the events themselves, not the wall clock. (Unless there a clever way of providing an IScheduler here which uses the IObservable itself as the source of .Now?)
If I try to use the Observable.Buffer(this IObservable<Foo> source, IObservable<Foo> bufferBoundaries) overload like so:
IObservable<Foo> foos = //...;
var pub = foos.Publish();
var windows = pub.Select(x => new DateTime(
x.Ticks - x.Ticks % TimeSpan.FromSeconds(5).Ticks)).DistinctUntilChanged();
pub.Buffer(windows).Subscribe(x => t.Dump())); // linqpad
pub.Connect();
...then the IList instances contain the item that causes the window to be closed, but I really want this item to go into the next window/buffer.
E.g. with timestamps [0, 1, 10, 11, 15] you will get blocks of [[0], [1, 10], [11, 15]] instead of [[0, 1], [10, 11], [15]]
Here's an idea. The group key condition is the "window number" and I use GroupByUntil. This gives you the desired output in your example (and I've used an int stream just like that example - but you can substitute whatever you need to number your windows).
public class Tests : ReactiveTest
{
public void Test()
{
var scheduler = new TestScheduler();
var xs = scheduler.CreateHotObservable<int>(
OnNext(0, 0),
OnNext(1, 1),
OnNext(10, 10),
OnNext(11, 11),
OnNext(15, 15),
OnCompleted(16, 0));
xs.Publish(ps => // (1)
ps.GroupByUntil(
p => p / 5, // (2)
grp => ps.Where(p => p / 5 != grp.Key)) // (3)
.SelectMany(x => x.ToList())) // (4)
.Subscribe(Console.WriteLine);
scheduler.Start();
}
}
Notes
We publish the source stream because we will subscribe more than once.
This is a function to create a group key - use this to generate a window number from your item type.
This is the group termination condition - use this to inspect the source stream for an item in another window. Note that means a window won't close until an element outside of it arrives, or the source stream terminates. This is obvious if you think about it - your desired output requires consideration of next element after a window ends. Note if your source bears any relation to real time, you could merge this with an Observable.Timer+Select that outputs a null/default instance of your term to terminate the stream earlier.
SelectMany puts the groups into lists and flattens the stream.
This example will run in LINQPad quite nicely if you include nuget package rx-testing. New up a Tests instance and just run the Test() method.
I think James World's answer is neater/more readable, but for posterity, I've found another way to do this using Buffer():
IObservable<Foo> foos = //...;
var pub = foos.Publish();
var windows = pub.Select(x => new DateTime(
x.Ticks - x.Ticks % TimeSpan.FromSeconds(5).Ticks))
.DistinctUntilChanged().Publish.RefCount();
pub.Buffer(windows, x => windows).Subscribe(x => t.Dump()));
pub.Connect();
With 10m events, James' approach is more than 2.5x as fast (20s vs. 56s on my machine).
Window is a generalization of Buffer, and GroupJoin is a generalization of Window (and Join). When you write a Window or Buffer query and you find that notifications are being incorrectly included or excluded from the edges of the windows/lists, then redefine your query in terms of GroupJoin to take control over where edge notifications arrive.
Note that in order to make the closing notifications available to newly opened windows you must define your boundaries as windows of those notifications (the windowed data, not the boundary data). In your case, you cannot use a sequence of DateTime values as your boundaries, you must use a sequence of Foo objects instead. To accomplish this, I've replaced your Select->DistinctUntilChanged query with a Scan->Where->Select query.
var batches = foos.Publish(publishedFoos => publishedFoos
.Scan(
new { foo = (Foo)null, last = DateTime.MinValue, take = true },
(acc, foo) =>
{
var boundary = foo.Timestamp - acc.last >= TimeSpan.FromSeconds(5);
return new
{
foo,
last = boundary ? foo.Timestamp : acc.last,
take = boundary
};
})
.Where(a => a.take)
.Select(a => a.foo)
.Publish(boundaries => boundaries
.Skip(1)
.StartWith((Foo)null)
.GroupJoin(
publishedFoos,
foo => foo == null ? boundaries.Skip(1) : boundaries,
_ => Observable.Empty<Unit>(),
(foo, window) => (foo == null ? window : window.StartWith(foo)).ToList())))
.Merge()
.Replay(lists => lists.SkipLast(1)
.Select(list => list.Take(list.Count - 1))
.Concat(lists),
bufferSize: 1);
The Replay query at the end is only required if you expect the sequence to eventually end and you care about not dropping the last notification; otherwise, you could simply modify window.StartWith(foo) to window.StartWith(foo).SkipLast(1) to achieve the same basic results, though the last notification of the last buffer will be lost.
I'm using PLINQ to run a function that tests serial ports to determine if they're a GPS device.
Some serial ports immediately are found to be a valid GPS. In this case, I want the first one to complete the test to be the one returned. I don't want to wait for the rest of the results.
Can I do this with PLINQ, or do I have to schedule a batch of tasks and wait for one to return?
PLINQ is probably not going to suffice here. While you can use .First, in .NET 4, this will cause it to run sequentially, which defeats the purpose. (Note that this will be improved in .NET 4.5.)
The TPL, however, is most likely the right answer here. You can create a Task<Location> for each serial port, and then use Task.WaitAny to wait on the first successful operation.
This provides a simple way to schedule a bunch of "tasks" and then just use the first result.
I have been thinking about this on and off for the past couple days and I can't find a built in PLINQ way to do this in C# 4.0. The accepted answer to this question of using FirstOrDefault does not return a value until the full PLINQ query is complete and still returns the (ordered) first result. The following extreme example shows the behavior:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
// waits until all results are in, then returns first
q.FirstOrDefault().Dump("result");
I don't see a built-in way to immediately get the first available result, but I was able to come up with two workarounds.
The first creates Tasks to do the work and returns the Task, resulting in a quickly completed PLINQ query. The resulting tasks can be passed to WaitAny to get the first result as soon as it is available:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
return Task.Factory.StartNew(() =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
});
cts.CancelAfter(5000);
// returns as soon as the tasks are created
var ts = q.ToArray();
// wait till the first task finishes
var idx = Task.WaitAny( ts );
ts[idx].Result.Dump("res");
This is probably a terrible way to do it. Since the actual work of the PLINQ query is just a very fast Task.Factory.StartNew, it's pointless to use PLINQ at all. A simple .Select( i => Task.Factory.StartNew( ... on the IEnumerable is cleaner and probably faster.
The second workaround uses a queue (BlockingCollection) and just inserts results into this queue once they are computed:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
var qu = new BlockingCollection<string>();
// ForAll blocks until PLINQ query is complete
Task.Factory.StartNew(() => q.ForAll( x => qu.Add(x) ));
// get first result asap
qu.Take().Dump("result");
With this method, the work is done using PLINQ, and the BlockingCollecion's Take() will return the first result as soon as it is inserted by the PLINQ query.
While this produces the desired result, I am not sure it has any advantage over just using the simpler Tasks + WaitAny
Upon further review, you can apparently just use FirstOrDefault to solve this. PLINQ will not preserve ordering by default, and with an unbuffered query, will return immediately.
http://msdn.microsoft.com/en-us/library/dd460677.aspx
To accomplish this entirely with PLINQ in .NET 4.0:
SerialPorts. // Your IEnumerable of serial ports
AsParallel().AsUnordered(). // Run as an unordered parallel query
Where(IsGps). // Matching the predicate IsGps (Func<SerialPort, bool>)
Take(1). // Taking the first match
FirstOrDefault(); // And unwrap it from the IEnumerable (or null if none are found
The key is to not use an ordered evaluation like First or FirstOrDefault until you have specified that you only care to find one.