Rx sequential groupBy (partition stream) - c#

I have a stream of events:
event.EventTime: 1s-----2s----3s----4s----5s----6s---
stream: A-B-C--D-----------------E-F---G-H--
An event looks like this:
public class Event
{
public DateTime EventTime { get; set; }
public int Value { get; set; }
}
EventTime should correspond to a time at which the event arrives, but there can be a small delay. The events are not supposed to arrive out-of-order, though.
Now, when I specify an grouping interval, say 1 second, I expect the stream to be grouped like this
1s-------2s----3s----4s----5s-----6s---
[A-B-C]--[D]---[ ]---[ ]---[E-F]--[G-H]
(notice the empty intervals)
I have tried using Buffer, but sadly I need to partition by EventTime, not System.DateTime.Now. Even with boundaries, I'd need some kind of look-ahead since when I use Buffer(2,1) as boundary and compare [0] and [1], even though [1] succesfully breaks the buffer, it still gets inserted into the old one instead of the new one. I also tried GroupBy, but that yielded groups only after the input stream finished. Which should never happen. Then I tried some this thing:
var intervalStart = GetIntervalStartLocal(DateTime.Now) + intervalLength;
var intervals = Observable.Timer(intervalStart, intervalLength);
var eventsAsObservables = intervals.GroupJoin<long, Event, long, Event, (DateTime, IObservable<Event>)>(
data,
_ => Observable.Never<long>(),
_ => Observable.Never<Event>(),
(intervalNumber, events) => {
var currentIntervalStart = intervalStart + intervalNumber*intervalLength;
var eventsInInterval = events
.SkipWhile(e => GetIntervalStartLocal(e.EventTime) < currentIntervalStart)
.TakeWhile(e => GetIntervalStartLocal(e.EventTime) == currentIntervalStart);
return (currentIntervalStart, eventsInInterval);
});
var eventsForIntervalsAsObservables = eventsAsObservables.SelectMany(g => {
var lists = g.Item2.Aggregate(new List<Event>(), (es, e) => { es.Add(e); return es; });
return lists.Select(l => (intervalStart: g.Item1, events: l));
});
var task = eventsForIntervalsAsObservables.ForEachAsync(es => System.Console.WriteLine(
$"=[{es.intervalStart.TimeOfDay}]= " + string.Join("; ", es.events.Select(e => e.EventTime.TimeOfDay))));
await task;
I was thinking that I'd use GroupJoin which joins based on values. So first, I'll emit interval timestamps. Then, inside GroupJoins resultSelector, I'll compute a matching interval from each Event, using GetIntervalStartLocal function (truncates the date to an interval length). After that, I'll skip all the potential leftovers from a previous interval (SkipWhile expected interval is higher then actually computed from Event). Finally, I'll TakeWhile event computed interval matches expected.
However, there must be a problem before I even get to SkipWhile and TakeWhile, because resultSelector actually does not operate on all data from data, but ignores some, e.g. like this:
event.EventTime: 1s-----2s----3s----4s----5s----6s---
stream: A---C--D-------------------F-----H--
and then constructs (from what it operates on, correctly):
1s-----2s----3s----4s----5s---6s---
[A-C]--[D]---[ ]---[ ]---[F]--[H]--
I think I must be doing something terribly wrong here, because it shouldn't be that hard to do partitioning on a stream based on a stream event value.

You need to clarify what you want. Given this:
time : 1s-------2s----3s----4s----5s-----6s---
stream: A-B-C----D-----------------E-F----G-H-- (actual)
group : [A-B-C]--[D]---[ ]---[ ]---[E-F]--[G-H] (desired result)
It's not clear whether 'time' here is your event time-stamp, or actual time. If it's actual time, then that is of course impossible: You can't pass a list of ABC before C has arrived. If you're referring to your event time-stamp, then Buffer or perhaps Window will have to know when to stop, which isn't that easy to do.
GroupBy does work for me as follows:
var sampleSource = Observable.Interval(TimeSpan.FromMilliseconds(400))
.Timestamp()
.Select(t => new Event { EventTime = t.Timestamp.DateTime, Value = (int)t.Value });
sampleSource
.GroupBy(e => e.EventTime.Ticks / 10000000) //10M ticks per second
.Dump(); //LinqPad
The only problem with this is that each group doesn't have a close criteria, so it's a giant memory leak. So you can add a timer to close the groups:
sampleSource
.GroupBy(e => e.EventTime.Ticks / 10000000) //10M ticks per second
.Select(g => g.TakeUntil(Observable.Timer(TimeSpan.FromSeconds(2)))) //group closes 2 seconds after opening
.Dump(); //LinqPad
This closing also allows us to return lists with .ToList(), rather than Observables:
sampleSource
.GroupBy(e => e.EventTime.Ticks / 10000000) //10M ticks per second
.SelectMany(g => g.TakeUntil(Observable.Timer(TimeSpan.FromSeconds(2))).ToList())
.Dump(); //LinqPad

Related

How can I fix race condition in this "reset after time" observable?

Input is an observable that produces a value each time a problem occurs.
As output I want an observable that produces a value if problems exist for a longer time. In other words I want to "reset" the output observable (not produce values) if the last problem is outdated.
My solution:
// first get an observable producing statusOk values (true = ok, false = not ok)
var okStatusObservable = input.Select(_ => true).Throttle(longerTime)
.Merge(input.Select(_ => false));
// we only want event if statusOk=false for a longer time
var outputObservable = okStatusObservable
.DistinctUntilChanged() // only changes
.Throttle(evenLongerTime) // wait for stable status
.Where(_ => _ == false); // only interested in bad status
I think the okStatusObservable might contain a race condition: If input receives events at time interval of exactly longerTime and second merge part (Select / false) would produce a boolean before the first part (Select+Throttle / true) then this would result in okStatus to be true 99.9% of the time where the opposite would be correct.
(PS: to have status value from beginning, we might add .StartWith(true) but that doesn't matter regarding race condition.)
A cleaner way to do the first observable is as follows:
var okStatusObservable2 = input
.Select(_ => Observable.Return(true).Delay(longerTime).StartWith(false))
.Switch();
Explanation: For each input message, produce an observable that starts with a false, and after longerTime produces a true. The Switch means that if you have a new observable, just switch to it, which would exclude the all-clear true at the end.
For your second observable, unless longerTime differs between the two observables, every first false in the first observable will result in a false in the second. Is that your intention?
Also, your Where is messed up (should be .Where(b => !b) or .Where(b => b == false). .Where(_ => false) will always evaluate to false returning nothing.
Other than that, I think your solution is sound.

System.Reactive Observable of string combine items into single item as new Observable

I have an observable whose items emit a string value that is only a portion of the entire message that I want to publically offer as an observable.
If items come in like this:
"This is "
"only part of"
" the message."
I want to offer a public observable whose items are emitted like:
"This is only part of the message."
And I know when the message parts are a full message by the period at the end.
I have been trying to get the Buffer operator to work because that seems to be the right operator for my scenario, but I don't know how to tell the buffer what the closing condition is or if that's even possible.
Buffer is the best way to do this:
var source = new Subject<string>();
var result = source.Publish(_source => _source
.Buffer(_source.Where(s => s.EndsWith(".")))
)
.Select(l => l.Aggregate ((x, y) => x + y));
result.Subscribe(s => Console.WriteLine(s));
source.OnNext("This is ");
source.OnNext("only part of");
source.OnNext(" the message.");
source.OnNext("Not. A. Full. Message ");
source.OnNext("but end of stream anyway");
source.OnCompleted();
Buffer takes a parameter that specifies where the groups splits should happen, which we specify with the where clause. Buffer aggregates the messages into a list, which we then aggregate with Linq's Aggregate.
EDIT:
Publish avoids re-subscription. If you were to remove Publish, the solution would look like this, and would work:
var result2 = source.Buffer(
source.Where(s => s.EndsWith("."))
)
.Select(l => l.Aggregate((x, y) => x + y));
However, result2 would be subscribed twice to source, which can be a source of bugs, particularly if source isn't a well implemented or behaved observable. Therefore, when you subscribe twice to an observable, it's best to use Publish, which essentially 'forwards' the messages from the one subscription on to multiple subscriptions.

How to window/buffer IObservable<T> into chunks based on a Func<T>

Given a class:
class Foo { DateTime Timestamp {get; set;} }
...and an IObservable<Foo>, with guaranteed monotonically increasing Timestamps, how can I generate an IObservable<IList<Foo>> chunked into Lists based on those Timestamps?
I.e. each IList<Foo> should have five seconds of events, or whatever. I know I can use Buffer with a TimeSpan overload, but I need to take the time from the events themselves, not the wall clock. (Unless there a clever way of providing an IScheduler here which uses the IObservable itself as the source of .Now?)
If I try to use the Observable.Buffer(this IObservable<Foo> source, IObservable<Foo> bufferBoundaries) overload like so:
IObservable<Foo> foos = //...;
var pub = foos.Publish();
var windows = pub.Select(x => new DateTime(
x.Ticks - x.Ticks % TimeSpan.FromSeconds(5).Ticks)).DistinctUntilChanged();
pub.Buffer(windows).Subscribe(x => t.Dump())); // linqpad
pub.Connect();
...then the IList instances contain the item that causes the window to be closed, but I really want this item to go into the next window/buffer.
E.g. with timestamps [0, 1, 10, 11, 15] you will get blocks of [[0], [1, 10], [11, 15]] instead of [[0, 1], [10, 11], [15]]
Here's an idea. The group key condition is the "window number" and I use GroupByUntil. This gives you the desired output in your example (and I've used an int stream just like that example - but you can substitute whatever you need to number your windows).
public class Tests : ReactiveTest
{
public void Test()
{
var scheduler = new TestScheduler();
var xs = scheduler.CreateHotObservable<int>(
OnNext(0, 0),
OnNext(1, 1),
OnNext(10, 10),
OnNext(11, 11),
OnNext(15, 15),
OnCompleted(16, 0));
xs.Publish(ps => // (1)
ps.GroupByUntil(
p => p / 5, // (2)
grp => ps.Where(p => p / 5 != grp.Key)) // (3)
.SelectMany(x => x.ToList())) // (4)
.Subscribe(Console.WriteLine);
scheduler.Start();
}
}
Notes
We publish the source stream because we will subscribe more than once.
This is a function to create a group key - use this to generate a window number from your item type.
This is the group termination condition - use this to inspect the source stream for an item in another window. Note that means a window won't close until an element outside of it arrives, or the source stream terminates. This is obvious if you think about it - your desired output requires consideration of next element after a window ends. Note if your source bears any relation to real time, you could merge this with an Observable.Timer+Select that outputs a null/default instance of your term to terminate the stream earlier.
SelectMany puts the groups into lists and flattens the stream.
This example will run in LINQPad quite nicely if you include nuget package rx-testing. New up a Tests instance and just run the Test() method.
I think James World's answer is neater/more readable, but for posterity, I've found another way to do this using Buffer():
IObservable<Foo> foos = //...;
var pub = foos.Publish();
var windows = pub.Select(x => new DateTime(
x.Ticks - x.Ticks % TimeSpan.FromSeconds(5).Ticks))
.DistinctUntilChanged().Publish.RefCount();
pub.Buffer(windows, x => windows).Subscribe(x => t.Dump()));
pub.Connect();
With 10m events, James' approach is more than 2.5x as fast (20s vs. 56s on my machine).
Window is a generalization of Buffer, and GroupJoin is a generalization of Window (and Join). When you write a Window or Buffer query and you find that notifications are being incorrectly included or excluded from the edges of the windows/lists, then redefine your query in terms of GroupJoin to take control over where edge notifications arrive.
Note that in order to make the closing notifications available to newly opened windows you must define your boundaries as windows of those notifications (the windowed data, not the boundary data). In your case, you cannot use a sequence of DateTime values as your boundaries, you must use a sequence of Foo objects instead. To accomplish this, I've replaced your Select->DistinctUntilChanged query with a Scan->Where->Select query.
var batches = foos.Publish(publishedFoos => publishedFoos
.Scan(
new { foo = (Foo)null, last = DateTime.MinValue, take = true },
(acc, foo) =>
{
var boundary = foo.Timestamp - acc.last >= TimeSpan.FromSeconds(5);
return new
{
foo,
last = boundary ? foo.Timestamp : acc.last,
take = boundary
};
})
.Where(a => a.take)
.Select(a => a.foo)
.Publish(boundaries => boundaries
.Skip(1)
.StartWith((Foo)null)
.GroupJoin(
publishedFoos,
foo => foo == null ? boundaries.Skip(1) : boundaries,
_ => Observable.Empty<Unit>(),
(foo, window) => (foo == null ? window : window.StartWith(foo)).ToList())))
.Merge()
.Replay(lists => lists.SkipLast(1)
.Select(list => list.Take(list.Count - 1))
.Concat(lists),
bufferSize: 1);
The Replay query at the end is only required if you expect the sequence to eventually end and you care about not dropping the last notification; otherwise, you could simply modify window.StartWith(foo) to window.StartWith(foo).SkipLast(1) to achieve the same basic results, though the last notification of the last buffer will be lost.

What is the functional way to properly set a dependent predicate for Observable sequence without side effect?

I have three observables oGotFocusOrDocumentSaved, oGotFocus and oLostFocus. I would like oGotFocusOrDocumentSaved to push sequences only when _active is true. My implementation below works as needed, but it introduces a side-effect on _active. Is there anyway to remove side-effect but still get the same functionality?
class TestClass
{
private bool _active = true;
public TestClass(..)
{
...
var oLostFocus = Observable
.FromEventPattern<EventArgs>(_view, "LostFocus")
.Throttle(TimeSpan.FromMilliseconds(500));
var oGotFocus = Observable
.FromEventPattern<EventArgs>(_view, "GotFocus")
.Throttle(TimeSpan.FromMilliseconds(500));
var oGotFocusOrDocumentSaved = oDocumentSaved // some other observable
.Merge<CustomEvtArgs>(oGotFocus)
.Where(_ => _active)
.Publish();
var lostFocusDisposable = oLostFocus.Subscribe(_ => _active = false);
var gotFocusDisposable = oGotFocus.Subscribe(_ => _active = true);
// use case
oGotFocusOrDocumentSaved.Subscribe(x => DoSomethingWith(x));
...
}
...
}
It does sound like you really want a oDocumentSavedWhenHasFocus rather than a oGotFocusOrDocumentSaved observable.
So try using the .Switch() operator, like this:
var oDocumentSavedWhenHasFocus =
oGotFocus
.Select(x => oDocumentSaved.TakeUntil(oLostFocus))
.Switch();
This should be fairly obvious as to how it works, once you know how .Switch() works.
A combination of SelectMany and TakeUntil should get you where you need to be.
from g in oGotFocus
from d in oDocumentSaved
.Merge<CustomEvtArgs>(oGotFocus)
.TakeUntil(oLostFocus)
It seems that you want to be notified when the document is saved, but only if the document currently has focus. Correct? (And you also want to be notified when the document gets focus, but that can easily be merged in later.)
Think in terms of windows instead of point events; i.e., join by coincidence.
Your requirement can be represented as a Join query whereby document saves are joined to focus windows, thus yielding notifications only when both overlap; i.e., when both are "active".
var oGotFocusOrDocumentSaved =
(from saved in oDocumentSaved
join focused in oGotFocus
on Observable.Empty<CustomEventArgs>() // oDocumentSave has no duration
equals oLostFocus // oGotFocus duration lasts until oLostFocus
select saved)
.Merge(oGotFocus);

Throttle Rx.Observable without skipping values

Throttle method skips values from an observable sequence if others follow too quickly. But I need a method to just delay them. That is, I need to set a minimum delay between items, without skipping any.
Practical example: there's a web service which can accept requests no faster than once a second; there's a user who can add requests, single or in batches. Without Rx, I'll create a list and a timer. When users adds requests, I'll add them to the list. In the timer event, I'll check wether the list is empty. If it is not, I'll send a request and remove the corresponding item. With locks and all that stuff. Now, with Rx, I can create Subject, add items when users adds requests. But I need a way to make sure the web service is not flooded by applying delays.
I'm new to Rx, so maybe I'm missing something obvious.
There's a fairly easy way to do what you want using an EventLoopScheduler.
I started out with an observable that will randomly produce values once every 0 to 3 seconds.
var rnd = new Random();
var xs =
Observable
.Generate(
0,
x => x < 20,
x => x + 1,
x => x,
x => TimeSpan.FromSeconds(rnd.NextDouble() * 3.0));
Now, to make this output values immediately unless the last value was within a second ago I did this:
var ys =
Observable.Create<int>(o =>
{
var els = new EventLoopScheduler();
return xs
.ObserveOn(els)
.Do(x => els.Schedule(() => Thread.Sleep(1000)))
.Subscribe(o);
});
This effectively observes the source on the EventLoopScheduler and then puts it to sleep for 1 second after each OnNext so that it can only begin the next OnNext after it wakes up.
I tested that it worked with this code:
ys
.Timestamp()
.Select(x => x.Timestamp.Second + (double)x.Timestamp.Millisecond/1000.0)
.Subscribe(x => Console.WriteLine(x));
I hope this helps.
How about a simple extension method:
public static IObservable<T> StepInterval<T>(this IObservable<T> source, TimeSpan minDelay)
{
return source.Select(x =>
Observable.Empty<T>()
.Delay(minDelay)
.StartWith(x)
).Concat();
}
Usage:
var bufferedSource = source.StepInterval(TimeSpan.FromSeconds(1));
I want to suggest an approach with using Observable.Zip:
// Incoming requests
var requests = new[] {1, 2, 3, 4, 5}.ToObservable();
// defines the frequency of the incoming requests
// This is the way to emulate flood of incoming requests.
// Which, by the way, uses the same approach that will be used in the solution
var requestsTimer = Observable.Interval(TimeSpan.FromSeconds(0.1));
var incomingRequests = Observable.Zip(requests, requestsTimer, (number, time) => {return number;});
incomingRequests.Subscribe((number) =>
{
Console.WriteLine($"Request received: {number}");
});
// This the minimum interval at which we want to process the incoming requests
var processingTimeInterval = Observable.Interval(TimeSpan.FromSeconds(1));
// Zipping incoming requests with the interval
var requestsToProcess = Observable.Zip(incomingRequests, processingTimeInterval, (data, time) => {return data;});
requestsToProcess.Subscribe((number) =>
{
Console.WriteLine($"Request processed: {number}");
});
I was playing around with this and found .Zip (as mentioned before) to be the most simple method:
var stream = "ThisFastObservable".ToObservable();
var slowStream =
stream.Zip(
Observable.Interval(TimeSpan.FromSeconds(1)), //Time delay
(x, y) => x); // We just care about the original stream value (x), not the interval ticks (y)
slowStream.TimeInterval().Subscribe(x => Console.WriteLine($"{x.Value} arrived after {x.Interval}"));
output:
T arrived after 00:00:01.0393840
h arrived after 00:00:00.9787150
i arrived after 00:00:01.0080400
s arrived after 00:00:00.9963000
F arrived after 00:00:01.0002530
a arrived after 00:00:01.0003770
s arrived after 00:00:00.9963710
t arrived after 00:00:01.0026450
O arrived after 00:00:00.9995360
b arrived after 00:00:01.0014620
s arrived after 00:00:00.9993100
e arrived after 00:00:00.9972710
r arrived after 00:00:01.0001240
v arrived after 00:00:01.0016600
a arrived after 00:00:00.9981140
b arrived after 00:00:01.0033980
l arrived after 00:00:00.9992570
e arrived after 00:00:01.0003520
How about using an observable timer to take from a blocking queue? Code below is untested, but should give you an idea of what I mean...
//assuming somewhere there is
BlockingCollection<MyWebServiceRequestData> workQueue = ...
Observable
.Timer(new TimeSpan(0,0,1), new EventLoopScheduler())
.Do(i => myWebService.Send(workQueue.Take()));
// Then just add items to the queue using workQueue.Add(...)
.Buffer(TimeSpan.FromSeconds(0.2)).Where(i => i.Any())
.Subscribe(buffer =>
{
foreach(var item in buffer) Console.WriteLine(item)
});

Categories