Given a simple scenario:
A and B are in a room, A talks to B. The room is dark and B couldn't see A. How could B figure out if A is pausing or A is kidnapped from the room?
When A talks, A provides IObservable Talk that B subsequently subscribes to Talk.Subscribe(string=>process what A said). B could at the same time subscribe to Observable.Interval Heartbeat as a heartbeat checking.
My question is what Operator I should use to merge/combine two IObservable so that if there is no item from Talk over two items of Heartbeat, B will assume the A has been kidnapped.
Please note that I want to avoid a variable to store the state because it may cause the side effect if I don't synchronize that variable properly.
Thanks,
Imagine a state variable you want to act on, with the state representing the number of heartbeats since 'A' last spoke. That would look like this:
var stateObservable = Observable.Merge( //State represent number of heartbeats since A last spoke
aSource.Select(_ => new Func<int, int>(i => 0)), //When a talks, set state to 0
bHeartbeat.Select(_ => new Func<int, int>(i => i + 1)) //when b heartbeats, increment state
)
.Scan(0, (state, func) => func(state));
We represent incidents of A speaking as a function resetting the state to 0, and incidents of B heartbeatting as incrementing the state. We then accumulate with the Scan function.
The rest is now easy:
var isKidnapped = stateObservable
.Where(state => state >= 2)
.Take(1);
isKidnapped.Subscribe(_ => Console.WriteLine("A is kidnapped"));
EDIT:
Here's an example with n A sources:
var aSources = new Subject<Tuple<string, Subject<string>>>();
var bHeartbeat = Observable.Interval(TimeSpan.FromSeconds(1)).Publish().RefCount();
var stateObservable = aSources.SelectMany(t =>
Observable.Merge(
t.Item2.Select(_ => new Func<int, int>(i => 0)),
bHeartbeat.Select(_ => new Func<int, int>(i => i + 1))
)
.Scan(0, (state, func) => func(state))
.Where(state => state >= 2)
.Take(1)
.Select(_ => t.Item1)
);
stateObservable.Subscribe(s => Console.WriteLine($"{s} is kidnapped"));
aSources
.SelectMany(t => t.Item2.Select(s => Tuple.Create(t.Item1, s)))
.Subscribe(t => Console.WriteLine($"{t.Item1} says '{t.Item2}'"));
bHeartbeat.Subscribe(_ => Console.WriteLine("**Heartbeat**"));
var a = new Subject<string>();
var c = new Subject<string>();
var d = new Subject<string>();
var e = new Subject<string>();
var f = new Subject<string>();
aSources.OnNext(Tuple.Create("A", a));
aSources.OnNext(Tuple.Create("C", c));
aSources.OnNext(Tuple.Create("D", d));
aSources.OnNext(Tuple.Create("E", e));
aSources.OnNext(Tuple.Create("F", f));
a.OnNext("Hello");
c.OnNext("My name is C");
d.OnNext("D is for Dog");
await Task.Delay(TimeSpan.FromMilliseconds(1200));
e.OnNext("Easy-E here");
a.OnNext("A is for Apple");
await Task.Delay(TimeSpan.FromMilliseconds(2200));
Related
I have a class like
public class Foo
{
public string X;
public string Y;
public int Z;
}
and the query I want to achieve is, given an IEnumerable<Foo> called foos,
"Group by X, then by Y, and choose the the largest subgroup
from each supergroup; if there is a tie, choose the one with the
largest Z."
In other words, a not-so-compact solution would look like
var outer = foos.GroupBy(f => f.X);
foreach(var g1 in outer)
{
var inner = g1.GroupBy(g2 => g2.Y);
int maxCount = inner.Max(g3 => g3.Count());
var winners = inner.Where(g4 => g4.Count() == maxCount));
if(winners.Count() > 1)
{
yield return winners.MaxBy(w => w.Z);
}
else
{
yield return winners.Single();
}
}
and a not-so-efficient solution would be like
from foo in foos
group foo by new { foo.X, foo.Y } into g
order by g.Key.X, g.Count(), g.Max(f => f.Z)
. . . // can't figure the rest out
but ideally I'd like both compact and efficient.
you are reusing enumerables too much, that causes whole enumerable to be executed again which can cause significant performance decrease in some cases.
Your not so compact code can be simplified to this.
foreach (var byX in foos.GroupBy(f => f.X))
{
yield return byX.GroupBy(f => f.Y, f => f, (_, byY) => byY.ToList())
.MaxBy(l => l.Count)
.MaxBy(f => f.Z);
}
Here is how it goes,
items are grouped by x, hence the variable is named byX, which means entire byX enumerable contains similar X's.
Now you group this grouped items by Y. the variable named byY means that entire byY enumerable contains similar Y's that also have similar X's
Finally you select largest list i.e winners (MaxyBy(l => l.Count)) and from winners you select item with highest Z (MaxBy(f => f.Z)).
The reason I used byY.ToList() was to prevent duplicate enumeration that otherwise would be caused by Count() and MaxBy().
Alternatively you can change your entire iterator into single return statement.
return foos.GroupBy(f => f.X, f => f, (_, byX) =>
byX.GroupBy(f => f.Y, f => f,(__, byY) => byY.ToList())
.MaxBy(l => l.Count)
.MaxBy(f => f.Z));
Based on the wording of your question I assume that you want the result to be an IEnumerable<IEnumerable<Foo>>. Elements are grouped by both X and Y so all elements in a specific inner sequence will have the same value for X and Y. Furthermore, every inner sequence will have different (unique) values for X.
Given the following data
X Y Z
-----
A p 1
A p 2
A q 1
A r 3
B p 1
B q 2
the resulting sequence of sequences should consist of two sequences (for X = A and X = B)
X Y Z
-----
A p 1
A p 2
X Y Z
-----
B q 2
You can get this result using the following LINQ expression:
var result = foos
.GroupBy(
outerFoo => outerFoo.X,
(x, xFoos) => xFoos
.GroupBy(
innerFoo => innerFoo.Y,
(y, yFoos) => yFoos)
.OrderByDescending(yFoos => yFoos.Count())
.ThenByDescending(yFoos => yFoos.Select(foo => foo.Z).Max())
.First());
If you really care about performance you can most likely improve it at the cost of some complexity:
When picking the group with most elements or highest Z value two passes are performed over the elements in each group. First the elements are counted using yFoos.Count() and then the maximum Z value is computed using yFoos.Select(foo => foo.Z).Max(). However, you can do the same in one pass by using Aggregate.
Also, it is not necessary to sort all the groups to find the "largest" group. Instead a single pass over all the groups can be done to find the "largest" group again using Aggregate.
result = foos
.GroupBy(
outerFoo => outerFoo.X,
(x, xFoos) => xFoos
.GroupBy(
innerFoo => innerFoo.Y,
(y, yFoos) => new
{
Foos = yFoos,
Aggregate = yFoos.Aggregate(
(Count: 0, MaxZ: int.MinValue),
(accumulator, foo) =>
(Count: accumulator.Count + 1,
MaxZ: Math.Max(accumulator.MaxZ, foo.Z)))
})
.Aggregate(
new
{
Foos = Enumerable.Empty<Foo>(),
Aggregate = (Count: 0, MaxZ: int.MinValue)
},
(accumulator, grouping) =>
grouping.Aggregate.Count > accumulator.Aggregate.Count
|| grouping.Aggregate.Count == accumulator.Aggregate.Count
&& grouping.Aggregate.MaxZ > accumulator.Aggregate.MaxZ
? grouping : accumulator)
.Foos);
I am using a ValueTuple as the accumulator in Aggregate as I expect that to have a good performance. However, if you really want to know you should measure.
You can prety much ignore the outer grouping and what is left is just a little advaced MaxBy, kind of alike a two parameter sorting. If you implement that, you would end up with something like:
public IEnumerable<IGrouping<string, Foo>> GetFoo2(IEnumerable<Foo> foos)
{
return foos.GroupBy(f => f.X)
.Select(f => f.GroupBy(g => g.Y)
.MaxBy2(g => g.Count(), g => g.Max(m => m.Z)));
}
It is questionable how much you can call this linq approach, as you moved all the functionality into quite ordinary function. You can also implement the functionality with aggregate. There are two options. With seed and without seed. I like the latter option:
public IEnumerable<IGrouping<string, Foo>> GetFoo3(IEnumerable<Foo> foos)
{
return foos.GroupBy(f => f.X)
.Select(f => f.GroupBy(g => g.Y)
.Aggregate((a, b) =>
a.Count() > b.Count() ? a :
a.Count() < b.Count() ? b :
a.Max(m => m.Z) >= b.Max(m => m.Z) ? a : b
));
}
The performance would suffer if Count() is not constant time, which is not guaranteed, but on my tests it worked fine. The variant with seed would be more complicated, but may be faster if done right.
Thinking about this further, I realized your orderby could vastly simplify everything, still not sure it is that understandable.
var ans = foos.GroupBy(f => f.X, (_, gXfs) => gXfs.GroupBy(gXf => gXf.Y).Select(gXgYfs => gXgYfs.ToList())
.OrderByDescending(gXgYfs => gXgYfs.Count).ThenByDescending(gXgYfs => gXgYfs.Max(gXgYf => gXgYf.Z)).First());
While it is possible to do this in LINQ, I don't find it any more compact or understandable if you make it into one statement when using query comprehension syntax:
var ans = from foo in foos
group foo by foo.X into foogX
let foogYs = (from foo in foogX
group foo by foo.Y into rfoogY
select rfoogY)
let maxYCount = foogYs.Max(y => y.Count())
let foogYsmZ = from fooY in foogYs
where fooY.Count() == maxYCount
select new { maxZ = fooY.Max(f => f.Z), fooY = from f in fooY select f }
let maxMaxZ = foogYsmZ.Max(y => y.maxZ)
select (from foogY in foogYsmZ where foogY.maxZ == maxMaxZ select foogY.fooY).First();
If you are willing to use lambda syntax, some things become easier and shorter, though not necessarily more understandable:
var ans = from foogX in foos.GroupBy(f => f.X)
let foogYs = foogX.GroupBy(f => f.Y)
let maxYCount = foogYs.Max(foogY => foogY.Count())
let foogYmCmZs = foogYs.Where(fooY => fooY.Count() == maxYCount).Select(fooY => new { maxZ = fooY.Max(f => f.Z), fooY })
let maxMaxZ = foogYmCmZs.Max(foogYmZ => foogYmZ.maxZ)
select foogYmCmZs.Where(foogYmZ => foogYmZ.maxZ == maxMaxZ).First().fooY.Select(y => y);
With lots of lambda syntax, you can go completely incomprehensible:
var ans = foos.GroupBy(f => f.X, (_, gXfs) => gXfs.GroupBy(gXf => gXf.Y).Select(gXgYf => new { fCount = gXgYf.Count(), maxZ = gXgYf.Max(f => f.Z), gXgYfs = gXgYf.Select(f => f) }))
.Select(fC_mZ_gXgYfs_s => {
var maxfCount = fC_mZ_gXgYfs_s.Max(fC_mZ_gXgYfs => fC_mZ_gXgYfs.fCount);
var fC_mZ_gXgYfs_mCs = fC_mZ_gXgYfs_s.Where(fC_mZ_gXgYfs => fC_mZ_gXgYfs.fCount == maxfCount).ToList();
var maxMaxZ = fC_mZ_gXgYfs_mCs.Max(fC_mZ_gXgYfs => fC_mZ_gXgYfs.maxZ);
return fC_mZ_gXgYfs_mCs.Where(fC_mZ_gXgYfs => fC_mZ_gXgYfs.maxZ == maxMaxZ).First().gXgYfs;
});
(I modified this third possiblity to reduce repetitive calculations and be more DRY, but that did make it a bit more verbose.)
I have a simple use case where:
Receive a notification of events
Perform some action on the event
Print the content after x interval
How can I do the above step in a single Rx pipeline?
Something like below:
void Main()
{
var observable = Observable.Interval(TimeSpan.FromSeconds(1));
// Receive event and call Foo()
observable.Subscribe(x=>Foo());
// After 1 minute, I want to print the result of count
// How do I do this using above observable?
}
int count = 0;
void Foo()
{
Console.Write(".");
count ++;
}
I think this does what you want:
var observable =
Observable
.Interval(TimeSpan.FromSeconds(1))
.Do(x => Foo())
.Window(() => Observable.Timer(TimeSpan.FromMinutes(1.0)));
var subscription =
observable
.Subscribe(xs => Console.WriteLine(count));
However, it's a bad idea to mix state with observables. If you had two subscriptions you'd increment count twice as fast. It's better to encapsulate your state within the observable so that each subscription would get a new instance of count.
Try this instead:
var observable =
Observable
.Defer(() =>
{
var count = 0;
return
Observable
.Interval(TimeSpan.FromSeconds(1))
.Select(x =>
{
Console.Write(".");
return ++count;
});
})
.Window(() => Observable.Timer(TimeSpan.FromMinutes(0.1)))
.SelectMany(xs => xs.LastAsync());
var subscription =
observable
.Subscribe(x => Console.WriteLine(x));
I get this kind of output:
...........................................................59
............................................................119
............................................................179
............................................................239
Remembering that it starts with 0 then this is timing pretty well.
After seeing paulpdaniels answer I realized that I could replace my Window/SelectMany/LastAsync with the simpler Sample operator.
Also, if we don't really need the side-effect of incrementing a counter then this whole observable shrinks down to this:
var observable =
Observable
.Interval(TimeSpan.FromSeconds(1.0))
.Do(x => Console.Write("."))
.Sample(TimeSpan.FromMinutes(1.0));
observable.Subscribe(x => Console.WriteLine(x));
Much simpler!
I would use Select + Sample:
var observable = Observable.Interval(TimeSpan.FromSeconds(1))
.Select((x, i) => {
Foo(x);
return i;
})
.Do(_ => Console.Write("."))
.Sample(TimeSpan.FromMinutes(1));
observable.Subscribe(x => Console.WriteLine(x));
Select has an overload that returns the index of the current value, by returning that and then sampling at 1 minute intervals, you can get the last value emitted during that interval.
I have a single Window WPF application with the following constructor
numbers = Observable.Generate(DateTime.Now,
time => true,
time => DateTime.Now,
time => { return new Random(DateTime.Now.Millisecond).NextDouble() * 99 + 2; },
time => TimeSpan.FromSeconds(1.0));
numbers.ObserveOnDispatcher()
.Subscribe(s => list1.Items.Add(s.ToString("##.00")));
numbers.Where(n => n < 10).ObserveOnDispatcher().
Subscribe(s => list2.Items.Add(s.ToString("##.00")));
Now here is the screenshot of the lists - Notice 3.76 is missing from the left list... This behavior is intermittent.
The short answer is that you are doing it wrong. Rx is working perfectly.
When you create an observable you are creating the definition of a sequence of values over time, not the actual sequence of values over time. This means that whenever you have a subscriber to the observable you are creating new instances of the observable for each subscriber.
So, in your case you have two instances of this sequence operating:
var numbers =
Observable
.Generate(
DateTime.Now,
time => true,
time => DateTime.Now,
time => new Random(DateTime.Now.Millisecond)
.NextDouble() * 99 + 2,
time => TimeSpan.FromSeconds(1.0));
Now, since you are subscribing to this observable twice in immediate succession the two instances of this observable would be trying to generate values at almost the same time. So the value DateTime.Now.Millisecond would be the same most of the time, but now always. The value returned from new Random(x).NextDouble() is the same for the same x, so hence why most of the time you get the same value from the two instances of the observable. It's just when DateTime.Now.Millisecond is different that you get two different values and it appears that the subscribers are missing values.
Here's an alternative version that should work as you expected initially:
var rnd = new Random((int)DateTime.Now.Ticks);
var numbers =
Observable
.Generate(0, n => true, n => 0,
n => rnd.NextDouble() * 99 + 2,
n => TimeSpan.FromSeconds(1.0));
var publishedNumbers = numbers.Publish();
publishedNumbers
.ObserveOnDispatcher()
.Subscribe(s => list1.Items.Add(s.ToString("##.00")));
publishedNumbers
.Where(n => n < 10)
.ObserveOnDispatcher()
.Subscribe(s => list2.Items.Add(s.ToString("##.00")));
publishedNumbers.Connect();
I am in the process of creating a service to make it easy for a user to select a protocol from the IANA - Protocol Registry.
As you might imagine searching the registry for the term http pulls up a lot of hits. Since amt-soap-http is going to selected by a user much less frequently than straight http I decided that it would be a good idea to pull out everything that starts with http and then concatenate that with the remaining results.
The below lambda expression is the result of that thought process:
var records = this._ianaRegistryService.GetAllLike(term).ToList();
var results = records.Where(r => r.Name.StartsWith(term))
.OrderBy(r => r.Name)
.Concat(records.Where(r => !r.Name.StartsWith(term))
.OrderBy(r => r.Name))
.Take(MaxResultSize);
Unfortunately, I feel like I am iterating through my results more times than necessary. Premature optimization considerations aside is there a combination of lambda expressions that would be more efficient than the above?
It might be more efficient as a two-step ordering:
var results = records.OrderBy(r => r.Name.StartsWith(term) ? 1 : 2)
.ThenBy(r => r.Name)
.Take(MaxResultSize);
Using comment to explain what I am trying to do is getting hard. So i will post this another answer.
Suppose I want to sort a list of random integers first according to its being even or odd then in numerical order (simulating StartsWith with mod 2).
Here is the test case: action2 is the same as other answer.
If you run this code you will see that my suggestion (action1) is two times faster.
void Test()
{
Random rnd = new Random();
List<int> records = new List<int>();
for(int i=0;i<2000000;i++)
{
records.Add(rnd.Next());
}
Action action1 = () =>
{
var res1 = records.GroupBy(r => r % 2)
.OrderBy(x => x.Key)
.Select(x => x.OrderBy(y => y))
.SelectMany(x => x)
.ToList();
};
Action action2 = () =>
{
var res2 = records.OrderBy(x => x % 2).ThenBy(x => x).ToList();
};
//Avoid counting JIT
action1();
action2();
var sw = Stopwatch.StartNew();
action1();
long t1 = sw.ElapsedMilliseconds;
sw.Restart();
action2();
long t2 = sw.ElapsedMilliseconds;
Console.WriteLine(t1 + " " + t2);
}
Since I'm having a cold Observable here and I subscribe to "grouped" several times, why do I NOT need Publish here? I would have expect it to bring up unwanted results when I run it but to my surprise it works with and without Publish. Why is that?
var subject = new List<string>
{
"test",
"test",
"hallo",
"test",
"hallo"
}.ToObservable();
subject
.GroupBy(x => x)
.SelectMany(grouped => grouped.Scan(0, (count, _) => ++count)
.Zip(grouped, (count, chars) => new { Chars = chars, Count = count }))
.Subscribe(result => Console.WriteLine("You typed {0} {1} times",
result.Chars, result.Count));
// I Would have expect that I need to use Publish like that
//subject
// .GroupBy(x => x)
// .SelectMany(grouped => grouped.Publish(sharedGroup =>
// sharedGroup.Scan(0, (count, _) => ++count)
// .Zip(sharedGroup, (count, chars) =>
// new { Chars = chars, Count = count })))
// .Subscribe(result => Console.WriteLine("You typed {0} {1} times",
// result.Chars, result.Count));
Console.ReadLine();
EDIT
As Paul noticed since we are subscribing to the underlying cold observable twice, we should be going over the sequence twice. However, I had no luck to make this effect visible. I tried to insert debug lines but for example this prints "performing" just once.
var subject = new List<Func<string>>
{
() =>
{
Console.WriteLine("performing");
return "test";
},
() => "test",
() => "hallo",
() => "test",
() => "hallo"
}.ToObservable();
subject
.Select(x => x())
.GroupBy(x => x)
.SelectMany(grouped => grouped.Scan(0, (count, _) => ++count)
.Zip(grouped, (count, chars) => new { Chars = chars, Count = count }))
.Subscribe(result => Console.WriteLine("You typed {0} {1} times",
result.Chars, result.Count));
I wonder if we can make the effect visible that we are dealing with an cold observable and are not using Publish(). In another step I would like to see how Publish() (see above) makes the effect goes away.
EDIT 2
As Paul suggested, I created a custom IObservable<string> for debugging purposes. However, if you set a breakpoint in it's Subscribe() method you will notice that it's just going to be hit once.
class Program
{
static void Main(string[] args)
{
var subject = new MyObservable();
subject
.GroupBy(x => x)
.SelectMany(grouped => grouped.Scan(0, (count, _) => ++count)
.Zip(grouped, (count, chars) => new { Chars = chars, Count = count }))
.Subscribe(result => Console.WriteLine("You typed {0} {1} times",
result.Chars, result.Count));
Console.ReadLine();
}
}
class MyObservable : IObservable<string>
{
public IDisposable Subscribe(IObserver<string> observer)
{
observer.OnNext("test");
observer.OnNext("test");
observer.OnNext("hallo");
observer.OnNext("test");
observer.OnNext("hallo");
return Disposable.Empty;
}
}
So for me the question is still open. Why do I not need Publish here on this cold Observable?
You're only using your List-based source once, so you won't see duplicate subscription effects there. The key to answering your question is the following observation:
An IGroupedObservable<K, T> object flowing out of GroupBy by itself is a subject in disguise.
Internally, GroupBy keeps a Dictionary<K, ISubject<T>>. Whenever a message comes in, it gets sent into the subject with the corresponding key. You're subscribing twice to the grouping object, which is safe, as the subject decouples the producer from the consumer.
Reusing 'grouped' in the Zip means you're effectively doing each grouping twice - however, since your source is Cold, it still works. Does that make sense?