How to fix the inconsistency of the Publish().RefCount() behavior? - c#

Recently I stumbled upon an interesting statement by Enigmativity about the Publish and RefCount operators:
You're using the dangerous .Publish().RefCount() operator pair which creates a sequence that can't be subscribed to after it completes.
This statement seems to oppose Lee Campbell's assessment about these operators. Quoting from his book Intro to Rx:
The Publish/RefCount pair is extremely useful for taking a cold observable and sharing it as a hot observable sequence for subsequent observers.
Initially I didn't believe that Enigmativity's statement was correct, so I tried to refute it. My experiments revealed that the Publish().RefCount() can be
indeed inconsistent. Subscribing a second time to a published sequence can cause a new subscription to the source sequence, or not, depending on whether the source sequence was completed while connected. If it was completed, then it won't be resubscribed. If it was not completed, then it will be resubscribed. Here is a demonstration of this behavior:
var observable = Observable
.Create<int>(o =>
{
o.OnNext(13);
o.OnCompleted(); // Commenting this line alters the observed behavior
return Disposable.Empty;
})
.Do(x => Console.WriteLine($"Producer generated: {x}"))
.Finally(() => Console.WriteLine($"Producer finished"))
.Publish()
.RefCount()
.Do(x => Console.WriteLine($"Consumer received #{x}"))
.Finally(() => Console.WriteLine($"Consumer finished"));
observable.Subscribe().Dispose();
observable.Subscribe().Dispose();
In this example the observable is composed by three parts. First is the producing part that generates a single value and then completes. Then follows the publishing mechanism (Publish+RefCount). And finally comes the consuming part that observes the values emitted by the producer. The observable is subscribed twice. The expected behavior would be that each subscription will receive one value. But this is not what happens! Here is the output:
Producer generated: 13
Consumer received #13
Producer finished
Consumer finished
Consumer finished
(Try it on fiddle)
And here is the output if we comment the o.OnCompleted(); line. This subtle change results to a behavior that is expected and desirable:
Producer generated: 13
Consumer received #13
Producer finished
Consumer finished
Producer generated: 13
Consumer received #13
Producer finished
Consumer finished
In the first case the cold producer (the part before the Publish().RefCount()) was subscribed only once. The first consumer received the emitted value, but the second consumer received nothing (except from an OnCompleted notification). In the second case the producer was subscribed twice. Each time it generated a value, and each consumer got one value.
My question is: how can we fix this? How can we modify either the Publish operator, or the RefCount, or both, in order to make them behave always consistently and desirably? Below are the specifications of the desirable behavior:
The published sequence should propagate to its subscribers all notifications coming directly from the source sequence, and nothing else.
The published sequence should subscribe to the source sequence when its current number of subscribers increases from zero to one.
The published sequence should stay connected to the source as long as it has at least one subscriber.
The published sequence should unsubscribe from the source when its current number of subscribers becomes zero.
I am asking for either a custom PublishRefCount operator that offers the functionality described above, or for a way to achieve the desirable functionality using the built-in operators.
Btw a similar question exists, that asks why this happens. My question is about how to fix it.
Update: In retrospect, the above specification results to an unstable behavior that makes race-conditions unavoidable. There is no guarantee that two subscriptions to the published sequence will result to a single subscription to the source sequence. The source sequence may complete between the two subscriptions, causing the unsubscription of the first subscriber, causing the unsubscription of the RefCount operator, causing a new subscription to the source for the next subscriber. The behavior of the built-in .Publish().RefCount() prevents this from happening.
The moral lesson is that the .Publish().RefCount() sequence is not broken, but it's not reusable. It cannot be used reliably for multiple connect/disconnect sessions. If you want a second session, you should create a new .Publish().RefCount() sequence.

Lee does a good job explaining IConnectableObservable, but Publish isn't explained that well. It's a pretty simple animal, just hard to explain. I'll assume you understand IConnectableObservable:
If we to re-implement the zero-param Publish function simply and lazily, it would look something like this:
// For illustrative purposes only: don't use this code
public class PublishObservable<T> : IConnectableObservable<T>
{
private readonly IObservable<T> _source;
private readonly Subject<T> _proxy = new Subject<T>();
private IDisposable _connection;
public PublishObservable(IObservable<T> source)
{
_source = source;
}
public IDisposable Connect()
{
if(_connection == null)
_connection = _source.Subscribe(_proxy);
var disposable = Disposable.Create(() =>
{
_connection.Dispose();
_connection = null;
});
return _connection;
}
public IDisposable Subscribe(IObserver<T> observer)
{
var _subscription = _proxy.Subscribe(observer);
return _subscription;
}
}
public static class X
{
public static IConnectableObservable<T> Publish<T>(this IObservable<T> source)
{
return new PublishObservable<T>(source);
}
}
Publish creates a single proxy Subject which subscribes to the source observable. The proxy can subscribe/unsubscribe to source based on the connection: Call Connect, and proxy subscribes to source. Call Dispose on the connection disposable and the proxy unsubscribes from source. The important think to take-away from this is that there is a single Subject that proxies any connection to the source. You're not guaranteed only one subscription to source, but you are guaranteed one proxy and one concurrent connection. You can have multiple subscriptions via connecting/disconnecting.
RefCount handles the calling Connect part of things: Here's a simple re-implementation:
// For illustrative purposes only: don't use this code
public class RefCountObservable<T> : IObservable<T>
{
private readonly IConnectableObservable<T> _source;
private IDisposable _connection;
private int _refCount = 0;
public RefCountObservable(IConnectableObservable<T> source)
{
_source = source;
}
public IDisposable Subscribe(IObserver<T> observer)
{
var subscription = _source.Subscribe(observer);
var disposable = Disposable.Create(() =>
{
subscription.Dispose();
DecrementCount();
});
if(++_refCount == 1)
_connection = _source.Connect();
return disposable;
}
private void DecrementCount()
{
if(--_refCount == 0)
_connection.Dispose();
}
}
public static class X
{
public static IObservable<T> RefCount<T>(this IConnectableObservable<T> source)
{
return new RefCountObservable<T>(source);
}
}
A bit more code, but still pretty simple: Call Connect on the ConnectableObservable if refcount goes up to 1, disconnect if it goes down to 0.
Put the two together, and you get a pair that guarantee that there will only be one concurrent subscription to a source observable, proxied through one persistent Subject. The Subject will only be subscribed to the source while there is >0 downstream subscriptions.
Given that introduction, there's a lot of misconceptions in your question, so I'll go over them one by one:
... Publish().RefCount() can be indeed inconsistent. Subscribing a second
time to a published sequence can cause a new subscription to the
source sequence, or not, depending on whether the source sequence was
completed while connected. If it was completed, then it won't be
resubscribed. If it was not completed, then it will be resubscribed.
.Publish().RefCount() will subscribe anew to source under one condition only: When it goes from zero subscribers to 1. If the count of subscribers goes from 0 to 1 to 0 to 1 for any reason then you will end up re-subscribing. The source observable completing will cause RefCount to issue an OnCompleted, and all of its observers unsubscribe. So subsequent subscriptions to RefCount will trigger an attempt to resubscribe to source. Naturally if source is observing the observable contract properly it will issue an OnCompleted immediately and that will be that.
[see sample observable with OnCompleted...] The observable is subscribed twice. The
expected behavior would be that each subscription will receive one
value.
No. The expected behavior is that the proxy Subject after issuing an OnCompleted will re-emit an OnCompleted to any subsequent subscription attempt. Since your source observable completes synchronously at the end of your first subscription, the second subscription will be attempting to subscribe to a Subject that has already issued an OnCompleted. This should result in an OnCompleted, otherwise the Observable contract would be broken.
[see sample observable without OnCompleted as second case...] In the
first case the cold producer (the part before the
Publish().RefCount()) was subscribed only once. The first consumer
received the emitted value, but the second consumer received nothing
(except from an OnCompleted notification). In the second case the
producer was subscribed twice. Each time it generated a value, and
each consumer got one value.
This is correct. Since the proxy Subject never completed, subsequent re-subscriptions to source will result in the cold observable re-running.
My question is: how can we fix this? [..]
The published sequence should propagate to its subscribers all notifications coming directly from the source sequence, and nothing
else.
The published sequence should subscribe to the source sequence when its current number of subscribers increases from zero to one.
The published sequence should stay connected to the source as long as it has at least one subscriber.
The published sequence should unsubscribe from the source when its current number of subscribers become zero.
All of the above currently happens with .Publish and .RefCount currently as long as you don't complete/error. I don't suggest implementing an operator that changes that, breaking the Observable contract.
EDIT:
I would argue the #1 source of confusion with Rx is Hot/Cold observables. Since Publish can 'warm-up' cold observables, it's no surprise that it should lead to confusing edge cases.
First, a word on the observable contract. The Observable contract stated more succinctly is that an OnNext can never follow an OnCompleted/OnError, and there should be only one OnCompleted or OnError notification. This does leave the edge case of attempts to subscribe to terminated observables:
Attempts to subscribe to terminated observables result in receiving the termination message immediately. Does this break the contract? Perhaps, but it's the only contract cheat, to my knowledge, in the library. The alternative is a subscription to dead air. That doesn't help anybody.
How does this tie into hot/cold observables? Unfortunately, confusingly. A subscription to an ice-cold observable triggers a re-construction of the entire observable pipeline. This means that subscribe-to-already-terminated rule only applies to hot observables. Cold observables always start anew.
Consider this code, where o is a cold observable.:
var o = Observable.Interval(TimeSpan.FromMilliseconds(100))
.Take(5);
var s1 = o.Subscribe(i => Console.WriteLine(i.ToString()));
await Task.Delay(TimeSpan.FromMilliseconds(600));
var s2 = o.Subscribe(i => Console.WriteLine(i.ToString()));
For the purposes of the contract, the observable behind s1 and observable behind s2 are entirely different. So even though there's a delay between them, and you'll end up seeing OnNext after OnCompleted, that's not a problem, because they are entirely different observables.
Where it get's sticky is with a warmed-up Publish version. If you were to add .Publish().RefCount() to the end of o in the code above...
Without changing anything else, s2 would terminate immediately printing nothing.
Change the delay to 400 or so, and s2 would print the last two numbers.
Change s1 to only .Take(2), and s2 would start over again printing 0 through 4.
Making this nastiness worse, is the Shroedinger's cat effect: If you set up an observer on o to watch what would happen the whole time, that changes the ref-count, affecting the functionality! Watching it, changes the behavior. Debugging nightmare.
This is the hazard of attempting to 'warm-up' cold observables. It just doesn't work well, especially with Publish/RefCount.
My advice would be:
Don't try to warm up cold observables.
If you need to share a subscription, with either cold or hot observables, stick with #Enigmativity's general rule of strictly using the selector Publish version
If you must, have a dummy subscription on a Publish/RefCount observable. This at least provides a consistent Refcount >= 1, reducing the quantum activity effect.

As Shlomo pointed out, this problem is associated with the Publish operator. The RefCount works fine. So it's the Publish that needs fixing. The Publish is nothing more than calling the Multicast operator with a standard Subject<T> as argument. Here is its source code:
public IConnectableObservable<TSource> Publish<TSource>(IObservable<TSource> source)
{
return source.Multicast(new Subject<TSource>());
}
So the Publish operator inherits the behavior of the Subject class. This class, for very good reasons, maintains the state of its completion. So if you signal its completion by calling subject.OnCompleted(), any future subscribers of the subject will instantly receive an OnCompleted notification. This feature serves well a standalone subject and its subscribers, but becomes a problematic artifact when a Subject is used as an intermediate propagator between a source sequence and the subscribers of that sequence. That's because the source sequence already maintains its own state, and duplicating this state inside the subject introduces the risk of the two states becoming out of sync. Which is exactly what happens when the Publish is combined with the RefCount operator. The subject remembers that the source has completed, while the source, being a cold sequence, has lost its memory about its previous life and is willing to start a new life afresh.
So the solution is to feed the Multicast operator with a stateless subject. Unfortunately I can't find a way to compose it based on the built-in Subject<T> (inheritance is not an option because the class is sealed). Fortunately implementing it from scratch is not very difficult. The implementation below uses an ImmutableArray as storage for the subject's observers, and uses interlocked operations to ensure its thread-safety (much like the built-in Subject<T> implementation).
public class StatelessSubject<T> : ISubject<T>
{
private IImmutableList<IObserver<T>> _observers
= ImmutableArray<IObserver<T>>.Empty;
public void OnNext(T value)
{
foreach (var observer in Volatile.Read(ref _observers))
observer.OnNext(value);
}
public void OnError(Exception error)
{
foreach (var observer in Volatile.Read(ref _observers))
observer.OnError(error);
}
public void OnCompleted()
{
foreach (var observer in Volatile.Read(ref _observers))
observer.OnCompleted();
}
public IDisposable Subscribe(IObserver<T> observer)
{
ImmutableInterlocked.Update(ref _observers, x => x.Add(observer));
return Disposable.Create(() =>
{
ImmutableInterlocked.Update(ref _observers, x => x.Remove(observer));
});
}
}
Now the Publish().RefCount() can be fixed by replacing it with this:
.Multicast(new StatelessSubject<SomeType>()).RefCount()
This change results to the desirable behavior. The published sequence is initially cold, becomes hot when it is subscribed for the first time, and becomes cold again when its last subscriber unsubscribes. And the circle continues with no memories of the past events.
Regarding the other normal case that the source sequence completes, the completion is propagated to all subscribers, causing all of them to unsubscribe automatically, causing the published sequence to become cold. The end result is that both sequences, the source and the published, are always in sync. They are either both hot, or both cold.
Here is a StatelessPublish operator, to make the consumption of the class a little easier.
/// <summary>
/// Returns a connectable observable sequence that shares a single subscription to
/// the underlying sequence, without maintaining its state.
/// </summary>
public static IConnectableObservable<TSource> StatelessPublish<TSource>(
this IObservable<TSource> source)
{
return source.Multicast(new StatelessSubject<TSource>());
}
Usage example:
.StatelessPublish().RefCount()

Related

Observable timers disposing

I'm using the Reactive .NET extensions and I wonder about its disposal. I know in some cases it's good to dispose it like that: .TakeUntil(Observable.Timer(TimeSpan.FromMinutes(x))). I
First case
In this case, I have a timer that triggers after x seconds and then it completes and should be disposed.
public void ScheduleOrderCancellationIfNotFilled(string pair, long orderId, int waitSecondsBeforeCancel)
{
Observable.Timer(TimeSpan.FromSeconds(waitSecondsBeforeCancel))
.Do(e =>
{
var result = _client.Spot.Order.GetOrder(pair, orderId);
if (result.Success)
{
if (result.Data?.Status != OrderStatus.Filled)
{
_client.Spot.Order.CancelOrder(pair, orderId);
}
}
})
.Subscribe();
}
Second case
In this case, the timer runs on the first second and then it repeats itself on each 29 minutes. This should live until its defining class is disposed. I believe this one should be disposed with IDisposable implementation. How?
var keepAliveListenKey = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromMinutes(29))
.Do(async e =>
{
await KeepAliveListenKeyAsync().ConfigureAwait(false);
})
.Subscribe();
Edit
I also want it to be using a Subject<T> which makes it easier to dispose and to reset the subscription.
For ex. Reset and Dispose observable subscriber, Reactive Extensions (#Enigmativity)
public class UploadDicomSet : ImportBaseSet
{
IDisposable subscription;
Subject<IObservable<long>> subject = new Subject<IObservable<long>>();
public UploadDicomSet()
{
subscription = subject.Switch().Subscribe(s => CheckUploadSetList(s));
subject.OnNext(Observable.Interval(TimeSpan.FromMinutes(2)));
}
void CheckUploadSetList(long interval)
{
subject.OnNext(Observable.Never<long>());
// Do other things
}
public void AddDicomFile(SharedLib.DicomFile dicomFile)
{
subject.OnNext(Observable.Interval(TimeSpan.FromMinutes(2)));
// Reset the subscription to go off in 2 minutes from now
// Do other things
}
}
In the first case it gonna be disposed automatically. It is, actually, a common way to achieve automatic subscription management and that's definitely nice and elegant way to deal with rx.
In the second case you have over-engineered. Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1)) is itself sufficient to generate a sequence of ascending longs over time. Since this stream is endless by its nature, you right - explicit subscription management is required. So it is enough to have:
var sub = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1)).Subscribe()
...and sub.Dispose() it later.
P.S. Note that in your code you .Do async/await. Most probably that is not what you want. You want SelectMany to ensure that async operation is properly awaited and exceptions handled.
Answering your questions in the comments section:
What about disposing using Subject instead?
Well, nothing so special about it. Both IObserver<>, IObservable<> is implemented by this class such that it resembles classical .NET events (list of callbacks to be called upon some event). It does not differ in any sense with respect to your question and use-case.
May you give an example about the .Do with exception handling?
Sure. The idea is that you want translate your async/await encapsulated into some Task<T> to IObservable<T> such that is preserves both cancellation and error signals. For that .SelectMany method must be used (like SelectMany from LINQ, the same idea). So just change your .Do to .SelectMany.
Observable
.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1))
.SelectMany(_ => Observable.FromAsync(() => /* that's the point where your Task<> becomes Observable */ myTask))
I'm confused again. Do I need IObservable<IObservable> (Select) or IObservable (SelectMany)
Most probably, you don't need switch. Why? Because it was created mainly to avoid IO race conditions, such that whenever new event is emitted, the current one (which might be in progress due to natural parallelism or asynchronous workflow) is guaranteed to be cancelled (i.e. unsubscribed). Otherwise race conditions can (and will) damage your state.
SelectMany, on the contrary, will make sure all of them are happen sequentially, in some total order they have indeed arrived. Nothing will be cancelled. You will finish (await, if you wish) current callback and then trigger the next one. Of course, such behavior can be altered by means of appropriate IScheduler, but that is another story.
Reactive Observable Subscription Disposal (#Enigmativity)
The disposable returned by the Subscribe extension methods is returned solely to allow you to manually unsubscribe from the observable before the observable naturally ends.
If the observable completes - with either OnCompleted or OnError - then the subscription is already disposed for you.
One important thing to note: the garbage collector never calls .Dispose() on observable subscriptions, so you must dispose of your subscriptions if they have not (or may not have) naturally ended before your subscription goes out of scope.
First case
Looks like I don't need to manually .Dispose() the subscription in the first case scenario because it ends naturally.
Dispose is being triggered at the end.
var xs = Observable.Create<long>(o =>
{
var d = Observable.Timer(TimeSpan.FromSeconds(5))
.Do(e =>
{
Console.WriteLine("5 seconds elapsed.");
})
.Subscribe(o);
return Disposable.Create(() =>
{
Console.WriteLine("Disposed!");
d.Dispose();
});
});
var subscription = xs.Subscribe(x => Console.WriteLine(x));
Second case
but in the second case, where it doesn't end "naturally", I should dispose it.
Dispose is not triggered unless manually disposed.
var xs = Observable.Create<long>(o =>
{
var d = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1))
.Do(e =>
{
Console.WriteLine("Test.");
})
.Subscribe(o);
return Disposable.Create(() =>
{
Console.WriteLine("Disposed!");
d.Dispose();
});
});
var subscription = xs.Subscribe(x => Console.WriteLine(x));
Conclusion
He gave such a nice examples, it's worth seeing if you are asking yourself the same question.

TestScheduler-created ColdObserver not unsubscribing when OnCompleted?

I wanted to check that an IObservable I had created was respecting the courtesy of "Once I've completed, I'll unsubscribe you". At first blush it looked like something was wrong with my code. But eliminating my code, and just using the Observable and Observer provided by TestScheduler, it looks like the 'unsubscription' never happens:
using Microsoft.Reactive.Testing;
using System.Reactive;
...
var ts = new TestScheduler();
var ob = ts.CreateObserver<int>();
var xs = ts.CreateColdObservable<int>(
new Recorded<Notification<int>>(1, Notification.CreateOnCompleted<int>())
);
xs.Subscribe(ob);
ts.AdvanceTo(2);
Assert.Equal(1, xs.Subscriptions.Single().Unsubscribe); //<-- Xunit no like
I originally suspected the observer, but I tried that on a variant of the code found here, and it works, so now I'm thinking that the implementation of Subscribe on the ColdObservable isn't behaving properly.
No such courtesy exists. The RX design guideines in section 4.3 suggest you can:
Assume resources are cleaned up after an OnError or OnCompleted message.
And in section 4.4 say you can:
Assume a best effort to stop all outstanding work on Unsubscribe
These guidelines ("courtesies") talk about an operator releasing it's own resources plus those of any it has acquired as soon as possible.
In your code, you aren't testing for either of these scenarios. The purpose of the Unsubscribe property on an ITestableObservable is to report when a subscription taken out by an observer was explicitly disposed, not when internal cleanup happened - but you are not storing this handle to be able to dispose it:
xs.Subscribe(ob); /* return of handle ignored here */
So you are trying to assert that you disposed the subscription you threw away, not that the observable you subscribed to cleaned up any subscription and resources it may have taken out.
If you want to see the effect of the timely resource clean up of 4.3/4.4, write an extension method like this:
public static IObservable<T> SpyResourceCleanUp<T>(
this IObservable<T> source, IScheduler scheduler)
{
return Observable.Create<T>(obs =>
{
var subscription = source.Subscribe(obs);
return new CompositeDisposable(
subscription,
Disposable.Create(() => Console.WriteLine(
"Clean up performed at " + scheduler.Now.Ticks)));
});
}
And replace your line:
xs.Subscribe(ob);
with
xs.SpyResourceCleanUp(ts).Subscribe(ob);
(Editing in some of the comments)
On your test I see immediate resource clean-up, as I would expect. And with this change your test will now pass because SpyResourceCleanUp is unsubscribing from it's parent (xs) as soon as it OnCompletes() itself in adherence to 4.3 of the guidelines.
What might not be obvious here is that Observable.Create handles calling the Dispose() method of the returned IDisposable as soon as either the subscription is disposed or OnComplete() or OnError() has been called on the observer. This is how Create helps you implement section 4.3, and why the test passes with the altered code.
Under the covers, subscriptions to the AnonymousObservable<T> : ObservableBase<T> returned by Create are wrapped by an AutoDetachObserver as you can see here.
i.e. The Disposable you return from Observable.Create isn't the one the caller gets - they get a wrapped version that will call your Dispose() either on stream termination or cancellation.

Reactive Framework as Message queue using BlockingCollection

I've been doing some work lately with the Reactive Framework and have been absolutely loving it so far. I'm looking at replacing a traditional polling message queue with some filtered IObservables to clean up my server operations. In the old way, I dealt with messages coming into the server like so:
// Start spinning the process message loop
Task.Factory.StartNew(() =>
{
while (true)
{
Command command = m_CommandQueue.Take();
ProcessMessage(command);
}
}, TaskCreationOptions.LongRunning);
Which results in a continuously polling thread that delegates commands from clients out to the ProcessMessage method where I have a series of if/else-if statements that determine the type of the command and delegate work based on its type
I am replacing this with an event driven system using Reactive for which I've written the following code:
private BlockingCollection<BesiegedMessage> m_MessageQueue = new BlockingCollection<BesiegedMessage>();
private IObservable<BesiegedMessage> m_MessagePublisher;
m_MessagePublisher = m_MessageQueue
.GetConsumingEnumerable()
.ToObservable(TaskPoolScheduler.Default);
// All generic Server messages (containing no properties) will be processed here
IDisposable genericServerMessageSubscriber = m_MessagePublisher
.Where(message => message is GenericServerMessage)
.Subscribe(message =>
{
// do something with the generic server message here
}
My question is that while this works, is it good practice to use a blocking collection as the backing for an IObservable like this? I don't see where Take() is ever called this way which makes me think that the Messages will pile off on the queue without being removed after they have been processed?
Would it be more efficient to look into Subjects as the backing collection to drive the filtered IObservables that will be picking up these messages? Is there anything else I'm missing here that might benefit the architecture of this system?
Here is a complete worked example, tested under Visual Studio 2012.
Create a new C# console app.
Right click on your project, select "Manage NuGet Packages", and add "Reactive Extensions - Main
Library".
Add this C# code:
using System;
using System.Collections.Concurrent;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
namespace DemoRX
{
class Program
{
static void Main(string[] args)
{
BlockingCollection<string> myQueue = new BlockingCollection<string>();
{
IObservable<string> ob = myQueue.
GetConsumingEnumerable().
ToObservable(TaskPoolScheduler.Default);
ob.Subscribe(p =>
{
// This handler will get called whenever
// anything appears on myQueue in the future.
Console.Write("Consuming: {0}\n",p);
});
}
// Now, adding items to myQueue will trigger the item to be consumed
// in the predefined handler.
myQueue.Add("a");
myQueue.Add("b");
myQueue.Add("c");
Console.Write("[any key to exit]\n");
Console.ReadKey();
}
}
}
You will see this on the console:
[any key to exit]
Consuming: a
Consuming: b
Consuming: c
The really nice thing about using RX is that you can use the full power of LINQ to filter out any unwanted messages. For example, add a .Where clause to filter by "a", and observe what happens:
ob.Where(o => (o == "a")).Subscribe(p =>
{
// This will get called whenever something appears on myQueue.
Console.Write("Consuming: {0}\n",p);
});
Philosophical notes
The advantage of this method over starting up a dedicated thread to poll the queue, is that you don't have to worry about disposing of the thread properly once the program has exited. This means you don't have to bother with IDisposable or CancellationToken (which is always required when dealing with a BlockingCollection or else your program might hang on exit with a thread that refuses to die).
Believe me, its not as easy as you think to write completely robust code to consume events coming out of a BlockingCollection. I much prefer using the RX method, as shown above as its cleaner, more robust, has less code, and you can filter using LINQ.
Latency
I was surprised at how fast this method is.
On my Xeon X5650 # 2.67Ghz, it takes 5 seconds to process 10 million events, which works out at approximately 0.5 microseconds per event. It took 4.5 seconds to put the items into the BlockingCollection, so RX was taking them out and processing them almost as fast as they were going in.
Threading
In all of my tests, RX only spun up one thread to handle the tasks on the queue.
This means that we have a very nice pattern: we can use RX to collect incoming data from multiple threads, place them into a shared queue, then process the queue contents on a single thread (which is, by definition, thread safe).
This pattern eliminates a huge amount of headaches when dealing with multithreaded code, by decoupling the producer and consumer of data via a queue, where the producer could be multi-threaded and the consumer is single-threaded and thus thread-safe. This is the concept that makes Erlang so robust. For more information on this pattern, see Multi-threading made ridiculously simple.
Here's something pulled directly from my posterior - any real solution would be very much dependent on your actual usage, but here's "The cheapest pseudo Message Queue system ever":
Thoughts/motivations:
Deliberate exposure of IObservable<T> such that subscribers can do any filtering/cross subscriptions they want to
The overall Queue is typeless, but Register and Publish are type-safe(ish)
YMMV with the Publish() where it is - try experimenting with moving it around
Generally Subject is a no-no, although in this case it does make for some SIMPLE code.
One could "internalize" the registration to actually do the subscription as well, but then the queue would need to manage the IDisposables created - bah, let your consumers deal with it!
The Code:
public class TheCheapestPubSubEver
{
private Subject<object> _inner = new Subject<object>();
public IObservable<T> Register<T>()
{
return _inner.OfType<T>().Publish().RefCount();
}
public void Publish<T>(T message)
{
_inner.OnNext(message);
}
}
Usage:
void Main()
{
var queue = new TheCheapestPubSubEver();
var ofString = queue.Register<string>();
var ofInt = queue.Register<int>();
using(ofInt.Subscribe(i => Console.WriteLine("An int! {0}", i)))
using(ofString.Subscribe(s => Console.WriteLine("A string! {0}", s)))
{
queue.Publish("Foo");
queue.Publish(1);
Console.ReadLine();
}
}
Output:
A string! Foo
An int! 1
HOWEVER, this doesn't strictly enforce "consuming consumers" - multiple Registers of a specific type would result in multiple observer calls - that is:
var queue = new TheCheapestPubSubEver();
var ofString = queue.Register<string>();
var anotherOfString = queue.Register<string>();
var ofInt = queue.Register<int>();
using(ofInt.Subscribe(i => Console.WriteLine("An int! {0}", i)))
using(ofString.Subscribe(s => Console.WriteLine("A string! {0}", s)))
using(anotherOfString.Subscribe(s => Console.WriteLine("Another string! {0}", s)))
{
queue.Publish("Foo");
queue.Publish(1);
Console.ReadLine();
}
Results in:
A string! Foo
Another string! Foo
An int! 1
I haven't used BlockingCollection in this context - so I'm 'conjecturing' - you should run it to approve, disprove.
BlockingCollection might only further complicate things here (or provide little help). Take a look at this post from Jon - simply to confirm. GetConsumingEnumerable will provide you with 'per subscriber' enumerable. Exhausting them down eventually - something to have in mind with Rx.
Also the the IEnumerable<>.ToObservable further flattens out the 'source'. As it works (you can lookup the source - I'd recommend w/ Rx more than anything) - each subscribe creates an own 'enumerator' - so all will be getting their own versions of the feed. I'm really not sure, how that pans out in the Observable scenario like this.
Anyhow - if you want to provide app-wide messages - IMO you'd need to introduce Subject or state in some other form (e.g. Publish etc.). And in that sense, I don't think BlockingCollection will help any - but again, it's best that you try it out yourself.
Note (a philosophical one)
If you want to combine message types, or combine different sources - e.g. in a more 'real world' scenario - it gets more complex. And it gets quite interesting I must say.
Keep an eye on having them 'rooted' into a single-shared stream (and avoid what Jer suggested rightly).
I'd recommend that you don't try to evade using Subject. For what you need, that's your friend - no matter all the no-state related discussions and how Subject is bad - you effectively have a state (and you need a 'state') - Rx kicks in 'after the fact', so you enjoy benefits from it regardless.
I encourage you to go that way, as I love it how it turned out.
My issue here is that we have turned a Queue (which I normally associate with destructive reads by one consumer especially if you are using BlockingCollection) into a broadcast (send to anyone and everyone listening right now).
These seem two conflicting ideas.
I have seen this done, but it then was thrown away as it was the "right solution to the wrong question".

Do 'Intermediate IObservables' without final subscribers get kept in memory for the lifetime of the root IObservable

For example, consider this:
public IDisposable Subscribe<T>(IObserver<T> observer)
{
return eventStream.Where(e => e is T).Cast<T>().Subscribe(observer);
}
The eventStream is a long lived source of events. A short lived client will use this method to subscribe for some period of time, and then unsubscribe by calling Dispose on the returned IDisposable.
However, while the eventStream still exists and should be kept in memory, there has been 2 new IObservables created by this method - the one returned by the Where() method that is presumably held in memory by the eventStream, and the one returned by the Cast<T>() method that is presumably held in memory by the one returned by the Where() method.
How will these 'intermediate IObservables' (is there a better name for them?) get cleaned up? Or will they now exist for the lifetime of the eventStream even though they no longer have subscriptions and no one else references them except for their source IObservable and therefor will never have subscriptions again?
If they are cleaned up by informing their parent they no longer have subscriptions, how do they know nothing else has taken a reference to them and may at some point later subscribe to them?
However, while the eventStream still exists and should be kept in memory, there has been 2 new IObservables created by this method - the one returned by the Where() method that is presumably held in memory by the eventStream, and the one returned by the Cast() method that is presumably held in memory by the one returned by the Where() method.
You have this backward. Let's walk through the chain of what is going on.
IObservable<T> eventStream; //you have this defined and assigned somewhere
public IDisposable Subscribe<T>(IObserver<T> observer)
{
//let's break this method into multiple lines
IObservable<T> whereObs = eventStream.Where(e => e is T);
//whereObs now has a reference to eventStream (and thus will keep it alive),
//but eventStream knows nothing of whereObs (thus whereObs will not be kept alive by eventStream)
IObservable<T> castObs = whereObs.Cast<T>();
//as with whereObs, castObs has a reference to whereObs,
//but no one has a reference to castObs
IDisposable ret = castObs.Subscribe(observer);
//here is where it gets tricky.
return ret;
}
What ret does or does not have a reference to depends on the implementation of the various observables. From what I have seen in Reflector in the Rx library and the operators I have written myself, most operators do not return disposables that have a reference to the operator observable itself.
For example, a basic implementation of Where would be something like (typed directly in the editor, no error handling)
IObservable<T> Where<T>(this IObservable<T> source, Func<T, bool> filter)
{
return Observable.Create<T>(obs =>
{
return source.Subscribe(v => if (filter(v)) obs.OnNext(v),
obs.OnError, obs.OnCompleted);
}
}
Notice that the disposable returned will have a reference to the filter function via the observer that is created, but will not have a reference to the Where observable. Cast can be easily implemented using the same pattern. In essence, the operators become observer wrapper factories.
The implication of all this to the question at hand is that the intermediate IObservables are eligible for garbage collection by the end of the method. The filter function passed to Where stays around as long as the subscription does, but once the subscription is disposed or completed, only eventStream remains (assuming it is still alive).
EDIT for supercat's comment, let's look at how the compiler might rewrite this or how you would implement this without closures.
class WhereObserver<T> : IObserver<T>
{
WhereObserver<T>(IObserver<T> base, Func<T, bool> filter)
{
_base = base;
_filter = filter;
}
IObserver<T> _base;
Func<T, bool> _filter;
void OnNext(T value)
{
if (filter(value)) _base.OnNext(value);
}
void OnError(Exception ex) { _base.OnError(ex); }
void OnCompleted() { _base.OnCompleted(); }
}
class WhereObservable<T> : IObservable<T>
{
WhereObservable<T>(IObservable<T> source, Func<T, bool> filter)
{
_source = source;
_filter = filter;
}
IObservable<T> source;
Func<T, bool> filter;
IDisposable Subscribe(IObserver<T> observer)
{
return source.Subscribe(new WhereObserver<T>(observer, filter));
}
}
static IObservable<T> Where(this IObservable<T> source, Func<T, bool> filter)
{
return new WhereObservable(source, filter);
}
You can see that the observer does not need any reference to the observable that generated it and the observable has no need to track the observers it creates. We didn't even make any new IDisposable to return from our subscribe.
In reality, Rx has some actual classes for anonymous observable/observer that take delegates and forward the interface calls to those delegates. It uses closures to create those delegates. The compiler does not need to emit classes that actually implement the interfaces, but the spirit of the translation remains the same.
I think I've come to the conclusion with the help of Gideon's answer and breaking down a sample Where method:
I assumed incorrectly that each downstream IObservable was referenced by the upstream at all times (in order to push events down when needed). But this would root downstreams in memory for the lifetime of the upstream.
In fact, each upstream IObservable is referenced by the downstream IObservable (waiting, ready to hook an IObserver when required). This roots upstreams in memory as long as the downstream is referenced (which makes sense, as while a downstream in still referenced somewhere, a subscription may occur at any time).
However when a subscription does occur, this upstream to downstream reference chain does get formed, but only on the IDisposable implementation objects that manage the subscriptions at each observable stage, and only for the lifetime of that subscription. (which also makes sense - while a subscription exists, each upstream 'processing logic' must still be held in memory to handle the events being passed through to reach the final subscriber IObserver).
This gives a solution to both problems - while an IObservable is referenced, it will hold all source (upstream) IObservables in memory, ready for a subscription. And while a subscription exists, it will hold all downstream subscriptions in memory, allowing the final subscription to still receive events even though it's source IObservable may no longer be referenced.
Applying this to my example in my question, the Where and Cast downstream observables are very short lived - referenced up until the Subscribe(observer) call completes. They are then free to be collected. The fact that the intermediate observables may now be collected does not cause a problem for the subscription just created, as it has formed it's own subscription object chain (upstream -> downstream) that is rooted by the source eventStream observable. This chain will be released as soon as each downstream stage disposes its IDisposable subscription tracker.
You need to remember that IObserable<T> (like IEnumerable<T>) are lazy lists. They don't exist until someone tries to access the elements by subscribing or iterating.
When you write list.Where(x => x > 0) you are not creating a new list, you are merely defining what the new list will look like if someone tries to access the elements.
This is a very important distinction.
You can consider that there are two different IObservables. One is the definition and the subscribed instances.
The IObservable definitions use next to no memory. References can be freely shared. They will be cleanly garbage collected.
The subscribed instances only exist if someone is subscribed. They may use considerable memory. Unless you use the .Publish extensions you can't share references. When the subscription ends or is terminated by calling .Dispose() the memory is cleaned up.
A new set of subscribed instances are created for every new subscription. When the final child subscription is disposed the whole chain is disposed. They can't be shared. If there is a second subscription a complete chain of subscribed instances are created, independent of the first.
I hope this helps.
A class implementing IObservable is just a regular object. It will get cleaned up when the GC runs and does not see any references to it. It isn't anything other than "when does new object() get cleaned up". Except for memory use, whether they get cleaned up should not be visible to your program.
If an object subscribes to events, whether for its own use, or for the purpose of forwarding them to other objects, the publisher of those events will generally keep it alive even if nobody else will. If I'm understanding your situation correctly, you have objects which subscribe to events for the purpose of forwarding them to zero or more other subscribers. I would suggest that you should if possible design your intermediate IObservables so that they will not subscribe to an event from their parent until someone subscribes to an event from them, and they will unsubscribe from their parent's event any time their last subscriber unsubscribes. Whether or not this is practical will depend upon the threading contexts of the parent and child IObservables. Further note that (again depending upon threading context) locking may be required to deal with the case where a new subscriber joins at about the same time as (what would have been) the last subscriber quits. Even though most objects' subscription and unsubscription scenarios could be handled using CompareExchange rather than locking, that is often unworkable in scenarios involving interconnected subscription lists.
If your object will receive subscriptions and unsubscriptions from its children in a threading context which is not compatible with the parent's subscription and unsubscription methods (IMHO, IObservable should have required that all legitimate implementations allow subscription and unsubscription from arbitrary threading context, but alas it does not) you may have no choice but to have the intermediate IObservable, immediately upon creation, create a proxy object to handle subscriptions on your behalf, and have that object subscribe to the parent's event. Then have your own object (to which the proxy would have only a weak reference) include a finalizer which will notify the proxy that it will need to unsubscribe when its parent's threading context permits. It would be nice to have your proxy object unsubscribe when its last subscriber quits, but if a new subscriber might join and expect its subscription to be valid immediately, one may have to keep the proxy subscribed as long as anyone holds a reference to the intermediate observer which could be used to request a new subscription.

Bounded Queue scenario

I need to implement a producer/consumer bounded queue, multiple consumers against a single producer.
I have a push function that adds an item to the queue and then checks for maxsize. If we have reached it return false, in every other case return true.
In the following code _vector is a List<T>, onSignal basically consumes an item in an asynchronous way.
Do you see issues with this code?
public bool Push(T message)
{
bool canEnqueue = true;
lock (_vector)
{
_vector.Add(message);
if (_vector.Count >= _maxSize)
{
canEnqueue = false;
}
}
var onSignal = SignalEvent;
if (onSignal != null)
{
onSignal();
}
return canEnqueue;
}
I know you said single-producer, multiple-consumer, but it's worth mentioning anyway: if your queue is almost full (say 24 out of 25 slots), then if two threads Push at the same time, you will end up exceeding the limit. If there's even a chance you might have multiple producers at some point in the future, you should consider making Push a blocking call, and have it wait for an "available" AutoResetEvent which is signaled after either an item is dequeued or after an item is enqueued while there are still slots available.
The only other potential issue I see is the SignalEvent. You don't show us the implementation of that. If it's declared as public event SignalEventDelegate SignalEvent, then you will be OK because the compiler automatically adds a SynchronizedAttribute. However, if SignalEvent uses a backing delegate with add/remove syntax, then you will need to provide your own locking for the event itself, otherwise it will be possible for a consumer to detach from the event just a little too late and still receive a couple of signals afterward.
Edit: Actually, that is possible regardless; more importantly, if you've used a property-style add/remove delegate without the appropriate locking, it is actually possible for the delegate to be in an invalid state when you try to execute it. Even with a synchronized event, consumers need to be prepared to receive (and discard) notifications after they've unsubscribed.
Other than that I see no issues - although that doesn't mean that there aren't any, it just means I haven't noticed any.
The biggest problem I see there is the use of List<T> to implement a queue; there are performance issues doing this, as removing the first item involves copying all the data.
Additional thoughts; you're raising the signal even if you didn't add data, and the use of events itself may have issue with threading (there are some edge cases, even when you capture the value before the null test - plus it is possibly more overhead than using the Monitor to do the signalling).
I would switch to a Queue<T> which won't have this problem - or better use a pre-rolled example; for example Creating a blocking Queue in .NET?, which does exactly what you discuss, and supports any number of both producers and consumers. It uses the blocking approach, but a "try" approach would be:
public bool TryEnqueue(T item)
{
lock (queue)
{
if (queue.Count >= maxSize) { return false; }
queue.Enqueue(item);
if (queue.Count == 1)
{
// wake up any blocked dequeue
Monitor.PulseAll(queue);
}
return true;
}
}
Finally - don't you "push" to a stack, not a queue?

Categories