I have a lot of code in my project like Hit and mute by using Reactive extension like this way:
IDisposable dsp = null;
dsp = TargetObservable.Subscribe((incomingContent) =>
{
if (incomingContent == "something")
{
myList.Add(incomingContent);
dsp.Dispose();
}
});
First of all, I concerns about the thread safety since my Observable is quite busy and have bunch of content pushing all the way, but later, I was told I should combine with the ObserveOn(thread) to guarantee thread safe, I totally agree, so let's forget the thread safe thing.
Here I want to know:
How or when I should call the Dispose for an observable.
What's the correct way to satisfy Hit and mute, combine with some complete-able extension method like Take(count), 'TakeWhile(predict)'?
If OnComplete() called, the Dispose() will be called internally, correct? Then the reference relationship between the Observer and Observable will break(because my observable is a long life static instance, the reference would cause memory leak).
I would avoid following the pattern you have here. It makes it difficult to understand the problem space if other devs have to mix global state with the inner function for the subscribe/OnNext handler.
You are much better off creating the TakeWhile/TakeUntilIncluding extension method which encapsulates the sequence termination. Then you can separate your 'adding to the list' concern.
An alternatitve thing to do is the super-simple:
var subscription = source.Where(x => x=="something")
.Take(1)
.Subscribe(incomingContent=>myList.Add(incomingContent));
Related
I'm trying to understand when is a good time to return a disposable in the function passed to Observable.Create vs just disposing any resources through scope by a using statement.
Is returning the disposable more for cases where the Observable is an infinite stream? Even if so I don't understand how the using block won't still dispose the resource even if the stream is closed prematurely
I think the Disposable interface on the Observable paradigm is used solely for the purpose of getting rid of the subscription (i.e, stopping the callback on the observed events), as Theodor Zoulias pointed out. It doesn't manage any resources on the stream whatsoever. You might be confusing the use of the Disposable interface on other scenarios.
As regards to disposing subscriptions:
One of the use cases I can see for returning a Disposable is when when you have more than one to call the function on: supposing you had a list of Observables, you could iterate on it and call the function .Dispose() to cancel multiple subscriptions at once.
You could also pass that stream as a disposable to another Observable, to be disposed when some event occurs. Since the entire RX paradigm is about not knowing when things will be executed, this is interesting. I worked at an application where I had to cancel a subscription if a certain event happened, and I passed the Observable Subscription (IDisposable) to the Observer of such event/stream.
Something on these lines:
IDisposable subscription1 = observableOne.Subscribe(_ => # code omitted);
observableTwo.Subscribe(_ => {
subscription1?.Dispose();
subscription1 = null;
});
As Enigmaticy has pointed out, although this exemplifies my point, a better way to accomplish this would be:
observableOne.TakeUntil(observableTwo).Subscribe(_ => #code ommited);
I haven't worked with C# in a while but these are the use cases I can see on using vs Disposable as object. It gives you greater flexibility on when you want to cancel your subscriptons :~
Thanks to everyone responding in this post it's helped me understand a bit better. I've had a lot of stumbling in my understanding of RX and I think a lot of this just comes down to limited documentation and seems like many people online don't quite understand perfectly either so there's a lot of misinformation to sort through.
This other answer does the trick for me
https://stackoverflow.com/a/7707768/7183974 .
What it really comes down to is for when we have non-blocking code in our Observable.Create method. So when our observable is subscribed to we instantly return a disposable that can clean up any asynchronous / concurrent processes in the event that we need to cancel a subscription early.
This is necessary for cases where maybe your observable is using other async (push-based) code.
For iterative (pull-based) code that you simply want to be push based then you can use Observable.Create but TBH I think just using an Iterator is better and if you need it to be a push-based API then just use ToObservable.
I was trying to implement a push-based iterator so the disposable seemed redundant to me which is what confused me. I've since refactored my code to be pull-based and if I ever were to need it to be push-based again I would just use ToObservable.
I am currently getting to grips with the Reactive Extensions framework for .NET and I am working my way through the various introduction resources I've found (mainly http://www.introtorx.com)
Our application involves a number of hardware interfaces that detect network frames, these will be my IObservables, I then have a variety of components that will consume those frames or perform some manner of transform on the data and produce a new type of frame. There will also be other components that need to display every n'th frame for example.
I am convinced that Rx is going to be useful for our application, however I am struggling with the implementation details for the IObserver interface.
Most (if not all) of the resources I have been reading have said that I should not implement the IObservable interface myself but use one of the provided functions or classes.
From my research it appears that creating a Subject<IBaseFrame> would provide me what I need, I would have my single thread that reads data from the hardware interface and then calls the OnNext function of my Subject<IBaseFrame> instance. The different IObserver components would then receive their notifications from that Subject.
My confusion is coming from the advice give in the appendix of this tutorial where it says:
Avoid the use of the subject types. Rx is effectively a functional programming paradigm. Using subjects means we are now managing state, which is potentially mutating. Dealing with both mutating state and asynchronous programming at the same time is very hard to get right. Furthermore, many of the operators (extension methods) have been carefully written to ensure that correct and consistent lifetime of subscriptions and sequences is maintained; when you introduce subjects, you can break this. Future releases may also see significant performance degradation if you explicitly use subjects.
My application is quite performance critical, I am obviously going to test the performance of using the Rx patterns before it goes in to production code; however I am worried that I am doing something that is against the spirit of the Rx framework by using the Subject class and that a future version of the framework is going to hurt performance.
Is there a better way of doing what I want? The hardware polling thread is going to be running continuously whether there are any observers or not (the HW buffer will back up otherwise), so this is a very hot sequence. I need to then pass the received frames out to multiple observers.
Any advice would be greatly appreciated.
Ok,
If we ignore my dogmatic ways and ignore "subjects are good/bad" all together. Let us look at the problem space.
I bet you either have 1 of 2 styles of system you need to ingrate to.
The system raises an event or a call back when a message arrives
You need to poll the system to see if there are any message to process
For option 1, easy, we just wrap it with the appropriate FromEvent method and we are done. To the Pub!
For option 2, we now need to consider how we poll this and how to do this effciently. Also when we get the value, how do we publish it?
I would imagine that you would want a dedicated thread for polling. You wouldn't want some other coder hammering the ThreadPool/TaskPool and leaving you in a ThreadPool starvation situation. Alternatively you don't want the hassle of context switching (I guess). So assume we have our own thread, we will probably have some sort of While/Sleep loop that we sit in to poll. When the check finds some messages we publish them. Well all of this sounds perfect for Observable.Create. Now we probably cant use a While loop as that wont allow us to ever return a Disposable to allow cancellation. Luckily you have read the whole book so are savvy with Recursive scheduling!
I imagine something like this could work. #NotTested
public class MessageListener
{
private readonly IObservable<IMessage> _messages;
private readonly IScheduler _scheduler;
public MessageListener()
{
_scheduler = new EventLoopScheduler();
var messages = ListenToMessages()
.SubscribeOn(_scheduler)
.Publish();
_messages = messages;
messages.Connect();
}
public IObservable<IMessage> Messages
{
get {return _messages;}
}
private IObservable<IMessage> ListenToMessages()
{
return Observable.Create<IMessage>(o=>
{
return _scheduler.Schedule(recurse=>
{
try
{
var messages = GetMessages();
foreach (var msg in messages)
{
o.OnNext(msg);
}
recurse();
}
catch (Exception ex)
{
o.OnError(ex);
}
});
});
}
private IEnumerable<IMessage> GetMessages()
{
//Do some work here that gets messages from a queue,
// file system, database or other system that cant push
// new data at us.
//
//This may return an empty result when no new data is found.
}
}
The reason I really don't like Subjects, is that is usually a case of the developer not really having a clear design on the problem. Hack in a subject, poke it here there and everywhere, and then let the poor support dev guess at WTF was going on. When you use the Create/Generate etc methods you are localizing the effects on the sequence. You can see it all in one method and you know no-one else is throwing in a nasty side effect. If I see a subject fields I now have to go looking for all the places in a class it is being used. If some MFer exposes one publicly, then all bets are off, who knows how this sequence is being used!
Async/Concurrency/Rx is hard. You don't need to make it harder by allowing side effects and causality programming to spin your head even more.
In general you should avoid using Subject, however for the thing you are doing here I think they work quite well. I asked a similar question when I came across the "avoid subjects" message in Rx tutorials.
To quote Dave Sexton (of Rxx)
"Subjects are the stateful components of Rx. They are useful for when
you need to create an event-like observable as a field or a local
variable."
I tend to use them as the entry point into Rx. So if I have some code that needs to say 'something happened' (like you have), I would use a Subject and call OnNext. Then expose that as an IObservable for others to subscribe to (you can use AsObservable() on your subject to make sure nobody can cast to a Subject and mess things up).
You could also achieve this with a .NET event and use FromEventPattern, but if I'm only going to turn the event into an IObservable anyway, I don't see the benefit of having an event instead of a Subject (which might mean I'm missing something here)
However, what you should avoid quite strongly is subscribing to an IObservable with a Subject, i.e. don't pass a Subject into the IObservable.Subscribe method.
Often when you're managing a Subject, you're actually just reimplementing features already in Rx, and probably in not as robust, simple and extensible a way.
When you're trying to adapt some asynchronous data flow into Rx (or create an asynchronous data flow from one that's not currently asynchronous), the most common cases are usually:
The source of data is an event: As Lee says, this is the simplest case: use FromEvent and head to the pub.
The source of data is from a synchronous operation and you want polled updates, (eg a webservice or database call): In this case you could use Lee's suggested approach, or for simple cases, you could use something like Observable.Interval.Select(_ => <db fetch>). You may want to use DistinctUntilChanged() to prevent publishing updates when nothing has changed in the source data.
The source of data is some kind of asynchronous api that calls your callback: In this case, use Observable.Create to hook up your callback to call OnNext/OnError/OnComplete on the observer.
The source of data is a call that blocks until new data is available (eg some synchronous socket read operations): In this case, you can use Observable.Create to wrap the imperative code that reads from the socket and publishes to the Observer.OnNext when data is read. This may be similar to what you're doing with the Subject.
Using Observable.Create vs creating a class that manages a Subject is fairly equivalent to using the yield keyword vs creating a whole class that implements IEnumerator. Of course, you can write an IEnumerator to be as clean and as good a citizen as the yield code, but which one is better encapsulated and feels a neater design? The same is true for Observable.Create vs managing Subjects.
Observable.Create gives you a clean pattern for lazy setup and clean teardown. How do you achieve this with a class wrapping a Subject? You need some kind of Start method... how do you know when to call it? Or do you just always start it, even when no one is listening? And when you're done, how do you get it to stop reading from the socket/polling the database, etc? You have to have some kind of Stop method, and you have to still have access not just to the IObservable you're subscribed to, but the class that created the Subject in the first place.
With Observable.Create, it's all wrapped up in one place. The body of Observable.Create is not run until someone subscribes, so if no one subscribes, you never use your resource. And Observable.Create returns a Disposable that can cleanly shutdown your resource/callbacks, etc - this is called when the Observer unsubscribes. The lifetimes of the resources you're using to generate the Observable are neatly tied to the lifetime of the Observable itself.
The quoted block text pretty much explains why you shouldn't be using Subject<T>, but to put it simpler, you are combining the functions of observer and observable, while injecting some sort of state in between (whether you're encapsulating or extending).
This is where you run into trouble; these responsibilities should be separate and distinct from each other.
That said, in your specific case, I'd recommend that you break your concerns into smaller parts.
First, you have your thread that is hot, and always monitoring the hardware for signals to raise notifications for. How would you do this normally? Events. So let's start with that.
Let's define the EventArgs that your event will fire.
// The event args that has the information.
public class BaseFrameEventArgs : EventArgs
{
public BaseFrameEventArgs(IBaseFrame baseFrame)
{
// Validate parameters.
if (baseFrame == null) throw new ArgumentNullException("IBaseFrame");
// Set values.
BaseFrame = baseFrame;
}
// Poor man's immutability.
public IBaseFrame BaseFrame { get; private set; }
}
Now, the class that will fire the event. Note, this could be a static class (since you always have a thread running monitoring the hardware buffer), or something you call on-demand which subscribes to that. You'll have to modify this as appropriate.
public class BaseFrameMonitor
{
// You want to make this access thread safe
public event EventHandler<BaseFrameEventArgs> HardwareEvent;
public BaseFrameMonitor()
{
// Create/subscribe to your thread that
// drains hardware signals.
}
}
So now you have a class that exposes an event. Observables work well with events. So much so that there's first-class support for converting streams of events (think of an event stream as multiple firings of an event) into IObservable<T> implementations if you follow the standard event pattern, through the static FromEventPattern method on the Observable class.
With the source of your events, and the FromEventPattern method, we can create an IObservable<EventPattern<BaseFrameEventArgs>> easily (the EventPattern<TEventArgs> class embodies what you'd see in a .NET event, notably, an instance derived from EventArgs and an object representing the sender), like so:
// The event source.
// Or you might not need this if your class is static and exposes
// the event as a static event.
var source = new BaseFrameMonitor();
// Create the observable. It's going to be hot
// as the events are hot.
IObservable<EventPattern<BaseFrameEventArgs>> observable = Observable.
FromEventPattern<BaseFrameEventArgs>(
h => source.HardwareEvent += h,
h => source.HardwareEvent -= h);
Of course, you want an IObservable<IBaseFrame>, but that's easy, using the Select extension method on the Observable class to create a projection (just like you would in LINQ, and we can wrap all of this up in an easy-to-use method):
public IObservable<IBaseFrame> CreateHardwareObservable()
{
// The event source.
// Or you might not need this if your class is static and exposes
// the event as a static event.
var source = new BaseFrameMonitor();
// Create the observable. It's going to be hot
// as the events are hot.
IObservable<EventPattern<BaseFrameEventArgs>> observable = Observable.
FromEventPattern<BaseFrameEventArgs>(
h => source.HardwareEvent += h,
h => source.HardwareEvent -= h);
// Return the observable, but projected.
return observable.Select(i => i.EventArgs.BaseFrame);
}
It is bad to generalize that Subjects are not good to use for a public interface.
While it is certainly true, that this is not the way a reactive programming approach should look like, it is definitively a good improvement/refactoring option for your classic code.
If you have a normal property with an public set accessor and you want to notify about changes, there speaks nothing against replacing it with a BehaviorSubject.
INPC or additional other events are just not that clean and it personally wears me off.
For this purpose you can and should use BehaviorSubjects as public properties instead of normal properties and ditch INPC or other events.
Additionally the Subject-interface makes the users of your interface more aware about the functionality of your properties and are more likely to subscribe instead of just getting the value.
It is the best to use if you want others to listen/subscribe to changes of a property.
Recently I have come across an increasing number of people who have code similar to the following:
private AsynchronousReader r;
public SynchronousReader()
{
r = new AsynchronousReader();
// My practice is to put this here
// and then never remove it and never add it again
// thus cleaning up the code and preventing constant add/remove.
//r.ReadCompleted += this.ReadCompletedCallback;
}
private ReadCompletedCallback()
{
// Remove the callback to "clean things up"...
r.ReadCompleted -= this.ReadCompletedCallback;
// Do other things
}
public Read()
{
r.ReadCompleted += this.ReadCompletedCallback;
// This call completes asynchronously and later invokes the above event
r.ReadAsync();
r.WaitForCompletion();
}
Folks say that this practice is better than the one I indicated above and have given several reasons specific to Silverlight. They state it prevents memory leaks, threading issues, and even that it is the normal practice.
I have not done much Silverlight, but it seems silly to do this still.
Are there any specific reasons one would use this method instead of just rigging up the callback in the constructor once and for the lifetime of the object?
This is as simple as I could make my example. Ignore the fact that it's a sort of wrapper that turns an asynchronous object into a synchronous one. I'm only curious about the way events are added and removed.
In the case you mention it would make sense to hook it up once, but potentially the objects (parent and/or child) may not get garbage collected as the event handlers still reference them.
According to Marc Gavel here
i.e. if we have:
publisher.SomeEvent += target.SomeHandler;
then "publisher" will keep "target" alive, but "target" will not keep
"publisher" alive.
A more important point to bear in mind might be the lifespan of the child object. If it is the same as the parent, then one-off subscription in the constructor makes more sense. If it is dynamic you will likely want to remove the handlers as I have seen them leak (resulting in multiple callbacks).
Note: If the constructor-only method turns out to leak objects, you can always put an unsubscribe in the Dispose() I guess, but I can't say I have ever seen that.
It sounds like you have two issues:
You're attempting to reuse an object that really should only be used once.
That object needs to get properly cleaned up.
You should really either only use an instance of the SynchronousReader object only once (thus avoiding the two async calls racing with one failing to finish like you mentioned elsewhere) or you should implement IDisposable in order to unsubscribe from the event and prevent the memory leak.
A third solution might be possible: keep the single instance of SynchronousReader, but each call to SynchronousReader.Read would create a new instance of AsynchronousReader (rather than storing it as a private field within the instance). Then you could keep most of the code above which you don't like, but which properly handles event subscriptions.
I have a C# app that needs to do a hot swap of a data input stream to a new handler class without breaking the data stream.
To do this, I have to perform multiple steps in a single thread without any other threads (most of all the data recieving thread) to run in between them due to CPU switching.
This is a simplified version of the situation but it should illustrate the problem.
void SwapInputHandler(Foo oldHandler, Foo newHandler)
{
UnhookProtocol(oldHandler);
HookProtocol(newHandler);
}
These two lines (unhook and hook) must execute in the same cpu slice to prevent any packets from getting through in case another thread executes in between them.
How can I make sure that these two commands run squentially using C# threading methods?
edit
There seems to be some confusion so I will try to be more specific. I didn't mean concurrently as in executing at the same time, just in the same cpu time slice so that no thread executes before these two complete. A lock is not what I'm looking for because that will only prevent THIS CODE from being executed again before the two commands run. I need to prevent ANY THREAD from running before these commands are done. Also, again I say this is a simplified version of my problem so don't try to solve my example, please answer the question.
Performing the operation in a single time slice will not help at all - the operation could just execute on another core or processor in parallel and access the stream while you perform the swap. You will have to use locking to prevent everybody from accessing the stream while it is in an inconsistent state.
Your data receiving thread needs to lock around accessing the handler pointer and you need to lock around changing the handler pointer.
Alternatively if your handler is a single variable you could use Interlocked.Exchange() to swap the value atomically.
Why not go at this from another direction, and let the thread in question handle the swap. Presumably, something wakes up when there's data to be handled, and passes it off to the current Foo. Could you post a notification to that thread that it needs to swap in a new handler the next time it wakes up? That would be much less fraught, I'd think.
Okay - to answer your specific question.
You can enumerate through all the threads in your process and call Thread.Suspend() on each one (except the active one), make the change and then call Thread.Resume().
Assuming your handlers are thread safe, my recommendation is to write a public wrapper over your handlers that does all the locking it needs using a private lock so you can safely change the handlers behind the scenes.
If you do this you can also use a ReaderWriterLockSlim, for accessing the wrapped handlers which allows concurrent read access.
Or you could architect your wrapper class and handler clases in such a way that no locking is required and the handler swamping can be done using a simple interlocked write or compare exchange.
Here's and example:
public interface IHandler
{
void Foo();
void Bar();
}
public class ThreadSafeHandler : IHandler
{
ReaderWriterLockSlim rwLock = new ReaderWriterLockSlim();
IHandler wrappedHandler;
public ThreadSafeHandler(IHandler handler)
{
wrappedHandler = handler;
}
public void Foo()
{
try
{
rwLock.EnterReadLock();
wrappedHandler.Foo();
}
finally
{
rwLock.ExitReadLock();
}
}
public void Bar()
{
try
{
rwLock.EnterReadLock();
wrappedHandler.Foo();
}
finally
{
rwLock.ExitReadLock();
}
}
public void SwapHandler(IHandler newHandler)
{
try
{
rwLock.EnterWriteLock();
UnhookProtocol(wrappedHandler);
HookProtocol(newHandler);
}
finally
{
rwLock.ExitWriteLock();
}
}
}
Take note that this is still not thread safe if atomic operations are required on the handler's methods, then you would need to use higher order locking between treads or add methods on your wrapper class to support thread safe atomic operations (something like, BeginTreadSafeBlock() folowed by EndTreadSafeBlock() that lock the wrapped handler for writing for a series of operations.
You can't and it's logical that you can't. The best you can do is avoid any other thread from disrupting the state between those two actions (as have already been said).
Here is why you can't:
Imagine there was an block that told the operating system to never thread switch while you're on that block. That would be technically possible but will lead to starvation everywhere.
You might thing your threads are the only one being used but that's an unwise assumption. There's the garbage collector, there are the async operations that works with threadpool threads, an external reference, such as a COM object could span its own thread (in your memory space) so that noone could progress while you're at it.
Imagine you make a very long operation in your HookOperation method. It involves a lot of non leaky operations but, as the Garbage Collector can't take over to free your resources, you end up without any memory left. Or imagine you call a COM object that uses multithreading to handle your request... but it can't start the new threads (well it can start them but they never get to run) and then joins them waiting for them to finish before coming back... and therefore you join on yourself, never returning!!.
As other posters have already said, you can't enforce system-wide critical section from user-mode code. However, you don't need it to implement the hot swapping.
Here is how.
Implement a proxy with the same interface as your hot-swappable Foo object. The proxy shall call HookProtocol and never unhook (until your app is stopped). It shall contain a reference to the current Foo handler, which you can replace with a new instance when needed. The proxy shall direct the data it receives from hooked functions to the current handler. Also, it shall provide a method for atomic replacement of the current Foo handler instance (there is a number of ways to implement it, from simple mutex to lock-free).
Time and time again I find myself having to write thread-safe versions of BindingList and ObservableCollection because, when bound to UI, these controls cannot be changed from multiple threads. What I'm trying to understand is why this is the case - is it a design fault or is this behavior intentional?
The problem is designing a thread safe collection is not simple. Sure it's simple enough to design a collection which can be modified/read from multiple threads without corrupting state. But it's much more difficult to design a collection that is usable given that it's updated from multiple threads. Take the following code as an example.
if ( myCollection.Count > 0 ) {
var x = myCollection[0];
}
Assume that myCollection is a thread safe collection where adds and updates are guaranteed not to corrupt state. This code is not thread safe and is a race condition.
Why? Even though myCollection is safe, there is no guarantee that a change does not occur between the two method calls to myCollection: namedly Count and the indexer. Another thread can come in and remove all elements between these calls.
This type of problem makes using a collection of this type quite frankly a nightmare. You can't ever let the return value of one call influence a subsequent call on the collection.
EDIT
I expanded this discussion on a recent blog post: http://blogs.msdn.com/jaredpar/archive/2009/02/11/why-are-thread-safe-collections-so-hard.aspx
To add a little to Jared's excellent answer: thread safety does not come for free. Many (most?) collections are only used within a single thread. Why should those collections have performance or functionality penalties to cope with the multi-threaded case?
Gathering ideas from all the other answers, I think this is the simplest way to resolve your issues:
Change your question from:
"Why isn't class X sane?"
to
"What is the sane way of doing this with class X?"
in your class's constructor, get the current displatcher as you create
your observable collections. Becuase, as you pointed out, modification need to
be done on the original thread, which may not be the main GUI thread.
So App.Current.Dispatcher isn't alwasys right,
and not all classes have a this.Dispatcher.
_dispatcher = System.Windows.Threading.Dispatcher.CurrentDispatcher;
_data = new ObservableCollection<MyDataItemClass>();
Use the dispatcher to Invoke your code sections
that need the original thread.
_dispatcher.Invoke(new Action(() => { _data.Add(dataItem); }));
That should do the trick for you. Though there are situations you might prefer .BeginInvoke instead of .Invoke.
If you want to go crazy - here's a ThreadedBindingList<T> that does notifications back on the UI thread automatically. However, it would still only be safe for one thread to be making updates etc at a time.