So we've been looking into the Azure Service Bus recently and we're a bit confused as to whether we should use an infinite loop to poll the queue/subscription or whether we should use the OnMessage callback/message pump functionality. What is going to execute fewer operations and thus cost less?
Ideally we want an event-driven system so we aren't wasting operations and it's just generally a much nicer approach.
My question is, is using OnMessage which is defined as "Processes a message in an event-driven message pump" really event-driven?
If you take a look at this page (QueueClient.OnMessage): https://msdn.microsoft.com/library/azure/microsoft.servicebus.messaging.queueclient.onmessage.aspx you'll notice the remark at the bottom which states that it is basically a wrapper around an infinite loop which is calling the Receive() method. That doesn't sound very event-driven to me.
Now if you look at this page (SubscriptionClient.OnMessage):
https://msdn.microsoft.com/en-us/library/azure/dn130336.aspx, that remark is not present. So is it the same for topics/subscriptions and queues or is it actually event-driven for subscriptions but not for queues?
Why are they even saying it's event-driven when it's clearly not? The fact that the remark on the QueueClient.OnMessage page has the words "infinite loop" and "every receive operation is a billable event" is somewhat scary.
Also, I'm not really concerned about how much/little it will cost either way, I'm more interested in making it as efficient as possible.
I've not used OnMessage, but the question interested me so I did some digging.
My understanding is that the OnMessage approach just encapsulates away some of the usual concerns to do with processing messages from a queue to give you a cleaner/easier way to do it with a lot less to be concerned about. So instead of writing all the scaffolding around polling, you can focus more on a "push-like/event driven" implementation (message pump model).
And so you are correct in that it is basically still just a loop calling Receive() - so with the default timeouts, the number of polls would be the same and therefore same cost.
I came across these references:
http://fabriccontroller.net/introducing-the-event-driven-message-programming-model-for-the-windows-azure-service-bus/
http://www.flyersoft.net/?p=971 - check the comments too, as this covers the same question as yours.
So is it the same for topics/subscriptions and queues or is it
actually event-driven for subscriptions but not for queues?
I am not 100%, but my assumption based on my research is that it is the same and it's just a case that the documentation is not clear.
Related
I am very excited about using Rx in production application; where I will be listening to incoming notification updates coming from different channel.
I will be writing Rx query on top this stream where I will throttling using .Window() operator. Subscriber (In my case it is ActionBlock) will process this data in blocking fashion; (i.e it will not spawn Task from ActionBlock). Keeping above in mind if data comes at much faster rate than what my Subscriber can consume then what will happen to incoming data. Does Rx query uses any buffer internally; will it get overflowed ?
The phenomenon you're referring to is called Back Pressure, and the Rx team is currently exploring different ways to handle this situation. One solution might be communicating back-pressure back to the Observable so that it might "slow down".
To alleviate back-pressure, you could use lossy operators such as Throttle or Sample.
Timothy's answer is mostly right, but it is possible to have back-pressure occur on a single thread. This can happen if you use asynchronous code. In that sense, back-pressure is related to synchronization and scheduling, not threading (recall that by default Rx is single threaded).
If you run into a scenario where events are being produced faster than they can be consumed, and you're not using a lossy operator to alleviate the back-pressure, those items are usually being scheduled/queued/buffered, which can lead to a lot of memory allocation.
Personally, this has not been an issue for me, since usually events are processed faster than they are yielded, or loss of events is simply not an option, and therefore the extra memory consumption is inevitable.
This is actually down to the implementation of individual operators, but the built-in ones will buffer on a per-subscription basis - so a slow consumer won't block other subscribers, unless of course subscribers are sharing threads.
On a related note, Rx doesn't always protect the Rx grammar; for example, it is your responsibility to ensure you don't make concurrent calls to OnNext on a Subject. You can use Observable.Synchronize() to fix this.
If the subscriber is processing on the same thread as the emitting observable, the data cannot come faster than the subscriber can consume.
IObservable<int> data = ...;
var subscription = data.Subscribe(n => Console.WriteLine(n));
In this example, each int that emits out of data will be written to the console before the next int is emitted.
If the subscription crosses threads, then the above doesn't hold.
I'm in need of some advice in proper coding:
I'm working on a program where multiple serial connections are used. Each communication line has a controller working as an abstraction layer. Between the controller and the serial port, a protocol is inserted to wrap the data in packages, ready for transfer. The protocol takes care of failed deliveries, resending etc.
To ensure that the GUI won't hang, the each connection line (protocol and serial port) is created on a separate thread. The controller is handled by the main thread, since it has controls in the GUI.
Currently, when I create the threads, I have chosen to create a message loop on them (Application.Run()), so instead polling buffers and yielding if no work, i simply invoke the thread (BeginInvoke) and uses the message loop as a buffer. This currently works nicely, and no serious problems so far.
My question is now: Is this "good coding", or should i use a while loop on the tread and be polling buffers instead?, or some third thing?
I would like to show code, but so far it is several thousand lines of code, so please be specific if you need to see any part of the code. :)
Thank you.
Using message loops in each thread is perfectly fine; Windows is optimized for this scenario. You are right to avoid polling, but you may want to look into other event-based designs that are more efficient still, for example preparing a package for transfer and calling SetEvent to notify a thread it's ready, or semaphore and thread-safe queue as Martin James suggests.
I'm not 100% sure what you are doing here but, with a bit of 'filling in' it doesn't sound bad:)
When your app is idle, (no comms), is CPU use 0%?
Is your app free of sleep(0)/sleep(1), or similar, polling loops?
Does it operate with a reasonably low latency?
If the answers are three 'YES', you should be fine :)
There are a few, (very few!), cases where polling for results etc. is a good idea, (eg. when the frequency of events in the threads is so high that signaling every progress event to the GUI would overwhelm it), but mostly, it's just poor design.
Is it possible to purge a ThreadPool?
Remove items from the ThreadPool?
Anything like that?
ThreadPool.QueueUserWorkItem(GetDataThread);
RegisteredWaitHandle Handle = ThreadPool.RegisterWaitForSingleObject(CompletedEvent, WaitProc, null, 10000, true);
Any thoughts?
I recommend using the Task class (added in .NET 4.0) if you need this kind of behaviour. It supports cancellation, and you can have any number of tasks listening to the same cancellation token, which enables you to cancel them all with a single method call.
Updated (non-4.0 solution):
You really only have two choices. One: implement your own event demultiplexer (this is far more complex than it appears, due to the 64-handle wait limitation); I can't recommend this - I had to do it once (in unmanaged code), and it was hideous.
That leaves the second choice: Have a signal to cancel the tasks. Naturally, RegisteredWaitHandle.Unregister can cancel the RWFSO part. The QUWI is more complex, but can be done by making the action aware of a "token" value. When the action executes, it first checks the token value against its stored token value; if they are different, then it shouldn't do anything.
One major thing to consider is race conditions. Just keep in mind that there is a race condition between cancelling an action and the ThreadPool executing it, so it is possible to see actions running after cancellation.
I have a blog post on this concept, which I call "asynchronous callback contexts". The CallbackContext type mentioned in the blog post is available in the Nito.Async library.
There's no interface for removing a queued item. However, nothing stops you from "poisoning" the delegate so that it returns immediately.
edit
Based on what Paul said, I'm thinking you might also want to consider a pipelined architecture, where you have a fixed number of threads reading from a blocking queue (like .NET 4.0's BlockingCollection on a ConcurrentQueue). This way, if you want to cancel items, you can just access the queue yourself.
Having said that, Stephen's advice about Task is likely better, in that it gives you all the control you would realistically want, without all the hard work that rolling your own pipelines involves. I mention this only for completion.
The ThreadPool exists to help you manage your threads. You should not have to worry about purging it at all since it will make the best performance decisions on your behalf.
If you think you need tighter control over your threads then you could consider creating your own thread management class (similar to ThreadPool) but it would take a lot of work to match and exceed the functionality that ThreadPool has built in.
Take a look here at some of the ThreadPool optimizations and the ideas behind it.
For my second point, I found an article on Code Project that implements a "Cancelable Threadpool", probably for some of your own similar reasons. It would be a good place to start looking if you're going to write your own.
I have a multithreaded application that spawns threads for several hardware instruments. Each thread is basically an infinite loop (for the lifetime of the application) that polls the hardware for new data, and activates an event (which passes the data) each time it collects something new. There is a single listener class that consolidates all these instruments, performs some calculations, and fires a new event with this calculation.
However, I'm wondering if, since there is a single listener, it would be better to expose an IEnumerable<> method off these instruments, and use a yield return to return the data, instead of firing events.
I'd like to see if anybody knows of differences in these two methods. In particular, I'm looking for the best reliability, best ability to pause/cancel operation, best for threading purposes, general safety, etc.
Also, with the second method is it possible to still run the IEnumerable loop on a separate thread? Many of these instruments are somewhat CPU-bound, so ensuring each one is on a different thread is vital.
This sounds like a very good use case for the Reactive Extensions. There's a little bit of a learning curve to it but in a nutshell, IObservable is the dual of IEnumerable. Where IEnumerable requires you to pull from it, IObservable pushes its values to the observer. Pretty much any time you need to block in your enumerator, it's a good sign you should reverse the pattern and use a push model. Events are one way to go but IObservable has much more flexibility since it's composable and thread-aware.
instrument.DataEvents
.Where(x => x.SomeProperty == something)
.BufferWithTime( TimeSpan.FromSeconds(1) )
.Subscribe( x => DoSomethingWith(x) );
In the above example, DoSomethingWith(x) will be called whenever the subject (instrument) produces a DataEvent that has a matching SomeProperty and it buffers the events into batches of 1 second duration.
There's plenty more you could do such as merging in the events produced by other subjects or directing the notifications onto the UI thread, etc. Unfortunately documentation is currently pretty weak but there's some good information on Matthew Podwysocki's blog. (Although his posts almost exclusively mention Reactive Extensions for JavaScript, it's pretty much all applicable to Reactive Extensions for .NET as well.)
It's a close call, but I think I'd stick to the event model in this case, with the main decider behing that future maintenance programmers are less likely to understand the yield concept. Also, yield means the code processing each hardware request is in the same thread as the code generating the requests for processing. That's bad, because it could mean your hardware has to wait on the consumer code.
And speaking of consumers, another option is a producer/consumer queue. Your instruments can all push into the same queue and your single listener can then pop from it do whatever from there.
There's a pretty fundamental difference, push vs pull. The pull model (yield) being the harder one to implement from the instrument interface view. Because you'll have to store data until the client code is ready to pull. When you push, the client may or may not store, as it deems necessary.
But most practical implementations in multi-threading scenarios need to deal with the overhead in the inevitable thread context switch that's required to present data. And that's often done with pull, using a thread-safe bounded queue.
Stephen Toub blogs about a blocking queue which implements IEnumerable as an infinite loop using the yield keyword. Your worker threads could enqueue new data points as they appear and the calculation thread could dequeue them using a foreach loop with blocking semantics.
I don't think there's much difference performance-wise between the event and yield approach. Yield is lazy evaluated, so it leaves an opportunity to signal the producing threads to stop. If your code is thoughtfully documented then maintenance ought to be a wash, too.
My preference is a third option, to use a callback method instead of an event (even though both involve delegates). Your producers invoke the callback each time they have data. Callbacks can return values, so your consumer can signal producers to stop or continue each time they check in with data.
This approach can give you places to optimize performance if you have a high volume of data. In your callback you lock on a neutral object and append incoming data to a collection. The runtime internally uses an ready queue on the lock object, so this can serve as your queuing point.
This lets you choose a collection, such as a List<T> with predefined capacity, that is O(1) for appending. You can also double-buffer your consumer, with your callback appending to the "left" buffer while you consolidate from the "right" one, and so forth. This minimizes the amount of producer blocking and associated missed data, which is handy for bursty data. You can also readily measure high-water marks and processing rates as you vary the number of threads.
Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation).
I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them.
But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives.
If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues
The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority.
I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me.
Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?
UPDATE:
So I think that I was mis-understanding the async pattern in (2) above. I would get a callback on one worker thread when there was data available. Then I would begin another async receive and get another callback, etc. I wouldn't get N callbacks at the same time.
The question still is the same though. Is there any reason that the callbacks would be better in my specific situation where I'm the client and only connected to one server.
The slowest part of your application will be the network communication. It's highly likely that you will make almost no difference to performance for a one thread, one connection client by tweaking things like this. The network communication itself will dwarf all other contributions to processing or context switching time.
Say that I get N callbacks at almost
the same time on N different
threadpool threads notifying me that
there's new data.
Why is that going to happen? If you have one socket, you Begin an operation on it to receive data, and you get exactly one callback when it's done. You then decide whether to do another operation. It sounds like you're overcomplicating it, though maybe I'm oversimplifying it with regard to what you're trying to do.
In summary, I'd say: pick the simplest programming model that gets you what you want; considering choices available in your scenario, they would be unlikely to make any noticeable difference to performance whichever one you go with. With the blocking model, you're "wasting" a thread that could be doing some real work, but hey... maybe you don't have any real work for it to do.
The number one rule of performance is only try to improve it when you have to.
I see you mention standards but never mention problems, if you are not having any, then you don't need to worry what the standards say.
"This class was specifically designed for network server applications that require high performance."
As I understand, you are a client here, having only a single connection.
Data on this connection arrives in order, consumed by a single thread.
You will probably loose performance if you instead receive small amounts on separate threads, just so that you can assemble them later in a serialized - and thus like single-threaded - manner.
Much Ado about Nothing.
You do not really need to speed this up, you probably cannot.
What you can do, however is to dispatch work units to other threads after you receive them.
You do not need SocketAsyncEventArgs for this. This might speed things up.
As always, measure & measure.
Also, just because you can, it does not mean you should.
If the performance is enough for the foreseeable future, why complicate matters?