Event-driven background "process" - c#

How do I achieve a background task that waits for events to be raised by other processes, similar to how a Winforms Form does "nothing" until an event is raised? I would like to know how to do this in two cases:
a) as a process to be called by applications
b) as a task in one application
(But an answer to one would be appreciated as well, of course.)
The specific type of usage I'm facing now is to have this task process a queue (FIFO) and when the queue is empty - wait. But please don't restrict your answers to that as I'm interested in a general answer.
I've used the terms task and process, but perhaps the term thread should be used. Please feel free to correct me if I'm wrong.
EDIT
I'm looking for some built-in mechanism, rather than implementing a "message pump". And preferably, built-in to .net, not a library that has to be installed.

Within an application (not across apps) - a simple queue pattern will do it, you can use a thread signalling mechanism. Here is a simple example:
Declare a thread-safe collection to store your command, a task (thread pool thread, effectively) to process the queue, a signal trigger - and a flag to allow exiting later:
private ConcurrentQueue<CommandObject> _queueCommands
private Task _queueProcessorTask;
private AutoResetEvent _trigger;
private bool _isRunning;
The code to initialise and kick off the queue processor:
_queueCommands = new ConcurrentQueue<CommandObject>();
_queueProcessorTask = new Task(ProcessQueue);
_trigger = new AutoResetEvent(false);
_isRunning = true;
_queueProcessorTask.Start();
Your queue processor itself will look like something this:
while (_isRunning)
{
if (_isRunning && _queueCommands .Count != 0)
{
if (_queueCommands.TryDequeue(out command))
{
// do the job, this is FIFO
}
}
// you wanna wait here, but only if there's nothing new to do
if (_isRunning && _queueCommands.Count == 0)
{
_trigger.WaitOne(10000, false);
}
}
And some code to add requests to the queue:
_queueCommands.Enqueue(newCommand);
_trigger.Set(); // this is the bit which does your event / signal to spark queue processor into life
There is also a collection called BlockingCollection that can do the above with the signal part intrinsic, but I like to show this verbose version so you know what's going on.
ADDED:
With the code above we're basically notifying a dedicated thread that it has a message / command to process - so you're "message pump" if you will. The blocking collection does this in less steps but you still have to add something to the collection to "pump" that message!

Your question is very broad and as such a lot of good answers are possible. It really depends on what you are trying to achieve. Many options exist and you just need to carefully look at your requirements and determine what fits best.
The basic pattern that would appear to apply in most cases is that of queued event or message handling.
I will list a few available "out of the box" solutions that address different types of needs, without trying to make this an exhaustive list.
Across processes
There are several "precooked" solutions available that for example could fit if you are doing eventing / messaging across multiple processes.
If you require some form of durable messaging, you could look at one of several message bus implementations, such as e.g. NServiceBus, Kafka, ...
If your messaging needs to be fast but does not require durability, then something like 0mq that supports multiple messaging patterns could be useful, if you want to do this across .NET web apps, you may want to have a look at SignalR.
If you are looking to do any type of complex event processing, or you want to have a permanent memory of the events and do "computatations", such as aggregation or other types of projections over a series of events, you could have a look at the EventStore project.
If you want to do something simple, like get notifications when data in your database has changed, you could look into specific change notification services / trigger analogs for your specific type database (e.g. RavenDB's changes API).
Within a process
If you need to perform (non-durable) messaging between threads in a single process, the two most complete "prebaked" building blocks available to you are the
TPL Dataflow
Allow to set up specific data processing pipelines, with several buffering and transformation options (think pipes and filters).
Reactive extensions
Specifically intended to build non-durable asynchronous event processing, composing and transforming events using observable sequences and LINQ-style query operators.
If your needs are far simpler than what they offer, you could build your own, taking into account the basic type of tried and tested patterns that are used in the solutions already mentioned.

Related

Is it a good practice to use TaskFactory.StartNew for every layer in push model?

Lets assume that I have a several layers:
Manager that reads data from a socket
Manager that subscribes to #1 and takes care about persisting the data
Manager that subscribes to #2 and takes care about deserialization of the data and propagating it to typed managers that are insterested in certain event types
WPF Controllers that display the data (are subscribed to #3)
As of right now I use
TaskFactory.StartNew(()=>subscriber.Publish(data));
on each layer. The reason for this is that I don't want to rely on the fact that every manager will do his work quickly and that ex. Socket manager is not stuck.
Is this a good approach?
Edit
Let's say that Socket manager receives a price update
There are 10 managers subscribed to Socket manager so when Socket manager propagates the message .StartNew is called 10 times.
Managers #2,#3 do nothing else but to propagate the message by .StartNew to a single subscriber
So ultimately per 1 message from socket 30x .StartNew() is called.
It seems a reasonable approach.
However, if one could meaningfully do:
subscriber.PublishAsync(data).LogExceptions(Log);
Where LogExceptions is something like:
// I'm thinking of Log4Net here, but of course something else could be used.
public static Task LogExceptions(this Task task, ILog log)
{
return task.ContinueWith(ta => LogFailedTask(ta, log), TaskContinuationOptions.OnlyOnFaulted);
}
private static void LogFailedTask(Task ta, ILog log)
{
var aggEx = ta.Exception;
if(aggEx != null)
{
log.Error("Error in asynchronous event");
int errCount = 0;
foreach(var ex in aggEx.InnerExceptions)
log.Error("Asynchronous error " + ++errCount, ex);
}
}
So that fire-and-forget use of tasks still have errors logged, and PublishAsync in turn makes use of tasks where appropriate, then I'd be happier still. In particular, if the "publishing" has anything that would block a thread that can be handled with async like writing to or reading from a database or file system then the thread use could scale better.
Regarding Task.Run vs. TaskFactory.StartNew, they are essentially identical under the hood. Please read the following link: http://blogs.msdn.com/b/pfxteam/archive/2014/12/12/10229468.aspx
Even though these methods use the ThreadPool for decent performance, there is overhead associated with constantly creating new Tasks on-the-fly. Task is generally used more for infrequent, fire-and-forget type workload. Your statement of "30x .StartNew() per 1 message from the socket" is a bit concerning. How often do socket messages arrive? If you are really concerned with latency, I think the better way of doing this is that each manager should have its own dedicated thread. You can use a BlockingQueue implementation so that the threads are waiting to consume a parent input item in the parent's queue. This would be preferable to a simple spinlock, for example.
This is the sort of architecture used regularly in financial market messaging subscription and decoding that needs the fastest possible performance. Also keep in mind that more threads do not always equate to faster performance. If the threads have any shared data dependencies, they will all be contending for the same locks, causing context switching on one another, etc. This is why a preset number of dedicated threads can usually win out vs. a greater number of threads created on-the-fly. The only exception I can think of would be "embarrassingly parallel" tasks where there are no shared data dependencies at all. Note that dependencies can exist on both the input side and the output side (anywhere there is a lock the threads could run into).

.NET AMQP Messaging Pattern Issues

I have created a small class using RabbitMQ that implements a publish/subscribe messaging pattern on a topic exchange. On top of this pub/sub I have the methods and properties:
void Send(Message, Subject) - Publish message to destination topic for any subscribers to handle.
MessageReceivedEvent - Subscribe to message received events on this messaging instance (messaging instance is bound to the desired subscribe topic when created).
SendWaitReply(Message, Subject) - Send a message and block until a reply message is received with a correlation id matching the sent message id (or timeout). This is essentially a request/reply or RPC mechanism on top of the pub/sub pattern.
The messaging patterns I have chosen are somewhat set in stone due to the way the system is to be designed. I realize I could use reply-to queues to mitigate the potential issue with SendWaitReply, but that breaks some requirements.
Right now my issues are:
For the Listen event, the messages are processed synchronously through the event subscribers as the listener runs in a single thread. This causes some serious performance issues when handling large volumes of messages (i.e. in a back-end process consuming events from a web api). I am considering passing in a callback function as opposed to subscribing to an event and then dispatching the collection of callbacks in parallel using Task or Threadpool. Thread safety would obviously now be a concern of the caller. I am not sure if this is a correct approach.
For the SendWaitReply event, I have built what seems to be a hacky solution that takes all inbound messages from the message listener loop and places them in a ConcurrentDictionary if they contain a non-empty correlation guid. Then in the SendWaitReply method, I poll the ConcurrentDictionary for a message containing a key that matches the Id of the sent message (or timeout after a certain period). If there is a faster/better way to do this, I would really like to investigate it. Maybe a way to signal to all of the currently blocked SendWaitReply methods that a new message is available and they should all check their Ids instead of polling continuously?
Update 10/15/2014
After much exhaustive research, I have concluded that there is no "official" mechanism/helper/library to directly handle the particular use-case I have presented above for SendWaitReply in the scope of RabbitMQ or AMQP. I will stick with my current solution (and investigate more robust implementations) for the time being. There have been answers recommending I use the provided RPC functionality, but this unfortunately only works in the case that you want to use exclusive callback queues on a per-request basis. This breaks one of my major requirements of having all messages (request and reply) visible on the same topic exchange.
To further clarify, the typical message pair for a SendWaitReply request is in the format of:
Topic_Exchange.Service_A => some_command => Topic_Exchange.Service_B
Topic_Exchange.Service_B => some_command_reply => Topic_Exchange.Service_A
This affords me a powerful debugging and logging technique where I simply set up a listener on Topic_Exchange.# and can see all of the system traffic for tracing very deep 'call stacks' through various services.
TL; DR - Current Problem Below
Backing down from the architectural level - I still have an issue with the message listener loop. I have tried the EventingBasicConsumer and am still seeing a block. The way my class works is that the caller subscribes to the delegate provided by the instance of the class. The message loop fires the event on that delegate and those subscribers then handle the message. It seems as if I need a different way to pass the message event handlers into the instance such that they don't all sit behind one delegate which enforces synchronous processing.
It's difficult to say why your code is blocking without a sample, but to prevent blocking while consuming, you should use the EventingBasicConsumer.
var consumer = new EventingBasicConsumer;
consumer.Received += (s, delivery) => { /* do stuff here */ };
channel.BasicConsume(queue, false, consumer);
One caveat, if you are using autoAck = false (as I do), then you need to ensure you lock the channel when you do channel.BasicAck or you may hit concurrency issues in the .NET library.
For the SendWaitReply, you may have better luck if you just use the SimpleRpcClient included in the RabbitMQ client library:
var props = channel.CreateBasicProperties();
// Set your properties
var client = new RabbitMQ.Client.MessagePatterns.SimpleRpcClient(channel, exchange, ExchangeType.Direct, routingKey);
IBasicProperties replyProps;
byte[] response = client.Call(props, body, out replyProps);
The SimpleRpcClient will deal with creating a temporary queue, correlation ID's, and so on instead of building your own. If you find you want to do something more advanced, the source is also a good reference.

Design pattern for asynchronous calls in C#

I'm designing a desktop application with multiple layers: the GUI layer (WinForms MVP) holds references to interfaces of adapter classes, and these adapters call BL classes that do the actual work.
Apart from executing requests from the GUI, the BL also fires some events that the GUI can subscribe to through the interfaces. For example, there's a CurrentTime object in the BL that changes periodically and the GUI should reflect the changes.
There are two issues that involve multithreading:
I need to make some of the logic
calls asynchronous so that they don't block the GUI.
Some of the events the GUI recevies are fired from non-GUI threads.
At what level is it best to handle the multithreading? My intuition says that the Presenter is the most suitable for that, am I right? Can you give me some example application that does what I need? And does it make sense for the presenter to hold a reference to the form so it can invoke delegates on it?
EDIT: The bounty will probably go to Henrik, unless someone gives an even better answer.
I would look at using a Task-based BLL for those parts that can be described as "background operations" (that is, they're started by the UI and have a definite completion point). The Visual Studio Async CTP includes a document describing the Task-based Asynchronous Pattern (TAP); I recommend designing your BLL API in this way (even though the async/await language extensions haven't been released yet).
For parts of your BLL that are "subscriptions" (that is, they're started by the UI and continue indefinitely), there are a few options (in order of my personal preference):
Use a Task-based API but with a TaskCompletionSource that never completes (or only completes by being cancelled as part of application shutdown). In this case, I recommend writing your own IProgress<T> and EventProgress<T> (in the Async CTP): the IProgress<T> gives your BLL an interface for reporting progress (replacing progress events) and EventProgress<T> handles capturing the SynchronizationContext for marshalling the "report progress" delegate to the UI thread.
Use Rx's IObservable framework; this is a good match design-wise but has a fairly steep learning curve and is less stable than I personally like (it's a pre-release library).
Use the old-fashioned Event-based Asynchronous Pattern (EAP), where you capture the SynchronizationContext in your BLL and raise events by queuing them to that context.
EDIT 2011-05-17: Since writing the above, the Async CTP team has stated that approach (1) is not recommended (since it somewhat abuses the "progress reporting" system), and the Rx team has released documentation that clarifies their semantics. I now recommend Rx for subscriptions.
It depends on what type of application you are writing - for example - do you accept bugs? What are your data requirements - soft realtime? acid? eventually consistent and/or partially connected/sometimes disconnected clients?
Beware that there's a distinction between concurrency and asynchronocity. You can have asynchronocity and hence call method call interleaving without actually having a concurrently executing program.
One idea could be to have a read and write side of your application, where the write-side publishes events when it's been changed. This could lead to an event driven system -- the read side would be built from the published events, and could be rebuilt. The UI could be task-driven - in that a task to perform would produce a command that your BL would take (or domain layer if you so wish).
A logical next step, if you have the above, is to also go event-sourced. Then you would recreate internal state of the write-model through what has been previously committed. There's a google group about CQRS/DDD that could help you with this.
With regards to updating the UI, I've found that the IObservable interfaces in System.Reactive, System.Interactive, System.CoreEx are well suited. It allows you to skip around different concurrent invocation contexts - dispatcher - thread pool, etc, and it interops well with the Task Parallel Library.
You'd also have to consider where you put your business logic -- if you go domain driven I'd say you could put it in your application as you'd have an updating procedure in place for the binaries you distribute anyway, when time comes to upgrade, but there's also the choice of putting it on the server. Commands could be a nice way to perform the updates to the write-side and a convenient unit of work when connection-oriented code fails (they are small and serializable and the UI can be designed around them).
To give you an example, have a look at this thread, with this code, that adds a priority to the IObservable.ObserveOnDispatcher(...)-call:
public static IObservable<T> ObserveOnDispatcher<T>(this IObservable<T> observable, DispatcherPriority priority)
{
if (observable == null)
throw new NullReferenceException();
return observable.ObserveOn(Dispatcher.CurrentDispatcher, priority);
}
public static IObservable<T> ObserveOn<T>(this IObservable<T> observable, Dispatcher dispatcher, DispatcherPriority priority)
{
if (observable == null)
throw new NullReferenceException();
if (dispatcher == null)
throw new ArgumentNullException("dispatcher");
return Observable.CreateWithDisposable<T>(o =>
{
return observable.Subscribe(
obj => dispatcher.Invoke((Action)(() => o.OnNext(obj)), priority),
ex => dispatcher.Invoke((Action)(() => o.OnError(ex)), priority),
() => dispatcher.Invoke((Action)(() => o.OnCompleted()), priority));
});
}
The example above could be used like this blog entry discusses
public void LoadCustomers()
{
_customerService.GetCustomers()
.SubscribeOn(Scheduler.NewThread)
.ObserveOn(Scheduler.Dispatcher, DispatcherPriority.SystemIdle)
.Subscribe(Customers.Add);
}
... So for example with a virtual starbucks shop, you'd have a domain entity that has something like a 'Barista' class, which produces events 'CustomerBoughtCappuccino' : { cost : '$3', timestamp : '2011-01-03 12:00:03.334556 GMT+0100', ... etc }. Your read-side would subscribe to these events. The read side could be some data model -- for each of your screens that present data -- the view would have a unique ViewModel-class -- which would be synchronized with the view in an observable dictionary like this. The repository would be (:IObservable), and your presenters would subscribe to all of that, or just a part of it. That way your GUI could be:
Task driven -> command driven BL, with focus on user operations
Async
Read-write-segregated
Given that your BL only takes commands and doesn't on top of that display a 'good enough for all pages'-type of read-model, you can make most things in it internal, internal protected and private, meaning you can use System.Contracts to prove that you don't have any bugs in it (!). It would produce events that your read-model would read. You could take the main principles from Caliburn Micro about the orchestration of workflows of yielded asynchronous tasks (IAsyncResults).
There are some Rx design guidelines you could read. And cqrsinfo.com about event sourcing and cqrs. If you are indeed interested in going beyond the async programming sphere into the concurrent programming sphere, Microsoft has released a well written book for free, on how to program such code.
Hope it helps.
I would consider the "Thread Proxy Mediator Pattern". Example here on CodeProject
Basically all method calls on your Adaptors run on a worker thread and all results are returned on the UI thread.
The recommended way is using threads on the GUI, and then update your controls with Control.Invoke().
If you don't want to use threads in your GUI application, you can use the BackgroundWorker class.
The best practice is having some logic in your Forms to update your controls from outside, normally a public method. When this call is made from a thread that is not the MainThread, you must protect illegal thread accesses using control.InvokeRequired/control.Invoke() (where control is the target control to update).
Take a look to this AsynCalculatePi example, maybe it's a good starting point.

How to Purge a ThreadPool? [Microsoft System.Threading.ThreadPool]

Is it possible to purge a ThreadPool?
Remove items from the ThreadPool?
Anything like that?
ThreadPool.QueueUserWorkItem(GetDataThread);
RegisteredWaitHandle Handle = ThreadPool.RegisterWaitForSingleObject(CompletedEvent, WaitProc, null, 10000, true);
Any thoughts?
I recommend using the Task class (added in .NET 4.0) if you need this kind of behaviour. It supports cancellation, and you can have any number of tasks listening to the same cancellation token, which enables you to cancel them all with a single method call.
Updated (non-4.0 solution):
You really only have two choices. One: implement your own event demultiplexer (this is far more complex than it appears, due to the 64-handle wait limitation); I can't recommend this - I had to do it once (in unmanaged code), and it was hideous.
That leaves the second choice: Have a signal to cancel the tasks. Naturally, RegisteredWaitHandle.Unregister can cancel the RWFSO part. The QUWI is more complex, but can be done by making the action aware of a "token" value. When the action executes, it first checks the token value against its stored token value; if they are different, then it shouldn't do anything.
One major thing to consider is race conditions. Just keep in mind that there is a race condition between cancelling an action and the ThreadPool executing it, so it is possible to see actions running after cancellation.
I have a blog post on this concept, which I call "asynchronous callback contexts". The CallbackContext type mentioned in the blog post is available in the Nito.Async library.
There's no interface for removing a queued item. However, nothing stops you from "poisoning" the delegate so that it returns immediately.
edit
Based on what Paul said, I'm thinking you might also want to consider a pipelined architecture, where you have a fixed number of threads reading from a blocking queue (like .NET 4.0's BlockingCollection on a ConcurrentQueue). This way, if you want to cancel items, you can just access the queue yourself.
Having said that, Stephen's advice about Task is likely better, in that it gives you all the control you would realistically want, without all the hard work that rolling your own pipelines involves. I mention this only for completion.
The ThreadPool exists to help you manage your threads. You should not have to worry about purging it at all since it will make the best performance decisions on your behalf.
If you think you need tighter control over your threads then you could consider creating your own thread management class (similar to ThreadPool) but it would take a lot of work to match and exceed the functionality that ThreadPool has built in.
Take a look here at some of the ThreadPool optimizations and the ideas behind it.
For my second point, I found an article on Code Project that implements a "Cancelable Threadpool", probably for some of your own similar reasons. It would be a good place to start looking if you're going to write your own.

C# Delegates and Threads!

What exactly do I need delegates, and threads for?
Delegates act as the logical (but safe) equivalent to function-pointers; they allow you to talk about an operation in an abstract way. The typical example of this is events, but I'm going to use a more "functional programming" example: searching in a list:
List<Person> people = ...
Person fred = people.Find( x => x.Name == "Fred");
Console.WriteLine(fred.Id);
The "lambda" here is essentially an instance of a delegate - a delegate of type Predicate<Person> - i.e. "given a person, is something true or false". Using delegates allows very flexible code - i.e. the List<T>.Find method can find all sorts of things based on the delegate that the caller passes in.
In this way, they act largely like a 1-method interface - but much more succinctly.
Delegates: Basically, a delegate is a method to reference a method. It's like a pointer to a method which you can set it to different methods that match its signature and use it to pass the reference to that method around.
Thread is a sequentual stream of instructions that execute one after another to complete a computation. You can have different threads running simultaneously to accomplish a specific task. A thread runs on a single logical processor.
Delegates are used to add methods to events dynamically.
Threads run inside of processes, and allow you to run 2 or more tasks at once that share resources.
I'd suggest have a search on these terms, there is plenty of information out there. They are pretty fundamental concepts, wiki is a high level place to start:
http://en.wikipedia.org/wiki/Thread_(computer_science)
http://en.wikipedia.org/wiki/C_Sharp_(programming_language)
Concrete examples always help me so here is one for threads. Consider your web server. As requests arrive at the server, they are sent to the Web Server process for handling. It could handle each as it arrives, fully processing the request and producing the page before turning to the next one. But consider just how much of the processing takes place at hard drive speeds (rather than CPU speeds) as the requested page is pulled from the disk (or data is pulled from the database) before the response can be fulfilled.
By pulling threads from a thread pool and giving each request its own thread, we can take care of the non-disk needs for hundreds of requests before the disk has returned data for the first one. This will permit a degree of virtual parallelism that can significantly enhance performance. Keep in mind that there is a lot more to Web Server performance but this should give you a concrete model for how threading can be useful.
They are useful for the same reason high-level languages are useful. You don't need them for anything, since really they are just abstractions over what is really happening. They do make things significantly easier and faster to program or understand.
Marc Gravell provided a nice answer for 'what is a delegate.'
Andrew Troelsen defines a thread as
...a path of execution within a process. "Pro C# 2008 and the .NET 3.5 Platform," APress.
All processes that are run on your system have at least one thread. Let's call it the main thread. You can create additional threads for any variety of reasons, but the clearest example for illustrating the purpose of threads is printing.
Let's say you open your favorite word processing application (WPA), type a few lines, and then want to print those lines. If your WPA uses the main thread to print the document, the WPA's user interface will be 'frozen' until the printing is finished. This is because the main thread has to print the lines before it can process any user interface events, i.e., button clicks, mouse movements, etc. It's as if the code were written like this:
do
{
ProcessUserInterfaceEvents();
PrintDocument();
} while (true);
Clearly, this is not what users want. Users want the user interface to be responsive while the document is being printed.
The answer, of course, is to print the lines in a second thread. In this way, the user interface can focus on processing user interface events while the secondary thread focuses on printing the lines.
The illusion is that both tasks happen simultaneously. On a single processor machine, this cannot be true since the processor can only execute one thread at a time. However, switching between the threads happens so fast that the illusion is usually maintained. On a multi-processor (or mulit-core) machine, this can be literally true since the main thread can run on one processor while the secondary thread runs on another processor.
In .NET, threading is a breeze. You can utilize the System.Threading.ThreadPool class, use asynchronous delegates, or create your own System.Threading.Thread objects.
If you are new to threading, I would throw out two cautions.
First, you can actually hurt your application's performance if you choose the wrong threading model. Be careful to avoid using too many threads or trying to thread things that should really happen sequentially.
Second (and more importantly), be aware that if you share data between threads, you will likely need to sychronize access to that shared data, e.g., using the lock keyword in C#. There is a wealth of information on this topic available online, so I won't repeat it here. Just be aware that you can run into intermittent, not-always-repeatable bugs if you do not do this carefully.
Your question is to vague...
But you probably just want to know how to use them in order to have a window, a time consuming process running and a progress bar...
So create a thread to do the time consuming process and use the delegates to increase the progress bar! :)

Categories