I am looking into a C# programming fairly scrub to the language. I would like to think I have a good understanding of object oriented programming in general, and what running multiple threads means, at a high level, but actual implementation I am as said scrub.
What I am looking to do is to create a tool that will have many threads running and interacting with each other independent, each will serve their own task and may call others.
My strategy to ensure communication (without losing anything with multiple updates occurring same time from different threads) is on each class to create a spool like task that can be called external, and add tasks to a given thread, or spool service for these. I am not sure if I should place this on the class or external and have the class itself call the spool for new tasks and keeping track of the spool. Here I am in particular considering how to signal the class if an empty spool gets a task (listener approach, so tasks can subscribe to pools if they want to be awoken if new stuff arrive), or make a "check every X seconds if out of tasks and next task is not scheduled" approach
What would a good strategy be to create this, should I create this in the actual class, or external? What are the critical regions in the implementation, as the "busy wait check" allows it to only be on adding new jobs, and removing jobs on the actual spool, while the signaling will require both adding/removing jobs, but also the goto sleep on signaling to be critical, and that suddenly add a high requirement for the spool of what to do if the critical region has entered, as this could result in blocks, causing other blocks, and possible unforeseen deadlocks.
I use such a model often, on various systems. I define a class for the agents, say 'AgentClass' and one for the requests, say 'RequestClass'. The agent has two abstract methods, 'submit(RequestClass *message)' and 'signal()'. Typically, a thread in the agent constructs a producer-consumer queue and waits on it for RequestClass instances, the submit() method queueing the passed RequestClass instances to the queue. The RequestClass usually contains a 'command' enumeration that tells the agent what needs doing, together with all data required to perform the request and the 'sender' agent instance. When an agent gets a request, it switches on the enumeration to call the correct function to do the request. The agent acts only on the data in the RequestClass - results, error messages etc. are placed in data members of the RequestClass. When the agent has performed the request, (or failed and generated error data), it can either submit() the request back to the sender, (ie. the request has been performed asynchronously), or call the senders signal() function, whch signals an event upon which the sender was waiting, (ie. the request was performed synchronously).
I usually construct a fixed number of RequestClass instances at startup and store them in a global 'pool' P-C queue. Any agent/thread/whatever than needs to send a request can dequeue a RequestClass instance, fill in data, submit() it to the agent and then wait asynchronously or synchronously for the request to be performed. When done with, the RequestClass is returned to the pool. I do this to avoid continual malloc/free/new/dispose, ease debugging, (I dump the pool level to a status bar using a timer, so I always notice if a request leaks or gets double-freed), and to eliminate the need for explicit thread termination on app close, (if multiple threads are only ever reading/writing to data areas that outlive the application forms etc, the app will close easily and the OS can deal with all the threads - there are hundreds of posts about 'cleanly shutting down threads upon app close' - I never bother!).
Such message-passing designs are quite resistant to deadlocks since the only locks, (if any), are in the P-C queues, though you can certainly achieve it if you try hard enough:)
Is this the sort of system that you seem to need , or have I got it wrong?
Rgds,
Martin
Related
I am trying to implement a queue to handle tasks queued by using task.run(function()).
Here are my requirements:
1) The threadpool should not stop after the application context has died. I need to be sure the calls will be made. (I believe this requires the thread to be a foreground thread)
2) Additionally, I need to be able to log these errors. Each function has it own error handling implemented within. I would consider it a fire and forget because I don't actually need to pass the data back to the caller but the information needs to be logged.
3) The queue will remove tasks as they complete. I may need some way of managing the size of the queue to prevent overuse of resources. Possibly, able to set a time limit for each task and forcing it to cancel after allotted time to free space queue.
Specification:
- .Net 4.0 Framework
- IIS
I was able to achieve the desire functionality by referencing Stephen Cleary AspNetBackgroundTasks
By using a singleton pattern, I was able to create a single instance of a object which is used as wrapper for managing the task. The code is able to prevent shut down by using the iregisteredobject.
Upon receiving a notification of pending shut down, Asp.Net notifies the object. Using a TaskCompleteSource, which only updates its state to a completed state when the running tasks count equal zero, the application await for all task to finish before allowing the application to shutdown.
There are risk in this design. The risk that is very similar to notification systems currently running in main memory. If power is lost, the code is lost.
Additionally, remember to make atomic changes to shared variables or implement thread safe locking techniques.
I have a bunch of threads blocked waiting for a message. Each message has an ID which points to a specific thread. I have the following implementations:
1) All threads are waiting on the same lock object using Monitor.Wait. When a message comes in, I call Monitor.PulseAll and each thread checks its own ID with the message ID. If there is a match, thread continues. Otherwise it waits again on the same object. With this approach, every message arrival causes N-1 threads to wake up and mismatch the ID and go back to sleep.
2) Each thread creates a ManualResetEvent and add it to a dictionary. The dictionary maps message id to its event. When the message arrives, it calls map[message.Id].Set() which wakes up the specific thread.
3) This last implementation is very similar to #2, except it uses a lock object instead of ManualResetEvent. The hypothesis is that ManualResetEvent is an expensive object. This approach is more complex if compared to ManualResetEvent.
What's the best approach here? Is there a better one?
The question description is fairly vague, so it's hard to know for sure what your best approach would be. That said…
I would not use #1 or #2 at all. #1 requires waking every thread up just so one thread can run, which is obviously inefficient, and #2 uses the unmanaged Windows-based synchronization objects, which is not as efficient as using a built-in .NET mechanism.
Your #3 option is on the face of it not unreasonable given the problem description. However, IMHO you should not be reimplementing this yourself. I.e. as near as I can tell, you (for some reason) have messages that need to be provided to specific threads, i.e. a given message must be processed only by one specific thread.
In this case, I think you should just create a separate message queue for each thread, and add the message to the appropriate queue. There are lots of ways to implement the queue, but the most obvious for this particular example seems to me to be to use BlockingCollection<T>. By default, this uses a queue as the underlying collection data structure. The other feature that's important here is the GetConsumingEnumerable() method, which allows you to write a foreach loop in the dependent thread to retrieve messages as they are queued. The loop will block when no message is available, waiting for one to be provided via some other thread.
You can use a dictionary to map message ID to the appropriate queue for each thread.
Note that this not really IMHO a performance issue. It's more about using an appropriate data structure for the given problem. I.e. you seem to have a queue of messages, where you want to dispatch each message to a different thread depending on its ID. Instead, I think you should implement multiple queues of messages, one for each thread, and then use the existing .NET features to implement your logic so that you don't have to reinvent the wheel.
Note also that if you still must maintain a single input queue for the messages (e.g. because that's the interface presented to some other component in your program), you can and should still do the above. You'll just have some adapter code that dequeues a message from the main, single message queue and routes to the appropriate thread-specific queue.
To wake up a specific thread or to even start a thread - Thread.Start().
To check if your thread hasn't aborted yet - Thread.IsAlive property.
To check if a thread is running - Thread.ThreadState().
You can use the above three props and methods to have desired control over the threads and manage them at a very fine granularity.
When you are initializing the threads, put them all in a Dictionary<ID, Thread>(). Now, whenever you get a message, simply get the thread with required ID and wake it up.
Lets assume that I have a several layers:
Manager that reads data from a socket
Manager that subscribes to #1 and takes care about persisting the data
Manager that subscribes to #2 and takes care about deserialization of the data and propagating it to typed managers that are insterested in certain event types
WPF Controllers that display the data (are subscribed to #3)
As of right now I use
TaskFactory.StartNew(()=>subscriber.Publish(data));
on each layer. The reason for this is that I don't want to rely on the fact that every manager will do his work quickly and that ex. Socket manager is not stuck.
Is this a good approach?
Edit
Let's say that Socket manager receives a price update
There are 10 managers subscribed to Socket manager so when Socket manager propagates the message .StartNew is called 10 times.
Managers #2,#3 do nothing else but to propagate the message by .StartNew to a single subscriber
So ultimately per 1 message from socket 30x .StartNew() is called.
It seems a reasonable approach.
However, if one could meaningfully do:
subscriber.PublishAsync(data).LogExceptions(Log);
Where LogExceptions is something like:
// I'm thinking of Log4Net here, but of course something else could be used.
public static Task LogExceptions(this Task task, ILog log)
{
return task.ContinueWith(ta => LogFailedTask(ta, log), TaskContinuationOptions.OnlyOnFaulted);
}
private static void LogFailedTask(Task ta, ILog log)
{
var aggEx = ta.Exception;
if(aggEx != null)
{
log.Error("Error in asynchronous event");
int errCount = 0;
foreach(var ex in aggEx.InnerExceptions)
log.Error("Asynchronous error " + ++errCount, ex);
}
}
So that fire-and-forget use of tasks still have errors logged, and PublishAsync in turn makes use of tasks where appropriate, then I'd be happier still. In particular, if the "publishing" has anything that would block a thread that can be handled with async like writing to or reading from a database or file system then the thread use could scale better.
Regarding Task.Run vs. TaskFactory.StartNew, they are essentially identical under the hood. Please read the following link: http://blogs.msdn.com/b/pfxteam/archive/2014/12/12/10229468.aspx
Even though these methods use the ThreadPool for decent performance, there is overhead associated with constantly creating new Tasks on-the-fly. Task is generally used more for infrequent, fire-and-forget type workload. Your statement of "30x .StartNew() per 1 message from the socket" is a bit concerning. How often do socket messages arrive? If you are really concerned with latency, I think the better way of doing this is that each manager should have its own dedicated thread. You can use a BlockingQueue implementation so that the threads are waiting to consume a parent input item in the parent's queue. This would be preferable to a simple spinlock, for example.
This is the sort of architecture used regularly in financial market messaging subscription and decoding that needs the fastest possible performance. Also keep in mind that more threads do not always equate to faster performance. If the threads have any shared data dependencies, they will all be contending for the same locks, causing context switching on one another, etc. This is why a preset number of dedicated threads can usually win out vs. a greater number of threads created on-the-fly. The only exception I can think of would be "embarrassingly parallel" tasks where there are no shared data dependencies at all. Note that dependencies can exist on both the input side and the output side (anywhere there is a lock the threads could run into).
Is there a way in C# to send a message to another thread based on the thread's thread id or name?
Basically for a project in school, my professor wants us to do a producer/consumer deal, but passing objects serialized to a string(such as xml) from producer to consumer. Once a string is pulled from a buffer in the consumer thread, each of those strings is decoded(including the threadid) and processed and the original producer is notified via callback. So how do I send an event to the original producer thread with just the thread id?
You can write a class which has a Dictionary< string, thread > member containing all your threads. When you create a thread add it to the dictionary so you can return it by name (key) later from anywhere in the class. This way you can also share resources among your threads, but be sure to lock any shared resources to prevent concurrency issues.
Imagine you run a company, and you could hire as many employees as you liked, but each employee was really single-minded, so you could only give them one order ever. You couldn't get much done with them, right? So if you were a smart manager, what you'd do is say "Your order is 'wait by your inbox until you get a letter telling you what to do, do the work, and then repeat'". Then you could put work items into the worker's inboxes as you needed work done.
The problem then is what happens if you give an employee a long-running, low priority task (let's say, "drive to Topeka to pick up peanut butter for the company picnic"). The employee will happily go off and do that. But then the building catches fire, and you need to know that if you issue the order "grab the fire extinguisher and put the fire out!" someone is going to do that quickly. You can solve that problem by having multiple employees share a single inbox- that way, there is a higher probability that someone will be ready to execute the order to douse the flames, and not be off driving through Kansas.
Guess what? Threads are those difficult employees.
You don't "pass messages to a thread". What you can do is set up a thread or group of threads to observe a common, shared data structure such as a blocking queue (BlockingCollection in .NET, for example), and then put messages (like your strings) into that queue for processing by the consumer threads (which need to listen on the queue for work).
For bidirectional communication, you would need two queues (one for the message, and one for the response). The reason is that your "main" thread is also a bad employee- it only can process responses one at a time, and while it is processing a response from one worker, another worker might come back with another response. You'd want to build a request/response coordination protocol so that the original requestor knows which request a response is associated with- usually requests have an ID, and responses reference the request's ID so that the original requestor knows which request each response is for.
Finally you need proper thread synchronization (locking) on the queues if that isn't built in to the Producer/Consumer queue that you are working with. Imagine if you were putting a message into a worker's inbox, and that worker was so eager to read the message that he grabbed it from your hand and tore it in half. What you need is the ability to prevent more than one thread from accessing the queue at a time.
When using threads you do not try to send messages between them. Threads can use a shared memory to synchronize themselves - This is called synchronized objects. In order to manage threads for a consumer/producer system you can use a queue (a data structure) and not a message system. (see example here: C# producer/consumer).
Another possible solution (which I would not recommend) is : You can use GetThreadId to return the ID of a given native thread. Then all you need to find is the thread handle and pass it to that function. GetCurrentThreadId returns the ID of the current thread. There you can access it's name property.
A message is simply a method call, and to make a method call you first need an instance object which expose some methods to be called, thus sending a message to a thread means finding active object which lives in that thread and calling it's specific method.
Finding each thread's main worker's object could be handled through the threads coordinator, so if an object in a specific thread wants to send a message to another object (in other thread), it would first send it's request to threads coordinator and the coordinator sends the message/request to it's destination.
How exactly does a Handle relate to a thread? I am writing a service that accepts an HTTP request and calls a method before returning a response. I have written a test client that sends out 10,000 HTTP requests (using a semaphore to make sure that only 1000 request are made at a time).
If i call the method (the method processed before returning a response) through the ThreadPool, or through a generic Action<T>.BeginInvoke, the service's handles will go way up and stay there until all the request have finished, but the thread count of the service stays pretty much dead.
However, if I synchronously call the method before returning the response, the thread count goes up, but the handle count will will go through extreme peaks and valleys.
This is C# on a windows machine (Server 2008)
Your description is too vague to give a good diagnostic. But the ThreadPool was designed to carefully throttle the number of active threads. It will avoid running more threads than you have CPU cores. Only when a thread gets "stuck" will it schedule an extra thread. That explains why you see the number of threads not increase wildly. And, indirectly, why the handle count stays stable because the machine is doing less work.
You can think of a handle as an abstraction of a pointer. There are lots of things in Windows that use handles (when you open a file at the API level, you get a handle to the file, when you create a window, the window has a handle, a thread has a handle, etc). So, Your handle count probably has to do with operations that are occuring on your threads. If you have more threads running, more stuff is going on at the same time, so you will see more handles open.