I'm working on my first ThreadPool application in Visual Studio 2008 with C#.
I have a report that has to perform calculations on 2000 to 4000 parts using data on our SQL Server.
I am Queuing all of the part numbers in a ThreadPool, where they go off and calculate their results. When these threads are finished, the RegisterWaitForSingleObject event fires to Unregister the Queued Item's Handle.
After all of the Queued Items have finished, is there a way to remove them from the ThreadPool?
The way it looks, if someone runs another report using a new set of 2000 to 4000 parts, I have no way of removing the previous array of parts.
How would I remove the Previously Queued Items? Would calling SetMaxThreads with workerThreads = 0 do it?
I realize I could experiment, but then I could waste most of the week experimenting.
Thanks for your time,
Joe
Once a ThreadPool item completes, it is automatically removed from the queue. What is indicating to you that they aren't?
Assuming you mean to interrupt (cancel) the work on the current queue...
Changing the max-threads won't affect the pending work; it'll just change the number of threads available to do it - and it is generally a bad idea to mess with this (your code isn't the only thing using the ThreadPool). I would use a custom queue - it is fairly easy to write a basic (thread-safe) producer/consumer queue, or .NET 4.0 includes some very good custom thread queues.
Then you can just abort the custom queue and start a new one.
I wrote a simple one here; at the moment it wants to exit cleanly (i.e. drain the queue until it is empty) before terminating, but it would be easy enough to add a flag to stop immediately after the current item (don't resort to interrupting/aborting threads at an arbitrary point in execution; never a good idea).
Related
I'm in process of writing a multi threaded application. Here's my case.
I grab a thousand records from database. Divide it into 5 chunks of list objects. and create 5 threads to process them. I do this same thing every minute until I have records remaining in the database
Task.Factory.StartNew(() => ProcessRecords(listRecords))
Inside ProcessRecords method, there is a small database update and some send mail takes place. (I'm using System.Net.Mail for email and don't use any ORM for db operation.)
Now I am worried that a thread might not complete because of some unknown issues. What will happen in that situation? Lets say one process (or even more) keeps on waiting for a deadlock in the database or something, what will happen to my application. It will keep on adding new threads with new set of records while some never ending threads. How can I implement something like timeout in this situation?
I want to run this process, terminate it in 5 minutes if it is not able to complete it.
Check out something called a TaskCancellationToken. You can use that to kill the task if you decide (by whatever means you prefer) that it's been running too long.
Alternatively, you could build that into the ProcessRecords() method itself: just have it commit seppuku if it runs too long by having it track its own start time and checking the elapsed time now and then; could be simpler.
That said, if you haven't already given it a shot, you might check to see whether .AsParallel() will save you some headaches here. There are a lot of cases where you can leave your parallelization woes to the compiler entirely.
Parallel.ForEach(db.Records, r => ProcessRecord(r));
Edit:
Parallel.ForEach(db.Records, ProcessRecord);
Yes. :)
Further edit:
For the OP, no, the TaskFactory doesn't offer anything like that out of the box. If you want to terminate the process from outside the process, you'll need to roll your own mechanism using some kind of a watcher thread to keep track of which tasks you have running, how long they've been running, and their respective cancellation tokens (or maybe just a bool you have at the top of a while loop...).
What i have now is a real-time API get bunch of messages from network and feed into pubsub manager class. there might be up to 1000 msg/sec or more at times. there are 2 different threads each connected to its own pubsub. subscribers are WPF windows. manager keeps list of windows and their DispatcherSynchornisationContext.
A thread calls the manager through interface method.
Manager publishes through Post:
foreach (var sub in Subscribers[subName])
{
sub.Context.Post(sub.WpfWindow.MyDelegate, data);
}
can this be optimised.
P.S. Please dont ask why do I think it is slow and all.. I dont have limits. Any solution is infinitely slow. I have to do my best to make it as fast as possible. I am asking for help to assess - can it be done faster? Thank you.
EDIT: found this: http://msdn.microsoft.com/en-us/library/aa969767.aspx
The argument with a queue stays. WHat I do is put stuff into the queue, the queue triggers a task that then invokes into the messaging thread and pulls X items of data (1000, or how many there are). The one thing that killed me was permanent single item invokation (which is slow), but doing it batchy works nicely. I can keep up wiitz nearly zero cpu load on a very busy ES data feed in crazy times for time and sales.
I have a special set of componetns for that that I will open source one of the next week and there is an ActionQueue (taking a delegate to call when items need processing). This is now a Task (was a queued thread work item before). I took time to process up 1000 messages per invocation - but if you do a price grid you may need more.
Note: use WPF hints to enable gpu caching of rendered bitmaps.
In addition:
Run every window on it's own thread / message pump
HEAVILY use async queues. The publisher should never block, every window has it's own target queue that is async.
You want processing as decoupled as possible. Brutally decoupled.
Here is my suggestion for you:
I would use a ConcurrentQueue (comes with the namespace System.Collections.Concurrent;) The background workers feed their messages in that queue. The UI Thread takes a timer and draws (let's say every 500 msec) a bunch of messages out of that queue and shows them to the user. Another possible way is, that the UI thread only will do that on demand of the user. The ConcurrentQueue is designed to be used from different thread and concurrently (as the name says ;-) )
I have queue of tasks for the ThreadPool, and each task has a tendency to froze locking up all the resources it is using. And these cant be released unless the service is restarted.
Is there a way in the ThreadPool to know that its thread is already frozen? I have an idea of using a time out, (though i still dont know how to write it), but i think its not safe because the length of time for processing is not uniform.
I don't want to be too presumptuous here, but a good dose of actually finding out what the problem is and fixing it is the best course with deadlocks.
Run a debug version of your service and wait until it deadlocks. It will stay deadlocked as this is a wonderful property of deadlocks.
Attach the Visual Studio debugger to the service.
"Break All".
Bring up your threads windows, and start spelunking...
Unless you have a sound architecture\design\reason to choose victims in the first place, don't do it - period. It's pretty much a recipe for disaster to arbitrarily bash threads over the head when they're in the middle of something.
(This is perhaps a bit lowlevel, but at least it is a simple solution. As I don't know C#'s API, this is a general solution for any language using thread-pools.)
Insert a watchdog task after each real task that updates a time value with the current time. If this value is larger than you max task run time (say 10 seconds), you know that something is stuck.
Instead of setting a time and polling it, you could continuously set and reset some timers 10 secs into the future. When it triggers, a task has hung.
The best way is probably to wrap each task in a "Watchdog" Task class that does this automatically. That way, upon completion, you'd clear the timer, and you could also set a per-task timeout, which might be useful.
You obviously need one time/timer object for each thread in the threadpool, but that's solvable via thread-local variables.
Note that this solution does not require you to modify your tasks' code. It only modifies the code putting tasks into the pool.
One way is to use a watchdog timer (a solution usually done in hardware but applicable to software as well).
Have each thread set a thread-specific value to 1 at least once every five seconds (for example).
Then your watchdog timer wakes every ten seconds (again, this is an example figure only) and checks to ensure that all the values are 1. If they're not 1, then a thread has locked up.
The watchdog timer then sets them all to 0 and goes back to sleep for the next cycle.
Providing your worker threads are written in such a way so that they will be able to set the values in a timely manner under non-frozen conditions, this scheme will work okay.
The first thread that locks up will not set its value to 1, and this will be detected by the watchdog timer on the next cycle.
However, a better solution is to find out why the threads are freezing in the first place and fix that.
I am about to implement the archetypal FileSystemWatcher solution. I have a directory to monitor for file creations, and the task of sucking up created files and inserting the into a DB. Roughly this will involve reading and processing 6 or 7, 80 char text files that appear at a rate of 150mS in bursts that occur every couple of seconds, and rarely a 2MB binary file will also have to be processed. This will most likely be a 24/7 process.
From what I have read about the FileSystemWatcher object it is better to enqueue its events in one thread and then dequeue/process them in another thread. The quandary I have right now is what would be the better creation mechanism of the thread that does the processing. The choices I can see are:
Each time I get a FSW event I manually create a new thread (yeah I know .. stupid architecture, but I had to say it).
Throw the processing at the CLR thread pool whenever I get an FSW event
On start up, create a dedicated second thread for the processing and use a producer/consumer model to handle the work. The main thread enqueues the request and the second thread dequeues it and performs the work.
I am tending towards the third method as the preferred one as I know the work thread will always be required - and also probably more so because I have no feel for the thread pool.
If you know that the second thread will always be required, and you also know that you'll never need more than one worker thread, then option three is good enough.
The third option is the most logical.
In regards to FSW missing some file events, I implemented this:
1) FSW Object which fires on FileCreate
2) tmrFileCheck, ticks = 5000 (5 seconds)
- Calls tmrFileChec_Tick
When the FileCreate event occurs, if (tmrFileCheck.Enabled == false) then tmrFileCheck.Start()
This way, after 10 seconds tmrFileCheck_Tick fires which
a) tmrFileCheck.Stop()
b) CheckForStragglerFiles
Of tests I've run, this works effectively where there are a < 100 files created per minute.
A variant is to merely have a timer tick ever NN seconds and sweep the directory(ies) for straggler files.
Another variant is to hire me to press F5 to refresh the window and call you when there are straggler files; just a suggestion. :-P
Just be aware that FileSystemWatcher may miss events, there's no guarantee it will deliver all specific events that have transpired. Your design of keeping the work done by the thread receiving events to a minimum, should reduce the chances of that happening, but it is still a possibility, given the finite event buffer size (tops out at 64KB).
I would highly recommend developing a battery of torture tests if you decide to use FileSystemWatcher.
In our testing, we encountered issues with network locations, that changing the InternalBufferSize did not fix, yet when we encountered this scenario, we did not receive Error event notifications either.
Thus, we developed our own polling mechanism for doing so, using Directory.GetFiles, followed by comparing the state of the returned files with the previously polled state, ensuring we always had an accurate delta.
Of course, this comes at a substantial cost in performance, which may not be good enough for you.
I've got a program I'm creating(in C#) and I see two approaches..
1) A job manager that waits for any number of X threads to finish, when finished it gets the next chunk of work and creates a new thread and gives it that chunk
or
2) We create X threads to start, give them each a chunk of work, and when a thread finishes a chunk its asks the job manager for more work. If there isn't any more work it sleeps and then asks again, with the sleep becoming progressively longer.
This program will be a run and done, tho I could see it turning into a service that continually looks for more jobs.
Each chunk will consists of a number of data ids, a call to the database to get some info or perform an operation on the data id, and then writing to the database info on the data id.
Assuming you are aware of the additional precautions that need to be taken when dealing with multithreaded database operations, it sounds like you're describing two different scenarios. In the first, you have several threads running, and once ALL of them finish it will look for new work. In the second, you have several threads running and their operations are completely parallel. Your environment is going to be what determines the proper approach to take; if there is something tying all of the work in the several threads where additional work cannot continue until all of them are finished, then with the former. If they don't have much affect on each other, go with the latter.
The second option isn't really right, as making the sleep time progressively longer means that you will unnecessarily keep those threads blocked.
Rather, you should have a pooled set of threads like the second option, but they use WaitHandles to wait for work and use a producer/consumer pattern. Basically, when the producer indicates that there is work, it sends a signal to a consumer (there will be a manager which will determine which thread will get the work, and then signal that thread) which will wake up and start working.
You might want to look into the Parallel Task Library. It's in beta now, but if you can use it and are comfortable with it, I would recommend it, as it will manage a great deal of this for you (and much better, taking into account the number of cores on a machine, the optimal number of threads, etc, etc).
The former solution (spawn a thread for each new piece of work), is easier to code, and not too bad, if the units of work are large enough.
The second solution (thread-pool, with a queue of work), is more complicated to code, but supports smaller units of work.
Instead of rolling your own solution, you should look at the ThreadPool class in the .NET framework. You could use the QueueUserWorkItem method. It should do exactly what you want to accomplish.