How to auto scale worker tasks?
I have an application which I want to automatically scale the number of worker tasks to accommodate for the throughput of items to process (with a maximum limit to the number of workers obviously).
All items to process are routed through a single point, from which they are distributed among the worker tasks (right now I find the worker with the shortest queue, and enqueue the item to it).
What would be a good pattern or technique to use to make some intelligent decisions as to how many workers I should spin up to handle the items? This logic would need to include shutting workers down if the items are able to be processed in a timely fashion by less workers.
I realize that adding more workers won't scale infinitely as ultimately other resources will be bottlenecked, and at some point adding more workers will hurt more than help. If I could account for this, and decide to scale DOWN the number of workers to find the 'sweet spot' automatically, that would be fantastic, however at this point I'd be happy if my system could just increase the number of workers as more items are added, and then decrease the number as less items are needed.
One idea I've toyed around with is measuring the average time an item sits in the queue. If this average time is greater than a couple seconds, I should spin up more workers (until the set max limit is reached). If the average time is less than 1 second, I should spin down more workers (until there's only one left of course).
Does anyone have any suggestions on the best way to approach this?
You won't be able to scale unless you can break the work down into chunks that can be distributed amongst workers. How you distribute that work really depends on how you break down the work. If you break down the work into smaller chunks whose results get merged when done, then the map/reduce pattern may fit the bill...
One thing to keep in mind, regardless of what you do, is that as soon as you have more threads running than CPUs your performance will degrade dramatically. this is because you increase/introduce context switching which is very expensive. Using things like TPL and PLINQ will allow you to schedule tasks and avoid this problem.
You could also compute a global running average of number of queued task per seconds, and divide by maximum task per seconds one worker can process. This will give you the number of worker you need to not fall behind.
You also want to introduce a delay, (if for x seconds there is less worker then needed increase) (if for x seconds there is more worker then needed decrease).
This assume all task take the same amount of time to process and that all your worker have the same throughput.
Related
I have a process that looks at a database table, picks up records and sends emails. At different times of the day/month this process can get pretty backed up, and current we have 30 instances of a windows service running to keep up with demand.
We tried creating a single instance, and spinning up 6 long running TPL tasks per instance, but this is static and didn't scale well.
What I would like to be able to do is look at the table to be processed, count the number of requests, and add threads to a pool up to a specified cap, say NumProcessors * 10. When the demand goes back down, pull these threads back out of the pool because each thread his the DB every 2 seconds, and I would much rather have 6 threads doing that per instance than 60.
Add threads is pretty easy, but I'm having a hard time thinking of a way to gracefully pull threads out of the pool as demand goes down.
One way to do this would be to have a single thread that reads from the database and sends the requests to processing threads (possibly using something like ConcurrentQueue).
This way, you always hit the database only once every 2 seconds (or whatever), but you also can have many threads that actually do the work that takes a long time (sending emails).
Behind the scene Task use ThreadPool - http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
You can control the max/min number of threads in the pool using SetMaxThreads and SetMinThreads functions.
So during peak time you can set max threads to NumProcessors * 10, and then when demand go back restore it to its previous value.
I've helped a client with an application which stopped doing it's work after a while (it eventually started doing work again).
The problem was that when a Task failed it used Thread.Sleep for 5 seconds (in the task). As there could be up to 800 tasks queued every two second you can imagine the problem if many of those jobs fails and invoked Thread.Sleep. None of those jobs were marked with TaskCreationOptions.LongRunning.
I rewrote the tasks so that Thread.Sleep (or Task.Delay for that matter) wasn't nessacary.
However, I'm interested in what the TaskScheduler (the default) did in that scenario. How and when do it increase the number of threads?
According to MSDN
Behind the scenes, tasks are queued to the ThreadPool, which has been
enhanced with algorithms (like hill-climbing) that determine and
adjust to the number of threads that maximizes throughput.
The ThreadPool will have a maximum number of threads depending on the environment. As you create more Tasks the pool can run them concurrently until it reaches its maximum number of threads, at which point any further tasks will be queued.
If you want to find out the maximum number of ThreadPool threads you can use System.Threading.ThreadPool.GetMaxThreads (you need to pass in two out int parameters, one that will be populated with the number of maximum worker threads and another that will be populated with the maximum number of asynchronous I/O threads).
If you want to get a better idea of what is happening in your application at runtime you can use Visual Studio's threads window by going to Debug -> Windows -> Threads (The entry will only be there when you are debugging so you'll need to set a break point in your application first).
This post is possibly of interest. It would seem the default task scheduler simply queues the task up in the ThreadPool's queue unless you use TaskCreationOptions.LongRunning. This means that it's up to the ThreadPool to decide when to create new threads.
I built an app that performs work on thousands of files, then writes modified copies of these files to the disk. I am using a ThreadPool but it was spawning so many threads the pc was becoming unresponsive 260 total), so i changed the max from the default of 250 down to 50, this solved that issue (app only spawns about 60 threads total), however now that the files are becoming ready so quickly, its tying up the UI to the point where the pc is unresponsive.
Is there a way to limit the amount of I/O - i mean, i like using 50 threads to perform the work on the files, but not 50 threads writing at the same time when they are processed. I would rather not re-architect the writing of the files part if i can keep from it - i was hoping i could limit the amount of I/O (simultaneous) the threads from this pool could consume.
Use a semaphore to limit no. of threads wanting to write to disk simultaneously.
http://msdn.microsoft.com/en-us/library/system.threading.semaphore.aspx
Limits the number of threads that can
access a resource or pool of resources
concurrently.
You really don't need so many threads. A disk can only support its maximum read and write throughput, which a single thread can easily max-out if it is dedicated to IO i.e. reading or writing. You also cannot read and write to a hard disk simultaneously (although this is complicated with OS caching layers, etc), so having concurrent threads reading and writting can be very counter-productive. There is also little to be gained from having more threads than processors\cores for your non-IO tasks as any additional threads will spend much of their time waiting for a core to become available e.g. if you have 50 threads and 4 cores, a minimum of 46 of the threads will be idle at any given time. The wasted threads will contribute to both memory consumption also incur performance overhead as they will all be fighting to get a crack at some time on a core, and the OS has to arbitrate this fight.
A more straightforward approach would be have a single thread whose job it is to read in the files, and then add the data to a blocking queue (e.g. see ConcurrentQueue), meanwhile have a number of worker threads that are waiting on file data in the queue (e.g. a number threads equal to the number of processors\cores). These worker threads will munch their way through the queue as items are added, and block when it is empty. When a worker thread finishes a piece of work, it can add that to another blocking queue which is being monitored either by the reader thread or a dedicated writer thread. Its job is to write the files out.
This pattern seeks to balance IO and CPU amongst a much smaller bunch of co-operating threads, where the number of IO threads is limited to what is physically capable by a hard drive, and a number of CPU worker threads that is sensible for the number of processors\cores you have. In essence it separates IO and CPU work so that things behave more predictably.
Further to this, if IO really is the problem (and not a huge amount of threads all fighting each other), then you can place some pauses (e.g. Thread.Sleep) in your file reading and writing threads to limit how much work they do.
Update
Perhaps it is worth explaining why there are so many threads being generated in the first place. This is a degenerative case for threadpool use, and is centred around queueing workitems that have a component of IO in them.
The threadpool executes work items from its queue and monitors how long executing work items are taking. If currently executing workitems are taking a long time to complete (I think half a second from memory) then it will start adding more threads to the pool as it believes this will get the queue processed quicker\more fairly. However, if the additional concurrent workitems are also performing work IO against a shared disk, then performance of the disk will actually reduce, meaning that workitems will take even longer to execute. Because workitems are taking longer to execute, the threadpool adds more threads. This is the degenerative case, where performance gets worse and worse as more threads are added.
The use of a semaphore as suggested would have to be done carefully, as the semaphore could cause blocking of threadpool threads, the threadpool would see workitems taking a long time to execute, and it will still start adding more threads.
Is there a magic number or formula for setting the values of SetMaxThreads and SetMinThreads for ThreadPool? I have thousands of long-running methods that need execution but just can't find the perfect match for setting these values. Any advise would be greatly appreciated.
The default minimum number of threads is the number of cores your machine has. That's a good number, it doesn't generally make sense to run more threads than you have cores.
The default maximum number of threads is 250 times the number of cores you have on .NET 2.0 SP1 and up. There is an enormous amount of breathing room here. On a four core machine, it would take 499 seconds to reach that maximum if none of the threads complete in a reasonable amount of time.
The threadpool scheduler tries to limit the number of active threads to the minimum, by default the number of cores you have. Twice a second it allows one more thread to start if the active threads do not complete. Threads that run for a very long time or do a lot of blocking that is not caused by I/O are not good candidates for the threadpool. You should use a regular Thread instead.
Getting to the maximum isn't healthy. On a four core machine, just the stacks of those threads will consume a gigabyte of virtual memory space. Getting OOM is very likely. Consider lowering the max number of threads if that's your problem. Or consider starting just a few regular Threads that receive packets of work from a thread-safe queue.
Typically, the magic number is to leave it alone. The ThreadPool does a good job of handling this.
That being said, if you're doing a lot of long running services, and those services will have large periods where they're waiting, you may want to increase the maximum threads to handle more options. (If the processes aren't blocking, you'll probably just slow things down if you increase the thread count...)
Profile your application to find the correct number.
If you want better control, you might want to consider NOT using the built-in ThreadPool. There is a nice replacement at http://www.codeproject.com/KB/threads/smartthreadpool.aspx.
Are there any benefits to limiting the number of concurrent threads doing a given task to equal the number of processors on the host system? Or better to simply trust libraries such as .NET's ThreadPool to do the right thing ... even if there are 25 different concurrent threads happening at any one given moment?
Most threads are not CPU bound, they end up waiting on IO or other events. If you look at your system now, I imagine you have 100's (if not 1000's) of threads executing with no problems. By that measure, you're probably best just leaving the .NET thread pool to do the right thing!
However, if the threads were all CPU bound (e.g. something like ray tracing) then it would be a good idea to limit the number of threads to the number of cores, otherwise chances are that context switching will begin to hurt performance.
The threadpool already does a reasonably good job at this. It tries to limit the number of running threads to the number of CPU cores in your machine. When one thread ends, it immediately schedules another eligible thread for execution.
Every 0.5 seconds, it evaluates what is going on with the running threads. When the threads have been running too long, it assumes they are stalled and allows another thread to start executing. You'll now have more threads running than you have CPU cores. This can go up to the maximum number of allowed thread, as set by ThreadPool.SetMaxThreads().
Starting around .NET 2.0 SP1, the default maximum number of threads was increased considerably to 250 times the number of cores. You should never ever get there. If you do, you would have wasted about 2 minutes of time where a possibly non-optimal number of threads were running. Those threads however would all have to be blocking for that long, not exactly a typical execution pattern for a thread. On the other hand, if these threads are all waiting on the same kind of resource they are likely to just take turns, adding more threads cannot improve throughput.
Long story short, the thread pool will work well if you run threads that execute quickly (seconds at most) and don't block for a long time. You probably ought to consider creating your own Thread objects when your code doesn't match that pattern.
Well, if your bottleneck is ONLY processors, then it might make sense, but that would ignore all memory and other i/o bottlenecks, and chances are at least your cache memory is throwing page faults and other events that would slow the threads.
I'd trust the library myself. Threads wait for all kinds of things, and you don't want your application to slow down because it can't spawn a new thread, even though most of the rest are just sleeping, waiting for some event or resource.
Measure your application under a variety of thread:processor ratios. Come to conclusions based on hard data about your application. Accept no arguments from first principles about what performance you should get, only what you do get matters.