I create about 5000 background workers that do intensive work in a console app. I'm also using an external library that instantiates an object, say ObjectX. At some point, say t0, ObjectX tries to obtain a thread from an os thread pool and start it, but I have no control on how it obtains this thread. Things work fine for 100 background workers. For 1000 background workers it takes about 10 minutes after t0 for ObjectX to obtain and start a thread.
Is there a way to set, in advance, a high priority for any threads that will be started in the future by an object?
As I think the answer to 1 is "no", is there a way to limit the priority of the background workers so as to somehow favor everything else? Even though I only want to 'favor' ObjectX.
The goal would be to always have available resources to run the thread launched by ObjectX, no matter how overloaded the machine is.
I'm using C# and the .Net fr 3.5, on a Windows 64bit machine.
The way threads work is that they are given processor time by the OS. When this happens this is called a context switch. A context switch takes about 2000-8000 cycles (i.e. depending on processor 2000-8000 instructions). If the OS has many CPUs or cores, it may not need to take the CPU away from one thread and give it to another--avoiding a context switch. There can only be one thread per CPU running at a time, when you have more threads that need CPU than CPUs then you're forcing a context switch. Context switches are performed no faster than the system quantum (every 20ms for client and 120ms for server).
If you have 5000 background workers you effectively have 5000 threads. Each of those threads is potentially vying for CPU time. On a client version of windows, that means 250,000 context switches per second. i.e. 500,000,000 to 2,000,000,000 cycles per second are devoted simply to switching between threads. (i.e. over and above the work your threads are performing) if it could even process that many context switches per second.
The recommended practice is to only have one CPU-bound thread per processor. A CPU-bound thread is one that spends very little time "waiting". The UI thread is not a CPU-bound thread. If your background workers are spending a lot of time waiting for locks, then they may not be CPU-bound either--but, in general, background worker threads are CPU-bound. (otherwise, what would be the point of using a background worker?).
Also, the OS spends a lot of time figuring out what thread needs to get the CPU next. When you start changing thread priorities you interfere with that and most of the time end up making your entire system slower (not just your application) rather than faster.
Update:
On a related not, it takes about 200,000 cycles to create a new thread and about 100,000 cycles to destroy a thread.
Update 2:
If the impetus of the question isn't simply "If it can be done" but to be able to scale workload, then as #JoshW/#Servy mention, using something like the Producer/Consumer Pattern would allow for scalability that could facilitate horizontal scaling to multiple computers/nodes via a queue or a service bus. Simple starting up an in ordinate amount of threads is not scalable beyond the # of CPUs. If what you truly want is an architecture that can scaled out because "available resources...how overloaded the machine is" is simply impossible.
Personally I think this is a bad idea, however... given the comments you have made on other answers and your request that "No matter how many background workers are create that ObjectX runs as soon as possible"... You could conceivably force your background workers to block using a ManualResetEvent.
For example at the top of your worker code you could block on a Manual reset event with the WaitOne method. This manual reset could be static or passed as an input parameter and wherever your ObjectX gets instantiated/called or whatever, you call the .Reset method on your ManualResetEvent. This would block all your workers at the WaitOne line. Next at the bottom of the code that runs ObjectX, call the ManualResetEvent.Set() method and that will unblock the workers.
Note this is NOT an efficient way to manage your threads, but if you "just have to make it work" and have time later to improve it... I suppose it's one possible solution.
The goal would be to always have available resources to run the thread launched by ObjectX, no matter how overloaded the machine is.
Then thread priorities might not be the right tool.. Remember, thread priorities are evil
In general, windows is not a real-time OS; especially, win32 does not even attempt to be soft real-time (IIRC, the NT kernel tried, at some point, to have at least support for soft real time subsystems, but I may be wrong). So there is no guarantee about available resources, or timing.
Also, are you worried about other threads in the system? Those threads are out of your control (what if the other threads are already at the system max priority?).
If you are worried about threads in your app... you can control and throttle them, using less threads/workers to do more work (batching work in bigger units, and submitting it to a worker, for example, or by using TPL or other tools that will handle and throttle thread usage for you)
That said, you could intercept when a thread is created (look for example this question https://stackoverflow.com/a/3802316/863564) see if it was created for ObjectX (for example, checking its name) and use SetThreadPriority to boost it.
Related
I'm trying to fix memory spikes in a very large application. While I'm not sure how much of an effect this would have on memory, I noticed the following:
Application uses a custom thread pool to do all expensive tasks
Application will execute all incoming tasks
Tasks can be composed of thousands of sub tasks
While the thread pool will only execute {T} tasks at a time, and finishes a task completely before starting a new one, it does create a new system thread (Thread class) and start it for every sub task added to it
The sub task system threads are started with a thread start that instantly blocks on a manual reset event (MRE) pending a thread pool slot freeing up
So, this thread pool can create thousands of threads, but all but 30 (or whatever you configure) will be blocked on an MRE while other tasks complete.
My Question:
What impact on the memory/processor will a thousand threads blocked on MREs have? I don't have a lot of time to fix this spike, so if it's minimal, I'd rather leave the issue and work to fix it in a later patch when I have more time.
Also, is this behavior typical in thread pools, or does this sound flawed (I'm leaning towards flawed, but I don't have a good enough background to be sure).
What impact on the memory/processor will a thousand threads blocked on MREs have? I don't have a lot of time to fix this spike, so if it's minimal, I'd rather leave the issue and work to fix it in a later patch when I have more time.
Each thread, when created manually, has it's own stack allocated to it. By default, this will be 1MB per thread, though it is possible to make threads with a smaller stack via a constructor parameter.
You'd be much better off redesigning this, from the description of your problem, to use the standard ThreadPool, and a class like BlockingCollection<T> to handle your throttling. This is designed to directly allow bounding input, with blocking. Making a custom "thread pool" of an infinite number of threads is going to be far less efficient than using the highly tuned ThreadPool included with the framework.
Also, is this behavior typical in thread pools, or does this sound flawed (I'm leaning towards flawed, but I don't have a good enough background to be sure).
This is definitely flawed. The entire point of a ThreadPool is to avoid making a thread per request, and "pool" the threads (reusing them) for multiple requests without having to recreate them.
I need to optimize a WCF service... it's quite a complex thing. My problem this time has to do with tasks (Task Parallel Library, .NET 4.0). What happens is that I launch several tasks when the service is invoked (using Task.Factory.StartNew) and then wait for them to finish:
Task.WaitAll(task1, task2, task3, task4, task5, task6);
Ok... what I see, and don't like, is that on the first call (sometimes the first 2-3 calls, if made quickly one after another), the final task starts much later than the others (I am looking at a case where it started 0.5 seconds after the others). I tried calling
ThreadPool.SetMinThreads(12*Environment.ProcessorCount, 20);
at the beginning of my service, but it doesn't seem to help.
The tasks are all database-related: I'm reading from multiple databases and it has to take as little time as possible.
Any idea why the last task is taking so long? Is there something I can do about it?
Alternatively, should I use the thread pool directly? As it happens, in one case I'm looking at, one task had already ended before the last one started - I would had saved 0.2 seconds if I had reused that thread instead of waiting for a new one to be created. However, I can not be sure that that task will always end so quickly, so I can't put both requests in the same task.
[Edit] The OS is Windows Server 2003, so there should be no connection limit. Also, it is hosted in IIS - I don't know if I should create regular threads or using the thread pool - which is the preferred version?
[Edit] I've also tried using Task.Factory.StartNew(action, TaskCreationOptions.LongRunning); - it doesn't help, the last task still starts much later (around half a second later) than the rest.
[Edit] MSDN1 says:
The thread pool has a built-in delay
(half a second in the .NET Framework
version 2.0) before starting new idle
threads. If your application
periodically starts many tasks in a
short time, a small increase in the
number of idle threads can produce a
significant increase in throughput.
Setting the number of idle threads too
high consumes system resources
needlessly.
However, as I said, I'm already calling SetMinThreads and it doesn't help.
I have had problems myself with delays in thread startup when using the (.Net 4.0) Task-object. So for time-critical stuff I now use dedicated threads (... again, as that is what I was doing before .Net 4.0.)
The purpose of a thread pool is to avoid the operative system cost of starting and stopping threads. The threads are simply being reused. This is a common model found in for example internet servers. The advantage is that they can respond quicker.
I've written many applications where I implement my own threadpool by having dedicated threads picking up tasks from a task queue. Note however that this most often required locking that can cause delays/bottlenecks. This depends on your design; are the tasks small then there would be a lot of locking and it might be faster to trade some CPU in for less locking: http://www.boyet.com/Articles/LockfreeStack.html
SmartThreadPool is a replacement/extension of the .Net thread pool. As you can see in this link it has a nice GUI to do some testing: http://www.codeproject.com/KB/threads/smartthreadpool.aspx
In the end it depends on what you need, but for high performance I recommend implementing your own thread pool. If you experience a lot of thread idling then it could be beneficial to increase the number of threads (beyond the recommended cpucount*2). This is actually how HyperThreading works inside the CPU - using "idle" time while doing operations to do other operations.
Note that .Net has a built-in limit of 25 threads per process (ie. for all WCF-calls you receive simultaneously). This limit is independent and overrides the ThreadPool setting. It can be increased, but it requires some magic: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=201
Following from my prior question (yep, should have been a Q against original message - apologies):
Why do you feel that creating 12 threads for each processor core in your machine will in some way speed-up your server's ability to create worker threads? All you're doing is slowing your server down!
As per MSDN do
As per the MSDN docs: "You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorith for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.".
Issues like this are usually caused by bumping into limits or contention on a shared resource.
In your case, I am guessing that your last task(s) is/are blocking while they wait for a connection to the DB server to come available or for the DB to respond. Remember - if your invocation kicks off 5-6 other tasks then your machine is going to have to create and open numerous DB connections and is going to kick the DB with, potentially, a lot of work. If your WCF server and/or your DB server are cold, then your first few invocations are going to be slower until the machine's caches etc., are populated.
Have you tried adding a little tracing/logging using the stopwatch to time how long it takes for your tasks to connect to the DB server and then execute their operations?
You may find that reducing the number of concurrent tasks you kick off actually speeds things up. Try spawning 3 tasks at a time, waiting for them to complete and then spawn the next 3.
When you call Task.Factory.StartNew, it uses a TaskScheduler to map those tasks into actual work items.
In your case, it sounds like one of your Tasks is delaying occasionally while the OS spins up a new Thread for the work item. You could, potentially, build a custom TaskScheduler which already contained six threads in a wait state, and explicitly used them for these six tasks. This would allow you to have complete control over how those initial tasks were created and started.
That being said, I suspect there is something else at play here... You mentioned that using TaskCreationOptions.LongRunning demonstrates the same behavior. This suggests that there is some other factor at play causing this half second delay. The reason I suspect this is due to the nature of TaskCreationOptions.LongRunning - when using the default TaskScheduler (LongRunning is a hint used by the TaskScheduler class), starting a task with TaskCreationOptions.LongRunning actually creates an entirely new (non-ThreadPool) thread for that Task. If creating 6 tasks, all with TaskCreationOptions.LongRunning, demonstrates the same behavior, you've pretty much guaranteed that the problem is NOT the default TaskScheduler, since this is going to always spin up 6 threads manually.
I'd recommend running your code through a performance profiler, and potentially the Concurrency Visualizer in VS 2010. This should help you determine exactly what is causing the half second delay.
What is the OS? If you are not running the server versions of windows, there is a connection limit. Your many threads are probably being serialized because of the connection limit.
Also, I have not used the task parallel library yet, but my limited experience is that new threads are cheap to make in the context of networking.
These articles might explain the problem you're having:
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-are-wcf-responses-slow-and-setminthreads-does-not-work.aspx
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-does-wcf-become-slow-after-being-idle-for-15-seconds.aspx
seeing as you're using .Net 4, the first article probably doesn't apply, but as the second article points out the ThreadPool terminates idle threads after 15 seconds which might explain the problem you're having and offers a simple (though a little hacky) solution to get around it.
Whether or not you should be using the ThreadPool directly wouldn't make any difference as I suspect the task library is using it for you underneath anyway.
One third-party library we have been using for a while might help you here - Smart Thread Pool. You still get the same benefits of using the task libraries, in that you can have the return values from the threads and get any exception information from them too.
Also, you can instantiate threadpools so that when you have multiple places each needing a threadpool (so that a low priority process doesn't start eating into the quota of some high priority process) and oh yeah you can set the priority of the threads in the pool too which you can't do with the standard ThreadPool where all the threads are background threads.
You can find plenty of info on the codeplex page, I've also got a post which highlights some of the key differences:
http://theburningmonk.com/2010/03/threading-introducing-smartthreadpool/
Just on a side note, for tasks like the one you've mentioned, which might take some time to return, you probably shouldn't be using the threadpool anyway. It's recommended that we should avoid using the threadpool for any blocking tasks like that because it hogs up the threadpool which is used by all sorts of things by the framework classes, like handling timer events, etc. etc. (not to mention handling incoming WCF requests!). I feel like I'm spamming here but here's some of the info I've gathered around the use of the threadpool and some useful links at the bottom:
http://theburningmonk.com/2010/03/threading-using-the-threadpool-vs-creating-your-own-threads/
well, hope this helps!
I have a thread that I fire off every time the user scans a barcode.
Most of the time it is a fairly short running thread. But sometimes it can take a very long time (waiting on a invoke to the GUI thread).
I have read that it may be a good idea to use the ThreadPool for this rather than just creating my own thread for each scan.
But I have also read that if the ThreadPool runs out of threads then it will just wait until some other thread exits (not OK for what I am doing).
So, how likely is it that I am going to run out of threads? And is the benefit of the ThreadPool really worth it? (When I scan it does not seem to take too long for the scan to "run" the thread logic.)
It depends on what you mean by "a very long time" and how common that scenario is.
The MSDN topic "The Managed Thread Pool" offers good guidelines for when not to use thread pool threads:
There are several scenarios in which it is appropriate to create and manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The
thread pool has a maximum number of
threads, so a large number of blocked
thread pool threads might prevent
tasks from starting.
You need to place threads into a single-threaded apartment. All
ThreadPool threads are in the
multithreaded apartment.
You need to have a stable identity associated with the thread, or to
dedicate a thread to a task.
Since the user will never scan more than one barcode at a time, the memory costs of the threadpool might not be worth it - I'd stick with a single thread just waiting in the background.
The point of the thread pool is to amortize the cost of creating threads, which are not inexpensive to spin up and tear down. If you have a short-running task, the cost of creating/destroying the thread can be a significant portion of the overall run-time. The maximum number of threads in the thread pool depends on the version of the .NET Framework, typically dozens to hundreds per processor. The number of threads is scaled depending on available work.
Will you run out of threads and have to wait for a thread to become available? It depends on your workload. You can get the maximum number of threads available via ThreadPool.GetMaxThreads(). Chances are (based on the description of your problem) that this number is sufficiently high.
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.getmaxthreads.aspx
Another option would be to manage your own pool of scan threads and assign them work rather than creating a new thread for every scan. Personally I would try the threadpool first and only manage your own threads if it proved necessary. Even better, I would look into async programming techniques in .NET. The methods will be run on the thread pool, but give you a much nicer programming experience than manual thread management.
If most of the time it is short running threads you could use the thread pool or a BackgroundWorker which draws threads from the pool.
An advantage I can see in your case is that threadpool class puts an upper limit on the amount of threads that may be active. It depends on the context of your application whether you will exhaust system resources. Exhausting a modern desktop system is VERY hard to do really.
If the software is used in a supermarket till it is highly unlikely that you will have more then 5 barcodes being analysed at the same time. If its run in a back-end server for a whole row of supermarket tills. Then perhaps 30-100 concurrent requests might be active.
With this sort of theory crafting it is highly unlikely that you will run out of threads, even on embedded hardware. If you have a dozen or so requests active at a time, and your code works, it's ok to just leave it as it is.
A thread pool is just an abstraction though, and you could have queue in the middle that queues request onto a thread-pool, in this scenario for the row-of-till example above, I'd feel comfortable queueing 100-1000 requests against a threadpool with 10 threads.
In .net (and on windows in general), the question should always be reversed: "Is creating a new thread worth it in this scenario?"
Creating a new thread is expensive, and doing it over and over again is almost certainly not worth it. The thread pool is cheap, and really should be the first thing you turn to when you need a new thread.
If you decide to spin up a new thread, soon you will start worrying about re-using the thread if it's already running. Then you will start worrying that sometimes the thread is running but it seems to be taking too long, and so you should make a new one. Then you're going to decide to have a thread not exit immediately upon finishing work, but to wait a little while in case new work comes in. And then... bam! You've created your own thread pool. At which point you should just back up and use the system-provided one.
The folks who mentioned that the thread pool might "run out of threads" were well-intentioned, but they did you a disservice. The limit on the number of threads in the thread pool is quite large. If you run into it, you have other problems.
(And, of course, since .net 2.0, you can set the maximum number of threads, so you can tweak the number if you absolutely have to.)
Others have directed you to MSDN: "The Managed Thread Pool". I will repeat that direction, as the article is good, but in my mind does not sell the thread pool hard enough. :)
I've read at many places that .net Threadpool is meant for short time span tasks (may be not more than 3secs). In all these mentioning I've not found a concrete reason why it should be not be used.
Even some people said that it leads to nasty results if we use for long time tasks and also leads to deadlocks.
Can somebody explain it in plain english with technical reason why we should not use thread pool for long time span tasks?
To be specific, I would even like to give a scenario and want to to know why ThreadPool should not be used in this scenario with proper reasons behind it.
Scenario: I need to process some thousands of user's data. User's processing data is retrieved from a local database and using that information I need to connect to an API hosted on some other location and the response from API will be stored in the local database after processing it.
If someone can explain me pitfalls in this scenario if I use ThreadPool with thread limit of 20? Processing time of each user may range from 3 sec to 1 min (or more).
The point of the threadpool is to avoid the situation where the time spent creating the thread is longer than the time spent using it. By reusing existing threads, we get to avoid that overhead.
The downside is that the threadpool is a shared resource: if you're using a thread, something else can't. So if you have lots of long-running tasks, you could end up with thread-pool starvation, possibly even leading to deadlock.
Don't forget that your application's code may not be the only code using the thread pool... the system code uses it a lot too.
It sounds like you might want to have your own producer/consumer queue, with a small number of threads processing it. Alternatively, if you could talk to your other service using an asynchronous API, you may find that each bit of processing on your computer would be short-lived.
It is related to the way the threadpool scheduler works. It tries hard to ensure that it won't release more waiting threads than you have CPU cores. Which is a good idea, running more threads than cores is wasteful as Windows spends time switching context between threads. Making the overall time needed to complete the jobs longer.
As soon as a TP thread completes, another one is allowed to run. Two times per second, the TP scheduler steps in when the running threads do not complete. It cannot tell why these threads are taking so much time to get their job done. Half a second is a lot of CPU cycles, a cool billion or so. It therefore assumes that the threads are blocking, waiting for some kind of I/O to complete. Like a dbase query, a disk read, a socket connection attempt, stuff like that.
And it allows another thread to run. You've now got more threads then you have cores. Which isn't really a problem if those original threads are indeed blocking, they're not consuming any CPU cycles.
You can see where this leads: if your thread runs for 3 seconds then its creating a bit of a logjam. It delays, but won't block, other TP threads that are waiting to run. If your thread needs to spend so much time because it is constantly blocking then you are better off creating a regular Thread. And if you really care that the thread does not get delayed by the TP scheduler then you should use a Thread as well.
The TP scheduler was tinkered with in .NET 4.0 btw, what I wrote is really only true for earlier releases. The basics are still there, it just uses a smarter scheduling algorithm. Based on a feedback, dynamically scheduling by measuring throughput. This really only matters if you have a lot of TP threads going.
Two reasons not really touched upon:
The threadpool is used as the normal means of handling I/O callback functions, which are usually supposed to happen very soon after associated I/O operation completes. In general, timeliness is more important with short tasks than long ones, but long-running tasks in the threadpool will delay the execution of notification tasks which could have (and should have) started up, run, and completed quickly.
If a threadpool task becomes blocked until such time as some other threadpool task runs, it may hog a threadpool thread, thus delaying or in some cases blocking altogether the start of that other task (or any others).
Generally, having a threadpool thread acquire a lock (waiting if necessary) isn't a problem. If it's necessary for one threadpool thread to wait for another threadpool thread to release a lock, the fact that latter thread acquired the lock in the first place implies that it got started. On the other hand, waiting for e.g. some data to arrive from a connection may cause deadlock if an I/O callback routine is used to flag the arrival of data. If many too many threadpool threads are waiting for the I/O callback to signal that data has arrived, the system may decide to defer the callback until one of the threadpool threads completes.
So I have a program that serves as a sort of 'shell' for other programs. At its core, it gets passed a class, a method name, and some args, and handles the execution of the function, with the idea of allowing other programmers to basically schedule their processes to run on this shell service. Everything works fine except for one issue. Frequently, these processes that are scheduled for execution are very CPU heavy. At times, the processes called start using so much of the CPU, that the threads that I have which are responsible for checking schedules and firing off other jobs don't get a chance to run for quite some time, resulting in scheduling issues and a lack of sufficient responsiveness. Unfortunately, I can't insert Thread.Sleep() calls in the actual running code, as I don't really 'own' it.
So my question is: Is it possible to force an arbitrary thread (that I started) to sleep (yield) every so often, without modifying the actual code running in that thread? Barring that, is there some way to 'inject' Thread.Sleep() calls into code that I'm about to run dynamically, at run-time?
Not really. You could try changing the priority of the other thread with the Priority property, but you can't really inject a "sleep" call.
There's Thread.Suspend() but I strongly recommend you don't use it. You don't really want to get another thread to suspend at arbitrary times, as it might hold important resources. (That's why the method is marked as obsolete and has all kinds of warnings around it :)
Reducing the priority of the other tasks and potentially boosting the priority of your own tasks is probably the best approach.
No there is not (to my knowledge), you can however change the scheduling priority of a thread - http://msdn.microsoft.com/en-us/library/system.threading.thread.priority.aspx.
Lower the priority of the thread that is running the problem code as you start the thread (or raise the priority of your scheduling threads)
You can suspend a thread with Thread.Suspend (deprecated and not recommended) or you can lower its priority by changing Thread.Priority.
Actually, what you are stating is very much a security risk and thus isn't going to be supported by most platforms. It's generally not a good policy to allow other programs to affect your status without your permission.
If possible, you could probably load the prog in question on its own process that you create in your code, and then you can set that processe's priority or make it sleep, or what have you. I am no expert in windows programming, so your mileage may vary.
You should not be doing this. Threads should not Sleep() unless they themselves decide it's a good idea.
It's not relevant to your problem anyway. Your problem apparently is that management threads do not run, due to CPU starvation. To fix that, it's sufficient to give those threads a higher priority. At the end of each timeslice, the OS will determine which threads should run next. Thread priority is somewhat dynamic: threads that haven't run for a while will move up relative to threasd that just ran. But the higher base thread priority of your management threads will mean they don't need this dynamic boost to still get scheduled.
If this is still not sufficient, lower the priority of the worker threads. This will mean the worker thread has to be dormant for even more time before it gets a new timeslice.
Finally, make sure that the thread kicking off new worker threads runs at a very low priority. This will mean that no new threads are created if the system is hitting CPU limits. This implements a simple but robust throttling mechanism.