I've helped a client with an application which stopped doing it's work after a while (it eventually started doing work again).
The problem was that when a Task failed it used Thread.Sleep for 5 seconds (in the task). As there could be up to 800 tasks queued every two second you can imagine the problem if many of those jobs fails and invoked Thread.Sleep. None of those jobs were marked with TaskCreationOptions.LongRunning.
I rewrote the tasks so that Thread.Sleep (or Task.Delay for that matter) wasn't nessacary.
However, I'm interested in what the TaskScheduler (the default) did in that scenario. How and when do it increase the number of threads?
According to MSDN
Behind the scenes, tasks are queued to the ThreadPool, which has been
enhanced with algorithms (like hill-climbing) that determine and
adjust to the number of threads that maximizes throughput.
The ThreadPool will have a maximum number of threads depending on the environment. As you create more Tasks the pool can run them concurrently until it reaches its maximum number of threads, at which point any further tasks will be queued.
If you want to find out the maximum number of ThreadPool threads you can use System.Threading.ThreadPool.GetMaxThreads (you need to pass in two out int parameters, one that will be populated with the number of maximum worker threads and another that will be populated with the maximum number of asynchronous I/O threads).
If you want to get a better idea of what is happening in your application at runtime you can use Visual Studio's threads window by going to Debug -> Windows -> Threads (The entry will only be there when you are debugging so you'll need to set a break point in your application first).
This post is possibly of interest. It would seem the default task scheduler simply queues the task up in the ThreadPool's queue unless you use TaskCreationOptions.LongRunning. This means that it's up to the ThreadPool to decide when to create new threads.
Related
I am trying to figure out exactly what impact does ThreadPool.SetMinThreads makes.
According to official documentation it says
Sets the minimum number of threads the thread pool creates on demand, as new requests are made, before switching to an algorithm for managing thread creation and destruction.
In my understanding, as a developer, I'm suppose to have control over mechanism on how to spin new threads on demand, so they are created and waiting in the idle state, in situations when for example I'm expecting load of request coming at specific time.
And this is exactly what I initially thought SetMinThreads method is designed for.
But when I started actually playing with it - I got really weird results.
So I'm having my ASP.NET .NET5 application, and in controller action I'm having code like this:
ThreadPool.SetMinThreads(32000, 1000);
And of course I'm intuitively expecting runtime to create 32K of worker threads and 1000 io threads for me.
And when I do that, and then call other method - Process.GetCurrentProcess().Threads to get all the process' threads, and print statistic on them, I get something like this
Standby - 17
Running - 4
I thought that maybe app needs some time to spin new threads, so I've tried diffrent delays, 1min, 5min and 10mins.
But result always stays the same, I get 15-20 Standby and 2-4 Running.
So then comes logical question - what exactly SetMinThreads method is doing at all? The description provided by MSDN does not seem very helpful.
And another logical question - what if I wanted to force dotnet to spin 32K of new threads in idle state - does dotnet provide any mechanism for it at all?
The ThreadPool.SetMinThreads sets the minimum number of threads that the ThreadPool creates instantly on demand. That's the key phrase, and it is indeed quite unintuitive. The ThreadPool currently¹ (.NET 5) works in two modes:
When a new request for work arrives, and all the threads in the pool are busy, create instantly a new thread in order to satisfy the request.
When a new request for work arrives, and all the threads in the pool are busy, queue the request, and wait for 1 sec before creating a new thread, hoping that in the meantime one of the worker threads will complete its current work, and will become available for serving the queued request.
The ThreadPool.SetMinThreads sets the threshold between these two modes. It does not give you control over the number of threads that are alive right now. Which is not very satisfying, but it is what it is. If you want to force the ThreadPool to create 1,000 threads instantly, you must also sent an equal number of requests for work, additionally to calling ThreadPool.SetMinThreads(1000, 1000). Something like this should do the trick:
ThreadPool.SetMinThreads(1000, 1000);
Task[] tasks = Enumerable.Range(0, 1000)
.Select(_ => Task.Run(() => Thread.Sleep(100)))
.ToArray();
Honestly I don't think that anyone does that. Creating a new Thread is quite fast in human time (it requires around 0.25 milliseconds per thread in my PC), so for a system that receives requests from humans, the overhead of creating a thread shouldn's have any measurable impact. On the other hand 0.25 msec is an eon in computer time, when you want a thread to do a tiny amount of work (in the range of nanoseconds), like adding something in a List<T>. That's why the ThreadPool was invented in the first place: To amortize the overhead of thread-creation for tiny but numerous workloads.
Be aware that creating a new Thread has also a memory cost, which is generally more significant than the time cost: each thread requires at least 1 MB of RAM for its stack. So creating 32,000 threads will tie down 32 GB of memory just for stack space. This is not very efficient. That's why in recent years asynchronous programming has become so prominent in server-side web development, because it allows to do more work with less threads.
¹ There is nothing preventing the Microsoft engineers from changing/twicking the implementation of the ThreadPool in the future. AFAIK this has already happened at least once in the past.
We have a .ForEach loop (TPL) which starts many, many, many Tasks.
Since the TPL is consuming threads from the thread pool I am wondering what will happen when there are no more threads available?
Will the calling code block until threads are available again?
I know the threadpool has a global work queue where work items (Task) will be queued. Can that queue ever be full?
Our problem is that some of the tasks are long running (30 minutes) and some are short (a second), but we have thousands of such Tasks, if not more. Does the TPL start a new Thread for each Task I start? I think not. At what point will the thread pool be exhausted?
When there are no more free Threads there are several algorithms kicking in. Yhe main one is that the ThreadPool will slowly create extra threads (max 2/second).
This helps to address your situation with long-running tasks, but the system is not perfect. Be ware of a situation where hundreds of threads are created, your app will probably crash.
First approach would be to specify a DegreeOfParallelism on the ForEach. You want to limit the number of threads to numberOfCores * someFactor where someFactor depends on the I/O the Tasks perform.
You could also investigate custom TPL schedulers, I don't know much about that.
I have a C# Windows Service that starts up various objects (Class libraries). Each of these objects has its own "processing" logic that start up multiple long running processing threads by using the ThreadPool. I have one example, just like this:
System.Threading.ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(WorkerThread_Processing));
This works great. My app works with no issues, and my threads work well.
Now, for regression testing, I am starting those same objects up, but from a C# Console app rather than a Windows Service. It calls the same exact code (because it is invoking the same objects), however the WorkerThread_Processing method delays for up to 20 seconds before starting.
I have gone in and switched from the ThreadPool to a Thread, and the issue goes away. What could be happening here? I know that I am not over the MaxThreads count (I am starting 20 threads max).
The ThreadPool is specifically not intended for long-running items (more specifically, you aren't even necessarily starting up new threads when you use the ThreadPool, as its purpose is to spread the tasks over a limited number of threads).
If your task is long running, you should either break it up into logical sections that are put on the ThreadPool (or use the new Task framework), or spin up your own Thread object.
As to why you're experiencing the delay, the MSDN Documentation for the ThreadPool class says the following:
As part of its thread management strategy, the thread pool delays before creating threads. Therefore, when a number of tasks are queued in a short period of time, there can be a significant delay before all the tasks are started.
You only know that the ThreadPool hasn't reached its maximum thread count, not how many threads (if any) it actually has sitting idle.
The thread pool's maximum number of threads value is the maximum number that it can create. It is not the maximum number that are already created. The thread pool has logic that prevents it from spinning up a whole bunch of threads instantly.
If you call ThreadPool.QueueUserWorkItem 10 times in quick succession, the thread pool will not create 10 threads immediately. It will start a thread, delay, start another, etc.
I seem to recall that the delay was 500 milliseconds, but I can't find the documentation to verify that.
Here it is: The Managed Thread Pool:
The thread pool has a built-in delay (half a second in the .NET
Framework version 2.0) before starting new idle threads. If your
application periodically starts many tasks in a short time, a small
increase in the number of idle threads can produce a significant
increase in throughput. Setting the number of idle threads too high
consumes system resources needlessly.
You can control the number of idle threads maintained by the thread
pool by using the GetMinThreads and SetMinThreads
Note that this quote is taken from the .NET 3.5 version of the documentation. The .NET 4.0 version does not mention a delay.
I need to optimize a WCF service... it's quite a complex thing. My problem this time has to do with tasks (Task Parallel Library, .NET 4.0). What happens is that I launch several tasks when the service is invoked (using Task.Factory.StartNew) and then wait for them to finish:
Task.WaitAll(task1, task2, task3, task4, task5, task6);
Ok... what I see, and don't like, is that on the first call (sometimes the first 2-3 calls, if made quickly one after another), the final task starts much later than the others (I am looking at a case where it started 0.5 seconds after the others). I tried calling
ThreadPool.SetMinThreads(12*Environment.ProcessorCount, 20);
at the beginning of my service, but it doesn't seem to help.
The tasks are all database-related: I'm reading from multiple databases and it has to take as little time as possible.
Any idea why the last task is taking so long? Is there something I can do about it?
Alternatively, should I use the thread pool directly? As it happens, in one case I'm looking at, one task had already ended before the last one started - I would had saved 0.2 seconds if I had reused that thread instead of waiting for a new one to be created. However, I can not be sure that that task will always end so quickly, so I can't put both requests in the same task.
[Edit] The OS is Windows Server 2003, so there should be no connection limit. Also, it is hosted in IIS - I don't know if I should create regular threads or using the thread pool - which is the preferred version?
[Edit] I've also tried using Task.Factory.StartNew(action, TaskCreationOptions.LongRunning); - it doesn't help, the last task still starts much later (around half a second later) than the rest.
[Edit] MSDN1 says:
The thread pool has a built-in delay
(half a second in the .NET Framework
version 2.0) before starting new idle
threads. If your application
periodically starts many tasks in a
short time, a small increase in the
number of idle threads can produce a
significant increase in throughput.
Setting the number of idle threads too
high consumes system resources
needlessly.
However, as I said, I'm already calling SetMinThreads and it doesn't help.
I have had problems myself with delays in thread startup when using the (.Net 4.0) Task-object. So for time-critical stuff I now use dedicated threads (... again, as that is what I was doing before .Net 4.0.)
The purpose of a thread pool is to avoid the operative system cost of starting and stopping threads. The threads are simply being reused. This is a common model found in for example internet servers. The advantage is that they can respond quicker.
I've written many applications where I implement my own threadpool by having dedicated threads picking up tasks from a task queue. Note however that this most often required locking that can cause delays/bottlenecks. This depends on your design; are the tasks small then there would be a lot of locking and it might be faster to trade some CPU in for less locking: http://www.boyet.com/Articles/LockfreeStack.html
SmartThreadPool is a replacement/extension of the .Net thread pool. As you can see in this link it has a nice GUI to do some testing: http://www.codeproject.com/KB/threads/smartthreadpool.aspx
In the end it depends on what you need, but for high performance I recommend implementing your own thread pool. If you experience a lot of thread idling then it could be beneficial to increase the number of threads (beyond the recommended cpucount*2). This is actually how HyperThreading works inside the CPU - using "idle" time while doing operations to do other operations.
Note that .Net has a built-in limit of 25 threads per process (ie. for all WCF-calls you receive simultaneously). This limit is independent and overrides the ThreadPool setting. It can be increased, but it requires some magic: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=201
Following from my prior question (yep, should have been a Q against original message - apologies):
Why do you feel that creating 12 threads for each processor core in your machine will in some way speed-up your server's ability to create worker threads? All you're doing is slowing your server down!
As per MSDN do
As per the MSDN docs: "You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorith for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.".
Issues like this are usually caused by bumping into limits or contention on a shared resource.
In your case, I am guessing that your last task(s) is/are blocking while they wait for a connection to the DB server to come available or for the DB to respond. Remember - if your invocation kicks off 5-6 other tasks then your machine is going to have to create and open numerous DB connections and is going to kick the DB with, potentially, a lot of work. If your WCF server and/or your DB server are cold, then your first few invocations are going to be slower until the machine's caches etc., are populated.
Have you tried adding a little tracing/logging using the stopwatch to time how long it takes for your tasks to connect to the DB server and then execute their operations?
You may find that reducing the number of concurrent tasks you kick off actually speeds things up. Try spawning 3 tasks at a time, waiting for them to complete and then spawn the next 3.
When you call Task.Factory.StartNew, it uses a TaskScheduler to map those tasks into actual work items.
In your case, it sounds like one of your Tasks is delaying occasionally while the OS spins up a new Thread for the work item. You could, potentially, build a custom TaskScheduler which already contained six threads in a wait state, and explicitly used them for these six tasks. This would allow you to have complete control over how those initial tasks were created and started.
That being said, I suspect there is something else at play here... You mentioned that using TaskCreationOptions.LongRunning demonstrates the same behavior. This suggests that there is some other factor at play causing this half second delay. The reason I suspect this is due to the nature of TaskCreationOptions.LongRunning - when using the default TaskScheduler (LongRunning is a hint used by the TaskScheduler class), starting a task with TaskCreationOptions.LongRunning actually creates an entirely new (non-ThreadPool) thread for that Task. If creating 6 tasks, all with TaskCreationOptions.LongRunning, demonstrates the same behavior, you've pretty much guaranteed that the problem is NOT the default TaskScheduler, since this is going to always spin up 6 threads manually.
I'd recommend running your code through a performance profiler, and potentially the Concurrency Visualizer in VS 2010. This should help you determine exactly what is causing the half second delay.
What is the OS? If you are not running the server versions of windows, there is a connection limit. Your many threads are probably being serialized because of the connection limit.
Also, I have not used the task parallel library yet, but my limited experience is that new threads are cheap to make in the context of networking.
These articles might explain the problem you're having:
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-are-wcf-responses-slow-and-setminthreads-does-not-work.aspx
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-does-wcf-become-slow-after-being-idle-for-15-seconds.aspx
seeing as you're using .Net 4, the first article probably doesn't apply, but as the second article points out the ThreadPool terminates idle threads after 15 seconds which might explain the problem you're having and offers a simple (though a little hacky) solution to get around it.
Whether or not you should be using the ThreadPool directly wouldn't make any difference as I suspect the task library is using it for you underneath anyway.
One third-party library we have been using for a while might help you here - Smart Thread Pool. You still get the same benefits of using the task libraries, in that you can have the return values from the threads and get any exception information from them too.
Also, you can instantiate threadpools so that when you have multiple places each needing a threadpool (so that a low priority process doesn't start eating into the quota of some high priority process) and oh yeah you can set the priority of the threads in the pool too which you can't do with the standard ThreadPool where all the threads are background threads.
You can find plenty of info on the codeplex page, I've also got a post which highlights some of the key differences:
http://theburningmonk.com/2010/03/threading-introducing-smartthreadpool/
Just on a side note, for tasks like the one you've mentioned, which might take some time to return, you probably shouldn't be using the threadpool anyway. It's recommended that we should avoid using the threadpool for any blocking tasks like that because it hogs up the threadpool which is used by all sorts of things by the framework classes, like handling timer events, etc. etc. (not to mention handling incoming WCF requests!). I feel like I'm spamming here but here's some of the info I've gathered around the use of the threadpool and some useful links at the bottom:
http://theburningmonk.com/2010/03/threading-using-the-threadpool-vs-creating-your-own-threads/
well, hope this helps!
I have several low-imprtance tasks to be performed when some cpu time is available. I don't want this task to perform if other more import task are running. Ie if a normal/high priority task comes I want the low-importance task to pause until the importance task is done.
There is a pretty big number of low importance task to be performed (50 to 1000). So I don't want to create one thread per task. However I believe that the threadpool do not allow some priority specification, does it ?
How would you do solve this ?
You can new up a Thread and use a Dispatcher to send it takes of various priorities.
The priorities are a bit UI-centric but that doesn't really matter.
You shouldn't mess with the priority of the regular ThreadPool, since you aren't the only consumer. I suppose the logical approach would be to write your own - perhaps as simple as a producer/consumer queue, using your own Thread(s) as the consumer(s) - setting the thread priority yourself.
.NET 4.0 includes new libraries (the TPL etc) to make all this easier - until then you need additional code to create a custom thread pool or work queue.
When you are using the build in ThreadPool all threads execute with the default priority. If you mess with this setting it will be ignored. This is a case where you should roll your own ThreadPool. A few years ago I extended the SmartThreadPool to meet my needs. This may satisfy yours as well.
I'd create a shared Queue of pending task objects, with each object specifying its priority. Then write a dispatcher thread that watches the Queue and launches a new thread for each task, up to some max thread limit, and specifying the thread priority as it creates it. Its only a small amount of work to do that, and you can have the dispatcher report activity and even dynamically adjust the number of running threads. That concept has worked very well for me, and can be wrapped in a windows service to boot if you make your queue a database table.