ThreadPool SetMinThreads - the impact of setting it - c#

I am trying to understand the impact of setting ThreadPool.SetMinthreads. I have multiple virtual applications running in one Azure App Service. My understanding is that all these virtual applications will share the App Pool, and will have only one worker process (Assuming the App Pool's max worker process will be 1).
I have the below two questions.
In this setup, if I set ThreadPool.SetMinThreads to let's say 100 worker threads and IO threads, can I safely assume that each app domain will have 100 worker threads and 100 IO threads when it is loaded? To be precise, the ThreadPool.SetMinThreads applies within the AppDomain, or Worker Process or App Pool? What is the scope of ThreadPool?
I also assume there is no limitation on the max threads the system can spawn as it is determined by the underlying host's capacity. This means, if I do not explicitly set ThreadPool.SetMaxThreads, the system will spawn new threads and will continue to do it if there is a continuous load till CPU/Memory spikes to the max. I am basing on the below statement to support my assumption:
Process and threads, for example, require physical memory, virtual
memory, and pool memory, so the number of processes or threads that
can be created on a given Windows system is ultimately determined by
one of these resources, depending on the way that the processes or
threads are created and which constraint is hit first.
https://blogs.technet.microsoft.com/markrussinovich/2009/07/05/pushing-the-limits-of-windows-processes-and-threads/

The MinThreads governs how many worker threads will be spawned without a delay.
Whenever you do something that requires a thread from the thread pool (whether worker or IOCP pool), the system will first see if there is a free thread.
If not, it looks to see how many threads are currently spawned. If that number is less than MinThreads, it immediately spawns a new thread. Otherwise it waits a short time, usually around 300-500ms, though that is system dependent. If there is still no free thread, it will then spawn a new thread.
Of course, this all is still limited by MaxThreads.
All that said, IIS is very good at figuring out a sensible number based on your machine and in most cases you are best to leave it alone; if you are just worried about serving requests then I wouldn't touch it personally. If, on the other hand, you are spawning a lot of background tasks yourself then it may be sensible. I'd strongly encourage you to measure it before you actually make changes.
Though... Setting MinThreads to 100 is rarely harmful, especially as the system will only start the number of threads it actually needs anyway

Related

threading higher priority for a not known in advance thread

I create about 5000 background workers that do intensive work in a console app. I'm also using an external library that instantiates an object, say ObjectX. At some point, say t0, ObjectX tries to obtain a thread from an os thread pool and start it, but I have no control on how it obtains this thread. Things work fine for 100 background workers. For 1000 background workers it takes about 10 minutes after t0 for ObjectX to obtain and start a thread.
Is there a way to set, in advance, a high priority for any threads that will be started in the future by an object?
As I think the answer to 1 is "no", is there a way to limit the priority of the background workers so as to somehow favor everything else? Even though I only want to 'favor' ObjectX.
The goal would be to always have available resources to run the thread launched by ObjectX, no matter how overloaded the machine is.
I'm using C# and the .Net fr 3.5, on a Windows 64bit machine.
The way threads work is that they are given processor time by the OS. When this happens this is called a context switch. A context switch takes about 2000-8000 cycles (i.e. depending on processor 2000-8000 instructions). If the OS has many CPUs or cores, it may not need to take the CPU away from one thread and give it to another--avoiding a context switch. There can only be one thread per CPU running at a time, when you have more threads that need CPU than CPUs then you're forcing a context switch. Context switches are performed no faster than the system quantum (every 20ms for client and 120ms for server).
If you have 5000 background workers you effectively have 5000 threads. Each of those threads is potentially vying for CPU time. On a client version of windows, that means 250,000 context switches per second. i.e. 500,000,000 to 2,000,000,000 cycles per second are devoted simply to switching between threads. (i.e. over and above the work your threads are performing) if it could even process that many context switches per second.
The recommended practice is to only have one CPU-bound thread per processor. A CPU-bound thread is one that spends very little time "waiting". The UI thread is not a CPU-bound thread. If your background workers are spending a lot of time waiting for locks, then they may not be CPU-bound either--but, in general, background worker threads are CPU-bound. (otherwise, what would be the point of using a background worker?).
Also, the OS spends a lot of time figuring out what thread needs to get the CPU next. When you start changing thread priorities you interfere with that and most of the time end up making your entire system slower (not just your application) rather than faster.
Update:
On a related not, it takes about 200,000 cycles to create a new thread and about 100,000 cycles to destroy a thread.
Update 2:
If the impetus of the question isn't simply "If it can be done" but to be able to scale workload, then as #JoshW/#Servy mention, using something like the Producer/Consumer Pattern would allow for scalability that could facilitate horizontal scaling to multiple computers/nodes via a queue or a service bus. Simple starting up an in ordinate amount of threads is not scalable beyond the # of CPUs. If what you truly want is an architecture that can scaled out because "available resources...how overloaded the machine is" is simply impossible.
Personally I think this is a bad idea, however... given the comments you have made on other answers and your request that "No matter how many background workers are create that ObjectX runs as soon as possible"... You could conceivably force your background workers to block using a ManualResetEvent.
For example at the top of your worker code you could block on a Manual reset event with the WaitOne method. This manual reset could be static or passed as an input parameter and wherever your ObjectX gets instantiated/called or whatever, you call the .Reset method on your ManualResetEvent. This would block all your workers at the WaitOne line. Next at the bottom of the code that runs ObjectX, call the ManualResetEvent.Set() method and that will unblock the workers.
Note this is NOT an efficient way to manage your threads, but if you "just have to make it work" and have time later to improve it... I suppose it's one possible solution.
The goal would be to always have available resources to run the thread launched by ObjectX, no matter how overloaded the machine is.
Then thread priorities might not be the right tool.. Remember, thread priorities are evil
In general, windows is not a real-time OS; especially, win32 does not even attempt to be soft real-time (IIRC, the NT kernel tried, at some point, to have at least support for soft real time subsystems, but I may be wrong). So there is no guarantee about available resources, or timing.
Also, are you worried about other threads in the system? Those threads are out of your control (what if the other threads are already at the system max priority?).
If you are worried about threads in your app... you can control and throttle them, using less threads/workers to do more work (batching work in bigger units, and submitting it to a worker, for example, or by using TPL or other tools that will handle and throttle thread usage for you)
That said, you could intercept when a thread is created (look for example this question https://stackoverflow.com/a/3802316/863564) see if it was created for ObjectX (for example, checking its name) and use SetThreadPriority to boost it.

multithread running on difference processes or same process?

in my .net multithread program, i am wondering all these threads running on the same process or different processes?
if it is on the same process, then i assume one process run on one core, then how multithreading can utilize all the four cores that i have in my quad-core cpu?
but if it is on the different processes, as i know different processes and same process have different data sharing mechanism, then how come i don't need to write different code to handle this in my multithreading program? Would anyone shed some light on
I want to ask two more similar questions
When i open the task manager, often times, i can see around 800 threads and 54 processes,and my cpu usage is only 5%,and i was told that each core only excute one thread at a time.
is my cpu running these 800 threads all the times, or only means 800 threads are queuing, waiting cpu to process?
if i want my multithreading program fully utilze my quad-core cpu, can i raise the cpu usage by creating more threads(it seems contradict the theroy that only one thread one core at a time)
Multithreading means multiple threads in the same process.
Each thread can be assigned to a different core.
But all the threads belong to the same process, for example if one of the threads will throw an unhandeled exception, the process will crash with all its threads.
You could have read a bit about it, just search google or Wikipedia - Software Multithreading
A single process may use a number of threads; even a basic .NET "hello world" console exe probably uses 4 or 5. So yes, a single process can potentially use all your available cores if you write it to do so.
Because it is the same process, data sharing is direct, but: care must be taken if you are changing the values, as otherwise very bad things can happen. Access must be carefully synchronized (lock etc) if you are changing the data within the threaded code.
You do, however, usually have to write different code to support multiple threads. Exceptions to this is when the framework is doing that for you, for example, ASP.NET or WCF may take incoming requests and hand them to different worker threads, allowing multiple concurrent operations even though you didn't explicitly code it that way. Which means that in ASP.NET or WCF you need to be careful with shared state, for exactly the reasons already discussed.
As a minor addition, note also that a process can support multiple AppDomains; in that scenario, the threads for the process are shared between all the AppDomains at whim by the scheduler.
Threads created by that process are part of that process. Different threads within the one process can and often do run on different processors or processor cores.
in my .net multithread program, i am wondering all these threads
running on the same process or different processes?
A thread always runs in a process, however, multiple threads can run in a single process and each thread can be handled by a different core.
If you have a single core, it doesn't mean that it can't run multiple threads, it just means that the core can't execute multiple threads at the same time. If you take a look at the picture above, you will note that:
Thread #1 executes for some time.
Thread #1 "stops".
Thread #2 executes for some time.
Thread #2 "stops".
Thread #1 executes for some time, again.
This illustrates what happens when a core runs multiple threads: the core only executes one thread at a time, but in order for both threads to run, the core must perform context switching. In other words: the core runs a few commands from Thread 1, switches to Thread 2 and runs a few commands from it, then it switches back to Thread 1 to execute some more commands.
Juggling Oranges:
A good metaphor is juggling oranges: technically, you only have two hands and you can only hold one orange in each hand at a time, so the maximum you can hold is two oranges. In this case the taxing part is holding the oranges. However, if you throw an orange up in the air, then you can hold a 3rd orange while the the 2nd one is in the air. The higher you throw the oranges, the more oranges you can juggle. To be more precise: the longer it takes for an orange to come back in your hand, the more oranges you can juggle. Of course, you probably can't juggle an enormous amount of oranges, because throwing an orange requires more energy than simply holding it.
In essence, your CPU is juggling threads: the longer a thread stays away from executing code on the CPU, the more threads a CPU can "juggle." If a thread is waiting on I/O (e.g. a database request), then the CPU can execute the code of another thread at the same time. This is the same reason why you see 54 processes and 800 threads in the task manager: many
of those threads are doing things that are not CPU-bound.
Sleep:
is my cpu running these 800 threads all the times, or only means 800
threads are queuing, waiting cpu to process?
Many of the threads you're noticing in your task manager are idle/sleeping, so they use very little (if any) CPU. However, the ones that are running are executed with context switching (if there are more threads than cores, which is the case most of the time). There are many things that can cause a thread to idle/sleep, see the orange juggling for an example.
CPU Utilization:
if i want my multithreading program fully utilze my quad-core cpu, can
i raise the cpu usage by creating more threads(it seems contradict the
theroy that only one thread one core at a time)
It gets tricky :). Imagine that instead of oranges, you have bowling balls: it's VERY taxing on your hands, so even if you tried, you probably won't be able to hold more than 2 bowling balls let alone juggle a 3rd one. At maximum load, you can only hold as many objects as you have hands. The same is true for the CPU: at maximum load, the CPU can only execute as many threads as there are cores.
The reason why you can run more threads than the number of cores is because the thread are not putting the maximum load on the cores. If your threads are CPU bound, i.e. they do some heavy computational stuff and they tax the core 100%, then you can only run as many threads as you have cores. However, the CPU is the fastest thing in your computer and your thread may be accessing other parts of your computer that are significantly slower than your CPU (hard disk, network card, etc), so you can run more threads.

ThreadPool not starting new Thread instantly

I have a C# Windows Service that starts up various objects (Class libraries). Each of these objects has its own "processing" logic that start up multiple long running processing threads by using the ThreadPool. I have one example, just like this:
System.Threading.ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(WorkerThread_Processing));
This works great. My app works with no issues, and my threads work well.
Now, for regression testing, I am starting those same objects up, but from a C# Console app rather than a Windows Service. It calls the same exact code (because it is invoking the same objects), however the WorkerThread_Processing method delays for up to 20 seconds before starting.
I have gone in and switched from the ThreadPool to a Thread, and the issue goes away. What could be happening here? I know that I am not over the MaxThreads count (I am starting 20 threads max).
The ThreadPool is specifically not intended for long-running items (more specifically, you aren't even necessarily starting up new threads when you use the ThreadPool, as its purpose is to spread the tasks over a limited number of threads).
If your task is long running, you should either break it up into logical sections that are put on the ThreadPool (or use the new Task framework), or spin up your own Thread object.
As to why you're experiencing the delay, the MSDN Documentation for the ThreadPool class says the following:
As part of its thread management strategy, the thread pool delays before creating threads. Therefore, when a number of tasks are queued in a short period of time, there can be a significant delay before all the tasks are started.
You only know that the ThreadPool hasn't reached its maximum thread count, not how many threads (if any) it actually has sitting idle.
The thread pool's maximum number of threads value is the maximum number that it can create. It is not the maximum number that are already created. The thread pool has logic that prevents it from spinning up a whole bunch of threads instantly.
If you call ThreadPool.QueueUserWorkItem 10 times in quick succession, the thread pool will not create 10 threads immediately. It will start a thread, delay, start another, etc.
I seem to recall that the delay was 500 milliseconds, but I can't find the documentation to verify that.
Here it is: The Managed Thread Pool:
The thread pool has a built-in delay (half a second in the .NET
Framework version 2.0) before starting new idle threads. If your
application periodically starts many tasks in a short time, a small
increase in the number of idle threads can produce a significant
increase in throughput. Setting the number of idle threads too high
consumes system resources needlessly.
You can control the number of idle threads maintained by the thread
pool by using the GetMinThreads and SetMinThreads
Note that this quote is taken from the .NET 3.5 version of the documentation. The .NET 4.0 version does not mention a delay.

Is ThreadPool worth it in this scenario?

I have a thread that I fire off every time the user scans a barcode.
Most of the time it is a fairly short running thread. But sometimes it can take a very long time (waiting on a invoke to the GUI thread).
I have read that it may be a good idea to use the ThreadPool for this rather than just creating my own thread for each scan.
But I have also read that if the ThreadPool runs out of threads then it will just wait until some other thread exits (not OK for what I am doing).
So, how likely is it that I am going to run out of threads? And is the benefit of the ThreadPool really worth it? (When I scan it does not seem to take too long for the scan to "run" the thread logic.)
It depends on what you mean by "a very long time" and how common that scenario is.
The MSDN topic "The Managed Thread Pool" offers good guidelines for when not to use thread pool threads:
There are several scenarios in which it is appropriate to create and manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The
thread pool has a maximum number of
threads, so a large number of blocked
thread pool threads might prevent
tasks from starting.
You need to place threads into a single-threaded apartment. All
ThreadPool threads are in the
multithreaded apartment.
You need to have a stable identity associated with the thread, or to
dedicate a thread to a task.
Since the user will never scan more than one barcode at a time, the memory costs of the threadpool might not be worth it - I'd stick with a single thread just waiting in the background.
The point of the thread pool is to amortize the cost of creating threads, which are not inexpensive to spin up and tear down. If you have a short-running task, the cost of creating/destroying the thread can be a significant portion of the overall run-time. The maximum number of threads in the thread pool depends on the version of the .NET Framework, typically dozens to hundreds per processor. The number of threads is scaled depending on available work.
Will you run out of threads and have to wait for a thread to become available? It depends on your workload. You can get the maximum number of threads available via ThreadPool.GetMaxThreads(). Chances are (based on the description of your problem) that this number is sufficiently high.
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.getmaxthreads.aspx
Another option would be to manage your own pool of scan threads and assign them work rather than creating a new thread for every scan. Personally I would try the threadpool first and only manage your own threads if it proved necessary. Even better, I would look into async programming techniques in .NET. The methods will be run on the thread pool, but give you a much nicer programming experience than manual thread management.
If most of the time it is short running threads you could use the thread pool or a BackgroundWorker which draws threads from the pool.
An advantage I can see in your case is that threadpool class puts an upper limit on the amount of threads that may be active. It depends on the context of your application whether you will exhaust system resources. Exhausting a modern desktop system is VERY hard to do really.
If the software is used in a supermarket till it is highly unlikely that you will have more then 5 barcodes being analysed at the same time. If its run in a back-end server for a whole row of supermarket tills. Then perhaps 30-100 concurrent requests might be active.
With this sort of theory crafting it is highly unlikely that you will run out of threads, even on embedded hardware. If you have a dozen or so requests active at a time, and your code works, it's ok to just leave it as it is.
A thread pool is just an abstraction though, and you could have queue in the middle that queues request onto a thread-pool, in this scenario for the row-of-till example above, I'd feel comfortable queueing 100-1000 requests against a threadpool with 10 threads.
In .net (and on windows in general), the question should always be reversed: "Is creating a new thread worth it in this scenario?"
Creating a new thread is expensive, and doing it over and over again is almost certainly not worth it. The thread pool is cheap, and really should be the first thing you turn to when you need a new thread.
If you decide to spin up a new thread, soon you will start worrying about re-using the thread if it's already running. Then you will start worrying that sometimes the thread is running but it seems to be taking too long, and so you should make a new one. Then you're going to decide to have a thread not exit immediately upon finishing work, but to wait a little while in case new work comes in. And then... bam! You've created your own thread pool. At which point you should just back up and use the system-provided one.
The folks who mentioned that the thread pool might "run out of threads" were well-intentioned, but they did you a disservice. The limit on the number of threads in the thread pool is quite large. If you run into it, you have other problems.
(And, of course, since .net 2.0, you can set the maximum number of threads, so you can tweak the number if you absolutely have to.)
Others have directed you to MSDN: "The Managed Thread Pool". I will repeat that direction, as the article is good, but in my mind does not sell the thread pool hard enough. :)

What's the thread context for events in .Net using existing APIs?

When using APIs handling asynchronous events in .Net I find myself unable to predict how the library will scale for large numbers of objects.
For example, using the Microsoft.Office.Interop.UccApi library, when I create an endpoint it gets events when phone events happen. Now let's say I want to create 1000 endpoints. The number of events per endpoint is small, but is what's happening behind the scenes in the API able to keep up with the event flow? I don't know because it never says how it's architected.
Let's say I want to create all 1000 objects in the main thread. Then I want to put the Login method into a large thread pool so all objects login in parallel. Then once all the objects have logged in the next phase will begin.
Are the event callbacks the API raises happening in the original creating thread? A separate threadpool? Or the same threadpool I'm accessing with ThreadPool.QueueUserWorkItem?
Would I be better putting each object in it's own thread? Grouping a few objects in each thread? Or is it fine just creating all 1000 objects in the main thread and through .Net magic it will all be OK?
thanx
The events from interop assemblies are just wrappers around the COM connection points. The thread on which the call from the connection point arrive depends on the threading model of the object that advised on that connection point. COM will ensure the proper thread switching for this.
If your objects are implemented on the main thread, which in .Net is usually an STA, all events should arrive on that same thread. If you want your calls to arrive on a random thread from the COM thread pool (which I think is the same as the CLR thread pool), you need to create your objects on a thread that is configured as an MTA.
I would strongly advise against creating a thread for each object: 1) If you create these threads as STA, each of them will have a message queue, waisting system resource; 2) If you create them as MTA, nothing guarantees you the event call will arrive on your thread; 3) You'll have 1000 idle threads doing nothing and just waiting on an event to shutdown; and 4) Starting up and shutting down all these threads will have terrible perf cost on your application.
It really depends on a lot of things, primarily how powerful your hardware is. The threadpool does have a certain number of threads (which you can increase) that it will make available for your application. So if all of your events are firing at the same time some will most likely be waiting for a few moments while your threadpool waits for threads to become free again. The tradeoff is that you don't have the performance hit of creating new threads all the time either. Probably creating 1000 threads isn't the right answer either.
It may turn out that this is ideal, both because of the performance gains in reusing threads but also because having 1000 threads all running simultaneously might be more memory / CPU usage than it's worth.
I just wanted to note that in .NET 2.0 and greater it's possible to programmatically increase the maximum number of threads in the thread pool using ThreadPool.SetMaxThreads(). Given this you can put a hard cap on the number of threads and so ensure the scheduler won't be brought to it's knees by the overhead.
Even more useful in this sort of case, you can set the minimum number of threads with ThreadPool.SetMinThreads(). With this you can ensure that you only pay the "horrible performance price" Franci is talking about once, at application startup. You could balance this against the expected number peak of users and so ensure you won't be creating tons of new threads.
A single new thread creation won't destroy you. What I would be worried about is the case where a lot of threads need to be created at the same time. If you can say that this will only happen at startup you would be golden.

Categories