Are there any benefits to limiting the number of concurrent threads doing a given task to equal the number of processors on the host system? Or better to simply trust libraries such as .NET's ThreadPool to do the right thing ... even if there are 25 different concurrent threads happening at any one given moment?
Most threads are not CPU bound, they end up waiting on IO or other events. If you look at your system now, I imagine you have 100's (if not 1000's) of threads executing with no problems. By that measure, you're probably best just leaving the .NET thread pool to do the right thing!
However, if the threads were all CPU bound (e.g. something like ray tracing) then it would be a good idea to limit the number of threads to the number of cores, otherwise chances are that context switching will begin to hurt performance.
The threadpool already does a reasonably good job at this. It tries to limit the number of running threads to the number of CPU cores in your machine. When one thread ends, it immediately schedules another eligible thread for execution.
Every 0.5 seconds, it evaluates what is going on with the running threads. When the threads have been running too long, it assumes they are stalled and allows another thread to start executing. You'll now have more threads running than you have CPU cores. This can go up to the maximum number of allowed thread, as set by ThreadPool.SetMaxThreads().
Starting around .NET 2.0 SP1, the default maximum number of threads was increased considerably to 250 times the number of cores. You should never ever get there. If you do, you would have wasted about 2 minutes of time where a possibly non-optimal number of threads were running. Those threads however would all have to be blocking for that long, not exactly a typical execution pattern for a thread. On the other hand, if these threads are all waiting on the same kind of resource they are likely to just take turns, adding more threads cannot improve throughput.
Long story short, the thread pool will work well if you run threads that execute quickly (seconds at most) and don't block for a long time. You probably ought to consider creating your own Thread objects when your code doesn't match that pattern.
Well, if your bottleneck is ONLY processors, then it might make sense, but that would ignore all memory and other i/o bottlenecks, and chances are at least your cache memory is throwing page faults and other events that would slow the threads.
I'd trust the library myself. Threads wait for all kinds of things, and you don't want your application to slow down because it can't spawn a new thread, even though most of the rest are just sleeping, waiting for some event or resource.
Measure your application under a variety of thread:processor ratios. Come to conclusions based on hard data about your application. Accept no arguments from first principles about what performance you should get, only what you do get matters.
Related
I have a dual core processor, now let's say that I want to make a spam bot program, which will spam messages such as "Hey, how are you?".
My question is, what number of threads would be able to pop up these messages the fastest, running 5 threads or 100 threads botting the messages? (Of course, these numbers aren't special, just for the example). All of the threads will run in thread-safe.
EDIT: As for the down votes before, I'm not really writing a spambot program, I just mentioned it as an example for my question, sorry for the misunderstanding
The ideal number of threads depends on your hardware (in this case a dual core processor), and on what those threads are doing. If they are CPU intensive, more than 1 thread per core will probably slow things down.
If the threads do some IO, you will see an overall increase in performance by adding threads. The point of diminishing returns depends entirely on the nature of the non-CPU tasks and on the specific hardware.
To find that point, you will have to test various thread totals.
You can design your system to self-tune the number of threads in use. I once designed a system that ran best (most total throughput) when the total CPU load was about 70%. To optimize for that value, I added threads (with a delay between threads) until the CPU was at 70%, +/- 5%. If it went above 80%, I signaled one or more threads to finish their current work and terminate. If it went below 60%, I gradually added threads. Worked like a charm.
Deliberately creating more threads than processors is a standard technique used to make use of "spare cycles" where a thread is blocked waiting for something, whether that's I/O, a mutex, or something else by providing some other useful work for the processor to do.
If your threads are doing I/O then this is a strong contender for the speed-up: as each thread blocks waiting for the I/O, the processor can run the other threads until they too block for I/O, hopefully by which time the data for the first thread is ready, and so forth.
Source: Anthony Williams
in my .net multithread program, i am wondering all these threads running on the same process or different processes?
if it is on the same process, then i assume one process run on one core, then how multithreading can utilize all the four cores that i have in my quad-core cpu?
but if it is on the different processes, as i know different processes and same process have different data sharing mechanism, then how come i don't need to write different code to handle this in my multithreading program? Would anyone shed some light on
I want to ask two more similar questions
When i open the task manager, often times, i can see around 800 threads and 54 processes,and my cpu usage is only 5%,and i was told that each core only excute one thread at a time.
is my cpu running these 800 threads all the times, or only means 800 threads are queuing, waiting cpu to process?
if i want my multithreading program fully utilze my quad-core cpu, can i raise the cpu usage by creating more threads(it seems contradict the theroy that only one thread one core at a time)
Multithreading means multiple threads in the same process.
Each thread can be assigned to a different core.
But all the threads belong to the same process, for example if one of the threads will throw an unhandeled exception, the process will crash with all its threads.
You could have read a bit about it, just search google or Wikipedia - Software Multithreading
A single process may use a number of threads; even a basic .NET "hello world" console exe probably uses 4 or 5. So yes, a single process can potentially use all your available cores if you write it to do so.
Because it is the same process, data sharing is direct, but: care must be taken if you are changing the values, as otherwise very bad things can happen. Access must be carefully synchronized (lock etc) if you are changing the data within the threaded code.
You do, however, usually have to write different code to support multiple threads. Exceptions to this is when the framework is doing that for you, for example, ASP.NET or WCF may take incoming requests and hand them to different worker threads, allowing multiple concurrent operations even though you didn't explicitly code it that way. Which means that in ASP.NET or WCF you need to be careful with shared state, for exactly the reasons already discussed.
As a minor addition, note also that a process can support multiple AppDomains; in that scenario, the threads for the process are shared between all the AppDomains at whim by the scheduler.
Threads created by that process are part of that process. Different threads within the one process can and often do run on different processors or processor cores.
in my .net multithread program, i am wondering all these threads
running on the same process or different processes?
A thread always runs in a process, however, multiple threads can run in a single process and each thread can be handled by a different core.
If you have a single core, it doesn't mean that it can't run multiple threads, it just means that the core can't execute multiple threads at the same time. If you take a look at the picture above, you will note that:
Thread #1 executes for some time.
Thread #1 "stops".
Thread #2 executes for some time.
Thread #2 "stops".
Thread #1 executes for some time, again.
This illustrates what happens when a core runs multiple threads: the core only executes one thread at a time, but in order for both threads to run, the core must perform context switching. In other words: the core runs a few commands from Thread 1, switches to Thread 2 and runs a few commands from it, then it switches back to Thread 1 to execute some more commands.
Juggling Oranges:
A good metaphor is juggling oranges: technically, you only have two hands and you can only hold one orange in each hand at a time, so the maximum you can hold is two oranges. In this case the taxing part is holding the oranges. However, if you throw an orange up in the air, then you can hold a 3rd orange while the the 2nd one is in the air. The higher you throw the oranges, the more oranges you can juggle. To be more precise: the longer it takes for an orange to come back in your hand, the more oranges you can juggle. Of course, you probably can't juggle an enormous amount of oranges, because throwing an orange requires more energy than simply holding it.
In essence, your CPU is juggling threads: the longer a thread stays away from executing code on the CPU, the more threads a CPU can "juggle." If a thread is waiting on I/O (e.g. a database request), then the CPU can execute the code of another thread at the same time. This is the same reason why you see 54 processes and 800 threads in the task manager: many
of those threads are doing things that are not CPU-bound.
Sleep:
is my cpu running these 800 threads all the times, or only means 800
threads are queuing, waiting cpu to process?
Many of the threads you're noticing in your task manager are idle/sleeping, so they use very little (if any) CPU. However, the ones that are running are executed with context switching (if there are more threads than cores, which is the case most of the time). There are many things that can cause a thread to idle/sleep, see the orange juggling for an example.
CPU Utilization:
if i want my multithreading program fully utilze my quad-core cpu, can
i raise the cpu usage by creating more threads(it seems contradict the
theroy that only one thread one core at a time)
It gets tricky :). Imagine that instead of oranges, you have bowling balls: it's VERY taxing on your hands, so even if you tried, you probably won't be able to hold more than 2 bowling balls let alone juggle a 3rd one. At maximum load, you can only hold as many objects as you have hands. The same is true for the CPU: at maximum load, the CPU can only execute as many threads as there are cores.
The reason why you can run more threads than the number of cores is because the thread are not putting the maximum load on the cores. If your threads are CPU bound, i.e. they do some heavy computational stuff and they tax the core 100%, then you can only run as many threads as you have cores. However, the CPU is the fastest thing in your computer and your thread may be accessing other parts of your computer that are significantly slower than your CPU (hard disk, network card, etc), so you can run more threads.
I built an app that performs work on thousands of files, then writes modified copies of these files to the disk. I am using a ThreadPool but it was spawning so many threads the pc was becoming unresponsive 260 total), so i changed the max from the default of 250 down to 50, this solved that issue (app only spawns about 60 threads total), however now that the files are becoming ready so quickly, its tying up the UI to the point where the pc is unresponsive.
Is there a way to limit the amount of I/O - i mean, i like using 50 threads to perform the work on the files, but not 50 threads writing at the same time when they are processed. I would rather not re-architect the writing of the files part if i can keep from it - i was hoping i could limit the amount of I/O (simultaneous) the threads from this pool could consume.
Use a semaphore to limit no. of threads wanting to write to disk simultaneously.
http://msdn.microsoft.com/en-us/library/system.threading.semaphore.aspx
Limits the number of threads that can
access a resource or pool of resources
concurrently.
You really don't need so many threads. A disk can only support its maximum read and write throughput, which a single thread can easily max-out if it is dedicated to IO i.e. reading or writing. You also cannot read and write to a hard disk simultaneously (although this is complicated with OS caching layers, etc), so having concurrent threads reading and writting can be very counter-productive. There is also little to be gained from having more threads than processors\cores for your non-IO tasks as any additional threads will spend much of their time waiting for a core to become available e.g. if you have 50 threads and 4 cores, a minimum of 46 of the threads will be idle at any given time. The wasted threads will contribute to both memory consumption also incur performance overhead as they will all be fighting to get a crack at some time on a core, and the OS has to arbitrate this fight.
A more straightforward approach would be have a single thread whose job it is to read in the files, and then add the data to a blocking queue (e.g. see ConcurrentQueue), meanwhile have a number of worker threads that are waiting on file data in the queue (e.g. a number threads equal to the number of processors\cores). These worker threads will munch their way through the queue as items are added, and block when it is empty. When a worker thread finishes a piece of work, it can add that to another blocking queue which is being monitored either by the reader thread or a dedicated writer thread. Its job is to write the files out.
This pattern seeks to balance IO and CPU amongst a much smaller bunch of co-operating threads, where the number of IO threads is limited to what is physically capable by a hard drive, and a number of CPU worker threads that is sensible for the number of processors\cores you have. In essence it separates IO and CPU work so that things behave more predictably.
Further to this, if IO really is the problem (and not a huge amount of threads all fighting each other), then you can place some pauses (e.g. Thread.Sleep) in your file reading and writing threads to limit how much work they do.
Update
Perhaps it is worth explaining why there are so many threads being generated in the first place. This is a degenerative case for threadpool use, and is centred around queueing workitems that have a component of IO in them.
The threadpool executes work items from its queue and monitors how long executing work items are taking. If currently executing workitems are taking a long time to complete (I think half a second from memory) then it will start adding more threads to the pool as it believes this will get the queue processed quicker\more fairly. However, if the additional concurrent workitems are also performing work IO against a shared disk, then performance of the disk will actually reduce, meaning that workitems will take even longer to execute. Because workitems are taking longer to execute, the threadpool adds more threads. This is the degenerative case, where performance gets worse and worse as more threads are added.
The use of a semaphore as suggested would have to be done carefully, as the semaphore could cause blocking of threadpool threads, the threadpool would see workitems taking a long time to execute, and it will still start adding more threads.
In an effort to speed up the startup of my resource-hungry app, I've moved various startup tasks to background threads and marked those thread with 'Thread.Priority = Lowest`.
However, those low priority threads still execute pretty much in parallel with the application (as it loads its UI), as evidenced by the timeline on the ANTS Profiler. My understanding was that Lowest meant that the CPU will handle all higher priority threads first, then get the lower priority threads.
Is my understanding flawed?
The threads may be scheduled with the lowest priority, but they don't wait at the back of the line. They will probably still get enough CPU time slices to gobble up certain resources that are the real bottlenecks, like hard drive access. It really all depends on exactly what you are doing.
Is the initialization computation-intensive? Or web intensive/hard drive intensive. A multi-threading approach is going to be most effective when different tasks use different resources, or to allow computationally intensive operations run without blocking other operations.
A single-threaded approach could feasibly order the tasks to make the application appear to load faster, where-as the multithreaded approach may mean that everyone gets their hands in at the same time, possibly even getting in eachother's way.
Lowering the priority doesn't mean that the thread will always be the last one picked to get a time slot. If the lower priority thread hasn't got a time slot for a while, it will be more likely to get one. That way lower priority threads will run slower, but not completely stop.
Also, if the main thread is waiting for something, like for example waiting for the disk drive to return data, the other threads can run in that void. If the main thread does a lot of disk I/O, there will be a lot of holes to run other threads in.
If the CPU has more than a single core, the load will be more evenly distributed between threads. No matter how high priority a thread has, it will still only run on one core.
Is it possible to re-engineer your app so that the threads that you're trying to get to wait until the UI is loaded don't actually run at all until after the UI is loaded? This would do what you wish, forcing them to wait until the UI is loaded (because they're not even created/started), whereas the method you're employing causes them to execute less often, but still execute.
Is there a magic number or formula for setting the values of SetMaxThreads and SetMinThreads for ThreadPool? I have thousands of long-running methods that need execution but just can't find the perfect match for setting these values. Any advise would be greatly appreciated.
The default minimum number of threads is the number of cores your machine has. That's a good number, it doesn't generally make sense to run more threads than you have cores.
The default maximum number of threads is 250 times the number of cores you have on .NET 2.0 SP1 and up. There is an enormous amount of breathing room here. On a four core machine, it would take 499 seconds to reach that maximum if none of the threads complete in a reasonable amount of time.
The threadpool scheduler tries to limit the number of active threads to the minimum, by default the number of cores you have. Twice a second it allows one more thread to start if the active threads do not complete. Threads that run for a very long time or do a lot of blocking that is not caused by I/O are not good candidates for the threadpool. You should use a regular Thread instead.
Getting to the maximum isn't healthy. On a four core machine, just the stacks of those threads will consume a gigabyte of virtual memory space. Getting OOM is very likely. Consider lowering the max number of threads if that's your problem. Or consider starting just a few regular Threads that receive packets of work from a thread-safe queue.
Typically, the magic number is to leave it alone. The ThreadPool does a good job of handling this.
That being said, if you're doing a lot of long running services, and those services will have large periods where they're waiting, you may want to increase the maximum threads to handle more options. (If the processes aren't blocking, you'll probably just slow things down if you increase the thread count...)
Profile your application to find the correct number.
If you want better control, you might want to consider NOT using the built-in ThreadPool. There is a nice replacement at http://www.codeproject.com/KB/threads/smartthreadpool.aspx.