in my .net multithread program, i am wondering all these threads running on the same process or different processes?
if it is on the same process, then i assume one process run on one core, then how multithreading can utilize all the four cores that i have in my quad-core cpu?
but if it is on the different processes, as i know different processes and same process have different data sharing mechanism, then how come i don't need to write different code to handle this in my multithreading program? Would anyone shed some light on
I want to ask two more similar questions
When i open the task manager, often times, i can see around 800 threads and 54 processes,and my cpu usage is only 5%,and i was told that each core only excute one thread at a time.
is my cpu running these 800 threads all the times, or only means 800 threads are queuing, waiting cpu to process?
if i want my multithreading program fully utilze my quad-core cpu, can i raise the cpu usage by creating more threads(it seems contradict the theroy that only one thread one core at a time)
Multithreading means multiple threads in the same process.
Each thread can be assigned to a different core.
But all the threads belong to the same process, for example if one of the threads will throw an unhandeled exception, the process will crash with all its threads.
You could have read a bit about it, just search google or Wikipedia - Software Multithreading
A single process may use a number of threads; even a basic .NET "hello world" console exe probably uses 4 or 5. So yes, a single process can potentially use all your available cores if you write it to do so.
Because it is the same process, data sharing is direct, but: care must be taken if you are changing the values, as otherwise very bad things can happen. Access must be carefully synchronized (lock etc) if you are changing the data within the threaded code.
You do, however, usually have to write different code to support multiple threads. Exceptions to this is when the framework is doing that for you, for example, ASP.NET or WCF may take incoming requests and hand them to different worker threads, allowing multiple concurrent operations even though you didn't explicitly code it that way. Which means that in ASP.NET or WCF you need to be careful with shared state, for exactly the reasons already discussed.
As a minor addition, note also that a process can support multiple AppDomains; in that scenario, the threads for the process are shared between all the AppDomains at whim by the scheduler.
Threads created by that process are part of that process. Different threads within the one process can and often do run on different processors or processor cores.
in my .net multithread program, i am wondering all these threads
running on the same process or different processes?
A thread always runs in a process, however, multiple threads can run in a single process and each thread can be handled by a different core.
If you have a single core, it doesn't mean that it can't run multiple threads, it just means that the core can't execute multiple threads at the same time. If you take a look at the picture above, you will note that:
Thread #1 executes for some time.
Thread #1 "stops".
Thread #2 executes for some time.
Thread #2 "stops".
Thread #1 executes for some time, again.
This illustrates what happens when a core runs multiple threads: the core only executes one thread at a time, but in order for both threads to run, the core must perform context switching. In other words: the core runs a few commands from Thread 1, switches to Thread 2 and runs a few commands from it, then it switches back to Thread 1 to execute some more commands.
Juggling Oranges:
A good metaphor is juggling oranges: technically, you only have two hands and you can only hold one orange in each hand at a time, so the maximum you can hold is two oranges. In this case the taxing part is holding the oranges. However, if you throw an orange up in the air, then you can hold a 3rd orange while the the 2nd one is in the air. The higher you throw the oranges, the more oranges you can juggle. To be more precise: the longer it takes for an orange to come back in your hand, the more oranges you can juggle. Of course, you probably can't juggle an enormous amount of oranges, because throwing an orange requires more energy than simply holding it.
In essence, your CPU is juggling threads: the longer a thread stays away from executing code on the CPU, the more threads a CPU can "juggle." If a thread is waiting on I/O (e.g. a database request), then the CPU can execute the code of another thread at the same time. This is the same reason why you see 54 processes and 800 threads in the task manager: many
of those threads are doing things that are not CPU-bound.
Sleep:
is my cpu running these 800 threads all the times, or only means 800
threads are queuing, waiting cpu to process?
Many of the threads you're noticing in your task manager are idle/sleeping, so they use very little (if any) CPU. However, the ones that are running are executed with context switching (if there are more threads than cores, which is the case most of the time). There are many things that can cause a thread to idle/sleep, see the orange juggling for an example.
CPU Utilization:
if i want my multithreading program fully utilze my quad-core cpu, can
i raise the cpu usage by creating more threads(it seems contradict the
theroy that only one thread one core at a time)
It gets tricky :). Imagine that instead of oranges, you have bowling balls: it's VERY taxing on your hands, so even if you tried, you probably won't be able to hold more than 2 bowling balls let alone juggle a 3rd one. At maximum load, you can only hold as many objects as you have hands. The same is true for the CPU: at maximum load, the CPU can only execute as many threads as there are cores.
The reason why you can run more threads than the number of cores is because the thread are not putting the maximum load on the cores. If your threads are CPU bound, i.e. they do some heavy computational stuff and they tax the core 100%, then you can only run as many threads as you have cores. However, the CPU is the fastest thing in your computer and your thread may be accessing other parts of your computer that are significantly slower than your CPU (hard disk, network card, etc), so you can run more threads.
Related
I have developed a windows service that, shortly, manages thousand of remote devices.
Currently it consists of two precesses, each with some hundreds of threads (we can discuss about of the opportunity to reduce the number of threads, but this is not the point), and all works quite fine.
Now I am trying to join all threads in a single process to semplify data exchange between threads, but what happens is that now all threads runs slower (it seems like in some conditions some threads run much less frequently).
So my question is: it is expected that windows scheduler works in different way on a single-process/multi-thread application compared to a multi-process/multi-thread application?
Little simple example:
- single core CPU
- 2 threads (A and B)
- thread A is doing a very long task, while thread B is sleeping
- now is time to wake up thread B, but thread A is still running
My conjecture:
- on single-process/multi-thread, scheduler force thread B to sleep and delay its wake up
- on multi-process/multi-thread, if A belongs to process 1 and B belongs to process 2, scheduler wake up thread B when expected
Could be?
Any suggestion to join all threads in a single process without throubles?
Sorry for my poor English.
EDIT
Following the advice of Luaan I am profiling application to check GC behavior. This is what I see on a 45 seconds time slot:
Some questions:
- why 13.000.000ms here?
- why reference to sleep here?
EDIT 2
Finally I solved my performance issue: as I said I was using some hundreds of threads in my service. I have rewritten some parts of the code in order to group old threads in few main threads and now I am using 8 main worker thread that do most of the job (around 50 threads total including secondary threads)... and magically now the service runs using something like half of the cpu.
Maybe the issue was related to GC activity also, but I think that most of the issues was due to the overhead of context switch for my threads.
EDIT 3
As some little performance issues continues I checked GC load with a PerformanceCounter object and you are right: my threads hangs when GC use about 99% of cpu time. How can I solve? Now I am trying to set GC in server mode.
Firstly, CPU cores and threading are unrelated. The Kernel decides how a thread is divided up amongst logical CPUs. The lower the logical CPUs the more context switching has to occur, slowing the process down overall, but it's still multi-threaded.
If you want to talk to other threads then the normal procedure is to do this with events which are subscribed to by the main application thread. You can then decide what to do with this information.
You shouldn't be concerned with how the kernel divides up the threads over logical CPUs as it does this automatically unless you are specifically using a processor affinity mask.
I create about 5000 background workers that do intensive work in a console app. I'm also using an external library that instantiates an object, say ObjectX. At some point, say t0, ObjectX tries to obtain a thread from an os thread pool and start it, but I have no control on how it obtains this thread. Things work fine for 100 background workers. For 1000 background workers it takes about 10 minutes after t0 for ObjectX to obtain and start a thread.
Is there a way to set, in advance, a high priority for any threads that will be started in the future by an object?
As I think the answer to 1 is "no", is there a way to limit the priority of the background workers so as to somehow favor everything else? Even though I only want to 'favor' ObjectX.
The goal would be to always have available resources to run the thread launched by ObjectX, no matter how overloaded the machine is.
I'm using C# and the .Net fr 3.5, on a Windows 64bit machine.
The way threads work is that they are given processor time by the OS. When this happens this is called a context switch. A context switch takes about 2000-8000 cycles (i.e. depending on processor 2000-8000 instructions). If the OS has many CPUs or cores, it may not need to take the CPU away from one thread and give it to another--avoiding a context switch. There can only be one thread per CPU running at a time, when you have more threads that need CPU than CPUs then you're forcing a context switch. Context switches are performed no faster than the system quantum (every 20ms for client and 120ms for server).
If you have 5000 background workers you effectively have 5000 threads. Each of those threads is potentially vying for CPU time. On a client version of windows, that means 250,000 context switches per second. i.e. 500,000,000 to 2,000,000,000 cycles per second are devoted simply to switching between threads. (i.e. over and above the work your threads are performing) if it could even process that many context switches per second.
The recommended practice is to only have one CPU-bound thread per processor. A CPU-bound thread is one that spends very little time "waiting". The UI thread is not a CPU-bound thread. If your background workers are spending a lot of time waiting for locks, then they may not be CPU-bound either--but, in general, background worker threads are CPU-bound. (otherwise, what would be the point of using a background worker?).
Also, the OS spends a lot of time figuring out what thread needs to get the CPU next. When you start changing thread priorities you interfere with that and most of the time end up making your entire system slower (not just your application) rather than faster.
Update:
On a related not, it takes about 200,000 cycles to create a new thread and about 100,000 cycles to destroy a thread.
Update 2:
If the impetus of the question isn't simply "If it can be done" but to be able to scale workload, then as #JoshW/#Servy mention, using something like the Producer/Consumer Pattern would allow for scalability that could facilitate horizontal scaling to multiple computers/nodes via a queue or a service bus. Simple starting up an in ordinate amount of threads is not scalable beyond the # of CPUs. If what you truly want is an architecture that can scaled out because "available resources...how overloaded the machine is" is simply impossible.
Personally I think this is a bad idea, however... given the comments you have made on other answers and your request that "No matter how many background workers are create that ObjectX runs as soon as possible"... You could conceivably force your background workers to block using a ManualResetEvent.
For example at the top of your worker code you could block on a Manual reset event with the WaitOne method. This manual reset could be static or passed as an input parameter and wherever your ObjectX gets instantiated/called or whatever, you call the .Reset method on your ManualResetEvent. This would block all your workers at the WaitOne line. Next at the bottom of the code that runs ObjectX, call the ManualResetEvent.Set() method and that will unblock the workers.
Note this is NOT an efficient way to manage your threads, but if you "just have to make it work" and have time later to improve it... I suppose it's one possible solution.
The goal would be to always have available resources to run the thread launched by ObjectX, no matter how overloaded the machine is.
Then thread priorities might not be the right tool.. Remember, thread priorities are evil
In general, windows is not a real-time OS; especially, win32 does not even attempt to be soft real-time (IIRC, the NT kernel tried, at some point, to have at least support for soft real time subsystems, but I may be wrong). So there is no guarantee about available resources, or timing.
Also, are you worried about other threads in the system? Those threads are out of your control (what if the other threads are already at the system max priority?).
If you are worried about threads in your app... you can control and throttle them, using less threads/workers to do more work (batching work in bigger units, and submitting it to a worker, for example, or by using TPL or other tools that will handle and throttle thread usage for you)
That said, you could intercept when a thread is created (look for example this question https://stackoverflow.com/a/3802316/863564) see if it was created for ObjectX (for example, checking its name) and use SetThreadPriority to boost it.
I have a dual core processor, now let's say that I want to make a spam bot program, which will spam messages such as "Hey, how are you?".
My question is, what number of threads would be able to pop up these messages the fastest, running 5 threads or 100 threads botting the messages? (Of course, these numbers aren't special, just for the example). All of the threads will run in thread-safe.
EDIT: As for the down votes before, I'm not really writing a spambot program, I just mentioned it as an example for my question, sorry for the misunderstanding
The ideal number of threads depends on your hardware (in this case a dual core processor), and on what those threads are doing. If they are CPU intensive, more than 1 thread per core will probably slow things down.
If the threads do some IO, you will see an overall increase in performance by adding threads. The point of diminishing returns depends entirely on the nature of the non-CPU tasks and on the specific hardware.
To find that point, you will have to test various thread totals.
You can design your system to self-tune the number of threads in use. I once designed a system that ran best (most total throughput) when the total CPU load was about 70%. To optimize for that value, I added threads (with a delay between threads) until the CPU was at 70%, +/- 5%. If it went above 80%, I signaled one or more threads to finish their current work and terminate. If it went below 60%, I gradually added threads. Worked like a charm.
Deliberately creating more threads than processors is a standard technique used to make use of "spare cycles" where a thread is blocked waiting for something, whether that's I/O, a mutex, or something else by providing some other useful work for the processor to do.
If your threads are doing I/O then this is a strong contender for the speed-up: as each thread blocks waiting for the I/O, the processor can run the other threads until they too block for I/O, hopefully by which time the data for the first thread is ready, and so forth.
Source: Anthony Williams
I am working on maintaining someone else's code that is using multithreading, via two methods:
1: ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)
2: Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()
However, on a dual core machine, I am only getting 50% CPU utlilization, and on a dual core with hyperthreadin enabled machine, I am only getting 25% CPU utilization.
Obviously threading is extremely complicated, but this behaviour would seem to indicate that I am not understanding some simple fundamental fact?
UPDATE
The code is too horribly complex to post here unfortunately, but for reference purposes, here is roughly what happens....I have approx 500 Accounts whose data is loaded from the database into an in memory cache...each account is loaded individually, and that process first calls a long running stored procedure, followed by manipulation and caching of the returned data. So, the point of threading in this situation is that there is indeed a bottleneck hitting the database (ie: the thread will be idled for up to 30 seconds waiting for the query to return), so we thread to allow others to begin processing the data they have received from Oracle.
So, the main thread executes:
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)
Then, the ReadData() then proceeds to execute (exactly once):
Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()
And this is occurring in a recursive function, so the QueueUserWorkItem can be executing multiple times, which in turn then executes exactly one new thread via the aThread.Start
Hopefully that gives a decent idea of how things are happening.
So, under this scenario, should this not theoretically pin both cores, rather than maxing out at 100% on one core, while the other core is essentially idle?
That code starts one thread that will go an do something. To get more than one core working you need to start more than one thread and get them both busy. Starting a thread to do some work, and then having your main thread wait for it won't get the task done any quicker. It is common to start a long running task on a background thread so that the UI remains responsive, which may be what this code was intended to do, but it won't make the task get done any quicker.
#Judah Himango - I had assumed that those two lines of code were samples of how multi-threading were being achieved in two different places in the program. Maybe the OP can clarify if this is the case or if these two lines really are in the one method. If they are part of one method then we will need to see what the two methods are actually doing.
Update:
That does sound like it should max out both cores. What do you mean by recursivly calling ReadData()? If each new thread is only calling ReadData at or near its end to start the next thread then that could explain the behaviour you are seeing.
I am not sure that this is actaully a good idea. If the stored proc takes 30 seconds to get the data then presumably it is placing a fair load on the database server. Running it 500 times in parallel is just going to make things worse. Obviously I don't know your database or data, but I would look at improving the performance of the stored proc.
If multi threading does look like the way forward, then I would have a loop on the main thread that calls ThreadPool.QueueUserWorkItem once for each account that needs loading. I would also remove the explicit thread creation and only use the thread pool. That way you are less likely to starve the local machine by creating too many threads.
How many threads are you spinning up? It may seem primitive (wait a few years, and you won't need to do this anymore), but your code has got to figure out an optimal number of threads to start, and spin up that many. Simply running a single thread won't make things any faster, and won't pin a physical processor, though it may be good for other reasons (a worker thread to keep your UI responsive, for instance).
In many cases, you'll want to be running a number of threads equal to the number of logical cores available to you (available from Environment.ProcessorCount, I believe), but it may have some other basis. I've spun up a few dozen threads, talking to different hosts, when I've been bound by remote process latency, for instance.
Multi-Threaded and Multi-Core are two different things. Doing things Multi-Threaded often won't offer you an enormous increase in performance, sometimes quite the opposite. The Operating System might do a few tricks to spread your cpu cycles over multiple cores, but that's where it ends.
What you are looking for is Parallelism. The .NET 4.0 framework will add a lot of new features to support Parallelism. Have a sneak-peak here:
http://www.danielmoth.com/Blog/2009/01/parallelising-loops-in-net-4.html
The CPU behavior would indicate that the application is only utilizing one logical processor. 50% would be one proc out of 2 (proc+proc). 25% would be one logical processor out of 4 (proc + HT + proc + HT)
How many threads to you have in total and do you have any locks in LoadCache. A SyncLock may a multi-thread system act as a single thread (by design). Also if your only spool one thread you will only get one worker thread.
CPU utilization is suggesting that you're only using one core; this may suggest that you've added threading to a portion where it is not beneficial (in this case, where CPU time is not a bottle neck).
If Loading the Cache or reading data happens very quickly, multi threading won't provide a massive improvement in speed performance. Similarly, if you're encountering a different bottleneck (slow bandwidth to a server, etc), it may not show up as CPU usage.
Are there any benefits to limiting the number of concurrent threads doing a given task to equal the number of processors on the host system? Or better to simply trust libraries such as .NET's ThreadPool to do the right thing ... even if there are 25 different concurrent threads happening at any one given moment?
Most threads are not CPU bound, they end up waiting on IO or other events. If you look at your system now, I imagine you have 100's (if not 1000's) of threads executing with no problems. By that measure, you're probably best just leaving the .NET thread pool to do the right thing!
However, if the threads were all CPU bound (e.g. something like ray tracing) then it would be a good idea to limit the number of threads to the number of cores, otherwise chances are that context switching will begin to hurt performance.
The threadpool already does a reasonably good job at this. It tries to limit the number of running threads to the number of CPU cores in your machine. When one thread ends, it immediately schedules another eligible thread for execution.
Every 0.5 seconds, it evaluates what is going on with the running threads. When the threads have been running too long, it assumes they are stalled and allows another thread to start executing. You'll now have more threads running than you have CPU cores. This can go up to the maximum number of allowed thread, as set by ThreadPool.SetMaxThreads().
Starting around .NET 2.0 SP1, the default maximum number of threads was increased considerably to 250 times the number of cores. You should never ever get there. If you do, you would have wasted about 2 minutes of time where a possibly non-optimal number of threads were running. Those threads however would all have to be blocking for that long, not exactly a typical execution pattern for a thread. On the other hand, if these threads are all waiting on the same kind of resource they are likely to just take turns, adding more threads cannot improve throughput.
Long story short, the thread pool will work well if you run threads that execute quickly (seconds at most) and don't block for a long time. You probably ought to consider creating your own Thread objects when your code doesn't match that pattern.
Well, if your bottleneck is ONLY processors, then it might make sense, but that would ignore all memory and other i/o bottlenecks, and chances are at least your cache memory is throwing page faults and other events that would slow the threads.
I'd trust the library myself. Threads wait for all kinds of things, and you don't want your application to slow down because it can't spawn a new thread, even though most of the rest are just sleeping, waiting for some event or resource.
Measure your application under a variety of thread:processor ratios. Come to conclusions based on hard data about your application. Accept no arguments from first principles about what performance you should get, only what you do get matters.