I have a console application(c#) where I have to call various third party API's and collect data. This I have to do simultaneously for different users. I am using threads for it. But as the number of users are increasing this service is eating into the CPU performance. It is affecting other processes. Is there a way we can use threads for parallel processing but do not affect the CPU performance in a huge way.
I assume from your question that you're creating threads manually, and so the quick way to answer this is to suggest that you use an API like the Task Parallel Library, because this will take an arbitrary number of tasks and try to use a sensible number of threads to process them - so given 500 API requests, it would limit itself to just a few threads.
However, to answer in more detail: the typical reason that you would see this problem is that code is creating too many threads. Threads are not free resources - they are expensive.
A made up example based on your question might be this:
you have 5 3rd party APIs that you need to call, and each is going to return ~1MB of data per user
you call each API on a separate background thread, for each user
you have 100 users
you therefore have created 500 threads in total, each of which is waiting on data from the network
The problem here is that there are 500 threads the program is trying to manage, and they are all waiting on the slowest piece of the system - the network.
More simply, we are trying to download 500 pieces of data at once (which in this example would mean everything finishes slowly), rather than downloading them one at a time so that individual items will finish earlier. Because each thread will be doing nothing (just waiting for the network), the CPU will switch between idle threads continually. As you increase your number of users, the number of threads increases - which increases the CPU usage just for switch between threads, even though each thread is actually downloading more slowly. This is (approximately) why you'll be seeing slower performance as your user count goes up.
A better example would be to take the same scenario and use just one background thread:
you have 5 3rd party APIs that you need to call, and each is going to return ~1MB of data per user
each API call is put into a queue and the queue is processed by a single thread
you have 100 users
you therefore have 1 thread running in the background which is using the full available bandwidth of the network for each request
In this example, your CPU usage will be pretty consistent - no matter how many users you have, there is only one background thread running, so context switching is minimised. Each individual API call runs at the maximum rate of the network card and so finishes as quickly as possible.
The reality is that one thread is probably not enough: a single request is unlikely to saturate the network, as there will be limiting factors elsewhere. But this is something you can tune later: maybe 2 or 3 threads would be more performant, but 4 threads would be slower again. The general rule when threading is to start small and work up, not to create a thread for each piece of work.
First, run a profiler and checkout some refactoring tools to see if you can perform code optimization to resolve the issue. If your application is still overloading the server then setup or purchase load balancing. In the meantime, if you are running the latest OS's you could try setting a hacky CPU rate limit...however, that may not work for the needs you described.
Related
The setup
We've written a Windows service, which fires off separate worker processes to perform various CPU-intensive tasks. The server and workers communicate via IPC named pipes.
At present, we create the workers via a simple Process.Start() call.
When we run up a number of workers on a fairly low-spec'd dual core server VM, Task Manager tells us that each worker uses approximately 2 - 3% CPU.
However (this is thing that's confusing us), when we perform the same test on a very powerful eight-core server, we still see each worker process using 2 - 3% CPU. Now, because there is more CPU 'power' available, I would expect to see each worker using a much smaller percentage of CPU.
This also means that on the low-powered server we hit 100% CPU after 30ish worker processes are created. But on the high-powered CPU, we hit the same limit after the same number of workers. We would expect to be able to run far more workers.
The issue
So, I have a couple of questions:
Are we misunderstanding the values that Task Manager is telling us? The CPU % value is the amount of CPU time consumed across all cores? If so, why is it the same on vastly different hardware?
Should we be doing something special / different to properly distribute our worker processes across multiple cores? At the moment, the processes have the default processor affinity (so they are able to run on any core).
Any help, advice, links, would be greatly appreciated.
Some extra info for people who have left comments:
I didn't mention it initially as I didn't necessarily want to add extra complexity to my question, but our worker processes are doing video transcoding of live video streams. As such, no worker ever 'completes' its task - it simply works for as long as a client is connected.
Essentially:
A client connects to the server
The server fires off a worker process which connects to a remote video stream
When video is received from said video stream, the worker transcodes it and sends the transcoded video back to the client.
Not sure if this helps with any other suggestions? Thanks for all the comments so far.
The fact that the processes are using 2-3% of the CPU suggests that your process is not CPU bound, but rather is likely to be IO bound or have some other limitation.
The IO on the 2 core server VM is probably quite a bit slower than your 8 core server, which, in turn, throttles the behavior there. This may be why the apparent CPU usage is the same, though I would suspect that the full server likely finishes the task sooner overall.
in my .net multithread program, i am wondering all these threads running on the same process or different processes?
if it is on the same process, then i assume one process run on one core, then how multithreading can utilize all the four cores that i have in my quad-core cpu?
but if it is on the different processes, as i know different processes and same process have different data sharing mechanism, then how come i don't need to write different code to handle this in my multithreading program? Would anyone shed some light on
I want to ask two more similar questions
When i open the task manager, often times, i can see around 800 threads and 54 processes,and my cpu usage is only 5%,and i was told that each core only excute one thread at a time.
is my cpu running these 800 threads all the times, or only means 800 threads are queuing, waiting cpu to process?
if i want my multithreading program fully utilze my quad-core cpu, can i raise the cpu usage by creating more threads(it seems contradict the theroy that only one thread one core at a time)
Multithreading means multiple threads in the same process.
Each thread can be assigned to a different core.
But all the threads belong to the same process, for example if one of the threads will throw an unhandeled exception, the process will crash with all its threads.
You could have read a bit about it, just search google or Wikipedia - Software Multithreading
A single process may use a number of threads; even a basic .NET "hello world" console exe probably uses 4 or 5. So yes, a single process can potentially use all your available cores if you write it to do so.
Because it is the same process, data sharing is direct, but: care must be taken if you are changing the values, as otherwise very bad things can happen. Access must be carefully synchronized (lock etc) if you are changing the data within the threaded code.
You do, however, usually have to write different code to support multiple threads. Exceptions to this is when the framework is doing that for you, for example, ASP.NET or WCF may take incoming requests and hand them to different worker threads, allowing multiple concurrent operations even though you didn't explicitly code it that way. Which means that in ASP.NET or WCF you need to be careful with shared state, for exactly the reasons already discussed.
As a minor addition, note also that a process can support multiple AppDomains; in that scenario, the threads for the process are shared between all the AppDomains at whim by the scheduler.
Threads created by that process are part of that process. Different threads within the one process can and often do run on different processors or processor cores.
in my .net multithread program, i am wondering all these threads
running on the same process or different processes?
A thread always runs in a process, however, multiple threads can run in a single process and each thread can be handled by a different core.
If you have a single core, it doesn't mean that it can't run multiple threads, it just means that the core can't execute multiple threads at the same time. If you take a look at the picture above, you will note that:
Thread #1 executes for some time.
Thread #1 "stops".
Thread #2 executes for some time.
Thread #2 "stops".
Thread #1 executes for some time, again.
This illustrates what happens when a core runs multiple threads: the core only executes one thread at a time, but in order for both threads to run, the core must perform context switching. In other words: the core runs a few commands from Thread 1, switches to Thread 2 and runs a few commands from it, then it switches back to Thread 1 to execute some more commands.
Juggling Oranges:
A good metaphor is juggling oranges: technically, you only have two hands and you can only hold one orange in each hand at a time, so the maximum you can hold is two oranges. In this case the taxing part is holding the oranges. However, if you throw an orange up in the air, then you can hold a 3rd orange while the the 2nd one is in the air. The higher you throw the oranges, the more oranges you can juggle. To be more precise: the longer it takes for an orange to come back in your hand, the more oranges you can juggle. Of course, you probably can't juggle an enormous amount of oranges, because throwing an orange requires more energy than simply holding it.
In essence, your CPU is juggling threads: the longer a thread stays away from executing code on the CPU, the more threads a CPU can "juggle." If a thread is waiting on I/O (e.g. a database request), then the CPU can execute the code of another thread at the same time. This is the same reason why you see 54 processes and 800 threads in the task manager: many
of those threads are doing things that are not CPU-bound.
Sleep:
is my cpu running these 800 threads all the times, or only means 800
threads are queuing, waiting cpu to process?
Many of the threads you're noticing in your task manager are idle/sleeping, so they use very little (if any) CPU. However, the ones that are running are executed with context switching (if there are more threads than cores, which is the case most of the time). There are many things that can cause a thread to idle/sleep, see the orange juggling for an example.
CPU Utilization:
if i want my multithreading program fully utilze my quad-core cpu, can
i raise the cpu usage by creating more threads(it seems contradict the
theroy that only one thread one core at a time)
It gets tricky :). Imagine that instead of oranges, you have bowling balls: it's VERY taxing on your hands, so even if you tried, you probably won't be able to hold more than 2 bowling balls let alone juggle a 3rd one. At maximum load, you can only hold as many objects as you have hands. The same is true for the CPU: at maximum load, the CPU can only execute as many threads as there are cores.
The reason why you can run more threads than the number of cores is because the thread are not putting the maximum load on the cores. If your threads are CPU bound, i.e. they do some heavy computational stuff and they tax the core 100%, then you can only run as many threads as you have cores. However, the CPU is the fastest thing in your computer and your thread may be accessing other parts of your computer that are significantly slower than your CPU (hard disk, network card, etc), so you can run more threads.
I need to optimize a WCF service... it's quite a complex thing. My problem this time has to do with tasks (Task Parallel Library, .NET 4.0). What happens is that I launch several tasks when the service is invoked (using Task.Factory.StartNew) and then wait for them to finish:
Task.WaitAll(task1, task2, task3, task4, task5, task6);
Ok... what I see, and don't like, is that on the first call (sometimes the first 2-3 calls, if made quickly one after another), the final task starts much later than the others (I am looking at a case where it started 0.5 seconds after the others). I tried calling
ThreadPool.SetMinThreads(12*Environment.ProcessorCount, 20);
at the beginning of my service, but it doesn't seem to help.
The tasks are all database-related: I'm reading from multiple databases and it has to take as little time as possible.
Any idea why the last task is taking so long? Is there something I can do about it?
Alternatively, should I use the thread pool directly? As it happens, in one case I'm looking at, one task had already ended before the last one started - I would had saved 0.2 seconds if I had reused that thread instead of waiting for a new one to be created. However, I can not be sure that that task will always end so quickly, so I can't put both requests in the same task.
[Edit] The OS is Windows Server 2003, so there should be no connection limit. Also, it is hosted in IIS - I don't know if I should create regular threads or using the thread pool - which is the preferred version?
[Edit] I've also tried using Task.Factory.StartNew(action, TaskCreationOptions.LongRunning); - it doesn't help, the last task still starts much later (around half a second later) than the rest.
[Edit] MSDN1 says:
The thread pool has a built-in delay
(half a second in the .NET Framework
version 2.0) before starting new idle
threads. If your application
periodically starts many tasks in a
short time, a small increase in the
number of idle threads can produce a
significant increase in throughput.
Setting the number of idle threads too
high consumes system resources
needlessly.
However, as I said, I'm already calling SetMinThreads and it doesn't help.
I have had problems myself with delays in thread startup when using the (.Net 4.0) Task-object. So for time-critical stuff I now use dedicated threads (... again, as that is what I was doing before .Net 4.0.)
The purpose of a thread pool is to avoid the operative system cost of starting and stopping threads. The threads are simply being reused. This is a common model found in for example internet servers. The advantage is that they can respond quicker.
I've written many applications where I implement my own threadpool by having dedicated threads picking up tasks from a task queue. Note however that this most often required locking that can cause delays/bottlenecks. This depends on your design; are the tasks small then there would be a lot of locking and it might be faster to trade some CPU in for less locking: http://www.boyet.com/Articles/LockfreeStack.html
SmartThreadPool is a replacement/extension of the .Net thread pool. As you can see in this link it has a nice GUI to do some testing: http://www.codeproject.com/KB/threads/smartthreadpool.aspx
In the end it depends on what you need, but for high performance I recommend implementing your own thread pool. If you experience a lot of thread idling then it could be beneficial to increase the number of threads (beyond the recommended cpucount*2). This is actually how HyperThreading works inside the CPU - using "idle" time while doing operations to do other operations.
Note that .Net has a built-in limit of 25 threads per process (ie. for all WCF-calls you receive simultaneously). This limit is independent and overrides the ThreadPool setting. It can be increased, but it requires some magic: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=201
Following from my prior question (yep, should have been a Q against original message - apologies):
Why do you feel that creating 12 threads for each processor core in your machine will in some way speed-up your server's ability to create worker threads? All you're doing is slowing your server down!
As per MSDN do
As per the MSDN docs: "You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorith for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.".
Issues like this are usually caused by bumping into limits or contention on a shared resource.
In your case, I am guessing that your last task(s) is/are blocking while they wait for a connection to the DB server to come available or for the DB to respond. Remember - if your invocation kicks off 5-6 other tasks then your machine is going to have to create and open numerous DB connections and is going to kick the DB with, potentially, a lot of work. If your WCF server and/or your DB server are cold, then your first few invocations are going to be slower until the machine's caches etc., are populated.
Have you tried adding a little tracing/logging using the stopwatch to time how long it takes for your tasks to connect to the DB server and then execute their operations?
You may find that reducing the number of concurrent tasks you kick off actually speeds things up. Try spawning 3 tasks at a time, waiting for them to complete and then spawn the next 3.
When you call Task.Factory.StartNew, it uses a TaskScheduler to map those tasks into actual work items.
In your case, it sounds like one of your Tasks is delaying occasionally while the OS spins up a new Thread for the work item. You could, potentially, build a custom TaskScheduler which already contained six threads in a wait state, and explicitly used them for these six tasks. This would allow you to have complete control over how those initial tasks were created and started.
That being said, I suspect there is something else at play here... You mentioned that using TaskCreationOptions.LongRunning demonstrates the same behavior. This suggests that there is some other factor at play causing this half second delay. The reason I suspect this is due to the nature of TaskCreationOptions.LongRunning - when using the default TaskScheduler (LongRunning is a hint used by the TaskScheduler class), starting a task with TaskCreationOptions.LongRunning actually creates an entirely new (non-ThreadPool) thread for that Task. If creating 6 tasks, all with TaskCreationOptions.LongRunning, demonstrates the same behavior, you've pretty much guaranteed that the problem is NOT the default TaskScheduler, since this is going to always spin up 6 threads manually.
I'd recommend running your code through a performance profiler, and potentially the Concurrency Visualizer in VS 2010. This should help you determine exactly what is causing the half second delay.
What is the OS? If you are not running the server versions of windows, there is a connection limit. Your many threads are probably being serialized because of the connection limit.
Also, I have not used the task parallel library yet, but my limited experience is that new threads are cheap to make in the context of networking.
These articles might explain the problem you're having:
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-are-wcf-responses-slow-and-setminthreads-does-not-work.aspx
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-does-wcf-become-slow-after-being-idle-for-15-seconds.aspx
seeing as you're using .Net 4, the first article probably doesn't apply, but as the second article points out the ThreadPool terminates idle threads after 15 seconds which might explain the problem you're having and offers a simple (though a little hacky) solution to get around it.
Whether or not you should be using the ThreadPool directly wouldn't make any difference as I suspect the task library is using it for you underneath anyway.
One third-party library we have been using for a while might help you here - Smart Thread Pool. You still get the same benefits of using the task libraries, in that you can have the return values from the threads and get any exception information from them too.
Also, you can instantiate threadpools so that when you have multiple places each needing a threadpool (so that a low priority process doesn't start eating into the quota of some high priority process) and oh yeah you can set the priority of the threads in the pool too which you can't do with the standard ThreadPool where all the threads are background threads.
You can find plenty of info on the codeplex page, I've also got a post which highlights some of the key differences:
http://theburningmonk.com/2010/03/threading-introducing-smartthreadpool/
Just on a side note, for tasks like the one you've mentioned, which might take some time to return, you probably shouldn't be using the threadpool anyway. It's recommended that we should avoid using the threadpool for any blocking tasks like that because it hogs up the threadpool which is used by all sorts of things by the framework classes, like handling timer events, etc. etc. (not to mention handling incoming WCF requests!). I feel like I'm spamming here but here's some of the info I've gathered around the use of the threadpool and some useful links at the bottom:
http://theburningmonk.com/2010/03/threading-using-the-threadpool-vs-creating-your-own-threads/
well, hope this helps!
I know there are many cases which are good cases to use multi-thread in an application, but when is it the best to multi-thread a .net web application?
A web application is almost certainly already multi threaded by the hosting environment (IIS etc). If your page is CPU-bound (and want to use multiple cores), then arguably multiple threads is a bad idea, as when your system is under load you are already using them.
The time it might help is when you are IO bound; for example, you have a web-page that needs to talk to 3 external web-services, talk to a database, and write a file (all unrelated). You can do those in parallel on different threads (ideally using the inbuilt async operations, to maximise completion-port usage) to reduce the overall processing time - all without impacting local CPU overly much (here the real delay is on the network).
Of course, in such cases you might also do better by simply queuing the work in the web application, and having a separate service dequeue and process them - but then you can't provide an immediate response to the caller (they'd need to check back later to verify completion etc).
IMHO you should avoid the use of multithread in a web based application.
maybe a multithreaded application could increase the performance in a standard app (with the right design), but in a web application you may want to keep a high throughput instead of speed.
but if you have a few concurrent connection maybe you can use parallel thread without a global performance degradation
Multithreading is a technique to provide a single process with more processing time to allow it to run faster. It has more threads thus it eats more CPU cycles. (From multiple CPU's, if you have any.) For a desktop application, this makes a lot of sense. But granting more CPU cycles to a web user would take away the same cycles from the 99 other users who are doing requests at the same time! So technically, it's a bad thing.
However, a web application might use other services and processes that are using multiple threads. Databases, for example, won't create a separate thread for every user that connects to them. They limit the number of threads to just a few, adding connections to a connection pool for faster usage. As long as there are connections available or pooled, the user will have database access. When the database runs out of connections, the user will have to wait.
So, basically, the use of multiple threads could be used for web applications to reduce the number of active users at a specific moment! It allows the system to share resources with multiple users without overloading the resource. Instead, users will just have to stand in line before it's their turn.
This would not be multi-threading in the web application itself, but multi-threading in a service that is consumed by the web application. In this case, it's used as a limitation by only allowing a small amount of threads to be active.
In order to benefit from multithreading your application has to do a significant amount of work that can be run in parallel. If this is not the case, the overhead of multithreading may very well top the benefits.
In my experience most web applications consist of a number of short running methods, so apart from the parallelism already offered by the hosting environment, I would say that it is rare to benefit from multithreading within the individual parts of a web application. There are probably examples where it will offer a benefit, but my guess is that it isn't very common.
ASP.NET is already capable of spawning several threads for processing several requests in parallel, so for simple request processing there is rarely a case when you would need to manually spawn another thread. However, there are a few uncommon scenarios that I have come across which warranted the creation of another thread:
If there is some operation that might take a while and can run in parallel with the rest of the page processing, you might spawn a secondary thread there. For example, if there was a webservice that you had to poll as a result of the request, you might spawn another thread in Page_Init, and check for results in Page_PreRender (waiting if necessary). Though it's still a question if this would be a performance benefit or not - spawning a thread isn't cheap and the time between a typical Page_Init and Page_Prerender is measured in milliseconds anyway. Keeping a thread pool for this might be a little bit more efficient, and ASP.NET also has something called "asynchronous pages" that might be even better suited for this need.
If there is a pool of resources that you wish to clean up periodically. For example, imagine that you are using some weird DBMS that comes with limited .NET bindings, but there is no pooling support (this was my case). In that case you might want to implement the DB connection pool yourself, and this would necessitate a "cleaner thread" which would wake up, say, once a minute and check if there are connections that have not been used for a long while (and thus can be closed off).
Another thing to keep in mind when implementing your own threads in ASP.NET - ASP.NET likes to kill off its processes if they have been inactive for a while. Thus you should not rely on your thread staying alive forever. It might get terminated at any moment and you better be ready for it.
I am working on maintaining someone else's code that is using multithreading, via two methods:
1: ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)
2: Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()
However, on a dual core machine, I am only getting 50% CPU utlilization, and on a dual core with hyperthreadin enabled machine, I am only getting 25% CPU utilization.
Obviously threading is extremely complicated, but this behaviour would seem to indicate that I am not understanding some simple fundamental fact?
UPDATE
The code is too horribly complex to post here unfortunately, but for reference purposes, here is roughly what happens....I have approx 500 Accounts whose data is loaded from the database into an in memory cache...each account is loaded individually, and that process first calls a long running stored procedure, followed by manipulation and caching of the returned data. So, the point of threading in this situation is that there is indeed a bottleneck hitting the database (ie: the thread will be idled for up to 30 seconds waiting for the query to return), so we thread to allow others to begin processing the data they have received from Oracle.
So, the main thread executes:
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)
Then, the ReadData() then proceeds to execute (exactly once):
Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()
And this is occurring in a recursive function, so the QueueUserWorkItem can be executing multiple times, which in turn then executes exactly one new thread via the aThread.Start
Hopefully that gives a decent idea of how things are happening.
So, under this scenario, should this not theoretically pin both cores, rather than maxing out at 100% on one core, while the other core is essentially idle?
That code starts one thread that will go an do something. To get more than one core working you need to start more than one thread and get them both busy. Starting a thread to do some work, and then having your main thread wait for it won't get the task done any quicker. It is common to start a long running task on a background thread so that the UI remains responsive, which may be what this code was intended to do, but it won't make the task get done any quicker.
#Judah Himango - I had assumed that those two lines of code were samples of how multi-threading were being achieved in two different places in the program. Maybe the OP can clarify if this is the case or if these two lines really are in the one method. If they are part of one method then we will need to see what the two methods are actually doing.
Update:
That does sound like it should max out both cores. What do you mean by recursivly calling ReadData()? If each new thread is only calling ReadData at or near its end to start the next thread then that could explain the behaviour you are seeing.
I am not sure that this is actaully a good idea. If the stored proc takes 30 seconds to get the data then presumably it is placing a fair load on the database server. Running it 500 times in parallel is just going to make things worse. Obviously I don't know your database or data, but I would look at improving the performance of the stored proc.
If multi threading does look like the way forward, then I would have a loop on the main thread that calls ThreadPool.QueueUserWorkItem once for each account that needs loading. I would also remove the explicit thread creation and only use the thread pool. That way you are less likely to starve the local machine by creating too many threads.
How many threads are you spinning up? It may seem primitive (wait a few years, and you won't need to do this anymore), but your code has got to figure out an optimal number of threads to start, and spin up that many. Simply running a single thread won't make things any faster, and won't pin a physical processor, though it may be good for other reasons (a worker thread to keep your UI responsive, for instance).
In many cases, you'll want to be running a number of threads equal to the number of logical cores available to you (available from Environment.ProcessorCount, I believe), but it may have some other basis. I've spun up a few dozen threads, talking to different hosts, when I've been bound by remote process latency, for instance.
Multi-Threaded and Multi-Core are two different things. Doing things Multi-Threaded often won't offer you an enormous increase in performance, sometimes quite the opposite. The Operating System might do a few tricks to spread your cpu cycles over multiple cores, but that's where it ends.
What you are looking for is Parallelism. The .NET 4.0 framework will add a lot of new features to support Parallelism. Have a sneak-peak here:
http://www.danielmoth.com/Blog/2009/01/parallelising-loops-in-net-4.html
The CPU behavior would indicate that the application is only utilizing one logical processor. 50% would be one proc out of 2 (proc+proc). 25% would be one logical processor out of 4 (proc + HT + proc + HT)
How many threads to you have in total and do you have any locks in LoadCache. A SyncLock may a multi-thread system act as a single thread (by design). Also if your only spool one thread you will only get one worker thread.
CPU utilization is suggesting that you're only using one core; this may suggest that you've added threading to a portion where it is not beneficial (in this case, where CPU time is not a bottle neck).
If Loading the Cache or reading data happens very quickly, multi threading won't provide a massive improvement in speed performance. Similarly, if you're encountering a different bottleneck (slow bandwidth to a server, etc), it may not show up as CPU usage.