Why C# threads goes idle during the execution?

Why C# threads goes idle during the execution? - c#

I have a scheduler which runs as background thread on application start of an ASP.NET site. User can initiate various tasks (alert emails/file generation etc) which is inserted in a db table. The scheduler will pick the tasks from database and push the items into a stack. Also scheduler has a threadpool running 10 background threads, which will pop task items from the stack and execute it.
This is running fine in one web server, but behaving strange in other web server. The threads goes idle for 6-12 seconds with no reason and do nothing even though there are items in the stack.
Using lock() on stack object to make Push & Pop thread safe
Tried Thread.Yield() to give yield to cpu to execute other threads, but slowing down the execution and going idle still persists
Tried Thread.Sleep(0) to give yield to cpu to execute other threads, but slowing down the execution and going idle still persists
Logged entries and exit of all methods to check if something going wrong during the execution, but no luck
My questions:
Is execution of threads in .net in-deterministic?
Is it necessary to specify Thread.Yield() or Thread.Sleep(0) to give breathing time to cpu?
Why it is behaving differently on boxes with same configuration? Is there any machine/environment specific factors that affect the execution of thread?
UPDATE on May.08.2013
There are two boxes in the farm, both are identical in hardware configuration, setup with same software configuration as well Windows 2008 64bit / IIS7. Both webserver has only one site each with same build. Application pools of both site runs on Framework V4.0 on integrated mode. This is a legacy code and no chance since last two years.
We tried several iterations, in all cases webserver1 executes without any issues and completes the background work quickly as it was earlier. BUT webserver2 has significant delay and performing very poor.
We tried extensive logging, capturing entries/exit of all methods. The scenario is like this, all threads works fine for 2 seconds and then goes idle for 6-12 seconds, again become live and execute for next 2 seconds and then goes idle again. This behavioral is consistent till the completion of the task. There is no exception, no application termination, no error in application pool/iis log.
Any idea ?

Your threads are repeatedly trying to grab a lock which may be causing contention. But should not be 6-12 seconds - that answer only debugger can provide.
You can use AutoResetEvent and wait on it in worker threads - and Set the event when you push item to stack.

Okay guys finally we pinned the issue.
One of the cpu core of the webserver was hitting 100% and never comes back. Whereas other cores are at 0-5%.
We did a load testing for normal - moderate - heavy loads. While generating normal to moderate load the server is serving decently, properly sharing the process execution with all other cpu cores. But when we generate heavy load, things change, server struggles to distribute the load among the cores and the thread goes idle for 6-7 seconds. We assume that due to the failure of one cpu core its dealing with some fuzzy logic to distribute the process among the cores.
After further investigation we found that Windows NT Kernel is causing this problem, might be due to corruption or driver related issue.

Related

C# multiprocess-multithread service and windows scheduler

I have developed a windows service that, shortly, manages thousand of remote devices.
Currently it consists of two precesses, each with some hundreds of threads (we can discuss about of the opportunity to reduce the number of threads, but this is not the point), and all works quite fine.
Now I am trying to join all threads in a single process to semplify data exchange between threads, but what happens is that now all threads runs slower (it seems like in some conditions some threads run much less frequently).
So my question is: it is expected that windows scheduler works in different way on a single-process/multi-thread application compared to a multi-process/multi-thread application?
Little simple example:
- single core CPU
- 2 threads (A and B)
- thread A is doing a very long task, while thread B is sleeping
- now is time to wake up thread B, but thread A is still running
My conjecture:
- on single-process/multi-thread, scheduler force thread B to sleep and delay its wake up
- on multi-process/multi-thread, if A belongs to process 1 and B belongs to process 2, scheduler wake up thread B when expected
Could be?
Any suggestion to join all threads in a single process without throubles?
Sorry for my poor English.
EDIT
Following the advice of Luaan I am profiling application to check GC behavior. This is what I see on a 45 seconds time slot:
Some questions:
- why 13.000.000ms here?
- why reference to sleep here?
EDIT 2
Finally I solved my performance issue: as I said I was using some hundreds of threads in my service. I have rewritten some parts of the code in order to group old threads in few main threads and now I am using 8 main worker thread that do most of the job (around 50 threads total including secondary threads)... and magically now the service runs using something like half of the cpu.
Maybe the issue was related to GC activity also, but I think that most of the issues was due to the overhead of context switch for my threads.
EDIT 3
As some little performance issues continues I checked GC load with a PerformanceCounter object and you are right: my threads hangs when GC use about 99% of cpu time. How can I solve? Now I am trying to set GC in server mode.

Firstly, CPU cores and threading are unrelated. The Kernel decides how a thread is divided up amongst logical CPUs. The lower the logical CPUs the more context switching has to occur, slowing the process down overall, but it's still multi-threaded.
If you want to talk to other threads then the normal procedure is to do this with events which are subscribed to by the main application thread. You can then decide what to do with this information.
You shouldn't be concerned with how the kernel divides up the threads over logical CPUs as it does this automatically unless you are specifically using a processor affinity mask.

Azure Cloud-Service OnStop

I using Azure Cloud Worker Role for processing incoming task from queues. Processing of each task can take up to several hours and each worker-role can handle up to N tasks simultaneously. Basically, it's working.
Now, you can read in documentation that from time to time, the worker role can be shutdown (for software update, OS upgrade, ...). Basically, it's fine. But, this planned shutdown cannot forcedly stop the worker-role already running tasks.
Expected:
When calling the OnStop() method by the environment:
the worker role will stop getting new tasks for processing.
Wait for running tasks completion.
Continue with the planned shutdown.
Actual:
OnStop() method can be block for up to 5 minutes. I cannot guaranty that I'll finish processing the task in 5 minutes - so, this is problem... My task is being killed in the middle of processing and this became unstable situation for my software.
How I'm can avoid this 5 minutes limit? Any tip will be welcome.

How I'm can avoid this 5 minutes limit?
Unfortunately, you can't. This is a hard limit imposed from Azure side. You will need to work around that.
There are two possible solutions I can think of and both of them would require you to rethink about your current architecture:
Break your one big task into many smaller tasks and create some kind of work flow.
Make your task idempotent so that even if it gets terminated in between (because of worker role shutdown or error in task itself) and when it gets pick up by another instance, it starts again in such a way that your output of the task is not corrupted.

No, you cannot bypass this limit. In general you should not rely on any of your instances running continuously for any long period of time. Instances may be suddenly stopped or they may suddenly disappear (because of an underlying server failure). You software should be designed such that when an instance is restarted (possibly redeployed) or some other instance finds capacity to take a previously released work item that work item is reprocessed without any adverse effects.

Threads eating into CPU performance

I have a console application(c#) where I have to call various third party API's and collect data. This I have to do simultaneously for different users. I am using threads for it. But as the number of users are increasing this service is eating into the CPU performance. It is affecting other processes. Is there a way we can use threads for parallel processing but do not affect the CPU performance in a huge way.

I assume from your question that you're creating threads manually, and so the quick way to answer this is to suggest that you use an API like the Task Parallel Library, because this will take an arbitrary number of tasks and try to use a sensible number of threads to process them - so given 500 API requests, it would limit itself to just a few threads.
However, to answer in more detail: the typical reason that you would see this problem is that code is creating too many threads. Threads are not free resources - they are expensive.
A made up example based on your question might be this:
you have 5 3rd party APIs that you need to call, and each is going to return ~1MB of data per user
you call each API on a separate background thread, for each user
you have 100 users
you therefore have created 500 threads in total, each of which is waiting on data from the network
The problem here is that there are 500 threads the program is trying to manage, and they are all waiting on the slowest piece of the system - the network.
More simply, we are trying to download 500 pieces of data at once (which in this example would mean everything finishes slowly), rather than downloading them one at a time so that individual items will finish earlier. Because each thread will be doing nothing (just waiting for the network), the CPU will switch between idle threads continually. As you increase your number of users, the number of threads increases - which increases the CPU usage just for switch between threads, even though each thread is actually downloading more slowly. This is (approximately) why you'll be seeing slower performance as your user count goes up.
A better example would be to take the same scenario and use just one background thread:
you have 5 3rd party APIs that you need to call, and each is going to return ~1MB of data per user
each API call is put into a queue and the queue is processed by a single thread
you have 100 users
you therefore have 1 thread running in the background which is using the full available bandwidth of the network for each request
In this example, your CPU usage will be pretty consistent - no matter how many users you have, there is only one background thread running, so context switching is minimised. Each individual API call runs at the maximum rate of the network card and so finishes as quickly as possible.
The reality is that one thread is probably not enough: a single request is unlikely to saturate the network, as there will be limiting factors elsewhere. But this is something you can tune later: maybe 2 or 3 threads would be more performant, but 4 threads would be slower again. The general rule when threading is to start small and work up, not to create a thread for each piece of work.

First, run a profiler and checkout some refactoring tools to see if you can perform code optimization to resolve the issue. If your application is still overloading the server then setup or purchase load balancing. In the meantime, if you are running the latest OS's you could try setting a hacky CPU rate limit...however, that may not work for the needs you described.

multithread running on difference processes or same process?

in my .net multithread program, i am wondering all these threads running on the same process or different processes?
if it is on the same process, then i assume one process run on one core, then how multithreading can utilize all the four cores that i have in my quad-core cpu?
but if it is on the different processes, as i know different processes and same process have different data sharing mechanism, then how come i don't need to write different code to handle this in my multithreading program? Would anyone shed some light on
I want to ask two more similar questions
When i open the task manager, often times, i can see around 800 threads and 54 processes,and my cpu usage is only 5%,and i was told that each core only excute one thread at a time.
is my cpu running these 800 threads all the times, or only means 800 threads are queuing, waiting cpu to process?
if i want my multithreading program fully utilze my quad-core cpu, can i raise the cpu usage by creating more threads(it seems contradict the theroy that only one thread one core at a time)

Multithreading means multiple threads in the same process.
Each thread can be assigned to a different core.
But all the threads belong to the same process, for example if one of the threads will throw an unhandeled exception, the process will crash with all its threads.
You could have read a bit about it, just search google or Wikipedia - Software Multithreading

A single process may use a number of threads; even a basic .NET "hello world" console exe probably uses 4 or 5. So yes, a single process can potentially use all your available cores if you write it to do so.
Because it is the same process, data sharing is direct, but: care must be taken if you are changing the values, as otherwise very bad things can happen. Access must be carefully synchronized (lock etc) if you are changing the data within the threaded code.
You do, however, usually have to write different code to support multiple threads. Exceptions to this is when the framework is doing that for you, for example, ASP.NET or WCF may take incoming requests and hand them to different worker threads, allowing multiple concurrent operations even though you didn't explicitly code it that way. Which means that in ASP.NET or WCF you need to be careful with shared state, for exactly the reasons already discussed.
As a minor addition, note also that a process can support multiple AppDomains; in that scenario, the threads for the process are shared between all the AppDomains at whim by the scheduler.

Threads created by that process are part of that process. Different threads within the one process can and often do run on different processors or processor cores.

in my .net multithread program, i am wondering all these threads
running on the same process or different processes?
A thread always runs in a process, however, multiple threads can run in a single process and each thread can be handled by a different core.
If you have a single core, it doesn't mean that it can't run multiple threads, it just means that the core can't execute multiple threads at the same time. If you take a look at the picture above, you will note that:
Thread #1 executes for some time.
Thread #1 "stops".
Thread #2 executes for some time.
Thread #2 "stops".
Thread #1 executes for some time, again.
This illustrates what happens when a core runs multiple threads: the core only executes one thread at a time, but in order for both threads to run, the core must perform context switching. In other words: the core runs a few commands from Thread 1, switches to Thread 2 and runs a few commands from it, then it switches back to Thread 1 to execute some more commands.
Juggling Oranges:
A good metaphor is juggling oranges: technically, you only have two hands and you can only hold one orange in each hand at a time, so the maximum you can hold is two oranges. In this case the taxing part is holding the oranges. However, if you throw an orange up in the air, then you can hold a 3rd orange while the the 2nd one is in the air. The higher you throw the oranges, the more oranges you can juggle. To be more precise: the longer it takes for an orange to come back in your hand, the more oranges you can juggle. Of course, you probably can't juggle an enormous amount of oranges, because throwing an orange requires more energy than simply holding it.
In essence, your CPU is juggling threads: the longer a thread stays away from executing code on the CPU, the more threads a CPU can "juggle." If a thread is waiting on I/O (e.g. a database request), then the CPU can execute the code of another thread at the same time. This is the same reason why you see 54 processes and 800 threads in the task manager: many
of those threads are doing things that are not CPU-bound.
Sleep:
is my cpu running these 800 threads all the times, or only means 800
threads are queuing, waiting cpu to process?
Many of the threads you're noticing in your task manager are idle/sleeping, so they use very little (if any) CPU. However, the ones that are running are executed with context switching (if there are more threads than cores, which is the case most of the time). There are many things that can cause a thread to idle/sleep, see the orange juggling for an example.
CPU Utilization:
if i want my multithreading program fully utilze my quad-core cpu, can
i raise the cpu usage by creating more threads(it seems contradict the
theroy that only one thread one core at a time)
It gets tricky :). Imagine that instead of oranges, you have bowling balls: it's VERY taxing on your hands, so even if you tried, you probably won't be able to hold more than 2 bowling balls let alone juggle a 3rd one. At maximum load, you can only hold as many objects as you have hands. The same is true for the CPU: at maximum load, the CPU can only execute as many threads as there are cores.
The reason why you can run more threads than the number of cores is because the thread are not putting the maximum load on the cores. If your threads are CPU bound, i.e. they do some heavy computational stuff and they tax the core 100%, then you can only run as many threads as you have cores. However, the CPU is the fastest thing in your computer and your thread may be accessing other parts of your computer that are significantly slower than your CPU (hard disk, network card, etc), so you can run more threads.

Launching multiple tasks from a WCF service

I need to optimize a WCF service... it's quite a complex thing. My problem this time has to do with tasks (Task Parallel Library, .NET 4.0). What happens is that I launch several tasks when the service is invoked (using Task.Factory.StartNew) and then wait for them to finish:
Task.WaitAll(task1, task2, task3, task4, task5, task6);
Ok... what I see, and don't like, is that on the first call (sometimes the first 2-3 calls, if made quickly one after another), the final task starts much later than the others (I am looking at a case where it started 0.5 seconds after the others). I tried calling
ThreadPool.SetMinThreads(12*Environment.ProcessorCount, 20);
at the beginning of my service, but it doesn't seem to help.
The tasks are all database-related: I'm reading from multiple databases and it has to take as little time as possible.
Any idea why the last task is taking so long? Is there something I can do about it?
Alternatively, should I use the thread pool directly? As it happens, in one case I'm looking at, one task had already ended before the last one started - I would had saved 0.2 seconds if I had reused that thread instead of waiting for a new one to be created. However, I can not be sure that that task will always end so quickly, so I can't put both requests in the same task.
[Edit] The OS is Windows Server 2003, so there should be no connection limit. Also, it is hosted in IIS - I don't know if I should create regular threads or using the thread pool - which is the preferred version?
[Edit] I've also tried using Task.Factory.StartNew(action, TaskCreationOptions.LongRunning); - it doesn't help, the last task still starts much later (around half a second later) than the rest.
[Edit] MSDN1 says:
The thread pool has a built-in delay
(half a second in the .NET Framework
version 2.0) before starting new idle
threads. If your application
periodically starts many tasks in a
short time, a small increase in the
number of idle threads can produce a
significant increase in throughput.
Setting the number of idle threads too
high consumes system resources
needlessly.
However, as I said, I'm already calling SetMinThreads and it doesn't help.

I have had problems myself with delays in thread startup when using the (.Net 4.0) Task-object. So for time-critical stuff I now use dedicated threads (... again, as that is what I was doing before .Net 4.0.)
The purpose of a thread pool is to avoid the operative system cost of starting and stopping threads. The threads are simply being reused. This is a common model found in for example internet servers. The advantage is that they can respond quicker.
I've written many applications where I implement my own threadpool by having dedicated threads picking up tasks from a task queue. Note however that this most often required locking that can cause delays/bottlenecks. This depends on your design; are the tasks small then there would be a lot of locking and it might be faster to trade some CPU in for less locking: http://www.boyet.com/Articles/LockfreeStack.html
SmartThreadPool is a replacement/extension of the .Net thread pool. As you can see in this link it has a nice GUI to do some testing: http://www.codeproject.com/KB/threads/smartthreadpool.aspx
In the end it depends on what you need, but for high performance I recommend implementing your own thread pool. If you experience a lot of thread idling then it could be beneficial to increase the number of threads (beyond the recommended cpucount*2). This is actually how HyperThreading works inside the CPU - using "idle" time while doing operations to do other operations.
Note that .Net has a built-in limit of 25 threads per process (ie. for all WCF-calls you receive simultaneously). This limit is independent and overrides the ThreadPool setting. It can be increased, but it requires some magic: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=201

Following from my prior question (yep, should have been a Q against original message - apologies):
Why do you feel that creating 12 threads for each processor core in your machine will in some way speed-up your server's ability to create worker threads? All you're doing is slowing your server down!
As per MSDN do
As per the MSDN docs: "You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorith for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.".
Issues like this are usually caused by bumping into limits or contention on a shared resource.
In your case, I am guessing that your last task(s) is/are blocking while they wait for a connection to the DB server to come available or for the DB to respond. Remember - if your invocation kicks off 5-6 other tasks then your machine is going to have to create and open numerous DB connections and is going to kick the DB with, potentially, a lot of work. If your WCF server and/or your DB server are cold, then your first few invocations are going to be slower until the machine's caches etc., are populated.
Have you tried adding a little tracing/logging using the stopwatch to time how long it takes for your tasks to connect to the DB server and then execute their operations?
You may find that reducing the number of concurrent tasks you kick off actually speeds things up. Try spawning 3 tasks at a time, waiting for them to complete and then spawn the next 3.

When you call Task.Factory.StartNew, it uses a TaskScheduler to map those tasks into actual work items.
In your case, it sounds like one of your Tasks is delaying occasionally while the OS spins up a new Thread for the work item. You could, potentially, build a custom TaskScheduler which already contained six threads in a wait state, and explicitly used them for these six tasks. This would allow you to have complete control over how those initial tasks were created and started.
That being said, I suspect there is something else at play here... You mentioned that using TaskCreationOptions.LongRunning demonstrates the same behavior. This suggests that there is some other factor at play causing this half second delay. The reason I suspect this is due to the nature of TaskCreationOptions.LongRunning - when using the default TaskScheduler (LongRunning is a hint used by the TaskScheduler class), starting a task with TaskCreationOptions.LongRunning actually creates an entirely new (non-ThreadPool) thread for that Task. If creating 6 tasks, all with TaskCreationOptions.LongRunning, demonstrates the same behavior, you've pretty much guaranteed that the problem is NOT the default TaskScheduler, since this is going to always spin up 6 threads manually.
I'd recommend running your code through a performance profiler, and potentially the Concurrency Visualizer in VS 2010. This should help you determine exactly what is causing the half second delay.

What is the OS? If you are not running the server versions of windows, there is a connection limit. Your many threads are probably being serialized because of the connection limit.
Also, I have not used the task parallel library yet, but my limited experience is that new threads are cheap to make in the context of networking.

These articles might explain the problem you're having:
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-are-wcf-responses-slow-and-setminthreads-does-not-work.aspx
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-does-wcf-become-slow-after-being-idle-for-15-seconds.aspx
seeing as you're using .Net 4, the first article probably doesn't apply, but as the second article points out the ThreadPool terminates idle threads after 15 seconds which might explain the problem you're having and offers a simple (though a little hacky) solution to get around it.
Whether or not you should be using the ThreadPool directly wouldn't make any difference as I suspect the task library is using it for you underneath anyway.
One third-party library we have been using for a while might help you here - Smart Thread Pool. You still get the same benefits of using the task libraries, in that you can have the return values from the threads and get any exception information from them too.
Also, you can instantiate threadpools so that when you have multiple places each needing a threadpool (so that a low priority process doesn't start eating into the quota of some high priority process) and oh yeah you can set the priority of the threads in the pool too which you can't do with the standard ThreadPool where all the threads are background threads.
You can find plenty of info on the codeplex page, I've also got a post which highlights some of the key differences:
http://theburningmonk.com/2010/03/threading-introducing-smartthreadpool/
Just on a side note, for tasks like the one you've mentioned, which might take some time to return, you probably shouldn't be using the threadpool anyway. It's recommended that we should avoid using the threadpool for any blocking tasks like that because it hogs up the threadpool which is used by all sorts of things by the framework classes, like handling timer events, etc. etc. (not to mention handling incoming WCF requests!). I feel like I'm spamming here but here's some of the info I've gathered around the use of the threadpool and some useful links at the bottom:
http://theburningmonk.com/2010/03/threading-using-the-threadpool-vs-creating-your-own-threads/
well, hope this helps!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.