I'm working on an application that process pipelines in separate threads. During my tests I have seen that if a process is "lightweight" or the CLR determines that this is going to end quickly CLR recycle this thread rapidly and various units of work can share at the same time the same thread.
On the contrary if a process take's some time or has more load CLR open different threads.
To me all that difficult TLS Thread local storage programming.
In fact my application pipelines take some time to process and it seems that CLR is always assigning one managed thread for each other. BTW if in some case two pipelines share one managed thread they will collide because they use TLS variables.
After all that here comes the real question... Can I do the assumption that If a process takes some time/load it will always use it's own thread, or am I crazy doing that?
For what I have been reading managed threads in .net 3.5 is like acting with a kind of black box. So perhaps this question can never really be responded.
EDIT:
With process I am refereing to the dictionary definition A series of actions, changes, or functions bringing about a result an not the computer process you identify in task manager.
Can I do the assumption that If a process takes some time/load it will
always use it's own thread, or am I crazy doing that
Process always uses its own threads. It's not possible access other process's thread, not that I'm aware of.
Code run from a threadpool thread should not place anything in thread-local storage which it is not going to remove via finally block. If you need to ensure that any thread-local storage used by a piece of code will die after that code finishes executing, you need to explicitly either clean up the storage or run that code in its own thread.
Related
We have an old project that we are supporting and there is an issue that occurs most probably due to multi-threading.
The original implementer 'fixed' it by doing a Thread.sleep before executing the problematic section.
The workaround works but as the section is inside a loop the thread.sleep adds multiple minutes to the time it takes for the section to finish.
In the last month we have been we have been experimenting with lower values for the sleep but we wish to find the root cause. During our investigations we were doing lock on private objects wherever we felt like that would help.
We looked for anything that might be spawning additional threads - found none.
No Thread.start and no ThreadPool usage.
What is confusing us is that during debugging we find our main thread in the middle of about 8 other threads that we don't know who spawned them.
These are background threads so first thought I had was the threadpool but as I mentioned no mention of it in the code.
It is .net 2.0 so no Asyncs.
This is just a part of the bigger application so it is a windows service but we run it as CMD to be able to debug it easily The main application itself is a windows forms desktop app.
It also uses COM+ components if that is any help.
I've tried [STA] instead of [MTA].
Also Locking as aforementioned.
MemoryBarriers as well.
We still get the issue.
The issue is basically corrupted datasets and nulls in objects where they shouldn't be.
It happens in about once every 25-100 iterations so reproduction is not straight forward but we have devised a test specifically for this issue to try to reproduce it.
All that is pointing us into the direction of thread issues.
Back to the original question -
Who could possibly by spawning those additional threads and how do we prevent these threads for being created?
Please note the threads marked with red - those are background threads and as far as we can see no mention of them in the code.
The suspected thread in the screenshot is actively modifying the cols in the dataset. Problem is - the methods calling the SetColValueOnRow function that the thread is executing are typical and don't use any kind of threading.
The CPU affinity for this application is set to 1 Core [part of the original work-around]
Thanks
Edit: The database is oracle 12c but the issues we face happen before writing to the database.
They usually happen in DataSets where a whole record or a few of its columns can be wiped once every few testing iterations
I think you need to investigate why Thread.sleep works. It does not sound like the code is by itself spawning additional threads, but you would have to go through the entire code base to find that out - including the COM+ components.
So the first thing I would do is to start up the program in debug and just press the F10 key to step into the program. Then open up the threads debug window and see if you see about the same number of threads as given in your question. If you do, then those are simply threads from the thread pool and your issue is probably unrelated to the multiple threads.
If you don't see the same number of threads, then try setting a breakpoint at various stages of the program and see if you can find where those threads are getting created. When you find where they are getting created, you can try adding some locking at that point. But, your issue still might not be caused by multiple threads corrupting memory. You should investigate until you are convinced that the issue is due to multiple threads or something else.
I suspect that the issue might be related to one or more of the COM+ components or maybe the code is calling some long running database stored procedure. In any case, I suspect the reason why Thread.sleep works is because it is giving the suspect component enough time to complete its operation before starting on the next operation.
If this theory is true, then it suggests that there is some interaction between operations and when Thread.Sleep is given a sufficiently large value to allow the operation to complete - there are no interaction issues. This also suggests that perhaps one of the COM+ components is doing some things asynchronously. The solution might be to use locks or critical sections inside the COM+ components code. Another idea is to redesign the section of code that is causing the problem to allow multiple operations simultaneously.
So, the problem you are experiencing may not be due to multiple threads in the C# code you are looking at - but might be due to a long-running operation that will sometimes fail if not given sufficient time to complete before starting the next operation. This may or may not be due to multiple threads in the C# code.
I have a .NET application which I would expect to have 5 long-running threads operating including the main thread. I can see that indeed 4 threads are newed up across the codebase, and I believe there is no direct (e.g. work item queuing / tasks) or indirect (e.g. Timers) usage of the ThreadPool anywhere. At least none I can find.
Running the app under Performance Monitor shows that the number of recognized threads stays constant at 5 (as I would expect) but the number of physical threads fluctuates between 70 and 120 over the course of about an hour!
Does anyone know why there are so many unused (as far as I can tell) physical threads? And why this number fluctuates?
I can't find any documentation that would explain this behavior so my best guess is that the ThreadPool balances itself to accommodate changing environmental factors such as free memory and resource contention but the numbers here seem excessive.
Update
A senior support engineer at Microsoft confirmed that the physical thread counter in use definitely only reports threads for the current process, despite the odd wording in MSDN. If an answer suggests this is not the case it will need to point to a definitive source.
Both ThreadPools and the GC create threads. There is a normal (or "worker") thread pool and an IO threadpool. The normal threadpool will allocate new threads as it feels it needs to to keep the threadpool responsive. It should create one thread per CPU right away, and probably one thread per second after that up to the minimum # of threads. See ThreadPool.GetMinThreads for the minimum number of worker threads the worker thread pool will create. See ThreadPool.GetAvailableThreads for the number of "active" worker threads in the worker thread pool. If you have long-running threads using worker thread-pool threads, this will make it think the thread is in use and allocate another to service future requests.
There is also a maximum # of threads in the pool, so as threads recycle back to the pool the pool may kill some off to get back down to a # it decides is best.
There is also a finalizer thread.
There are likely others that are undocumented or are a result of a library you're using.
Update:
I think part of the problem is confusion over "recognized threads" and "physical threads" and "unused threads".
Recognized threads are documented as (emphasis mine)
These threads are associated with a corresponding managed thread object. The runtime does not create these threads, but they have run inside the runtime at least once.
Physical threads are documented as (emphasis mine)
native operating system threads created and owned by the common language runtime to act as underlying threads for managed thread objects
I'm guessing that the term "unused threads" by #JRoughan refers to "physical threads"--those that aren't "recognized". Which doesn't really mean they're unused, they're just not in the recognized counter. As the documentation points out, "physical threads" are created by the runtime, and I don't believe you can tell from either of those counters whether a thread is "used" or "unused"--depending on what #JRoughan means by "unused".
Things like this do not have a simple answer. You need to investigate either under a debugger or using ETW traces.
With ETW traces, you can get events for each thread creation/destruction, optionally with call stack.
CLR itself could create threads for itself (e.g. GC threads, background GC threads, multicore JIT thread), thread pool threads, IO threads, timer thread. There is another kind of thread: gate thread.
Normally you can tell usage from the symbolic name of thread proc once symbols are resolved.
For ETW analysis, use PerfView from Microsoft.
Is the application that you are testing in performance monitor a stantalone .net application or an application under IIS? If it is a stantalone application, probably you add some extra lib/code for using performace monitor. It mays create threads.
You can use Sysinternals' Process Explorer to watch threads in your process. You can see which method in which module started the threads.
We can only speculate of course. My own bet would be about in-process COM servers. Those, and their associated threads, may be created when you use classes that wrap COM interfaces, such as the ones for directory services or WMI for example. Since they're created by native code (even though it's wrapped within a dotnet code), they're not recognized as managed threads.
Recently I worked with an external dll library where I have no influence on it.
Under some special circumstances, a method of this third party dll is blocking and never returning.
I tried to work around this issue by executing this method in a new AppDomain. After a custom timeout, I wanted to Unload the AppDomain and kill all this crap ;)
Unfortunately, it does not work - as someone would expect.
After some time it throws CannotUnloadAppDomainException since the blocking method does not allow aborting the thread gracefully.
I depend on using this library and it does not seem that there will be an update soon.
So can I work around this issue, even if it's not best practice?
Any bad hack appreciated :)
An AppDomain cannot typically solve that problem, it's only good to throw away the state of your program. The real issue is that your thread is stuck. In cases like these, calling Thread.Abort() is unlikely to work, it will just get stuck as well. A thread can only be aborted if it is a "alertable wait state", blocking on a CLR synchronization object. Or executing managed code. In a state that the CLR knows how to safely clean up. Most 3rd party code falls over like this when executing unmanaged code, no way to ever clean that up in a safe way. A decisive hint that this is the case is AppDomain.Unload failing to get the job done, it can only unload the AppDomain when it can abort the threads that are executing code in the domain.
The only good alternative is to run that code in a separate process. Which you can kill with Process.Kill(). Windows do the cleanup. You'd use a .NET interop mechanism to talk to that code. Like named pipes, sockets, remoting or WCF. Plus the considerable hassle of having to write the code that can detect the timeout, kills the process, starts it back up and recovers internal state since you now restart with an uninitialized instance of that 3rd party code.
Do not forget about the real fix. Create a small repro project that reproduces the problem. When it hangs, create a minidump of the process. Send both to the 3rd party support group.
after reading this (scroll down the end to Blocking Issues) I think your only solution is to run the method in a different process - this might involve quite a bit of refactoring and/or a 'host' project (eg Console application) that loads the method in question and makes it easy to call (eg reading args from command line) when launching the new process using the Process class
You can always use background worker, no need to create a new appdomain. This will ensure that you have complete control over the execution of the thread.
However, there is no way to ensure that you can gracefully abort the thread. As the dll is unmanaged, chances are there that it may cause memory leaks. However, spawning a new thread will ensure that your application does not crash when the Dll does not respond.
When several threads are running the same piece of code, how CLR manages to keep them overstepping each other. Is it the AppDomain that manages these threads and define boundaries between different threads even though they might be acting on same code ( and possibly data)? If so how?
TIA
Simple; for method variables (excluding captured variables, iterator blocks, etc), the variables are on the stack. Each thread has a different stack. This is no different to a recursive method on a single thread - the method variables are separate and independent per call.
For objects on the heap... it doesn't!!. No boundaries; no protection. If you don't correctly synchronize etc, you will corrupt your data.
In short, this is your job.
It is an operating system implementation detail. Windows maintains the processor context for each thread. That context contains a copy of the state of the processor registers. Really important ones that matter to your question is EIP, the Instruction Pointer, and ESP, the Stack Pointer. The instruction pointer keeps track of the machine code instructions that are executed by the thread. The stack pointer keeps track of the activation frame of the currently executing method. Every thread has its own stack.
Since each thread has its own instruction pointer, they can each execute their own code, independent of other threads. Having their own stack ensures that threads cannot stomp each others local variables. Your machine has hundreds of threads running at the same time. They take turns executing code for a while on an available CPU core. It's the operating system's job to make that work, it saves the processor state in the thread context whenever it has been running for a while, or blocks, and it is time for another thread to get a turn. Resuming that thread simply involves copying the state back from the saved context to the processor. And it continues where it left off when it was interrupted.
Threading gets tricky once threads start to access memory that's shared by all threads. In a .NET program, that's anything that's stored on the garbage collected heap as well as any static variables. Having one thread that writes such memory and other threads reading the same memory needs to be orchestrated. The lock keyword is one of the primary ways to do this.
The relevance of an AppDomain is that each one has its own garbage collected heap and 'loader heap' (the place where static variable values are stored). Which prevents threads from stomping on each other completely. It is quite equivalent to a process, without the associated operating system cost of a process. Which is quite high on Windows. AppDomains are important on custom CLR hosts, like ASP.NET and SQL Server. They help isolating client requests so that, say, one web page request that bombs with an unhandled exception cannot also corrupt the state of all other requests.
So I'm trying demonstrate to my uppers that the product contains a memory leak. However, it takes about 2 hours of running a script that touches a COM object to duplicate up to an OutOfMemoryException. In order to make this presentable, I'll need data for a baseline to show that it's not my script itself that's causing the memory problems, as well as the data to show that the behavior indeed duplicates a memory leak.
I plan to do this via a periodic report of total memory usage pooped out into a log file. For example, on this box I my Windows Task Manager -> Performance tab shows that I'm currently using 1.67GB out of 2.00GB. That's the number I need to pull into my code and dump in a log file periodically.
Only one problem... how do I get that piece of information?
Thanks for any help you can provide, even if it's to tell me it's impossible :P.
UPDATE: Thanks for the info on COM's memory issues, but the "baseline" of which I spake also touches the COM object in effectively identical ways and doesn't cause memory issues on the order of magnitude that a specific behavior does. Only answers to the question I posed would be helpful to me here.
Update:, In answer to the OP's question, class System.GC has a method for getting an estimate of the amount of memory in use:
System.GC.GetTotalMemory(false)
If you are using COM on a long-running process (i.e. no idle time) then you will experience a memory leak unless you periodically call:
Thread.CurrentThread.Join(100);
The 100 can of course be changed, but will be how long your active thread "sleeps" before resuming. From the docs:
Blocks the calling thread until a thread terminates or the specified time elapses, while continuing to perform standard COM and SendMessage pumping.
It is that last clause that is key.
Reference: http://support.microsoft.com/kb/828988
If a console application that is based on a single-threaded apartment (STA) creates and then uses STA Component Object Model (COM) components and the console application does not perform sufficient operations to pump COM messages, such as calling the Monitor.Enter method, the Thread.Join method, and others, the following symptoms may occur. Also, if the console application performs operations that that run for a long time and that do not pump messages, such as calling the Console.ReadLine method, the following symptoms may occur:
The release of COM components may be delayed.
The calls to the Finalize methods of the objects that the garbage collector collects may be delayed.
Calls to COM components may block the application thread for extended periods.
The memory amount that the STA application process uses may increase over time.
Calls to the GC.WaitForPendingFinalizers method may take a long time to return.