I would like to compile a list of all possible conditions making Monitor go to kernel-mode or use a kernel sync object.
The sync block has a field to reference the kernel object. Hence I deducted that lock will go to kernel-mode sometimes.
I found this: Lock (Monitor) internal implementation in .NET
But it has too many questions to be answered and the only useful information is that the OP answered his own question by simply stating that the lock will go to the kernel-mode sometimes. Also there aren’t any links to anything to support that answer.
When exactly will lock go to kernel-mode (not if and not why - when)?
I am more interested to hear about .NET 4 and 4.5 if there is any difference with older versions.
From the Richter book: "A sync block contains fields for a kernel object, the owning thread’s ID, a recursion count, and a waiting threads count."
Most of these kind of questions can be answered by looking at the CLR source code as available through the SSCLI20 distribution. It is getting pretty dated by now. It is .NET 2.0 vintage, but a lot of the core CLR features haven't changed much.
The source code file you want to look at is clr/src/vm/syncblk.cpp. Three classes play a role here. AwareLock is the low-level lock implementation that takes care of acquiring the lock, SyncBlock is the class that implements the queue of threads that are waiting to enter a lock, and CLREvent is the wrapper for the operating system synchronization object, the one you are asking about.
This is C++ code and the level of abstraction is quite high. This code heavily interacts with the garbage collector and there's a lot of testing code included. So I'll give a brief description of the process.
SyncBlock has the m_Monitor member that stores the AwareLock instance. SyncBlock::Enter() directly calls AwareLock::Enter(). It first tries to acquire the lock as cheaply as possible. First checking if the thread already owns the lock and just incrementing the lock count if that's the case. Next using FastInterlockCompareExchange(), an internal function that's very similar to Interlocked.CompareExchange(). If the lock is not contended then this succeeds very quickly and Monitor.Enter() returns. If not then another thread already owns the lock, and AwareLock::EnterEpilog is used. There's a need to get the operating system's thread scheduler involved so a CLREvent is used. It is dynamically created if necessary and its WaitOne() method is called. Which will involve a kernel transition.
So there is enough to answer your question: the Monitor class enters kernel mode when the lock is contended and the thread has to wait.
When the lock is heavily contended.
If the lock is lightly contended, there is a quick CPU spinlock to wait for the lock to be free again, but if this doesn't wait long enough for the lock to be free, the thread will blocking wait on the mutex, which involves a kernel mode call to suspend the thread and other such management.
After its spinwait step, additional intelligence may exist, such as skipping spinwait on single core machines since the contested lock could only be released after releasing the thread.
Related
I am wondering if the Mutex object busy waits or does it context switch out (i.e. does the thread owning the mutex go to sleep and get woken up later by an interrupt), or is it architecture dependent (i.e. number of cores your machine has)? I am hoping that it actually does a context switch out. Thank you in advance.
As per the documentation, Mutex.WaitOne "Blocks the current thread until the current WaitHandle receives a signal", which means it's put to sleep.
Internally, WaitHandle.WaitOne will call WaitForSingleObjectEx from the Windows API, which:
Waits until the specified object is in the signaled state, an I/O completion routine or asynchronous procedure call (APC) is queued to the thread, or the time-out interval elapses.
Also, according to another document on Context Switches
When a running thread needs to wait, it relinquishes the remainder of its time slice.
A good answer to draw from is here: When should one use a spinlock instead of a mutex?
The answer is that it depends. The Mutex class in .Net is typically backed by the operating system, since it is a lock that can be shared between multiple processes; it is not intended to be used only within a single process.
This means that we're at the mercy of our operating system's implementation. Most modern OSes, including Windows, implement adaptive mutexes for multi-core machines.
Drawing from the above answer, we learn that implementing locking by suspending the thread is often very expensive, since it requires at least 2 context switches. On a multi-core system, we can avoid some context switches by attempting to spin-wait initially to acquire the lock - if the lock is lightly contended, you'll likely acquire the lock in the spinwait, and thus never suffer the penalty of a context switch/thread suspension. In the case that the timeout expires while the spinwait is occurring, the lock will downgrade to full thread suspension to keep from wasting too much cpu.
None of this makes sense on a single-core machine, since you'd just be burning CPU while the holder of the lock is waiting to run to finish the work it needs to do in order to release the lock. Adaptive locks are not used on single-core machines.
So, to directly answer your question - it is likely that the Mutex class does both - it'll busy-wait (spin-wait) for a short while to see if it can acquire the mutex without performing a context switch, and if it can't do so in the short amount of time it allows itself, it'll suspend the thread. It's important to note that the amount of time it'll spinwait for is usually very short, and that overall, this strategy can significantly reduce total CPU usage or increase overall lock throughput. So, even though we're burning CPU spin-waiting, we'll probably save more CPU overall.
In context of .Net, the Mutex class provides mutual exclusion, but is meant to be used between multiple processes, and thus tends to be quite slow. Specifically, the implementation of the Mutex class in the Microsoft .Net Framework, the .Net Mutex class uses the Win32 Mutex object.
Do note that the details may change depending on which implementation of .Net you're using, and on which operating system. I've tried to provide a .Net/Microsoft/Windows-centric treatment of the topic since that is the most common circumstance.
As an aside, if you only need locking within a single process, the Monitor class, or its keyword lock, should be used instead. A similar dichotomy exists for semaphores - the Semaphore class is, in the end, implemented by the operating system - it can be used for inter-process communication and thus tends to be slow; the SemaphoreSlim class is implemented natively in .Net, can be used only within a single process, and tends to be faster. On this point, a good msdn article to read is Overview of Synchronization Primitives.
I have a thread reading from a specific plc's memory and it works perfectly. Now what I want is to start another thread to test the behavior of the system (simulate the first thread) in case of a conectivity issue, and when everything is Ok, continue the first thread. But I think I'll have problems with that because these two threads will need to use the same port.
My first idea was to abort the first thread, start the second one and when the everything's OK again, abort this thread and 'restart' the first one.
I've read some other forums and people say that aborting or suspending a thread is the worst solution, and I've read about syncronization of threads but I dont really know if this is useful in this case because I've never used it.
My question is, what is the correct way to solve this kind of situations?
You have a shared resource that you need to coordinate thread access to. There are a number of mechanisms in .NET available for that coordination.
There is a wonderful resource that provides both an introduction to thread concepts in .NET, and discusses advanced concepts in an approachable manner
http://www.albahari.com/threading/
In your case, have a look at the section on locking
Exclusive locking is used to ensure that only one thread can enter particular sections of code at a time. The two main exclusive locking constructs are lock and Mutex. Of the two, the lock construct is faster and more convenient. Mutex, though, has a niche in that its lock can span applications in different processes on the computer.
http://www.albahari.com/threading/part2.aspx#_Locking
You can structure your two threads so that they must acquire a specific lock to work with the port. Have your first thread release that lock before you start the second thread, then have the first thread wait to acquire that lock again (which the second thread will hold until done).
I have a .NET application which I would expect to have 5 long-running threads operating including the main thread. I can see that indeed 4 threads are newed up across the codebase, and I believe there is no direct (e.g. work item queuing / tasks) or indirect (e.g. Timers) usage of the ThreadPool anywhere. At least none I can find.
Running the app under Performance Monitor shows that the number of recognized threads stays constant at 5 (as I would expect) but the number of physical threads fluctuates between 70 and 120 over the course of about an hour!
Does anyone know why there are so many unused (as far as I can tell) physical threads? And why this number fluctuates?
I can't find any documentation that would explain this behavior so my best guess is that the ThreadPool balances itself to accommodate changing environmental factors such as free memory and resource contention but the numbers here seem excessive.
Update
A senior support engineer at Microsoft confirmed that the physical thread counter in use definitely only reports threads for the current process, despite the odd wording in MSDN. If an answer suggests this is not the case it will need to point to a definitive source.
Both ThreadPools and the GC create threads. There is a normal (or "worker") thread pool and an IO threadpool. The normal threadpool will allocate new threads as it feels it needs to to keep the threadpool responsive. It should create one thread per CPU right away, and probably one thread per second after that up to the minimum # of threads. See ThreadPool.GetMinThreads for the minimum number of worker threads the worker thread pool will create. See ThreadPool.GetAvailableThreads for the number of "active" worker threads in the worker thread pool. If you have long-running threads using worker thread-pool threads, this will make it think the thread is in use and allocate another to service future requests.
There is also a maximum # of threads in the pool, so as threads recycle back to the pool the pool may kill some off to get back down to a # it decides is best.
There is also a finalizer thread.
There are likely others that are undocumented or are a result of a library you're using.
Update:
I think part of the problem is confusion over "recognized threads" and "physical threads" and "unused threads".
Recognized threads are documented as (emphasis mine)
These threads are associated with a corresponding managed thread object. The runtime does not create these threads, but they have run inside the runtime at least once.
Physical threads are documented as (emphasis mine)
native operating system threads created and owned by the common language runtime to act as underlying threads for managed thread objects
I'm guessing that the term "unused threads" by #JRoughan refers to "physical threads"--those that aren't "recognized". Which doesn't really mean they're unused, they're just not in the recognized counter. As the documentation points out, "physical threads" are created by the runtime, and I don't believe you can tell from either of those counters whether a thread is "used" or "unused"--depending on what #JRoughan means by "unused".
Things like this do not have a simple answer. You need to investigate either under a debugger or using ETW traces.
With ETW traces, you can get events for each thread creation/destruction, optionally with call stack.
CLR itself could create threads for itself (e.g. GC threads, background GC threads, multicore JIT thread), thread pool threads, IO threads, timer thread. There is another kind of thread: gate thread.
Normally you can tell usage from the symbolic name of thread proc once symbols are resolved.
For ETW analysis, use PerfView from Microsoft.
Is the application that you are testing in performance monitor a stantalone .net application or an application under IIS? If it is a stantalone application, probably you add some extra lib/code for using performace monitor. It mays create threads.
You can use Sysinternals' Process Explorer to watch threads in your process. You can see which method in which module started the threads.
We can only speculate of course. My own bet would be about in-process COM servers. Those, and their associated threads, may be created when you use classes that wrap COM interfaces, such as the ones for directory services or WMI for example. Since they're created by native code (even though it's wrapped within a dotnet code), they're not recognized as managed threads.
For mastering of some technology you have to know how it's made at one abstraction level lower. In case of multithreading programming, it will be good to know about synchronization primitives.
Here is the question, how implemented Lock (Monitor) in .NET?
I'm intrested in such points:
- does it utilize OS objects?;
- does it require user mode or kernel mode?;
- what is overhead for threads that are waiting for lock?;
- in what cases threads queue that awaiting for the lock could be violated?.
Updated:
"If more than one thread contends the lock, they are queued on a “ready queue” and granted the lock on a first-come, first-served basis. Note: Nuances in the behavior of Windows and the CLR mean that the fairness of the queue can sometimes be violated." [C# 4.0 in a Nutshell, Joseph Albahari] So this is what I'm asking about in last question concerning 'violated queue'.
The Wikipedia article has a pretty good description of what a "Monitor" is, as well as its underlying technology, the Condition Variable.
Note that the .NET Monitor is a correct implementation of a condition variable; most published Win32 implementations of CVs are incorrect, even ones found in normally reputable sources such as Dr. Dobbs. This is because a CV cannot easily be built from the existing Win32 synchronization primitives.
Instead of just building a shallow (and incorrect) wrapper over the Win32 primitives, the .NET CV implementation takes advantage of the fact that it's on the .NET platform, implementing its own waiting queues, etc.
After some investigations I've found out answers to my questions. In general CodeInChaos and Henk Holterman were right, but here is some details.
When thread start to contends for a lock with other threads firstly it it does spin-wait loop for a while trying to obtain lock. All this actions performs in user-mode. Then if no success OS kernel object Event creates, thread is switched to the kernel-mode and waits for signal from this Event.
So answer to my questions are:
1. In better case no, but in worse yes (Event object lazily creates if required);
2. In general it works in user-mode but if threads compete for a lock too long, thread could be switched to kernel-mode (via Win API unmanaged function call);
3. Overhead for switch from user-mode to kernel-mode (~1000 CPU cycles);
4. Microsoft claim that it is "honest" algorithm like FIFO but it doesn't guarantee this. (E.g. If thread from 'waiting queue' will be suspended it moves to the end of queue when it would be resumed.)
I have an application that uses a Mutex for cross process synchronization of a block of code. This mechanism works great for the applications current needs. In the worst case I have noticed that about 6 threads can backup on the Mutex. It takes about 2-3 seconds to execute the synchronized code block.
I just received a new requirement that is asking to create a priority feature to the Mutex such that occasionally some requests of the Mutex can be deemed more important then the rest. When one of these higher priority threads comes in the desired functionality is for the Mutex to grant acquisition to the higher priority request instead of the lower.
So is there anyway to control the blocked Mutex queue that Windows maintains? Should I consider using a different threading model?
Thanks,
Matt
Using just the Mutex this will be tough one to solve, I am sure someone out there is thinking about thread priorities etc. but I would probably not consider this route.
One option would be to maintain a shared memory structure and implement a simple priority queue. The shared memory can use a MemoryMappedFile, then when a process wants to execute the section of code it puts a token with a priority on the priority queue and then when it wakes up each thread inspects the priority queue to check the first token in the queue if the token belongs to the process it can dequeue the token and execute the code.
Mutex isnt that great for a number of reasons, and as far as i know, there is no way to change promote one thread over another while they are running, nor a nice way to accomodate your requirement.
I just read Jeffrey Richters "clr via c# 3", and there are a load of great thread sync constructs in there, and lots of good threading advice generally.
I wish i could remember enough of it to answer your question, but i doubt i would get it across as well as he can. check out his website: http://www.wintellect.com/ or search for some of his concurrent affairs articles.
they will definitely help.
Give each thread an AutoResetEvent. Then instead of waiting on a mutex, each thread adds its ARE to to a sorted list. If there is only one ARE on the list, fire the event, else wait for its ARE to fire. When a thread finishes processing, it removes its ARE from the list and fires the next one. Be sure to synchronize the list.