Internal working of Monitor.Enter() method [duplicate]

Internal working of Monitor.Enter() method [duplicate] - c#

For mastering of some technology you have to know how it's made at one abstraction level lower. In case of multithreading programming, it will be good to know about synchronization primitives.
Here is the question, how implemented Lock (Monitor) in .NET?
I'm intrested in such points:
- does it utilize OS objects?;
- does it require user mode or kernel mode?;
- what is overhead for threads that are waiting for lock?;
- in what cases threads queue that awaiting for the lock could be violated?.
Updated:
"If more than one thread contends the lock, they are queued on a “ready queue” and granted the lock on a first-come, first-served basis. Note: Nuances in the behavior of Windows and the CLR mean that the fairness of the queue can sometimes be violated." [C# 4.0 in a Nutshell, Joseph Albahari] So this is what I'm asking about in last question concerning 'violated queue'.

The Wikipedia article has a pretty good description of what a "Monitor" is, as well as its underlying technology, the Condition Variable.
Note that the .NET Monitor is a correct implementation of a condition variable; most published Win32 implementations of CVs are incorrect, even ones found in normally reputable sources such as Dr. Dobbs. This is because a CV cannot easily be built from the existing Win32 synchronization primitives.
Instead of just building a shallow (and incorrect) wrapper over the Win32 primitives, the .NET CV implementation takes advantage of the fact that it's on the .NET platform, implementing its own waiting queues, etc.

After some investigations I've found out answers to my questions. In general CodeInChaos and Henk Holterman were right, but here is some details.
When thread start to contends for a lock with other threads firstly it it does spin-wait loop for a while trying to obtain lock. All this actions performs in user-mode. Then if no success OS kernel object Event creates, thread is switched to the kernel-mode and waits for signal from this Event.
So answer to my questions are:
1. In better case no, but in worse yes (Event object lazily creates if required);
2. In general it works in user-mode but if threads compete for a lock too long, thread could be switched to kernel-mode (via Win API unmanaged function call);
3. Overhead for switch from user-mode to kernel-mode (~1000 CPU cycles);
4. Microsoft claim that it is "honest" algorithm like FIFO but it doesn't guarantee this. (E.g. If thread from 'waiting queue' will be suspended it moves to the end of queue when it would be resumed.)

Related

What is a multithreading program and how does it work?

What is a multithreading program and how does it work exactly? I read some documents but I'm confused. I know that code is executed line by line, but I can't understand how the program manages this.
A simple answer would be appreciated.c# example please (only animation!)

What is a multi-threading program and how does it work exactly?
Interesting part about this question is complete books are written on the topic, but still it is elusive to lot of people. I will try to explain in the order detailed underneath.
Please note this is just to provide a gist, an answer like this can never do justice to the depth and detail required. Regarding videos, best that I have come across are part of paid subscriptions (Wintellect and Pluralsight), check out if you can listen to them on trial basis, assuming you don't already have the subscription:
Wintellect by Jeffery Ritcher (from his Book, CLR via C#, has same chapter on Thread Fundamentals)
CLR Threading by Mike Woodring
Explanation Order
What is a thread ?
Why were threads introduced, main purpose ?
Pitfalls and how to avoid them, using Synchronization constructs ?
Thread Vs ThreadPool ?
Evolution of Multi threaded programming API, like Parallel API, Task API
Concurrent Collections, usage ?
Async-Await, thread but no thread, why they are best for IO
What is a thread ?
It is software implementation, which is purely a Windows OS concept (multi-threaded architecture), it is bare minimum unit of work. Every process on windows OS has at least one thread, every method call is done on the thread. Each process can have multiple threads, to do multiple things in parallel (provided hardware support).
Other Unix based OS are multi process architecture, in fact in Windows, even the most complex piece of software like Oracle.exe have single process with multiple threads for different critical background operations.
Why were threads introduced, main purpose ?
Contrary to the perception that concurrency is the main purpose, it was robustness that lead to the introduction of threads, imagine every process on Windows is running using same thread (in the initial 16 bit version) and out of them one process crash, that simply means system restart to recover in most of the cases. Usage of threads for concurrent operations, as multiple of them can be invoked in each process, came in picture down the line. In fact it is even important to utilize the processor with multiple cores to its full ability.
Pitfalls and how to avoid using Synchronization constructs ?
More threads means, more work completed concurrently, but issue comes, when same memory is accessed, especially for Write, as that's when it can lead to:
Memory corruption
Race condition
Also, another issue is thread is a very costly resource, each thread has a thread environment block, Kernel memory allocation. Also for scheduling each thread on a processor core, time is spent for context switching. It is quite possible that misuse can cause huge performance penalty, instead of improvement.
To avoid Thread related corruption issues, its important to use the Synchronization constructs, like lock, mutex, semaphore, based on requirement. Read is always thread safe, but Write needs appropriate Synchronization.
Thread Vs ThreadPool ?
Real threads are not the ones, we use in C#.Net, that's just the managed wrapper to invoke Win32 threads. Challenge remain in user's ability to grossly misuse, like invoking lot more than required number of threads, assigning the processor affinity, so isn't it better that we request a standard pool to queue the work item and its windows which decide when the new thread is required, when an already existing thread can schedule the work item. Thread is a costly resource, which needs to be optimized in usage, else it can be bane not boon.
Evolution of Multi threaded programming, like Parallel API, Task API
From .Net 4.0 onward, variety of new APIs Parallel.For, Parallel.ForEach for data paralellization and Task Parallelization, have made it very simple to introduce concurrency in the system. These APIs again work using a Thread pool internally. Task is more like scheduling a work for sometime in the future. Now introducing concurrency is like a breeze, though still synchronization constructs are required to avoid memory corruption, race condition or thread safe collections can be used.
Concurrent Collections, usage ?
Implementations like ConcurrentBag, ConcurrentQueue, ConcurrentDictionary, part of System.Collections.Concurrent are inherent thread safe, using spin-wait and much easier and quicker than explicit Synchronization. Also much easier to manage and work. There's another set API like ImmutableList System.Collections.Immutable, available via nuget, which are thread safe by virtue of creating another copy of data structure internally.
Async-Await, thread but no thread, why they are best for IO
This is an important aspect of concurrency meant for IO calls (disk, network), other APIs discussed till now, are meant for compute based concurrency so threads are important and make it faster, but for IO calls thread has no use except waiting for the call to return, IO calls are processed on hardware based queue IO Completion ports

A simple analogy might be found in the kitchen.
You've probably cooked using a recipe before -- start with the specified ingredients, follow the steps indicated in the recipe, and at the end you (hopefully) have a delicious dish ready to eat. If you do that, then you have executed a traditional (non-multithreaded) program.
But what if you have to cook a full meal, which includes a number of different dishes? The simple way to do it would be to start with the first recipe, do everything the recipe says, and when it's done, put the finished dish (and the first recipe) aside, then start on the second recipe, do everything it says, put the second dish (and second recipe) aside, and so on until you've gone through all of the recipes one after another. That will work, but you might end up spending 10 hours in the kitchen, and of course by the time the last dish is ready to eat, the first dish might be cold and unappetizing.
So instead you'd probably do what most chefs do, which is to start working on several recipes at the same time. For example, you might put the roast in the oven for 45 minutes, but instead of sitting in front of the oven waiting 45 minutes for the roast to cook, you'd spend the 45 minutes chopping the vegetables. When the oven timer rings, you put down your vegetable knife, pull the cooked roast out of the oven and let it cool, then go back to chopping vegetables, and so on. If you can do that, then you are successfully multitasking several recipes/programs. That is, you aren't literally working on multiple recipes at once (you still have only two hands!), but you are jumping back and forth from following one recipe to following another whenever necessary, and thereby making progress on several tasks rather than twiddling your thumbs a lot. Do this well and you can have the whole meal ready to eat in a much shorter amount of time, and everything will be hot and fresh at about the same time too. If you do this, you are executing a simple multithreaded program.
Then if you wanted to get really fancy, you might hire a few other chefs to work in the kitchen at the same time as you, so that you can get even more food prepared in a given amount of time. If you do this, your team is doing multiprocessing, with each chef taking one part of the total work and all of them working simultaneously. Note that each chef may well be working on multiple recipes (i.e. multitasking) as described in the previous paragraph.
As for how a computer does this sort of thing (no more analogies about chefs), it usually implements it using a list of ready-to-run threads and a timer. When the timer goes off (or when the thread that is currently executing has nothing to do for a while, because e.g. it is waiting to load data from a slow hard drive or something), the operating system does a context switch, in which pauses the current thread (by putting it into a list somewhere and no longer executing instructions from that thread's code anymore), then pulls another ready-to-run thread from the list of ready-to-run threads and starts executing instructions from that thread's code instead. This repeats for as long as necessary, often with context switches happening every few milliseconds, giving the illusion that multiple programs are running "at the same time" even on a single-core CPU. (On a multi-core CPU it does this same thing on each core, and in that case it's no longer just an illusion; multiple programs really are running at the same time)

Why don't you refer to Microsoft's very own documentation of the .net class System.Threading.Thread?
It has a handfull of simple example programs written in C# (at the bottom of the page) just as you asked for:
Thread Examples

actually multi thread is do multiple process at the same time together . and you can complete process parallel .

it's actually multi thread is do multiple process at the same time together . and you can complete process parallel . you can take task from your main thread then execute some other way and done .

Does Mutex in C# busy wait?

I am wondering if the Mutex object busy waits or does it context switch out (i.e. does the thread owning the mutex go to sleep and get woken up later by an interrupt), or is it architecture dependent (i.e. number of cores your machine has)? I am hoping that it actually does a context switch out. Thank you in advance.

As per the documentation, Mutex.WaitOne "Blocks the current thread until the current WaitHandle receives a signal", which means it's put to sleep.
Internally, WaitHandle.WaitOne will call WaitForSingleObjectEx from the Windows API, which:
Waits until the specified object is in the signaled state, an I/O completion routine or asynchronous procedure call (APC) is queued to the thread, or the time-out interval elapses.
Also, according to another document on Context Switches
When a running thread needs to wait, it relinquishes the remainder of its time slice.

A good answer to draw from is here: When should one use a spinlock instead of a mutex?
The answer is that it depends. The Mutex class in .Net is typically backed by the operating system, since it is a lock that can be shared between multiple processes; it is not intended to be used only within a single process.
This means that we're at the mercy of our operating system's implementation. Most modern OSes, including Windows, implement adaptive mutexes for multi-core machines.
Drawing from the above answer, we learn that implementing locking by suspending the thread is often very expensive, since it requires at least 2 context switches. On a multi-core system, we can avoid some context switches by attempting to spin-wait initially to acquire the lock - if the lock is lightly contended, you'll likely acquire the lock in the spinwait, and thus never suffer the penalty of a context switch/thread suspension. In the case that the timeout expires while the spinwait is occurring, the lock will downgrade to full thread suspension to keep from wasting too much cpu.
None of this makes sense on a single-core machine, since you'd just be burning CPU while the holder of the lock is waiting to run to finish the work it needs to do in order to release the lock. Adaptive locks are not used on single-core machines.
So, to directly answer your question - it is likely that the Mutex class does both - it'll busy-wait (spin-wait) for a short while to see if it can acquire the mutex without performing a context switch, and if it can't do so in the short amount of time it allows itself, it'll suspend the thread. It's important to note that the amount of time it'll spinwait for is usually very short, and that overall, this strategy can significantly reduce total CPU usage or increase overall lock throughput. So, even though we're burning CPU spin-waiting, we'll probably save more CPU overall.
In context of .Net, the Mutex class provides mutual exclusion, but is meant to be used between multiple processes, and thus tends to be quite slow. Specifically, the implementation of the Mutex class in the Microsoft .Net Framework, the .Net Mutex class uses the Win32 Mutex object.
Do note that the details may change depending on which implementation of .Net you're using, and on which operating system. I've tried to provide a .Net/Microsoft/Windows-centric treatment of the topic since that is the most common circumstance.
As an aside, if you only need locking within a single process, the Monitor class, or its keyword lock, should be used instead. A similar dichotomy exists for semaphores - the Semaphore class is, in the end, implemented by the operating system - it can be used for inter-process communication and thus tends to be slow; the SemaphoreSlim class is implemented natively in .Net, can be used only within a single process, and tends to be faster. On this point, a good msdn article to read is Overview of Synchronization Primitives.

Why does the number of unused physical threads fluctuate in a .NET application?

I have a .NET application which I would expect to have 5 long-running threads operating including the main thread. I can see that indeed 4 threads are newed up across the codebase, and I believe there is no direct (e.g. work item queuing / tasks) or indirect (e.g. Timers) usage of the ThreadPool anywhere. At least none I can find.
Running the app under Performance Monitor shows that the number of recognized threads stays constant at 5 (as I would expect) but the number of physical threads fluctuates between 70 and 120 over the course of about an hour!
Does anyone know why there are so many unused (as far as I can tell) physical threads? And why this number fluctuates?
I can't find any documentation that would explain this behavior so my best guess is that the ThreadPool balances itself to accommodate changing environmental factors such as free memory and resource contention but the numbers here seem excessive.
Update
A senior support engineer at Microsoft confirmed that the physical thread counter in use definitely only reports threads for the current process, despite the odd wording in MSDN. If an answer suggests this is not the case it will need to point to a definitive source.

Both ThreadPools and the GC create threads. There is a normal (or "worker") thread pool and an IO threadpool. The normal threadpool will allocate new threads as it feels it needs to to keep the threadpool responsive. It should create one thread per CPU right away, and probably one thread per second after that up to the minimum # of threads. See ThreadPool.GetMinThreads for the minimum number of worker threads the worker thread pool will create. See ThreadPool.GetAvailableThreads for the number of "active" worker threads in the worker thread pool. If you have long-running threads using worker thread-pool threads, this will make it think the thread is in use and allocate another to service future requests.
There is also a maximum # of threads in the pool, so as threads recycle back to the pool the pool may kill some off to get back down to a # it decides is best.
There is also a finalizer thread.
There are likely others that are undocumented or are a result of a library you're using.
Update:
I think part of the problem is confusion over "recognized threads" and "physical threads" and "unused threads".
Recognized threads are documented as (emphasis mine)
These threads are associated with a corresponding managed thread object. The runtime does not create these threads, but they have run inside the runtime at least once.
Physical threads are documented as (emphasis mine)
native operating system threads created and owned by the common language runtime to act as underlying threads for managed thread objects
I'm guessing that the term "unused threads" by #JRoughan refers to "physical threads"--those that aren't "recognized". Which doesn't really mean they're unused, they're just not in the recognized counter. As the documentation points out, "physical threads" are created by the runtime, and I don't believe you can tell from either of those counters whether a thread is "used" or "unused"--depending on what #JRoughan means by "unused".

Things like this do not have a simple answer. You need to investigate either under a debugger or using ETW traces.
With ETW traces, you can get events for each thread creation/destruction, optionally with call stack.
CLR itself could create threads for itself (e.g. GC threads, background GC threads, multicore JIT thread), thread pool threads, IO threads, timer thread. There is another kind of thread: gate thread.
Normally you can tell usage from the symbolic name of thread proc once symbols are resolved.
For ETW analysis, use PerfView from Microsoft.

Is the application that you are testing in performance monitor a stantalone .net application or an application under IIS? If it is a stantalone application, probably you add some extra lib/code for using performace monitor. It mays create threads.
You can use Sysinternals' Process Explorer to watch threads in your process. You can see which method in which module started the threads.

We can only speculate of course. My own bet would be about in-process COM servers. Those, and their associated threads, may be created when you use classes that wrap COM interfaces, such as the ones for directory services or WMI for example. Since they're created by native code (even though it's wrapped within a dotnet code), they're not recognized as managed threads.

When exactly does .NET Monitor go to kernel-mode?

I would like to compile a list of all possible conditions making Monitor go to kernel-mode or use a kernel sync object.
The sync block has a field to reference the kernel object. Hence I deducted that lock will go to kernel-mode sometimes.
I found this: Lock (Monitor) internal implementation in .NET
But it has too many questions to be answered and the only useful information is that the OP answered his own question by simply stating that the lock will go to the kernel-mode sometimes. Also there aren’t any links to anything to support that answer.
When exactly will lock go to kernel-mode (not if and not why - when)?
I am more interested to hear about .NET 4 and 4.5 if there is any difference with older versions.
From the Richter book: "A sync block contains fields for a kernel object, the owning thread’s ID, a recursion count, and a waiting threads count."

Most of these kind of questions can be answered by looking at the CLR source code as available through the SSCLI20 distribution. It is getting pretty dated by now. It is .NET 2.0 vintage, but a lot of the core CLR features haven't changed much.
The source code file you want to look at is clr/src/vm/syncblk.cpp. Three classes play a role here. AwareLock is the low-level lock implementation that takes care of acquiring the lock, SyncBlock is the class that implements the queue of threads that are waiting to enter a lock, and CLREvent is the wrapper for the operating system synchronization object, the one you are asking about.
This is C++ code and the level of abstraction is quite high. This code heavily interacts with the garbage collector and there's a lot of testing code included. So I'll give a brief description of the process.
SyncBlock has the m_Monitor member that stores the AwareLock instance. SyncBlock::Enter() directly calls AwareLock::Enter(). It first tries to acquire the lock as cheaply as possible. First checking if the thread already owns the lock and just incrementing the lock count if that's the case. Next using FastInterlockCompareExchange(), an internal function that's very similar to Interlocked.CompareExchange(). If the lock is not contended then this succeeds very quickly and Monitor.Enter() returns. If not then another thread already owns the lock, and AwareLock::EnterEpilog is used. There's a need to get the operating system's thread scheduler involved so a CLREvent is used. It is dynamically created if necessary and its WaitOne() method is called. Which will involve a kernel transition.
So there is enough to answer your question: the Monitor class enters kernel mode when the lock is contended and the thread has to wait.

When the lock is heavily contended.
If the lock is lightly contended, there is a quick CPU spinlock to wait for the lock to be free again, but if this doesn't wait long enough for the lock to be free, the thread will blocking wait on the mutex, which involves a kernel mode call to suspend the thread and other such management.

After its spinwait step, additional intelligence may exist, such as skipping spinwait on single core machines since the contested lock could only be released after releasing the thread.

Are there any cases when it's preferable to use a plain old Thread object instead of one of the newer constructs?

I see a lot of people in blog posts and here on SO either avoiding or advising against the usage of the Thread class in recent versions of C# (and I mean of course 4.0+, with the addition of Task & friends). Even before, there were debates about the fact that a plain old thread's functionality can be replaced in many cases by the ThreadPool class.
Also, other specialized mechanisms are further rendering the Thread class less appealing, such as Timers replacing the ugly Thread + Sleep combo, while for GUIs we have BackgroundWorker, etc.
Still, the Thread seems to remain a very familiar concept for some people (myself included), people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class. I've been wondering lately if it's time to amend my ways.
So my question is, are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?

The Thread class cannot be made obsolete because obviously it is an implementation detail of all those other patterns you mention.
But that's not really your question; your question is
are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
Sure. In precisely those cases where one of the higher-level constructs does not meet your needs.
My advice is that if you find yourself in a situation where existing higher-abstraction tools do not meet your needs, and you wish to implement a solution using threads, then you should identify the missing abstraction that you really need, and then implement that abstraction using threads, and then use the abstraction.

Threads are a basic building block for certain things (namely parallelism and asynchrony) and thus should not be taken away. However, for most people and most use cases there are more appropriate things to use which you mentioned, such as thread pools (which provide a nice way of handling many small jobs in parallel without overloading the machine by spawning 2000 threads at once), BackgroundWorker (which encapsulates useful events for a single shortlived piece of work).
But just because in many cases those are more appropriate as they shield the programmer from needlessly reinventing the wheel, doing stupid mistakes and the like, that does not mean that the Thread class is obsolete. It is still used by the abstractions named above and you would still need it if you need fine-grained control over threads that is not covered by the more special classes.
In a similar vein, .NET doesn't forbid the use of arrays, despite List<T> being a better fit for many cases where people use arrays. Simply because you may still want to build things that are not covered by the standard lib.

Task and Thread are different abstractions. If you want to model a thread, the Thread class is still the most appropriate choice. E.g. if you need to interact with the current thread, I don't see any better types for this.
However, as you point out .NET has added several dedicated abstractions which are preferable over Thread in many cases.

The Thread class is not obsolete, it is still useful in special circumstances.
Where I work we wrote a 'background processor' as part of a content management system: a Windows service that monitors directories, e-mail addresses and RSS feeds, and every time something new shows up execute a task on it - typically to import the data.
Attempts to use the thread pool for this did not work: it tries to execute too much stuff at the same time and trash the disks, so we implemented our own polling and execution system using directly the Thread class.

The new options make direct use and management of the (expensive) threads less frequent.
people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class.
Which is a very expensive and relatively complex way of doing stuff in parallel.
Note that the expense matters most: You cannot use a full thread to do a small job, it would be counterproductive. The ThreadPool combats the costs, the Task class the complexities (exceptions, waiting and canceling).

To answer the question of "are there any cases when it's necessary or useful to use a plain old Thread object", I'd say a plain old Thread is useful (but not necessary) when you have a long running process that you won't ever interact with from a different thread.
For example, if you're writing an application that subscribes to receive messages from some sort of message queue and you're application is going to do more than just process those messages then it would be useful to use a Thread because the thread will be self-contained (i.e. you aren't waiting on it to get done), and it isn't short-lived. Using the ThreadPool class is more for queuing up a bunch of short-lived work items and allowing the ThreadPool class manage efficiently processing each one as a new Thread is available. Tasks can be used where you would use Thread directly, but in the above scenario I don't think they would buy you much. They help you interact with the thread more easily (which the above scenario doesn't need) and they help determine how many Threads actually should be used for the given set of tasks based on the number of processors you have (which isn't what you want, so you'd tell the Task your thing is LongRunning in which case in the current 4.0 implementation it would simply create a separate non-pooled Thread).

Probably not the answer you were expecting, but I use Thread all the time when coding against the .NET Micro Framework. MF is quite cut down and doesn't include higher level abstractions and the Thread class is super flexible when you need to get the last bit of performance out of a low MHz CPU.

You could compare the Thread class to ADO.NET. It's not the recommended tool for getting the job done, but its not obsolete. Other tools build on top of it to ease the job.
Its not wrong to use the Thread class over other things, especially if those things don't provide a functionality that you need.

It's not definitely obsolete.
The problem with multithreaded apps is that they are very hard to get right (often indeterministic behavior, input, output and also internal state is important), so a programmer should push as much work as possible to framework/tools. Abstract it away. But, the mortal enemy of abstraction is performance.
So my question is, are there any cases when it's necessary or useful
to use a plain old Thread object instead of one of the above
constructs?
I'd go with Threads and locks only if there will be serious performance problems, high performance goals.

I've always used the Thread class when I need to keep count and control over the threads I've spun up. I realize I could use the threadpool to hold all of my outstanding work, but I've never found a good way to keep track of how much work is currently being done or what the status is.
Instead, I create a collection and place the threads in them after I spin them up - the very last thing a thread does is remove itself from the collection. That way, I can always tell how many threads are running, and I can use the collection to ask each what it's doing. If there's a case when I need to kill them all, normally you'd have to set some kind of "Abort" flag in your application, wait for every thread to notice that on its own and self-terminate - in my case, I can walk the collection and issue a Thread.Abort to each one in turn.
In that case, I haven't found a better way that working directly with the Thread class. As Eric Lippert mentioned, the others are just higher-level abstractions, and it's appropriate to work with the lower-level classes when the available high-level implementations don't meet your need. Just as you sometimes need to do Win32 API calls when .NET doesn't address your exact needs, there will always be cases where the Thread class is the best choice despite recent "advancements."

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.