Does Mutex in C# busy wait? - c#

I am wondering if the Mutex object busy waits or does it context switch out (i.e. does the thread owning the mutex go to sleep and get woken up later by an interrupt), or is it architecture dependent (i.e. number of cores your machine has)? I am hoping that it actually does a context switch out. Thank you in advance.

As per the documentation, Mutex.WaitOne "Blocks the current thread until the current WaitHandle receives a signal", which means it's put to sleep.
Internally, WaitHandle.WaitOne will call WaitForSingleObjectEx from the Windows API, which:
Waits until the specified object is in the signaled state, an I/O completion routine or asynchronous procedure call (APC) is queued to the thread, or the time-out interval elapses.
Also, according to another document on Context Switches
When a running thread needs to wait, it relinquishes the remainder of its time slice.

A good answer to draw from is here: When should one use a spinlock instead of a mutex?
The answer is that it depends. The Mutex class in .Net is typically backed by the operating system, since it is a lock that can be shared between multiple processes; it is not intended to be used only within a single process.
This means that we're at the mercy of our operating system's implementation. Most modern OSes, including Windows, implement adaptive mutexes for multi-core machines.
Drawing from the above answer, we learn that implementing locking by suspending the thread is often very expensive, since it requires at least 2 context switches. On a multi-core system, we can avoid some context switches by attempting to spin-wait initially to acquire the lock - if the lock is lightly contended, you'll likely acquire the lock in the spinwait, and thus never suffer the penalty of a context switch/thread suspension. In the case that the timeout expires while the spinwait is occurring, the lock will downgrade to full thread suspension to keep from wasting too much cpu.
None of this makes sense on a single-core machine, since you'd just be burning CPU while the holder of the lock is waiting to run to finish the work it needs to do in order to release the lock. Adaptive locks are not used on single-core machines.
So, to directly answer your question - it is likely that the Mutex class does both - it'll busy-wait (spin-wait) for a short while to see if it can acquire the mutex without performing a context switch, and if it can't do so in the short amount of time it allows itself, it'll suspend the thread. It's important to note that the amount of time it'll spinwait for is usually very short, and that overall, this strategy can significantly reduce total CPU usage or increase overall lock throughput. So, even though we're burning CPU spin-waiting, we'll probably save more CPU overall.
In context of .Net, the Mutex class provides mutual exclusion, but is meant to be used between multiple processes, and thus tends to be quite slow. Specifically, the implementation of the Mutex class in the Microsoft .Net Framework, the .Net Mutex class uses the Win32 Mutex object.
Do note that the details may change depending on which implementation of .Net you're using, and on which operating system. I've tried to provide a .Net/Microsoft/Windows-centric treatment of the topic since that is the most common circumstance.
As an aside, if you only need locking within a single process, the Monitor class, or its keyword lock, should be used instead. A similar dichotomy exists for semaphores - the Semaphore class is, in the end, implemented by the operating system - it can be used for inter-process communication and thus tends to be slow; the SemaphoreSlim class is implemented natively in .Net, can be used only within a single process, and tends to be faster. On this point, a good msdn article to read is Overview of Synchronization Primitives.

Related

What is a multithreading program and how does it work?

What is a multithreading program and how does it work exactly? I read some documents but I'm confused. I know that code is executed line by line, but I can't understand how the program manages this.
A simple answer would be appreciated.c# example please (only animation!)
What is a multi-threading program and how does it work exactly?
Interesting part about this question is complete books are written on the topic, but still it is elusive to lot of people. I will try to explain in the order detailed underneath.
Please note this is just to provide a gist, an answer like this can never do justice to the depth and detail required. Regarding videos, best that I have come across are part of paid subscriptions (Wintellect and Pluralsight), check out if you can listen to them on trial basis, assuming you don't already have the subscription:
Wintellect by Jeffery Ritcher (from his Book, CLR via C#, has same chapter on Thread Fundamentals)
CLR Threading by Mike Woodring
Explanation Order
What is a thread ?
Why were threads introduced, main purpose ?
Pitfalls and how to avoid them, using Synchronization constructs ?
Thread Vs ThreadPool ?
Evolution of Multi threaded programming API, like Parallel API, Task API
Concurrent Collections, usage ?
Async-Await, thread but no thread, why they are best for IO
What is a thread ?
It is software implementation, which is purely a Windows OS concept (multi-threaded architecture), it is bare minimum unit of work. Every process on windows OS has at least one thread, every method call is done on the thread. Each process can have multiple threads, to do multiple things in parallel (provided hardware support).
Other Unix based OS are multi process architecture, in fact in Windows, even the most complex piece of software like Oracle.exe have single process with multiple threads for different critical background operations.
Why were threads introduced, main purpose ?
Contrary to the perception that concurrency is the main purpose, it was robustness that lead to the introduction of threads, imagine every process on Windows is running using same thread (in the initial 16 bit version) and out of them one process crash, that simply means system restart to recover in most of the cases. Usage of threads for concurrent operations, as multiple of them can be invoked in each process, came in picture down the line. In fact it is even important to utilize the processor with multiple cores to its full ability.
Pitfalls and how to avoid using Synchronization constructs ?
More threads means, more work completed concurrently, but issue comes, when same memory is accessed, especially for Write, as that's when it can lead to:
Memory corruption
Race condition
Also, another issue is thread is a very costly resource, each thread has a thread environment block, Kernel memory allocation. Also for scheduling each thread on a processor core, time is spent for context switching. It is quite possible that misuse can cause huge performance penalty, instead of improvement.
To avoid Thread related corruption issues, its important to use the Synchronization constructs, like lock, mutex, semaphore, based on requirement. Read is always thread safe, but Write needs appropriate Synchronization.
Thread Vs ThreadPool ?
Real threads are not the ones, we use in C#.Net, that's just the managed wrapper to invoke Win32 threads. Challenge remain in user's ability to grossly misuse, like invoking lot more than required number of threads, assigning the processor affinity, so isn't it better that we request a standard pool to queue the work item and its windows which decide when the new thread is required, when an already existing thread can schedule the work item. Thread is a costly resource, which needs to be optimized in usage, else it can be bane not boon.
Evolution of Multi threaded programming, like Parallel API, Task API
From .Net 4.0 onward, variety of new APIs Parallel.For, Parallel.ForEach for data paralellization and Task Parallelization, have made it very simple to introduce concurrency in the system. These APIs again work using a Thread pool internally. Task is more like scheduling a work for sometime in the future. Now introducing concurrency is like a breeze, though still synchronization constructs are required to avoid memory corruption, race condition or thread safe collections can be used.
Concurrent Collections, usage ?
Implementations like ConcurrentBag, ConcurrentQueue, ConcurrentDictionary, part of System.Collections.Concurrent are inherent thread safe, using spin-wait and much easier and quicker than explicit Synchronization. Also much easier to manage and work. There's another set API like ImmutableList System.Collections.Immutable, available via nuget, which are thread safe by virtue of creating another copy of data structure internally.
Async-Await, thread but no thread, why they are best for IO
This is an important aspect of concurrency meant for IO calls (disk, network), other APIs discussed till now, are meant for compute based concurrency so threads are important and make it faster, but for IO calls thread has no use except waiting for the call to return, IO calls are processed on hardware based queue IO Completion ports
A simple analogy might be found in the kitchen.
You've probably cooked using a recipe before -- start with the specified ingredients, follow the steps indicated in the recipe, and at the end you (hopefully) have a delicious dish ready to eat. If you do that, then you have executed a traditional (non-multithreaded) program.
But what if you have to cook a full meal, which includes a number of different dishes? The simple way to do it would be to start with the first recipe, do everything the recipe says, and when it's done, put the finished dish (and the first recipe) aside, then start on the second recipe, do everything it says, put the second dish (and second recipe) aside, and so on until you've gone through all of the recipes one after another. That will work, but you might end up spending 10 hours in the kitchen, and of course by the time the last dish is ready to eat, the first dish might be cold and unappetizing.
So instead you'd probably do what most chefs do, which is to start working on several recipes at the same time. For example, you might put the roast in the oven for 45 minutes, but instead of sitting in front of the oven waiting 45 minutes for the roast to cook, you'd spend the 45 minutes chopping the vegetables. When the oven timer rings, you put down your vegetable knife, pull the cooked roast out of the oven and let it cool, then go back to chopping vegetables, and so on. If you can do that, then you are successfully multitasking several recipes/programs. That is, you aren't literally working on multiple recipes at once (you still have only two hands!), but you are jumping back and forth from following one recipe to following another whenever necessary, and thereby making progress on several tasks rather than twiddling your thumbs a lot. Do this well and you can have the whole meal ready to eat in a much shorter amount of time, and everything will be hot and fresh at about the same time too. If you do this, you are executing a simple multithreaded program.
Then if you wanted to get really fancy, you might hire a few other chefs to work in the kitchen at the same time as you, so that you can get even more food prepared in a given amount of time. If you do this, your team is doing multiprocessing, with each chef taking one part of the total work and all of them working simultaneously. Note that each chef may well be working on multiple recipes (i.e. multitasking) as described in the previous paragraph.
As for how a computer does this sort of thing (no more analogies about chefs), it usually implements it using a list of ready-to-run threads and a timer. When the timer goes off (or when the thread that is currently executing has nothing to do for a while, because e.g. it is waiting to load data from a slow hard drive or something), the operating system does a context switch, in which pauses the current thread (by putting it into a list somewhere and no longer executing instructions from that thread's code anymore), then pulls another ready-to-run thread from the list of ready-to-run threads and starts executing instructions from that thread's code instead. This repeats for as long as necessary, often with context switches happening every few milliseconds, giving the illusion that multiple programs are running "at the same time" even on a single-core CPU. (On a multi-core CPU it does this same thing on each core, and in that case it's no longer just an illusion; multiple programs really are running at the same time)
Why don't you refer to Microsoft's very own documentation of the .net class System.Threading.Thread?
It has a handfull of simple example programs written in C# (at the bottom of the page) just as you asked for:
Thread Examples
actually multi thread is do multiple process at the same time together . and you can complete process parallel .
it's actually multi thread is do multiple process at the same time together . and you can complete process parallel . you can take task from your main thread then execute some other way and done .

Why does the number of unused physical threads fluctuate in a .NET application?

I have a .NET application which I would expect to have 5 long-running threads operating including the main thread. I can see that indeed 4 threads are newed up across the codebase, and I believe there is no direct (e.g. work item queuing / tasks) or indirect (e.g. Timers) usage of the ThreadPool anywhere. At least none I can find.
Running the app under Performance Monitor shows that the number of recognized threads stays constant at 5 (as I would expect) but the number of physical threads fluctuates between 70 and 120 over the course of about an hour!
Does anyone know why there are so many unused (as far as I can tell) physical threads? And why this number fluctuates?
I can't find any documentation that would explain this behavior so my best guess is that the ThreadPool balances itself to accommodate changing environmental factors such as free memory and resource contention but the numbers here seem excessive.
Update
A senior support engineer at Microsoft confirmed that the physical thread counter in use definitely only reports threads for the current process, despite the odd wording in MSDN. If an answer suggests this is not the case it will need to point to a definitive source.
Both ThreadPools and the GC create threads. There is a normal (or "worker") thread pool and an IO threadpool. The normal threadpool will allocate new threads as it feels it needs to to keep the threadpool responsive. It should create one thread per CPU right away, and probably one thread per second after that up to the minimum # of threads. See ThreadPool.GetMinThreads for the minimum number of worker threads the worker thread pool will create. See ThreadPool.GetAvailableThreads for the number of "active" worker threads in the worker thread pool. If you have long-running threads using worker thread-pool threads, this will make it think the thread is in use and allocate another to service future requests.
There is also a maximum # of threads in the pool, so as threads recycle back to the pool the pool may kill some off to get back down to a # it decides is best.
There is also a finalizer thread.
There are likely others that are undocumented or are a result of a library you're using.
Update:
I think part of the problem is confusion over "recognized threads" and "physical threads" and "unused threads".
Recognized threads are documented as (emphasis mine)
These threads are associated with a corresponding managed thread object. The runtime does not create these threads, but they have run inside the runtime at least once.
Physical threads are documented as (emphasis mine)
native operating system threads created and owned by the common language runtime to act as underlying threads for managed thread objects
I'm guessing that the term "unused threads" by #JRoughan refers to "physical threads"--those that aren't "recognized". Which doesn't really mean they're unused, they're just not in the recognized counter. As the documentation points out, "physical threads" are created by the runtime, and I don't believe you can tell from either of those counters whether a thread is "used" or "unused"--depending on what #JRoughan means by "unused".
Things like this do not have a simple answer. You need to investigate either under a debugger or using ETW traces.
With ETW traces, you can get events for each thread creation/destruction, optionally with call stack.
CLR itself could create threads for itself (e.g. GC threads, background GC threads, multicore JIT thread), thread pool threads, IO threads, timer thread. There is another kind of thread: gate thread.
Normally you can tell usage from the symbolic name of thread proc once symbols are resolved.
For ETW analysis, use PerfView from Microsoft.
Is the application that you are testing in performance monitor a stantalone .net application or an application under IIS? If it is a stantalone application, probably you add some extra lib/code for using performace monitor. It mays create threads.
You can use Sysinternals' Process Explorer to watch threads in your process. You can see which method in which module started the threads.
We can only speculate of course. My own bet would be about in-process COM servers. Those, and their associated threads, may be created when you use classes that wrap COM interfaces, such as the ones for directory services or WMI for example. Since they're created by native code (even though it's wrapped within a dotnet code), they're not recognized as managed threads.

Does Thread.Yield will let the CPU do context switch to other thread in the same process or same processor?

I see the following in Joseph Albahari's Threading book (http://www.albahari.com/threading/)
Thread.Sleep(0) relinquishes the thread’s current time slice
immediately, voluntarily handing over the CPU to other threads.
Framework 4.0’s new Thread.Yield() method does the same thing — except
that it relinquishes only to threads running on the same processor.
Is the context switch happen to some other thread within the same process or among the threads that are waiting to get CPU?
If the answer is the latter, is there any way to do context switch to some other thread that is in wait state in the same process?
I understand that the thread scheduling has been taken care by the operating system. But, got struck with a problem because of Thread.Sleep(0) and trying to find the solution for it.
Editing for more clarity about the problem:
The software has two threads (say A and B) and A will wait for a signal from B for 20 milliseconds and proceed regardless of the signal. A sets the signal and to let the processor continue with B, Thread.Sleep(0) applied as the software is a time critical application where every second maters. For a second both A and B didn't continued and restored (known with the help of the logs). We thought some other process in the same processor got the CPU time slice and now looking for alternatives.
The Thread.Yield method will switch to any thread which is ready to run on the current processor. It doesn't make any distinction about which process that Thread exists in
There is no way to yield to another thread in the same process, even by P/Invoke. Windows simply doesn't support it.
An alternative would be to use some kind of co-operative multitasking, such as TPL and async/await. When you await something, such as the awaitable object returned by Task.Yield(), it enables another task queued with the scheduler to start up. It's also quite a bit more efficient than using Thread.Yield(), but if you're not using it yet this will likely require a large overhaul of your app.
Thread.Yield() will just allow the scheduler to choose another thread within the same process that is ready to run, and resume it at whatever point it was stopped. It has nothing to do with time-slicing among processes, which is a completely different thing. (And rarely of concern unless you're programming the other process(es) as well.)
Note that the Yield() may have no effect at all, if the current thread is the only one able to run. It will just return (relatively immediately) from the Yield() call.
Your question about "context switching to another thread in the same process" is a bit mis-guided. You shouldn't think in those terms. If you need to wait for another thread to finish, use Join. If you need to signal to another thread that it should stop waiting and do something, there are a variety of mechanisms to use for that.
In short, your problem will get worse if you're trying to "outguess" the thread scheduler.
Perhaps you should be more explicit about the problem you're actually having.
Thread is a wrapper around the OS threads. Due to this scheduling of Threads is performed by OS kernel and Yield just a way to tell the kernel, that you want relinquish CPU but still stay runnable (unblocked). A kernel will consider your request as a good point to perform a rescheduling and give the CPU to some other waiting thread. OS is free to give CPU to any waiting thread from the runqueue disregard the process to which it belong. There is no way to affect to the scheduler decision unless it is your own scheduler and you use so called green threads and cooperative multitasking.
In regard to your problem: you need to use explicit synchronization if you want to achieve guaranteed results.
Yielding is a wrong way because it doesn't provide any guaranties to you.
There are a bunch of issues that can appear from its use.
For example, your thread B can simply have not enough time to accomplish its work and to send signal to A before A will be scheduled again, A can be scheduled immediately after Yield onto another CPU core, A even can be rescheduled again before the B will got a chance to be executed. Finally, other application can take a CPU. If you really care about time then raise priorities of both threads, but synchronize them explicitly.

When exactly does .NET Monitor go to kernel-mode?

I would like to compile a list of all possible conditions making Monitor go to kernel-mode or use a kernel sync object.
The sync block has a field to reference the kernel object. Hence I deducted that lock will go to kernel-mode sometimes.
I found this: Lock (Monitor) internal implementation in .NET
But it has too many questions to be answered and the only useful information is that the OP answered his own question by simply stating that the lock will go to the kernel-mode sometimes. Also there aren’t any links to anything to support that answer.
When exactly will lock go to kernel-mode (not if and not why - when)?
I am more interested to hear about .NET 4 and 4.5 if there is any difference with older versions.
From the Richter book: "A sync block contains fields for a kernel object, the owning thread’s ID, a recursion count, and a waiting threads count."
Most of these kind of questions can be answered by looking at the CLR source code as available through the SSCLI20 distribution. It is getting pretty dated by now. It is .NET 2.0 vintage, but a lot of the core CLR features haven't changed much.
The source code file you want to look at is clr/src/vm/syncblk.cpp. Three classes play a role here. AwareLock is the low-level lock implementation that takes care of acquiring the lock, SyncBlock is the class that implements the queue of threads that are waiting to enter a lock, and CLREvent is the wrapper for the operating system synchronization object, the one you are asking about.
This is C++ code and the level of abstraction is quite high. This code heavily interacts with the garbage collector and there's a lot of testing code included. So I'll give a brief description of the process.
SyncBlock has the m_Monitor member that stores the AwareLock instance. SyncBlock::Enter() directly calls AwareLock::Enter(). It first tries to acquire the lock as cheaply as possible. First checking if the thread already owns the lock and just incrementing the lock count if that's the case. Next using FastInterlockCompareExchange(), an internal function that's very similar to Interlocked.CompareExchange(). If the lock is not contended then this succeeds very quickly and Monitor.Enter() returns. If not then another thread already owns the lock, and AwareLock::EnterEpilog is used. There's a need to get the operating system's thread scheduler involved so a CLREvent is used. It is dynamically created if necessary and its WaitOne() method is called. Which will involve a kernel transition.
So there is enough to answer your question: the Monitor class enters kernel mode when the lock is contended and the thread has to wait.
When the lock is heavily contended.
If the lock is lightly contended, there is a quick CPU spinlock to wait for the lock to be free again, but if this doesn't wait long enough for the lock to be free, the thread will blocking wait on the mutex, which involves a kernel mode call to suspend the thread and other such management.
After its spinwait step, additional intelligence may exist, such as skipping spinwait on single core machines since the contested lock could only be released after releasing the thread.

Internal working of Monitor.Enter() method [duplicate]

For mastering of some technology you have to know how it's made at one abstraction level lower. In case of multithreading programming, it will be good to know about synchronization primitives.
Here is the question, how implemented Lock (Monitor) in .NET?
I'm intrested in such points:
- does it utilize OS objects?;
- does it require user mode or kernel mode?;
- what is overhead for threads that are waiting for lock?;
- in what cases threads queue that awaiting for the lock could be violated?.
Updated:
"If more than one thread contends the lock, they are queued on a “ready queue” and granted the lock on a first-come, first-served basis. Note: Nuances in the behavior of Windows and the CLR mean that the fairness of the queue can sometimes be violated." [C# 4.0 in a Nutshell, Joseph Albahari] So this is what I'm asking about in last question concerning 'violated queue'.
The Wikipedia article has a pretty good description of what a "Monitor" is, as well as its underlying technology, the Condition Variable.
Note that the .NET Monitor is a correct implementation of a condition variable; most published Win32 implementations of CVs are incorrect, even ones found in normally reputable sources such as Dr. Dobbs. This is because a CV cannot easily be built from the existing Win32 synchronization primitives.
Instead of just building a shallow (and incorrect) wrapper over the Win32 primitives, the .NET CV implementation takes advantage of the fact that it's on the .NET platform, implementing its own waiting queues, etc.
After some investigations I've found out answers to my questions. In general CodeInChaos and Henk Holterman were right, but here is some details.
When thread start to contends for a lock with other threads firstly it it does spin-wait loop for a while trying to obtain lock. All this actions performs in user-mode. Then if no success OS kernel object Event creates, thread is switched to the kernel-mode and waits for signal from this Event.
So answer to my questions are:
1. In better case no, but in worse yes (Event object lazily creates if required);
2. In general it works in user-mode but if threads compete for a lock too long, thread could be switched to kernel-mode (via Win API unmanaged function call);
3. Overhead for switch from user-mode to kernel-mode (~1000 CPU cycles);
4. Microsoft claim that it is "honest" algorithm like FIFO but it doesn't guarantee this. (E.g. If thread from 'waiting queue' will be suspended it moves to the end of queue when it would be resumed.)

Categories