What is a multithreading program and how does it work exactly? I read some documents but I'm confused. I know that code is executed line by line, but I can't understand how the program manages this.
A simple answer would be appreciated.c# example please (only animation!)
What is a multi-threading program and how does it work exactly?
Interesting part about this question is complete books are written on the topic, but still it is elusive to lot of people. I will try to explain in the order detailed underneath.
Please note this is just to provide a gist, an answer like this can never do justice to the depth and detail required. Regarding videos, best that I have come across are part of paid subscriptions (Wintellect and Pluralsight), check out if you can listen to them on trial basis, assuming you don't already have the subscription:
Wintellect by Jeffery Ritcher (from his Book, CLR via C#, has same chapter on Thread Fundamentals)
CLR Threading by Mike Woodring
Explanation Order
What is a thread ?
Why were threads introduced, main purpose ?
Pitfalls and how to avoid them, using Synchronization constructs ?
Thread Vs ThreadPool ?
Evolution of Multi threaded programming API, like Parallel API, Task API
Concurrent Collections, usage ?
Async-Await, thread but no thread, why they are best for IO
What is a thread ?
It is software implementation, which is purely a Windows OS concept (multi-threaded architecture), it is bare minimum unit of work. Every process on windows OS has at least one thread, every method call is done on the thread. Each process can have multiple threads, to do multiple things in parallel (provided hardware support).
Other Unix based OS are multi process architecture, in fact in Windows, even the most complex piece of software like Oracle.exe have single process with multiple threads for different critical background operations.
Why were threads introduced, main purpose ?
Contrary to the perception that concurrency is the main purpose, it was robustness that lead to the introduction of threads, imagine every process on Windows is running using same thread (in the initial 16 bit version) and out of them one process crash, that simply means system restart to recover in most of the cases. Usage of threads for concurrent operations, as multiple of them can be invoked in each process, came in picture down the line. In fact it is even important to utilize the processor with multiple cores to its full ability.
Pitfalls and how to avoid using Synchronization constructs ?
More threads means, more work completed concurrently, but issue comes, when same memory is accessed, especially for Write, as that's when it can lead to:
Memory corruption
Race condition
Also, another issue is thread is a very costly resource, each thread has a thread environment block, Kernel memory allocation. Also for scheduling each thread on a processor core, time is spent for context switching. It is quite possible that misuse can cause huge performance penalty, instead of improvement.
To avoid Thread related corruption issues, its important to use the Synchronization constructs, like lock, mutex, semaphore, based on requirement. Read is always thread safe, but Write needs appropriate Synchronization.
Thread Vs ThreadPool ?
Real threads are not the ones, we use in C#.Net, that's just the managed wrapper to invoke Win32 threads. Challenge remain in user's ability to grossly misuse, like invoking lot more than required number of threads, assigning the processor affinity, so isn't it better that we request a standard pool to queue the work item and its windows which decide when the new thread is required, when an already existing thread can schedule the work item. Thread is a costly resource, which needs to be optimized in usage, else it can be bane not boon.
Evolution of Multi threaded programming, like Parallel API, Task API
From .Net 4.0 onward, variety of new APIs Parallel.For, Parallel.ForEach for data paralellization and Task Parallelization, have made it very simple to introduce concurrency in the system. These APIs again work using a Thread pool internally. Task is more like scheduling a work for sometime in the future. Now introducing concurrency is like a breeze, though still synchronization constructs are required to avoid memory corruption, race condition or thread safe collections can be used.
Concurrent Collections, usage ?
Implementations like ConcurrentBag, ConcurrentQueue, ConcurrentDictionary, part of System.Collections.Concurrent are inherent thread safe, using spin-wait and much easier and quicker than explicit Synchronization. Also much easier to manage and work. There's another set API like ImmutableList System.Collections.Immutable, available via nuget, which are thread safe by virtue of creating another copy of data structure internally.
Async-Await, thread but no thread, why they are best for IO
This is an important aspect of concurrency meant for IO calls (disk, network), other APIs discussed till now, are meant for compute based concurrency so threads are important and make it faster, but for IO calls thread has no use except waiting for the call to return, IO calls are processed on hardware based queue IO Completion ports
A simple analogy might be found in the kitchen.
You've probably cooked using a recipe before -- start with the specified ingredients, follow the steps indicated in the recipe, and at the end you (hopefully) have a delicious dish ready to eat. If you do that, then you have executed a traditional (non-multithreaded) program.
But what if you have to cook a full meal, which includes a number of different dishes? The simple way to do it would be to start with the first recipe, do everything the recipe says, and when it's done, put the finished dish (and the first recipe) aside, then start on the second recipe, do everything it says, put the second dish (and second recipe) aside, and so on until you've gone through all of the recipes one after another. That will work, but you might end up spending 10 hours in the kitchen, and of course by the time the last dish is ready to eat, the first dish might be cold and unappetizing.
So instead you'd probably do what most chefs do, which is to start working on several recipes at the same time. For example, you might put the roast in the oven for 45 minutes, but instead of sitting in front of the oven waiting 45 minutes for the roast to cook, you'd spend the 45 minutes chopping the vegetables. When the oven timer rings, you put down your vegetable knife, pull the cooked roast out of the oven and let it cool, then go back to chopping vegetables, and so on. If you can do that, then you are successfully multitasking several recipes/programs. That is, you aren't literally working on multiple recipes at once (you still have only two hands!), but you are jumping back and forth from following one recipe to following another whenever necessary, and thereby making progress on several tasks rather than twiddling your thumbs a lot. Do this well and you can have the whole meal ready to eat in a much shorter amount of time, and everything will be hot and fresh at about the same time too. If you do this, you are executing a simple multithreaded program.
Then if you wanted to get really fancy, you might hire a few other chefs to work in the kitchen at the same time as you, so that you can get even more food prepared in a given amount of time. If you do this, your team is doing multiprocessing, with each chef taking one part of the total work and all of them working simultaneously. Note that each chef may well be working on multiple recipes (i.e. multitasking) as described in the previous paragraph.
As for how a computer does this sort of thing (no more analogies about chefs), it usually implements it using a list of ready-to-run threads and a timer. When the timer goes off (or when the thread that is currently executing has nothing to do for a while, because e.g. it is waiting to load data from a slow hard drive or something), the operating system does a context switch, in which pauses the current thread (by putting it into a list somewhere and no longer executing instructions from that thread's code anymore), then pulls another ready-to-run thread from the list of ready-to-run threads and starts executing instructions from that thread's code instead. This repeats for as long as necessary, often with context switches happening every few milliseconds, giving the illusion that multiple programs are running "at the same time" even on a single-core CPU. (On a multi-core CPU it does this same thing on each core, and in that case it's no longer just an illusion; multiple programs really are running at the same time)
Why don't you refer to Microsoft's very own documentation of the .net class System.Threading.Thread?
It has a handfull of simple example programs written in C# (at the bottom of the page) just as you asked for:
Thread Examples
actually multi thread is do multiple process at the same time together . and you can complete process parallel .
it's actually multi thread is do multiple process at the same time together . and you can complete process parallel . you can take task from your main thread then execute some other way and done .
Consider having an application which creates 30 app-domains, then runs them (each app-domain in its own thread) and when each of these app-domains finishes running (aka its thread exits and so on) we need to cleanup by running for each appdomain some custom cleanup-logic + a call for unloading the appdomain itself.
The cleanup logic + appdomain-unloading call for each of these app-domains might require more than one attempts to succeed (due to resources being involved, taking time to be released by the system and so on). If the cleanup operation cannot be performed in a specific attempt, it doesn't take more than 100ms for us to know about this and move on.
What is the best practice, in the world of C#, to perform such cleanup in a 'perform-cleanup-in-the-background' fashion? Possible venues off the top of my head:
Each app-domain cleanup should be performed in its very own 'new Thread()' corner. Each thread persists in a while loop with a sleep interval in case it needs to retry.
Have just a single, dedicated long-running thread with a task-queue in which we submit each and every appdomain to be cleaned-up and unloaded (again in a persistent fashion like in method #1 above).
Using a thread-pool and submitting the cleanup-tasks there
According to the following comment:
https://stackoverflow.com/a/28651533/863651
"If a method cannot be expected to exit within 100ms or so of when it starts execution, the method should be executed via some means other than the main thread pool.[ ... ] If, however, a method will take a second or longer to execute, and will spend most of its time blocked, the method should likely be run in a dedicated thread, and should almost certainly not be run in a main-threadpool thread."
I guess this discourages using method #3 above. I'm wondering if method#2 has any considerable advantages over method #1. The main thing that bugs me is that even though method#2 needs slightly more coding it uses just 1 thread no matter what, while method#2 will require N threads for N app-domains (with all the cost that this entails in terms of spawning the threads etc).
I'm open to suggestions about any method#4+ that there might be to implement such mechanism. I'm just curious to see how other programmers apply the concept of "best threading-practices" when it comes to such a problem.
Thanks in advance.
P.S.: This application is meant to be run in contemporary desktop computers (at the time of this writing).
Presuming you know what will need to be cleaned up at some point when you're managing the AppDomains, you can attach cleanup logic to the DomainUnload event like:
appDomain.DomainUnload +=
(sender, args) =>
{
//this logic needs to be specific for each AppDomain.
//you can consider using a class like BackgroundWorker to do the work.
};
For the single vs. multi-thread question, as long as you make your code thread safe, then my take is that the multi-threaded option would both reduce the lines of code (always a goal of mine) and optimize performance, especially if running cleanup as a background thread. 10s of threads is really a non-issue as far as tapping out system resources (unless cleanup operations may exhaust memory). The main drawback is that multi-threaded programs can sometimes be trickier to debug, but I bet you'll be okay in this case.
I'm working on an application that process pipelines in separate threads. During my tests I have seen that if a process is "lightweight" or the CLR determines that this is going to end quickly CLR recycle this thread rapidly and various units of work can share at the same time the same thread.
On the contrary if a process take's some time or has more load CLR open different threads.
To me all that difficult TLS Thread local storage programming.
In fact my application pipelines take some time to process and it seems that CLR is always assigning one managed thread for each other. BTW if in some case two pipelines share one managed thread they will collide because they use TLS variables.
After all that here comes the real question... Can I do the assumption that If a process takes some time/load it will always use it's own thread, or am I crazy doing that?
For what I have been reading managed threads in .net 3.5 is like acting with a kind of black box. So perhaps this question can never really be responded.
EDIT:
With process I am refereing to the dictionary definition A series of actions, changes, or functions bringing about a result an not the computer process you identify in task manager.
Can I do the assumption that If a process takes some time/load it will
always use it's own thread, or am I crazy doing that
Process always uses its own threads. It's not possible access other process's thread, not that I'm aware of.
Code run from a threadpool thread should not place anything in thread-local storage which it is not going to remove via finally block. If you need to ensure that any thread-local storage used by a piece of code will die after that code finishes executing, you need to explicitly either clean up the storage or run that code in its own thread.
I was currently trying to monitor the performance of my project. When I enumerated all physical threads (Process.GetCurrentProcess().Threads) the sum of total time spent on processor was much much lower than the total processor time on the process itself. The number of threads was stable, there were almost no threads that I could miss (maybe some until I opened the monitoring window). Why is that?
I had also problem with InvalidOperationException: the thread already exited (when I read the TotalProcessorTime). However, when I looked for the thread's state, it was Waiting. How can I evade the exception?
Thanks
There are a number of threads created and distroyed by the .Net framework and Operating system which you have no control over.
The Garbage Collector for example can use multiple threads. When you call into Win32 API's (the .Net framework may do this for you) these can also fire off short lived threads.
When several threads are running the same piece of code, how CLR manages to keep them overstepping each other. Is it the AppDomain that manages these threads and define boundaries between different threads even though they might be acting on same code ( and possibly data)? If so how?
TIA
Simple; for method variables (excluding captured variables, iterator blocks, etc), the variables are on the stack. Each thread has a different stack. This is no different to a recursive method on a single thread - the method variables are separate and independent per call.
For objects on the heap... it doesn't!!. No boundaries; no protection. If you don't correctly synchronize etc, you will corrupt your data.
In short, this is your job.
It is an operating system implementation detail. Windows maintains the processor context for each thread. That context contains a copy of the state of the processor registers. Really important ones that matter to your question is EIP, the Instruction Pointer, and ESP, the Stack Pointer. The instruction pointer keeps track of the machine code instructions that are executed by the thread. The stack pointer keeps track of the activation frame of the currently executing method. Every thread has its own stack.
Since each thread has its own instruction pointer, they can each execute their own code, independent of other threads. Having their own stack ensures that threads cannot stomp each others local variables. Your machine has hundreds of threads running at the same time. They take turns executing code for a while on an available CPU core. It's the operating system's job to make that work, it saves the processor state in the thread context whenever it has been running for a while, or blocks, and it is time for another thread to get a turn. Resuming that thread simply involves copying the state back from the saved context to the processor. And it continues where it left off when it was interrupted.
Threading gets tricky once threads start to access memory that's shared by all threads. In a .NET program, that's anything that's stored on the garbage collected heap as well as any static variables. Having one thread that writes such memory and other threads reading the same memory needs to be orchestrated. The lock keyword is one of the primary ways to do this.
The relevance of an AppDomain is that each one has its own garbage collected heap and 'loader heap' (the place where static variable values are stored). Which prevents threads from stomping on each other completely. It is quite equivalent to a process, without the associated operating system cost of a process. Which is quite high on Windows. AppDomains are important on custom CLR hosts, like ASP.NET and SQL Server. They help isolating client requests so that, say, one web page request that bombs with an unhandled exception cannot also corrupt the state of all other requests.