I have read in MSDN that It is not guaranteed that TPL(Task parallel library) will run the logic/code in parallel. So my question is that under what situation the code will run sequentially.?
When the code is deployed into single core processor ? Or When the .Net framework thread pool starvation happens? Or When the 'hardware threads'/'logical cores' are too busy to allocate only one 'hardware thread'/ 'logical core' at that time?
It is decided by the TaskScheduler set in the ParallelOptions of the TPL methods. This lets you easily replace the TaskScheduler with a custom one that could do whatever plan for paralization you want.
The default scheduler the TPL and PLINQ uses the is the ThreadPool. It will start by using one thread then add more threads as its algorithm detects that more threads would be useful (however if your task is not CPU bound the algorithm can make some incorrect assumptions and cause you problems).
I highly recommend you read the free book Patterns for Parallel Programming, it goes in to some detail about this. However the best book that I have read that goes in to a lot of detail about how the task scheduler works is Professional Parallel Programming with C# (Chapter 8 is all about Thread Pools).
I also recommend you download the package Samples for Parallel Programing with the .NET framework, it has a whole bunch of well commented projects inside it that helps explain a lot of concepts of parallel programming.
Related
I am looking to clarify my understanding of .NET multithreading, and in particular, which .NET methods create threads which may potentially execute at the same time on different processors or cores in a multi-processor/core system.
In the .NET TPL framework you can use the methods Parallel.Invoke, or Task.Factory.StartNew to achieve some kind of parallelism.
My understanding is that in both cases .NET creates new Tasks (behind the scenes for Parallel.Invoke), which the .NET environment then allocates to managed threads behind the scenes, which are then assigned to threads, which the CPU may allocate to different cores or processors depending on the workload. The main difference between the two methods is semantics - Parallel.Invoke executes multiple tasks and waits for them to complete; Task.Factory.StartNew starts a new task in the background. In both cases, the actual work may be done on different cores or processors. As per Task Parallel Library (TPL).
I have a colleague who is convinced that only the Parallel.Invoke method allows the threads to be executed on different cores/processors, and that Task.Factory.StartNew starts a new thread but that thread will only be scheduled on one core/processor - so doesn't actually give parallelism.
I can't find any documentation or articles which explicitly state whether this is the case or not. My colleague refers me to the same articles that I am looking at, such as Task-based Asynchronous Programming, which I think validate my understanding but my colleague thinks validates his.
The documentation sometimes uses the term "parallel processing" with reference to Parallel.Invoke and "asynchronous tasks" with reference to "Task.Factory.StartNew", but as far as I understand the same thing happens in the background with regards to allocation to multiple processors/cores.
Can anyone please help to clarify the situation, if possible with links to documentation/articles.
I know this sounds like seeking resolution to an argument with a colleague, but I am genuinely looking to clarify whether or not I understand this correctly.
It's actually pretty easy to answer.
Task.Run()
Queues the specified work to run on the ThreadPool ....
Task Parallel Library
... In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, ....
Using the same ThreadPool how is it possible for the ThreadPool to determine the type of task in order to limit the CPU? Either they both run on all Processors or the they all run one a Single Processor.
Extra Credit:
This begs the question, Is the ThreadPool Multi-Core aware?
The answer is surprisingly, it doesn't care. The ThreadPool asks the operating system (just like any c# application that uses new Thread()) for a Thread, it actually the responsibility of the OS. I think it would be pretty clear by now that with all the abstraction that even suggesting that C# can by default limit how threads are used is a pretty ridiculous assertion. (Yes you can run a thread on whatever core you want etc etc, but that is not how the ThreadPool works by default).
I highly recommend reading StartNew is Dangerous... TLDR? Use Task.Run().
Although operating systems sometimes provide for "processor affinity," this is an edge-case and its use (or availability) is quite rare. So far as I am aware, .NET does not make any use of such things.
Your foundation assumption must always be: "a runnable thread/process will run where it damn well pleases," and it might switch from one CPU resource to another at any time. The .NET framework makes things a whole lot "nicer" for you in a lot of ways, but the underlying scheduling decisions are still being made – exclusively – by the host operating system.
I have a quad core PC.
I had considered programmatically of uterlising multi-core processing using the Task Parallel Library. However, when I Googled for examples I was informed that the CPU will handle this automatically and is best to leave it alone.
Now, I find another article singing the praises of this library on Code Project.
Is there any advantage to using this library?
thanks
Unless your application is actively taking advantage of parallel processing, neither the OS nor the CPU will do this for you automatically. The OS and CPU may switch execution of your application between multiple cores, but that does not make it execute simultaneously on the different cores. For that you need to make your application capable of executing at least parts in parallel.
According to MSDN Parallel Processing and Concurrency in the .NET Framework there are basically three ways to do parallel processing in .NET:
Managed threading where you handle the threads and their synchronization yourself.
Various asynchronous programming patterns.
Parallel Programming in the .NET Framework of which both the Task Parallel Library and PLINQ are a part.
Reasons for using the TPL include that it and the accompanying tools according to the MSDN article
simplify parallel development so that you can write efficient, fine-grained, and scalable parallel code in a natural idiom without having to work directly with threads or the thread pool.
Threads vs. Tasks has some help for deciding between threads and the TPL
with the conclusion:
The bottom line is that Task is almost always the best option; it provides a much more powerful API and avoids wasting OS threads.
The only reasons to explicitly create your own Threads in modern code are setting per-thread options, or maintaining a persistent thread that needs to maintain its own identity.
Task parallel Library conducts its act through Task Schedulers .You can configure your TPL to which scheduler it uses. You can write your custom task scheduler which can create one thread for one task. This way you can have configuration advantage over managing your thread . Sth similar to advantage of using Dependency Injection framework over DIY-DI.
And there are already many SO entries for difference between task and Thread
Task vs Thread
Task vs Thread
Multithreading or TPL
I'm writing a web scraper in C#. I implemented it using Thread objects that take work from a pool of tasks and process all those elements against a callback and store the result. Pool of tasks means 5K - 50K input urls.
Is there any core framework object that can deal with something like this? I tried to see if the Threadpool can be instantiated but it can't. I'm also very unsure if such a hight number of tasks can/should be queued into the default Threadpool.
So, is there anything already available in the core for creating a large number of tasks and a number of threads and have those threads process those tasks? Or should I just stick to my own. I've already reinvented a few wheels since I've been using C#.
Yes, there is.
http://msdn.microsoft.com/en-us/library/dd460717%28v=vs.110%29.aspx
Task Parallel Library (TPL)
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Tasks are queued onto the Thread pool by default using a TaskScheduler. The scheduler works to promote efficient use of available threads and processors.
You may also be interested in the Dataflow API that sits on top of it.
http://msdn.microsoft.com/en-us/library/hh228603%28v=vs.110%29.aspx
Dataflow
This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available.
Recently I've read a lot about parallel programming in .NET but I am still confused by contradicting statements over the texts on this subject.
For example, tThe popup (upon pointing a mouse on tag's icon) description of the stackoverflow.com task-parallel-library tag:
"The Task Parallel Library is part of .NET 4. It is a set of APIs tpo
enable developers to program multi-core shared memory processors"
Does this mean that multi-core-d and parallel programming applications impossible using prior versions of .NET?
Do I control a multicore/parallel usage/ditribution between cores in .NET multithreaded application?
How can I identify a core on which a thread to be run and attribute a thread to a specific core?
What has the .NET 4.0+ Task Parallel Library enabled that was impossible to do in previous versions of .NET?
Update:
Well, it was difficult to formulate specific questions but I'd like to better understand:
What is the difference in .NET between developing a multi-threaded application and parallel programming?
So far, I could not grasp the difference between them
Update2:
MSDN "Parallel Programming in the .NET Framework" starts from version .NET 4.0 and its article Task Parallel Library tells:
"Starting with the .NET Framework 4, the TPL is the preferred way to
write multithreaded and parallel code"
Can you give me hints how to specifically create parallel code in pre-.NET4 (in .NET3.5), taking into account that I am familiar with multi-threading development?
I see "multithreading" as just what the term says: using multiple threads.
"Parallel processing" would be: splitting up a group of work among multiple threads so the work can be processed in parallel.
Thus, parallel processing is a special case of multithreading.
Does this mean that multi-core-d and parallel programming applications impossible using prior versions of .NET?
Not at all. You could do it using the Thread class. It was just much harder to write, and much much harder to get it right.
Do I control a multicore/parallel usage/ditribution between cores in .NET multithreaded application?
Not really, but you don't need to. You can mess around with processor affinity for your application, but at the .NET level that's hardly ever a winning strategy.
The Task Parallel Library includes a "partitioner" concept that can be used to control the distribution of work, which is a better solution that controlling the distribution of threads over cores.
How can I identify a core on which a thread to be run and attribute a thread to a specific core?
You're not supposed to do this. A .NET thread doesn't necessarily correspond with an OS thread; you're at a higher level of abstraction than that. Now, the default .NET host does map threads 1-to-1, so if you want to depend on an undocumented implementation detail, then you can poke through the abstraction and use P/invoke to determine/drive your processor affinity. But as noted above, it's not useful.
What has the .NET 4.0+ Task Parallel Library enabled that was impossible to do in previous versions of .NET?
Nothing. But it sure has made parallel processing (and multithreading) much easier!
Can you give me hints how to specifically create parallel code in pre-.NET4 (in .NET3.5), taking into account that I am familiar with multi-threading development?
First off, there's no reason to develop for that platform. None. .NET 4.5 is already out, and the last version (.NET 4.0) supports all OSes that the next older version (.NET 3.5) did.
But if you really want to, you can do simple parallel processing by spinning up Thread objects or BackgroundWorkers, or by queueing work directly to the thread pool. All of these approaches require more code (particularly around error handling) than the Task type in the TPL.
What if i ask you "Do you write business software with your own developed language? or Do you drink water after digging your own well?"
That's the difference in writing multi threading by creating threads and manage them around while you can use abstraction over threads using TPL. Multicore and scheduling of threads on cores is maintained at OS so you don't need to worry about whether your threads are getting executed on the cores your system supports AFAIK.
Check this article, it basically sums up what was (virtually) impossible before TPL, even though many companies had brewed their own parallel processing libraries none of them had been fully optimized to take advantage of all resources of the popular architectures (simply because it's big task & Microsoft has a lot of resources + they are good). Also it's interesting to note Intel's counterpart implementation TBB vs TPL
Does this mean that multi-core-d and parallel programming applications impossible using prior versions of .NET?
Not at all. Types like Thread and ThreadPool for scheduling computations on other threads and ManualResetEvent for synchronization were there since .Net 1.
Do I control a multicore/parallel usage/ditribution between cores in .NET multithreaded application?
No, that's mostly the job of the OS. You can set ProcessorAffinity of a ProcessThread, but there is no simple way to get a ProcessThread from a Thread (because it was originally thought that .Net Threads may not directly correspond to OS threads). There is usually no reason to do this and you especially shouldn't do it for ThreadPool threads.
What has the .NET 4.0+ Task Parallel Library enabled that was impossible to do in previous versions of .NET?
I'd say it didn't make anything impossible possible. But it made lots of tasks much simpler.
You could always write your own version of ThreadPool and manually use synchronization primitives (like ManualResetEvent) for synchronization between threads. But doing that properly and efficiently is lots of error-prone work.
What is the difference in .NET between developing a multi-threaded application and parallel programming?
This is just a question of naming and doesn't have much to do with your previous questions. Parallel programming means performing multiple operations at the same time, but it doesn't say how do you achieve parallelism. For that, you could use multiple computers, or multiple processes or multiple threads, or even a single thread.
(Parallel programming on a single thread can work if the operations are not CPU-bound, like reading a file from disk or fetching some data from the internet.)
So, multi-threaded programming is a subset of parallel programming, though one that's most commonly used on .Net.
Multithreading used to be available on single-core CPUs. I believe in .NET world, "parallel programming" represents compiler/language, as well as namespace and "library" additions, that facilitate multi-core capabilities (better than before). In this sense "parallel programming" is a category under multithreading, that provides improved support for multiple CPUa/cores.
My own ponderings: at the same time I see .NET "parallel programming" to encompass not only multi-threading, but other techniques. Consider the fact that the new async/await facilities don't guarantee multi-threading, as in certain scenarios they are only an abstraction of the continuation-passing-style paradigm that could accomplish everything on a single thread. Include in the mix parallelism that comes from running different processes (potentially on different machines) and in that sense, multithreading is only a portion of the broader concept of "parallel programming".
But if you consider the .NET releases I think the former is a better explanation.
I used multiple threads in a few programs, but still don't feel very comfortable about it.
What multi-threading libraries for C#/.NET are out there and which advantages does one have over the other?
By multi-threading libraries I mean everything which helps make programming with multiple threads easier.
What .NET integratet (i.e. like ThreadPool) do you use periodically?
Which problems did you encounter?
There are various reasons for using multiple threads in an application:
UI responsiveness
Concurrent operations
Parallel speedup
The approach one should choose depends on what you're trying to do. For UI responsiveness, consider using BackgroundWorker, for example.
For concurrent operations (e.g. a server: something that doesn't have to be parallel, but probably does need to be concurrent even on a single-core system), consider using the thread pool or, if the tasks are long-lived and you need a lot of them, consider using one thread per task.
If you have a so-called embarrassingly parallel problem that can be easily divided up into small subproblems, consider using a pool of worker threads (as many threads as CPU cores) that pull tasks from a queue. The Microsoft Task Parallel Library (TPL) may help here. If the job can be easily expressed as a monadic stream computation (i.e. with a query in LINQ with work in transformations and aggregations etc.), Parallel LINQ (same link) which runs on top of TPL may help.
There are other approaches, such as Actor-style parallelism as seen in Erlang, which are harder to implement efficiently in .NET because of the lack of a green threading model or means to implement same, such as CLR-supported continuations.
I like this one
http://www.codeplex.com/smartthreadpool
Check out the Power Threading library.
I have written a lot of threading code in my days, even implemented my own threading pool & dispatcher. A lot of it is documented here:
http://web.archive.org/web/20120708232527/http://devplanet.com/blogs/brianr/default.aspx
Just realize that I wrote these for very specific purposes and tested them in those conditions, and there is no real silver-bullet.
My advise would be to get comfortable with the thread pool before you move to any other libraries. A lot of the framework code uses the thread pool, so even if you happen to find The Best Threads Library(TM), you will still have to work with the thread pool, so you really need to understand that.
You should also keep in mind that a lot of work has been put into implementing the thread pool and tuning it. The upcoming version of .NET has numerous improvements triggered by the development the parallel libraries.
In my point of view many of the "problems" with the current thread pool can be amended by knowing its strengths and weaknesses.
Please keep in mind that you really should be closing threads (or allowing the threadpool to dispose) when you no longer need them, unless you will need them again soon. The reason I say this is that each thread requires stack memory (usually 1mb), so when you have applications sitting on threads but not using them, you are wasting memory.
For exmaple, Outlook on my machine right now has 20 threads open and is using 0% CPU. That is simply a waste of (a least) 20mb of memory. Word is also using another 10 threads with 0% CPU. 30mb may not seem like much, but what if every application was wasting 10-20 threads?
Again, if you need access to a threadpool on a regular basis then you don't need to close it (creating/destroying threads has an overhead).
You don't have to use the threadpool explicitly, you can use BeginInvoke-EndInvoke if you need async calls. It uses the threadpool behind the scenes. See here: http://msdn.microsoft.com/en-us/library/2e08f6yc.aspx
You should take a look at the Concurrency & Coordination Runtime. The CCR can be a little daunting at first as it requires a slightly different mind set. This video has a fairly good job of explanation of its workings...
In my opinion this would be the way to go, and I also hear that it will use the same scheduler as the TPL.
For me the builtin classes of the Framework are more than enough. The Threadpool is odd and lame, but you can write your own easily.
I often used the BackgroundWorker class for Frontends, cause it makes life much easier - invoking is done automatically for the eventhandlers.
I regularly start of threads manually and safe them in an dictionary with a ManualResetEvent to be able to examine who of them has ended already. I use the WaitHandle.WaitAll() Method for this. Problem there is, that WaitHandle.WaitAll does not acceppt Arrays with more than 64 WaitHandles at once.
You might want to look at the series of articles about threading patterns. Right now it has sample codes for implementing a WorkerThread and a ThreadedQueue.
http://devpinoy.org/blogs/jakelite/archive/tags/Threading+Patterns/default.aspx