I have a quad core PC.
I had considered programmatically of uterlising multi-core processing using the Task Parallel Library. However, when I Googled for examples I was informed that the CPU will handle this automatically and is best to leave it alone.
Now, I find another article singing the praises of this library on Code Project.
Is there any advantage to using this library?
thanks
Unless your application is actively taking advantage of parallel processing, neither the OS nor the CPU will do this for you automatically. The OS and CPU may switch execution of your application between multiple cores, but that does not make it execute simultaneously on the different cores. For that you need to make your application capable of executing at least parts in parallel.
According to MSDN Parallel Processing and Concurrency in the .NET Framework there are basically three ways to do parallel processing in .NET:
Managed threading where you handle the threads and their synchronization yourself.
Various asynchronous programming patterns.
Parallel Programming in the .NET Framework of which both the Task Parallel Library and PLINQ are a part.
Reasons for using the TPL include that it and the accompanying tools according to the MSDN article
simplify parallel development so that you can write efficient, fine-grained, and scalable parallel code in a natural idiom without having to work directly with threads or the thread pool.
Threads vs. Tasks has some help for deciding between threads and the TPL
with the conclusion:
The bottom line is that Task is almost always the best option; it provides a much more powerful API and avoids wasting OS threads.
The only reasons to explicitly create your own Threads in modern code are setting per-thread options, or maintaining a persistent thread that needs to maintain its own identity.
Task parallel Library conducts its act through Task Schedulers .You can configure your TPL to which scheduler it uses. You can write your custom task scheduler which can create one thread for one task. This way you can have configuration advantage over managing your thread . Sth similar to advantage of using Dependency Injection framework over DIY-DI.
And there are already many SO entries for difference between task and Thread
Task vs Thread
Task vs Thread
Multithreading or TPL
Related
I'm writing a web scraper in C#. I implemented it using Thread objects that take work from a pool of tasks and process all those elements against a callback and store the result. Pool of tasks means 5K - 50K input urls.
Is there any core framework object that can deal with something like this? I tried to see if the Threadpool can be instantiated but it can't. I'm also very unsure if such a hight number of tasks can/should be queued into the default Threadpool.
So, is there anything already available in the core for creating a large number of tasks and a number of threads and have those threads process those tasks? Or should I just stick to my own. I've already reinvented a few wheels since I've been using C#.
Yes, there is.
http://msdn.microsoft.com/en-us/library/dd460717%28v=vs.110%29.aspx
Task Parallel Library (TPL)
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Tasks are queued onto the Thread pool by default using a TaskScheduler. The scheduler works to promote efficient use of available threads and processors.
You may also be interested in the Dataflow API that sits on top of it.
http://msdn.microsoft.com/en-us/library/hh228603%28v=vs.110%29.aspx
Dataflow
This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available.
I have read in MSDN that It is not guaranteed that TPL(Task parallel library) will run the logic/code in parallel. So my question is that under what situation the code will run sequentially.?
When the code is deployed into single core processor ? Or When the .Net framework thread pool starvation happens? Or When the 'hardware threads'/'logical cores' are too busy to allocate only one 'hardware thread'/ 'logical core' at that time?
It is decided by the TaskScheduler set in the ParallelOptions of the TPL methods. This lets you easily replace the TaskScheduler with a custom one that could do whatever plan for paralization you want.
The default scheduler the TPL and PLINQ uses the is the ThreadPool. It will start by using one thread then add more threads as its algorithm detects that more threads would be useful (however if your task is not CPU bound the algorithm can make some incorrect assumptions and cause you problems).
I highly recommend you read the free book Patterns for Parallel Programming, it goes in to some detail about this. However the best book that I have read that goes in to a lot of detail about how the task scheduler works is Professional Parallel Programming with C# (Chapter 8 is all about Thread Pools).
I also recommend you download the package Samples for Parallel Programing with the .NET framework, it has a whole bunch of well commented projects inside it that helps explain a lot of concepts of parallel programming.
Recently I've read a lot about parallel programming in .NET but I am still confused by contradicting statements over the texts on this subject.
For example, tThe popup (upon pointing a mouse on tag's icon) description of the stackoverflow.com task-parallel-library tag:
"The Task Parallel Library is part of .NET 4. It is a set of APIs tpo
enable developers to program multi-core shared memory processors"
Does this mean that multi-core-d and parallel programming applications impossible using prior versions of .NET?
Do I control a multicore/parallel usage/ditribution between cores in .NET multithreaded application?
How can I identify a core on which a thread to be run and attribute a thread to a specific core?
What has the .NET 4.0+ Task Parallel Library enabled that was impossible to do in previous versions of .NET?
Update:
Well, it was difficult to formulate specific questions but I'd like to better understand:
What is the difference in .NET between developing a multi-threaded application and parallel programming?
So far, I could not grasp the difference between them
Update2:
MSDN "Parallel Programming in the .NET Framework" starts from version .NET 4.0 and its article Task Parallel Library tells:
"Starting with the .NET Framework 4, the TPL is the preferred way to
write multithreaded and parallel code"
Can you give me hints how to specifically create parallel code in pre-.NET4 (in .NET3.5), taking into account that I am familiar with multi-threading development?
I see "multithreading" as just what the term says: using multiple threads.
"Parallel processing" would be: splitting up a group of work among multiple threads so the work can be processed in parallel.
Thus, parallel processing is a special case of multithreading.
Does this mean that multi-core-d and parallel programming applications impossible using prior versions of .NET?
Not at all. You could do it using the Thread class. It was just much harder to write, and much much harder to get it right.
Do I control a multicore/parallel usage/ditribution between cores in .NET multithreaded application?
Not really, but you don't need to. You can mess around with processor affinity for your application, but at the .NET level that's hardly ever a winning strategy.
The Task Parallel Library includes a "partitioner" concept that can be used to control the distribution of work, which is a better solution that controlling the distribution of threads over cores.
How can I identify a core on which a thread to be run and attribute a thread to a specific core?
You're not supposed to do this. A .NET thread doesn't necessarily correspond with an OS thread; you're at a higher level of abstraction than that. Now, the default .NET host does map threads 1-to-1, so if you want to depend on an undocumented implementation detail, then you can poke through the abstraction and use P/invoke to determine/drive your processor affinity. But as noted above, it's not useful.
What has the .NET 4.0+ Task Parallel Library enabled that was impossible to do in previous versions of .NET?
Nothing. But it sure has made parallel processing (and multithreading) much easier!
Can you give me hints how to specifically create parallel code in pre-.NET4 (in .NET3.5), taking into account that I am familiar with multi-threading development?
First off, there's no reason to develop for that platform. None. .NET 4.5 is already out, and the last version (.NET 4.0) supports all OSes that the next older version (.NET 3.5) did.
But if you really want to, you can do simple parallel processing by spinning up Thread objects or BackgroundWorkers, or by queueing work directly to the thread pool. All of these approaches require more code (particularly around error handling) than the Task type in the TPL.
What if i ask you "Do you write business software with your own developed language? or Do you drink water after digging your own well?"
That's the difference in writing multi threading by creating threads and manage them around while you can use abstraction over threads using TPL. Multicore and scheduling of threads on cores is maintained at OS so you don't need to worry about whether your threads are getting executed on the cores your system supports AFAIK.
Check this article, it basically sums up what was (virtually) impossible before TPL, even though many companies had brewed their own parallel processing libraries none of them had been fully optimized to take advantage of all resources of the popular architectures (simply because it's big task & Microsoft has a lot of resources + they are good). Also it's interesting to note Intel's counterpart implementation TBB vs TPL
Does this mean that multi-core-d and parallel programming applications impossible using prior versions of .NET?
Not at all. Types like Thread and ThreadPool for scheduling computations on other threads and ManualResetEvent for synchronization were there since .Net 1.
Do I control a multicore/parallel usage/ditribution between cores in .NET multithreaded application?
No, that's mostly the job of the OS. You can set ProcessorAffinity of a ProcessThread, but there is no simple way to get a ProcessThread from a Thread (because it was originally thought that .Net Threads may not directly correspond to OS threads). There is usually no reason to do this and you especially shouldn't do it for ThreadPool threads.
What has the .NET 4.0+ Task Parallel Library enabled that was impossible to do in previous versions of .NET?
I'd say it didn't make anything impossible possible. But it made lots of tasks much simpler.
You could always write your own version of ThreadPool and manually use synchronization primitives (like ManualResetEvent) for synchronization between threads. But doing that properly and efficiently is lots of error-prone work.
What is the difference in .NET between developing a multi-threaded application and parallel programming?
This is just a question of naming and doesn't have much to do with your previous questions. Parallel programming means performing multiple operations at the same time, but it doesn't say how do you achieve parallelism. For that, you could use multiple computers, or multiple processes or multiple threads, or even a single thread.
(Parallel programming on a single thread can work if the operations are not CPU-bound, like reading a file from disk or fetching some data from the internet.)
So, multi-threaded programming is a subset of parallel programming, though one that's most commonly used on .Net.
Multithreading used to be available on single-core CPUs. I believe in .NET world, "parallel programming" represents compiler/language, as well as namespace and "library" additions, that facilitate multi-core capabilities (better than before). In this sense "parallel programming" is a category under multithreading, that provides improved support for multiple CPUa/cores.
My own ponderings: at the same time I see .NET "parallel programming" to encompass not only multi-threading, but other techniques. Consider the fact that the new async/await facilities don't guarantee multi-threading, as in certain scenarios they are only an abstraction of the continuation-passing-style paradigm that could accomplish everything on a single thread. Include in the mix parallelism that comes from running different processes (potentially on different machines) and in that sense, multithreading is only a portion of the broader concept of "parallel programming".
But if you consider the .NET releases I think the former is a better explanation.
I'm working on a network-bound application, which is supposed to have a lot (hundreds, may be thousands) of parallel processes.
I'm looking for the best way to implement it.
When I tried setting
ThreadPool.SetMaxThreads(int.MaxValue, int.MaxValue);
and than creating 1000 threads and making those do stuff in parallel, application's execution became really jumpy.
I've heard somewhere that delegate.BeginInvoke is somehow better that new Thread(...), so I've tried it, and than opened the app in debugger, and what I've seen are parallel threads.
If I have to create lots and lots of threads, what is the best way to ensure that the application is going to run smoothly?
Have you tried the new await / async pattern in C# 5 / .NET 4.5?
I haven't got sources to hand about how this operates under the hood, but one of the most common use-cases of this new feature is waiting for IO bound stuff.
Threads are not lightweight objects. They are expensive to create and context switch to/from; hence the reason for the Thread Pool (pre-created and recycled). Most common solutions that involve networking or other IO ports utilise lower-level IO Completion Ports (there is a managed library here) to "wait" on a port, but where the thread can continue executing as normal.
BeginInvoke will utilise a Thread Pool thread, so it will be better than creating your own only if a thread is available. This approach, if used too heavily, can immediately result in thread starvation.
Setting such a high thread pool count is not going to work in the long run as threads are too heavy for what it appears you want to do.
Axum, a former Microsoft Research language, used to achieve massive parallelism that would have been suitable for this task. It operated similarly to Stackless Python or Erlang. Lots of concepts from Axum made their way into the parallelism drive into C# 5 and .NET 4.5.
Setting the ThreadPool.SetMaxThreads will only affect how many threads the thread pool has, and it won't make a difference regarding threads you create yourself with new Thread().
Go async (model, not keyword) as suggested by many.
You should follow the advice mentioned in the other answers and comments. As fsimonazzi says, creating new threads directly has nothing to do with the ThreadPool. For a quick test lower the max worker and completionPort threads and use the ThreadPool.QueueUserWorkItem method. The ThreadPool will decide what your system can handle, queue your tasks and resuse threads whenever it can.
If your tasks are not compute-bound then you should also utilize asynchronous I/O. You do not your worker threads to wait for I/O completion. You need those worker threads to return to the pool as quickly as possible and not block on I/O requests.
As some may have seen in .NET 4.0, they've added a new namespace System.Threading.Tasks which basically is what is means, a task. I've only been using it for a few days, from using ThreadPool.
Which one is more efficient and less resource consuming? (Or just better overall?)
The objective of the Tasks namespace is to provide a pluggable architecture to make multi-tasking applications easier to write and more flexible.
The implementation uses a TaskScheduler object to control the handling of tasks. This has virtual methods that you can override to create your own task handling. Methods include for instance
protected virtual void QueueTask(Task task)
public virtual int MaximumConcurrencyLevel
There will be a tiny overhead to using the default implementation as there's a wrapper around the .NET threads implementation, but I'd not expect it to be huge.
There is a (draft) implementation of a custom TaskScheduler that implements multiple tasks on a single thread here.
which one is more efficient and less
resource consuming?
Irrelevant, there will be very little difference.
(Or just better overall)
The Task class will be the easier-to-use as it offers a very clean interface for starting and joining threads, and transfers exceptions. It also supports a (limited) form of load balancing.
"Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code."
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Thread
The bare metal thing, you probably don't need to use it, you probably can use a LongRunning Task and benefit from its facilities.
Tasks
Abstraction above the Threads. It uses the thread pool (unless you specify the task as a LongRunning operation, if so, a new thread is created under the hood for you).
Thread Pool
As the name suggests: a pool of threads. Is the .NET framework handling a limited number of threads for you. Why? Because opening 100 threads to execute expensive CPU operations on a CPU with just 8 cores definitely is not a good idea. The framework will maintain this pool for you, reusing the threads (not creating/killing them at each operation), and executing some of they in parallel in a way that your CPU will not burn.
OK, but when to use each one?
In resume: always use tasks.
Task is an abstratcion, so it is a lot easier to use. I advise you to always try to use Tasks and if you face some problem that makes you need to handle a thread by yourself (probably 1% of the time) then use threads.
BUT be aware that:
I/O Bound: For I/O bound operations (database calls, read/write files, APIs calls, etc) never use normal tasks, use LongRunning tasks or threads if you need to, but not normal tasks. Because it would lead you to a thread pool with a few threads busy and a lot of another tasks waiting for its turn to take the pool.
CPU Bound: For CPU bound operations just use the normal tasks and be happy.
Scheduling is an important aspect of parallel tasks.
Unlike threads, new tasks don't necessarily begin executing immediately. Instead, they are placed in a work queue. Tasks run when their associated task scheduler removes them from the queue, usually as cores become available. The task scheduler attempts to optimize overall throughput by controlling the system's degree of concurrency. As long as there are enough tasks and the tasks are sufficiently free of serializing dependencies, the program's performance scales with the number of available cores. In this way, tasks embody the concept of potential parallelism
As I saw on msdn http://msdn.microsoft.com/en-us/library/ff963549.aspx
ThreadPool and Task difference is very simple.
To understand task you should know about the threadpool.
ThreadPool is basically help to manage and reuse the free threads. In
other words a threadpool is the collection of background thread.
Simple definition of task can be:
Task work asynchronously manages the the unit of work. In easy words
Task doesn’t create new threads. Instead it efficiently manages the
threads of a threadpool.Tasks are executed by TaskScheduler, which queues tasks onto threads.
Another good point to consider about task is, when you use ThreadPool, you don't have any way to abort or wait on the running threads (unless you do it manually in the method of thread), but using task it is possible. Please correct me if I'm wrong