The best way to implement massively parallel application in c# - c#

I'm working on a network-bound application, which is supposed to have a lot (hundreds, may be thousands) of parallel processes.
I'm looking for the best way to implement it.
When I tried setting
ThreadPool.SetMaxThreads(int.MaxValue, int.MaxValue);
and than creating 1000 threads and making those do stuff in parallel, application's execution became really jumpy.
I've heard somewhere that delegate.BeginInvoke is somehow better that new Thread(...), so I've tried it, and than opened the app in debugger, and what I've seen are parallel threads.
If I have to create lots and lots of threads, what is the best way to ensure that the application is going to run smoothly?

Have you tried the new await / async pattern in C# 5 / .NET 4.5?
I haven't got sources to hand about how this operates under the hood, but one of the most common use-cases of this new feature is waiting for IO bound stuff.
Threads are not lightweight objects. They are expensive to create and context switch to/from; hence the reason for the Thread Pool (pre-created and recycled). Most common solutions that involve networking or other IO ports utilise lower-level IO Completion Ports (there is a managed library here) to "wait" on a port, but where the thread can continue executing as normal.
BeginInvoke will utilise a Thread Pool thread, so it will be better than creating your own only if a thread is available. This approach, if used too heavily, can immediately result in thread starvation.
Setting such a high thread pool count is not going to work in the long run as threads are too heavy for what it appears you want to do.
Axum, a former Microsoft Research language, used to achieve massive parallelism that would have been suitable for this task. It operated similarly to Stackless Python or Erlang. Lots of concepts from Axum made their way into the parallelism drive into C# 5 and .NET 4.5.

Setting the ThreadPool.SetMaxThreads will only affect how many threads the thread pool has, and it won't make a difference regarding threads you create yourself with new Thread().
Go async (model, not keyword) as suggested by many.

You should follow the advice mentioned in the other answers and comments. As fsimonazzi says, creating new threads directly has nothing to do with the ThreadPool. For a quick test lower the max worker and completionPort threads and use the ThreadPool.QueueUserWorkItem method. The ThreadPool will decide what your system can handle, queue your tasks and resuse threads whenever it can.
If your tasks are not compute-bound then you should also utilize asynchronous I/O. You do not your worker threads to wait for I/O completion. You need those worker threads to return to the pool as quickly as possible and not block on I/O requests.

Related

Does the .Net Threadpool provide any mechanisms to avoid I/O performance degradation when a lot of CPU intensive work is scheduled?

I have a pretty specific question about the .NET threadpool.
I would say I have a pretty fair understanding of the threadpool, but one thing still boggles my mind.
Let's assume I run a web application which serves requests, but also performs a lot of heavy duty CPU-bound work by rendering / editing uploaded media.
Common advice when it comes to separating I/O and CPU bound tasks in an application would be to dispatch the CPU bound work to the .Net ThreadPool. Concrete, that would mean dispatching the call with Task.Run(...) - So far so good.
However, I do wonder, what would happen if this is done for a lot of requests. Let's say several hundreds / thousands, enough to really put an enourmous strain on a machine, and even up to the point the Threadpool just can't handle it anymore. Adding more Threads would obviously go only so far, when your CPU can't handle more. I would say at this point the Threadpool's Threads are also at the mercy of the CPU itself, and the scheduling algorithm.
What implications would this have on I/O bound async operations?
Would this cause I/O bound async operations to struggle with executing their continuation? Given we are in a runtime environment which executes async/await continuations on the Threadpool and discards the SynchronizationContext, what would ensure that these would still execute properly?
Does the Threadpool make any sophisticated assumption as to which Thread receives scheduling priority, to ensure throughput even when it's absolutely polluted with work?
It would be especially interesting to know how ASP.Net Core deals with this, since the request handlers are supposedly Threadpool Threads themselves.
Let's assume I run a web application which serves requests, but also performs a lot of heavy duty CPU-bound work by rendering / editing uploaded media.
Common advice when it comes to separating I/O and CPU bound tasks in an application would be to dispatch the CPU bound work to the .Net ThreadPool. Concrete, that would mean dispatching the call with Task.Run(...) - So far so good.
No, that's bad advice.
ASP.NET is already handling the request on a thread pool thread, so switching to another thread pool thread via Task.Run isn't going to help anything - in fact, it'll make things worse.
Task.Run is fine to offload CPU work to the thread pool when the calling method is a GUI thread. However, it's not a good idea to use Task.Run on ASP.NET, generally speaking.
However, I do wonder, what would happen if this is done for a lot of requests. Let's say several hundreds / thousands, enough to really put an enourmous strain on a machine, and even up to the point the Threadpool just can't handle it anymore. Adding more Threads would obviously go only so far, when your CPU can't handle more.
The thread pool will inject threads whenever the thread pool is over-full. However, the injection rate is limited, so the thread pool grows slowly.
What implications would this have on I/O bound async operations? Would this cause I/O bound async operations to struggle with executing their continuation? ... what would ensure that these would still execute properly?
First off, the I/O requests themselves (and their lowest-level, BCL-internal continuations) are not affected. That's because "the" thread pool is actually two thread pools: there's worker threads (that execute queued work) and there's I/O threads (that enlist in the I/O completion port and handle low-level I/O completion).
However, at some point most continuations do transition to the worker thread pool, so by the time your code continues, it needs a regular thread pool thread to do so. And yes, that means that if the (worker) thread pool is saturated, then that can starve await continuations.
Having ASP.NET handlers do heavy CPU work is unusual. The thread pool does have a lot of knobs to tweak if you do need to support it. And there's always the option of splitting the CPU-bound APIs internally into a separate API, which would give you two different ASP.NET apps: one I/O-bound and the other CPU-bound, which would let you tune the thread pool appropriately for each.

C# program processes in background

I'm currently writing my own socket server (as the console app), and as I am new to multithreading in C# I started wondering about threading and backgrounds tasks. I found some possible alternatives to Threads like (BackgroundWorker, but for UI) or Task...
I have currenctly written a process, which periodically runs in endless while loop, where its checking clients if they are still connected.
As I cannot get opinion from searching on google, so I'm asking, is running processes in background, like my client check, through the Thread and endless loop a proper way, or there are some better ways how to do it?
If you are doing things periodically then a Timer would be suitable. Note that there are several alternatives with different behaviors. The motivation is that timers are made to run things periodically, so the intent becomes obvious when reading the code.
If you are processing items from other threads a Thread looping over a blocking collection is suitable. This can me useful if the processing needs to be in-order or singlethreaded. This will lockup some resources for the thread, so should be used somewhat sparingly.
If you want to run some one-off process-intensive task(s) in the background then a Task is most appropriate. This will use a thread from the threadpool, and return the thread when done. This is also useful if you want to run different things concurrently.
If you need to wait for some IO operation, like disk or network, then tasks + async/await is appropriate. This will not use any thread at all when waiting.
If running some process-intensive work in parallel, then Parallel.For/Foreach is suitable.
I would probably not use BackgroundWorker. This was made during the windows forms era, and is mostly superseded by tasks.

Async-Await vs ThreadPool vs MultiThreading on High-Performance Sockets (C10k Solutions?)

I'm really confused about async-awaits, pools and threads. The main problem starts with this question: "What can I do when I have to handle 10k socket I/O?" (aka The C10k Problem).
First, I tried to make a custom pooling architecture with threads
that uses one main Queue and multiple Threads to process all
incoming datas. It was a great experience about understanding
thread-safety and multi-threading but thread is an overkill
with async-await nowadays.
Later, I implemented a simple architecture with async-await but I
can't understand why "The async and await keywords don't cause
additional threads to be created." (from MSDN)? I think there
must be some threads to do jobs like BackgroundWorker.
Finally, I implemented another architecture with ThreadPool and it
looks like my first custom pooling.
Now, I think there should be someone else with me who confused about handling The C10k. My project is a dedicated (central) server for my game project that is hub/lobby server like MCSG's lobbies or COD's matchmaking servers. I'll do the login operations, game server command executions/queries and information serving (like version, patch).
Last part might be more specific about my project but I really need some good suggestions about real world solutions about multiple (heavy) data handling.
(Also yes, 1k-10k-100k connection handling depending on server hardware but this is a general question)
The key point: Choosing Between the Task Parallel Library and the ThreadPool (MSDN Blog)
[ADDITIONAL] Good (basic) things to read who wants to understand what are we talking about:
Threads
Async, Await
ThreadPool
BackgroundWorker
async/await is roughly analogous to the "Serve many clients with each thread, and use asynchronous I/O and completion notification" approach in your referenced article.
While async and await by themselves do not cause any additional threads, they will make use of thread pool threads if an async method resumes on a thread pool context. Note that the async interaction with ThreadPool is highly optimized; it is very doubtful that you can use Thread or ThreadPool to get the same performance (with a reasonable time for development).
If you can, I'd recommend using an existing protocol - e.g., SignalR. This will greatly simplify your code, since there are many (many) pitfalls to writing your own TCP/IP protocol. SignalR can be self-hosted or hosted on ASP.NET.
No. If we use asynchronous programming pattern that .NET introduced in 4.5, in most of the cases we need not to create manual thread by us. The compiler does the difficult work that the developer used to do. Creating a new thread is costly, it takes time. Unless we need to control a thread, then “Task-based Asynchronous Pattern (TAP)” and “Task Parallel Library (TPL)” is good enough for asynchronous and parallel programming. TAP and TPL uses Task. In general Task uses the thread from ThreadPool(A thread pool is a collection of threads already created and maintained by .NET framework. If we use Task, most of the cases we need not to use thread pool directly. A thread can do many more useful things. You can read more about Thread Pooling
You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. Asynchrony is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
Await is specifically designed to deal with something taking time, most typically an I/O request. Which traditionally was done with a callback when the I/O request was complete. Writing code that relies on these callbacks is quite difficult, await greatly simplifies it. Await just takes care of dealing with the delay, it doesn't otherwise do anything that a thread does. The await expression, what's at the right of the await keyword, is what gets the job done. You can use Async with any method that returns a Task. The XxxxAsync() methods are just precooked ones in the .NET framework for common operations that take time. Like downloading data from a web server.
I would recommend you to read Asynchronous Programming with Async and Await

Tasks vs ThreadPool

I have an application in C# with a list of work to do. I'm looking to do as much of that work as possible in parallel. However I need to be able to control the maximum amount of parallel tasks.
From what I understand this is possible with a ThreadPool or with Tasks. Is there an difference in which one I use? My main concern is being able to control how many threads are active at one time.
Please take a look at ParallelOptions.MaxDegreeOfParallelism for Tasks.
I would advise you to use Tasks, because they provide a higher level abstraction than the ThreadPool.
A very good read on the topic can be found here. Really, a must-have book and it's free on top of that :)
In TPL you can use the WithDegreeOfParallelism on a ParallelEnumerable or ParallelOptions.MaxDegreeOfParallism
There is also the CountdownEvent which may be a better option if you are just using custom threads or tasks.
In the ThreadPool, when you use SetMaxThreads its global for the AppDomain so you could potentially be limiting unrelated code unnecessarily.
You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer.
If the common language runtime is hosted, for example by Internet Information Services (IIS) or SQL Server, the host can limit or prevent changes to the thread pool size.
Use caution when changing the maximum number of threads in the thread pool. While your code might benefit, the changes might have an adverse effect on code libraries you use.
Setting the thread pool size too large can cause performance problems. If too many threads are executing at the same time, the task switching overhead becomes a significant factor.
I agree with the other answer that you should use TPL over the ThreadPool as its a better abstraction of multi-threading, but its possible to accomplish what you want in both.
In this article on msdn, they explain why they recommend Tasks instead of ThreadPool for Parallelism.
Task have a very charming feature to me, you can build chains of tasks. Which are executed on certain results of the task before.
A feature I often use is following: Task A is running in background to do some long running work. I chain Task B after it, only executing when Task A has finished regulary and I configure it to run in the foreground, so I can easily update my controls with the result of long running Task A.
You can also create a semaphore to control how many threads can execute at a single time. You can create a new semaphore and in the constructor specify how many simultaneous threads are able to use that semaphore at a single time. Since I don't know how you are going to be using the threads, this would be a good starting point.
MSDN Article on the Semaphore class
-Wesley

C# - ThreadPool vs Tasks

As some may have seen in .NET 4.0, they've added a new namespace System.Threading.Tasks which basically is what is means, a task. I've only been using it for a few days, from using ThreadPool.
Which one is more efficient and less resource consuming? (Or just better overall?)
The objective of the Tasks namespace is to provide a pluggable architecture to make multi-tasking applications easier to write and more flexible.
The implementation uses a TaskScheduler object to control the handling of tasks. This has virtual methods that you can override to create your own task handling. Methods include for instance
protected virtual void QueueTask(Task task)
public virtual int MaximumConcurrencyLevel
There will be a tiny overhead to using the default implementation as there's a wrapper around the .NET threads implementation, but I'd not expect it to be huge.
There is a (draft) implementation of a custom TaskScheduler that implements multiple tasks on a single thread here.
which one is more efficient and less
resource consuming?
Irrelevant, there will be very little difference.
(Or just better overall)
The Task class will be the easier-to-use as it offers a very clean interface for starting and joining threads, and transfers exceptions. It also supports a (limited) form of load balancing.
"Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code."
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Thread
The bare metal thing, you probably don't need to use it, you probably can use a LongRunning Task and benefit from its facilities.
Tasks
Abstraction above the Threads. It uses the thread pool (unless you specify the task as a LongRunning operation, if so, a new thread is created under the hood for you).
Thread Pool
As the name suggests: a pool of threads. Is the .NET framework handling a limited number of threads for you. Why? Because opening 100 threads to execute expensive CPU operations on a CPU with just 8 cores definitely is not a good idea. The framework will maintain this pool for you, reusing the threads (not creating/killing them at each operation), and executing some of they in parallel in a way that your CPU will not burn.
OK, but when to use each one?
In resume: always use tasks.
Task is an abstratcion, so it is a lot easier to use. I advise you to always try to use Tasks and if you face some problem that makes you need to handle a thread by yourself (probably 1% of the time) then use threads.
BUT be aware that:
I/O Bound: For I/O bound operations (database calls, read/write files, APIs calls, etc) never use normal tasks, use LongRunning tasks or threads if you need to, but not normal tasks. Because it would lead you to a thread pool with a few threads busy and a lot of another tasks waiting for its turn to take the pool.
CPU Bound: For CPU bound operations just use the normal tasks and be happy.
Scheduling is an important aspect of parallel tasks.
Unlike threads, new tasks don't necessarily begin executing immediately. Instead, they are placed in a work queue. Tasks run when their associated task scheduler removes them from the queue, usually as cores become available. The task scheduler attempts to optimize overall throughput by controlling the system's degree of concurrency. As long as there are enough tasks and the tasks are sufficiently free of serializing dependencies, the program's performance scales with the number of available cores. In this way, tasks embody the concept of potential parallelism
As I saw on msdn http://msdn.microsoft.com/en-us/library/ff963549.aspx
ThreadPool and Task difference is very simple.
To understand task you should know about the threadpool.
ThreadPool is basically help to manage and reuse the free threads. In
other words a threadpool is the collection of background thread.
Simple definition of task can be:
Task work asynchronously manages the the unit of work. In easy words
Task doesn’t create new threads. Instead it efficiently manages the
threads of a threadpool.Tasks are executed by TaskScheduler, which queues tasks onto threads.
Another good point to consider about task is, when you use ThreadPool, you don't have any way to abort or wait on the running threads (unless you do it manually in the method of thread), but using task it is possible. Please correct me if I'm wrong

Categories