I want to clarify the difference between two these abbreviations: TPL(a task parallel library) and TAP (task async pattern).
AFAIU, TPL - is a task parallel library and the main part of this library is Task and all related staff. So, it's like a technology which was implemented by Microsoft.
TAP - it's a pattern which underlies to async/await syntax sugar. And which is based on callback function + state machine + SynchronizationContext logic.
Is there something to add or correct?
TPL is a part of the BCL. It includes Task as well as several other parallelism-related higher-level abstractions including Parallel and Parallel LINQ. The focus of TPL was parallel processing, and using tasks as futures - while supported - was a relatively unused feature.
TAP is a pattern. It's called "Task-based" because it reused the Task type from the TPL as a generic Future type. Task (and related types) were enhanced to include more primitives to support TAP and asynchronous programming (e.g., GetAwaiter(), Task.WhenAll, etc). These days, TAP also works with "tasklikes" including ValueTask. TAP is focused on asynchronous programming as opposed to parallel processing.
Related
I'm writing a web scraper in C#. I implemented it using Thread objects that take work from a pool of tasks and process all those elements against a callback and store the result. Pool of tasks means 5K - 50K input urls.
Is there any core framework object that can deal with something like this? I tried to see if the Threadpool can be instantiated but it can't. I'm also very unsure if such a hight number of tasks can/should be queued into the default Threadpool.
So, is there anything already available in the core for creating a large number of tasks and a number of threads and have those threads process those tasks? Or should I just stick to my own. I've already reinvented a few wheels since I've been using C#.
Yes, there is.
http://msdn.microsoft.com/en-us/library/dd460717%28v=vs.110%29.aspx
Task Parallel Library (TPL)
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Tasks are queued onto the Thread pool by default using a TaskScheduler. The scheduler works to promote efficient use of available threads and processors.
You may also be interested in the Dataflow API that sits on top of it.
http://msdn.microsoft.com/en-us/library/hh228603%28v=vs.110%29.aspx
Dataflow
This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available.
Currently, from what I've researched, there are 3 ways to work with socket asynchronously:
.Net 4.5 Async example: Using .Net 4.5 Async Feature for Socket Programming (second post)
[...]Async: http://msdn.microsoft.com/en-us/library/system.net.sockets.socketasynceventargs.aspx
Begin[...]: http://msdn.microsoft.com/en-us/library/5w7b7x5f(v=vs.110).aspx
I am very confused with all the options .Net provides for working with asynchronous sockets. Why should I use one or the other? What better choice to have performance with thousands of simultaneous connections?
Methods using SocketAsyncEventArgs most closely match the underlying Windows technology (I/O Completion Ports). They are essentially a bare-metal wrapper designed to perform zero allocation and extract the highest performance at the cost of a less friendly API. This has a disadvantage of more tightly coupled code as it doesn't implement any standard Stream API. The other async socket methods all wrap this one.
Methods using a Begin/End pair are using what's called the Asynchronous Programming Model (APM). APM is the original async model of .NET. It's very easy to write spaghetti code if you use it half-cocked, but it's functional and fairly simple to use once you have some experience with it. They shouldn't see much use in modern .NET, though, because we've got something far easier and better performing:
Methods returning a Task are using the Task-based Asynchronous Pattern (TAP). Tasks are a pure upgrade to APM: they're more flexible, easier to compose, and should generally have equal or better performance. When combined with language-integrated async/await, you can write code that performs great and is significantly easier to understand and maintain.
tl;dr use Task methods, unless you've got a requirement of extreme perf. Then use SocketAsyncEventArgs methods. Don't use APM methods.
What better choice to have performance with thousands of simultaneous
connections?
...
A curiosity regarding the Begin[...]. If I have a MMORPG server where
one connection interacting with each other for position update,
animation, effects (basic MMORPG mechanism), in numbers, which would
be "heavily loaded servers"? 200~300 simultaneous connections?
On the server side, you may benefit equally well from using any asynchronous socket APIs, either Begin/End-style APM ones, event-based EAP ones or Task-based TAP ones. That's because you'll be blocking fewer threads, as opposed to using the synchronous APIs. So, more thread will be available to concurrently serve other incoming requests to your server, thus increasing its scalability.
Most likely, your won't see any performance advantage of using TAP socket APIs over their APM or EAP analogues. However, the TAP API pattern is so much easier to develop with than APM or EAP. When used with async/await, it produces shorter, more readable and less error-prone code. You get natural pseudo-linear code flow, which is not otherwise possible with APM callbacks or EAP event handlers. If you're unable find a proper Task-based socket API, you can always make one yourself from a Begin/End APM API with Task.FromAsync (or from an EAP API, check "A reusable pattern to convert event into task").
When it comes to a client side UI app, the scalability is not that important, but there's another benefit from the TAP pattern. With little efforts, it helps making your UI responsive, because you won't be blocking the UI thread (what usually happens while waiting for the result of a synchronous call). This is not specific to Task-based Socket API, it applies to any Task-based API, e.g, Task.Delay() or Stream.ReadAsync().
For some good reading materials on asynchronous programming in C#, check the async/await tag wiki:
https://stackoverflow.com/tags/async-await/info
If you have the chance of using .NET 4.5 and async/await, I totally recommend it.
Basically there are these ways of doing multithreading in .NET:
Thread.
ThreadPool.QueueWorkItem.
XXXAsync method and the XXXCompleted event.
BeginXXX and EndXXX methods.
Task Parallel Library.
async/await
The first one are raw threads, an option you should avoid because creating threads is a expensive operation. The rest, are just different ways of using the ThreadPool, that is a tool responsible of maintain a collection of threads that can be used to schedule your tasks, yielding a better performance than the first option.
The use different syntax's, but at to me, the most clear is async/await. I have created recently a WebSocket connector using sockets and asyn/await and the performance is quite good. Technically, async/await are not giving you a performance boost, but the clarity in the code will allow you to streamline the approach of your application, and that may give a good performance boost in comparison with a messy code based on continuations.
First, you might want to check out this article on MSDN about what the differences between the various async programming mechanisms in .NET are.
Begin[…] was the first async socket implementation, using APM (Asynchronous Programming Model). It takes a callback as one of its arguments. While somewhat dated compared to newer methods, this works fine if you don't mind dealing with callbacks and the messy code they can create. There's also some extra overhead associated with this because of the state object, and on heavily loaded servers this can start to become a problem.
[…]Async uses the newer event based model, and is also a lighter implementation to help deal with the high traffic issues Begin[…] has. This way works nicely, but can also result in messy code if you aren't careful. Oh yea, there's a bug you can read about here, though it's likely something you won't care about unless you're building a very performant piece of software.
Task based asynchronous programming (TPL) is the newest mechanism and, with the help of the async/await keywords, can have most (if not all) of the efficiency associated with […]Async while offering much easier to understand code. Also, with Tasks, it's much easier to wait on multiple operations to finish at a time. It's important note that, while there are several native .NET functions that implement TPL and return a Task, there isn't yet one for Socket operations. There are examples of how to do this online, but it requires a bit of extra work.
I found Difference between […]Async and Begin[…] .net asynchronous APIs question but this answer confused me a little bit.
Talking about these patterns, Stephen said:
Most *Async methods (with corresponding *Completed events) are using the Event-Based Asynchronous Pattern. The older (but still perfectly valid) Begin* and End* is a pattern called the Asynchronous Programming Model.
The Socket class is an exception to this rule; its *Async methods do not have any corresponding events; it's essentially just APM done in a way to avoid excessive memory allocations.
I get it as using *Async methods are more efficient, at least when it comes to sockets.
But then he mentioned Task Parallel Library:
However, both APM and EBAP are being replaced with a much more flexible approach based on the Task Parallel Library. Since the TPL can wrap APMs easily, older classes will likely not be updated directly; extension methods are used to provide Task equivalents for the old APM methods.
I found TPL and Traditional .NET Asynchronous Programming on MSDN, I know the basics of TPL, creating tasks, cancellations, continuations, etc but I still fail to understand these:
What are the advantages of Asynchronous Programming Model (APM) and Event-based Asynchronous Pattern (EAP) compared to each other? How does TPL can wrap APMs easily mean that both APM and EAP are being replaced with TPL?
And most importantly: Which should I use in socket programming;
APM?
EAP?
APM or EAP wrapped by a Task?
TPL by using the blocking methods of Socket class in tasks?
Other?
How does TPL can wrap APMs easily mean that both APM and EAP are being replaced with TPL?
It doesn't. Wether APM and EAP will be replaced by TAP (Task Asynchronous Pattern) or not in new APIs has nothing to do with this. I would expect TAP to replace APM and EAP for a variety of reasons. The main reason to me is that the code you write for using the TAP composes much better. Doing .ContinueWith(/* ... */).ContinueWith(/* ... */) generally reads much better than the corresponding code you would need to write to chain async calls through Begin/End methods, even if you don't take into account the options you can pass to ContinueWith to determine if the continuation should run. The TPL also provides various combinators for Tasks, such as WaitAll and WaitAny, that can make some scenarios much easier. The language support coming in C# and VB.NET via the async/await keywords will make this even easier.
Being able to wrap APMs in the TAP makes it easier to switch to this pattern because it means you don't have to rewrite existing code to make it fit in the new model.
Which should I use in socket programming?
I would recommend using the TAP wrapping the APM methods on Socket. Unless you can prove that the extra overhead of wrapping the Begin/End methods into a Task are the difference between scalable/fast enough or not, I would take advantage the ease of coding of the TAP.
Gideon had a great answer; I just wanted to provide some more background:
What are the advantages of Asynchronous Programming Model (APM) and Event-based Asynchronous Pattern (EAP) compared to each other?
APM is more common and it has a pretty strictly-defined pattern. e.g., the TPL has generic wrappers for APM methods (TaskFactory.FromAsync), but it can't do the same for EAP because EAP isn't as strictly-defined.
EAP has one great advantage: the event callbacks handle thread marshaling for you. So they're really nice for, e.g., basic background operations for a UI (BackgroundWorker).
TAP combines the best of both worlds: automatic thread marshaling by default and a strictly-defined, common pattern. It also has a nice object representation for an asynchronous operation (Task).
How does "TPL can wrap APMs easily" mean that "both APM and EAP are being replaced with TPL"?
It doesn't.
"However, both APM and EBAP are being replaced with a much more flexible approach based on the Task Parallel Library." - meaning that new code doesn't need to include APM/EAP methods/events; new code should include TAP methods instead.
"Since the TPL can wrap APMs easily, older classes will likely not be updated directly; extension methods are used to provide Task equivalents for the old APM methods." - meaning that you can add TAP methods to an existing APM type using TaskFactory.FromAsync; I figured the TPL team would take this approach rather than modifying a ton of classes in the BCL. However, I was wrong in this conjecture. For performance reasons, the BCL/TPL team did review the entire framework and added TAP methods directly to .NET classes instead of using extension methods. The new TAP methods are included in .NET 4.5, coming soon...
I am learning asynchronous programming using C# and I usually use BeginInvoke, but I am not very sure about the other methods of creating asynchronous application.
I have asked a question about this,see below link for more details:
How to return T value from BeginInvoke?
In above link, Gravell said that there are four models of asynchronous development
There's at least 4, then - a regular callback (non-APM, non-EAP) is also not uncommon
But Overflow said that there are three:
There are 3 models of asynchronous development in .NET
APM - (BeginXXX / EndXXX) which you are using here, when the long running task completes, it calls back into your code in the EndXXX method
EAP - Event based. In this model, when the long running task completes, an event is raised to inform your code.
TPL - New in .NET 4, this is the Task-based version. It looks most like synchronous programming to client code, using a fluent interface. Its calls back to your code using ContinueWith.
Anyone can help me on this?
I have searched google.com a lot, but actually they are using BeginInvoke most. thanks for your help.
Thread.Start - brutal
delegate.BeginInvoke/EndInvoke - 'old' standard
ThreadPool.QueueUserWorkItem - smart
TaskFactory.StartNew - the only way to do it correct (according to Patterns of parallel programming book | i recommend you to read it first for disambiguation)
There's a lot that can be caught in the term "asynchronous development."
For one, you could want to execute code on a background thread. I recently updated a blog post of mine contrasting several common approaches to executing code in the background. Here's the list, in order from most desirable to least:
Task (as used by async/await).
Task (as used by the Task Parallel Library).
BackgroundWorker.
Delegate.BeginInvoke.
ThreadPool.QueueUserWorkItem.
Thread
On another hand, you could want to represent an asynchronous operation (which may or may not be actual code executing on a background thread). In that case, there are several approaches, in order from most desirable to least:
Task (in the style of the Task-based Asynchronous Pattern (TAP))
IAsyncResult with Begin*/End* methods (which has the unfortunate name Asynchronous Programming Model (APM)).
A component written using the Event-based Asynchronous Pattern (EAP).
(As a side note, BackgroundWorker is EAP, and Delegate.BeginInvoke is APM).
On another hand, you could mean asynchronous programming in general, which can be interpreted to mean a reactive approach. In this case, there are only two approaches that I know of:
Reactive Extensions (Rx).
Event-based Asynchronous Pattern (EAP).
However, you could make a case that any event-driven program is reactive to some extent, so just handling UI events is a (simple) form of "asynchronous programming."
Also, these are only the common models. Any platform or library can add more. Here's some off the top of my head:
The Socket class has a special form of APM that can be used to minimize memory allocations. It works very similarly to APM but does not fit the pattern.
The WinRT runtime (coming in Windows 8) has its own representations of asynchronous operations (IAsyncOperation<TResult> and IAsyncInfo).
Windows Phone has specific support for a background agent, which permits you to run code in the background even if your app isn't currently running.
It will most certainly be useful to learn the methods Mikant described for asynchronous development. Just wanted to give you a heads up though that C# 5.0 is completely redesigning how the language deals with async. This will be its main theme along with introducing two new keywords, async and await. You simply call await on a long-running task and it will begin the task and return control to the calling method. Once the task is complete it proceeds with the rest of the code.
Here is an excellent video for the full details of its usage and explanation. It not only describes the old way of performing async operations but a complete review of the new style. It makes writing async applications a ton easier and much more readable with a natural flow.
This is the future of C# async behavior so well worth learning.
http://channel9.msdn.com/events/PDC/PDC10/FT09/
I am looking for an appropriate pattern and best modern way to solve the following problem:
My application is expecting inputs from multiple sources, for example: GUI, monitoring file-system, voice command, web request, etc. When an input is received I need to send it to some ProcessInput(InputData arg) method that would start processing the data in the background, without blocking the application to receive and process more data, and in some way return some results whenever the processing is complete. Depending on the input, the processing can take significantly different amounts of time. For starters I don't need the ability to check the progress or cancel the processing.
After reading a dozen of articles on MSDN and blogposts of some rock-star programmers I am really confused what pattern should be used here, and more importantly which features of .NET
My findings are:
ThreadPool.QueueUserWorkItem - easiest to understand, not very convinient about returning the results
BackgroundWorker - seems to be used only only for rather simple tasks, all workers run on single thread?
Event-based Asynchronous Pattern
Tasks in Task Parallel Library
C# 5 async/await - these seem to be shortcuts for Tasks from Task Parallel
Notes:
Performance is important, so taking advantage of multi-core system when possible would be really nice.
This is not a web application.
My problem reminds me of a TCP server(really any sort of server) where application is constantly listening for new connections/data on multiple sockets, I found the article Asynchronous Server Socket and I am curious if that pattern could be a possible solution for me.
My application is expecting inputs from multiple sources, for example: GUI, monitoring file-system, voice command, web request, etc.
I've done a whole lot of asynchronous programming in my time. I find it useful to distinguish between background operations and asynchronous events. A "background operation" is something that you initiate, and some time later it completes. An "asynchronous event" is something that's always going on independent of your program; you can subscribe, receive the events for a time, and then unsubscribe.
So, GUI inputs and file-system monitoring would be examples of asynchronous events; whereas web requests are background operations. Background operations can also be split into CPU-bound (e.g., processing some input in a pipeline) and I/O-bound (e.g., web request).
I make this distinction especially in .NET because different approaches have different strengths and weaknesses. When doing your evaluations, you also need to take into consideration how errors are propogated.
First, the options you've already found:
ThreadPool.QueueUserWorkItem - almost the worst option around. It can only handle background operations (no events), and doesn't handle I/O-bound operations well. Returning results and errors are both manual.
BackgroundWorker (BGW) - not the worst, but definitely not the best. It also only handles background operations (no events), and doesn't handle I/O-bound operations well. Each BGW runs in its own thread - which is bad, because you can't take advantage of the work-stealing self-balancing nature of the thread pool. Furthermore, the completion notifications are (usually) all queued to a single thread, which can cause a bottleneck in very busy systems.
Event-Based Asynchronous Pattern (EAP) - This is the first option from your list that would support asynchronous events as well as background operations, and it also can efficiently handle I/O-bound operations. However, it's somewhat difficult to program correctly, and it has the same problem as BGW where completion notifications are (usually) all queued to a single thread. (Note that BGW is the EAP applied to CPU-bound background operations). I wrote a library to help in writing EAP components, along with some EAP-based sockets. But I do not recommend this approach; there are better options available these days.
Tasks in Task Parallel Library - Task is the best option for background operations, both CPU-bound and I/O-bound. I review several background operation options on my blog - but that blog post does not address asychronous events at all.
C# 5 async/await - These allow a more natural expression of Task-based background operations. They also offer an easy way to synchronize back to the caller's context if you want to (useful for UI-initiated operations).
Of these options, async/await are the easiest to use, with Task a close second. The problem with those is that they were designed for background operations and not asynchronous events.
Any asynchronous event source may be consumed using asynchronous operations (e.g., Task) as long as you have a sufficient buffer for those events. When you have a buffer, you can just restart the asynchronous operation each time it completes. Some buffers are provided by the OS (e.g., sockets have read buffers, UI windows have message queues, etc), but you may have to provide other buffers yourself.
Having said that, here's my recommendations:
Task-based Asynchronous Pattern (TAP) - using either await/async or Task directly, use TAP to model at least your background operations.
TPL Dataflow (part of VS Async) - allows you to set up "pipelines" for data to travel through. Dataflow is based on Tasks. The disadvantage to Dataflow is that it's still developing and (IMO) not as stable as the rest of the Async support.
Reactive Extensions (Rx) - this is the only option that is specifically designed for asynchronous events, not just background operations. It's officially released (unlike VS Async and Dataflow), but the learning curve is steeper.
All three of these options are efficient (using the thread pool for any actual processing), and they all have well-defined semantics for error handling and results. I do recommend using TAP as much as possible; those parts can then easily be integrated into Dataflow or Rx.
You mentioned "voice commands" as one possible input source. You may be interested in a BuildWindows video where Stephen Toub sings -- and uses Dataflow to harmonize his voice in near-realtime. (Stephen Toub is one of the geniuses behind TPL, Dataflow, and Async).
IMO using a thread pool is the way to go WRT processing the input. Take a look at http://smartthreadpool.codeplex.com. It provides a very nice API (using generics) for waiting on results. You could use this in conjunction with Asynchronous Server Socket implementation. It may also be worth your while to take a look at Jeff Richter's Power Threading Lib: http://www.wintellect.com/Resources/Downloads
I am by no means expert in theese matters but I did some research on the subject recently and I'm very pleased with results achieved with MS TPL library. Tasks give you a nice wrapper around ThreadPool threads and are optimized for parallel processing so they ensure more performance. If you are able to use .NET 4.0 for your project, you should probably explore using tasks. They represent more advanced way of dealing with async operations and provide a nice way to cancel operations in progress using CancellationToken objects.
Here is the short example of accessing UI thread from different thread using tasks:
private void TaskUse()
{
var task = new Task<string>(() =>
{
Thread.Sleep(5000);
return "5 seconds passed!";
});
task.ContinueWith((tResult) =>
{
TestTextBox.Text = tResult.Result;
}, TaskScheduler.FromCurrentSynchronizationContext());
task.Start();
}
From previous example you can see how easy is to synchronize with UI thread with using TaskScheduler.FromCurrentSynchronizationContext(), assuming you call this method from UI thread. Tasks also provide optimizations for blocking operations like scenarios where you need to wait for service response and such by providing TaskCreationOptions.LongRunning enum value in Task constructor. This will assure that specified operation doesn't block processor core since maximum number of active tasks is determined by number of present processor cores.