I've been reading a lot about async and await, and at first I didn't get it because I didn't properly understand threading or tasks. But after getting to grips with both I wonder: why use async/await if you're comfortable with threads?
The asynchronousy of async/await can be done with Thread signaling, or Thread.Join() etc. Is it merely for time saving coding and "less" hassle?
Yes, it is a syntactic sugar that makes dealing with threads much easier, it also makes the code easier to maintain, because the thread management is done by run-time. await release the thread immediately and allows that thread or another one to pick up where it left off, even if done on the main thread.
Like other abstractions, if you want complete control over the mechanisms under the covers, then you are still free to implement similar logic using thread signaling, etc.
If you are interested in seeing what async/await produces then you can use Reflector or ILSpy to decompile the generated code.
Read What does async & await generate? for a description of what C# 5.0 is doing on your behalf.
If await was just calling Task.Wait we wouldn't need special syntax and new APIs for that. The major difference is that async/await releases the current thread completely while waiting for completion. During an async IO there is no thread involved at all. The IO is just a small data structure inside of the kernel.
async/await uses callback-based waiting under the hood and makes all its nastiness (think of JavaScript callbacks...) go a way.
Note, that async does not just move the work to a background thread (in general). It releases all threads involved.
Comparing async and await with threads is like comparing apples and pipe wrenches. From 10,000 feet they may look similar, but they are very different solutions to very different problems.
async and await are all about asynchronous programming; specifically, allowing a method to pause itself while it's waiting for some operation. When the method pauses, it returns to its caller (usually returning a task, which is completed when the method completes).
I assume you're familiar with threading, which is about managing threads. The closest parallel to a thread in the async world is Task.Run, which starts executing some code on a background thread and returns a task which is completed when that code completes.
async and await were carefully designed to be thread-agnostic. So they work quite well in the UI thread of WPF/Win8/WinForms/Silverlight/WP apps, keeping the UI thread responsive without tying up thread pool resources. They also work quite well in multithreaded scenarios such as ASP.NET.
If you're looking for a good intro to async/await, I wrote up one on my blog which has links to additional recommended reading.
There is a difference between the Threads and async/await feature.
Think about a situation, where you are calling a network to get some data from network. Here the Thread which is calling the Network Driver (probably running in some svchost process) keeps itself blocked, and consumes resources.
In case of Async/await, if the call is not network bound, it wraps the entire call into a callback using SynchronizationContext which is capable of getting callback from external process. This frees the Thread and the Thread will be available for other things to consume.
Asynchrony and Concurrency are two different thing, the former is just calling something in async mode while the later is really cpu bound. Threads are generally better when you need concurrency.
I have written a blog long ago describing these features .
C# 5.0 vNext - New Asynchronous Pattern
async/await does not use threads; that's one of the big advantages. It keeps your application responsive without the added complexity and overhead inherent in threads.
The purpose is to make it easy to keep an application responsive when dealing with long-running, I/O intensive operations. For example, it's great if you have to download a bunch of data from a web site, or read files from disk. Spinning up a new thread (or threads) is overkill in those cases.
The general rule is to use threads via Task.Run when dealing with CPU-bound operations, and async/await when dealing with I/O bound operations.
Stephen Toub has a great blog post on async/await that I recommend you read.
Related
I want to execute a library method that does a blocking I/O operation many times (up to 67840 calls). The library does not provide an async version of the method.
Since in most cases the call just waits for a timeout, I want to run multiple calls in parallel. My method is async, therefore it would be good if I could await the result.
Since the ThreadPool should not be used for blocking operations, I would like to do the following:
Start a number of threads (e.g. 1024)
Run the blocking calls on these threads
await the completion (e. g. via TaskCompletionSource) and process the result of each call in normal Tasks on the TheadPool
Are there existing classes in .NET with which I could achive something like this? I am aware of TaskCreationOptions.LongRunning, but as far as I can see this would create a new thread for each call.
blocking I/O operation... The library does not provide an async version of the method.
Just from this, you know you won't end up with an "ideal" solution. Ideally, I/O is performed asynchronously. In fact, on Windows, all I/O is performed asynchronously at the OS level, with each synchronous API call just blocking the current thread until that asynchronous operation completes.
So, the first thing you should accept is that you'll need to bend the rules a little.
Since in most cases the call just waits for a timeout, I want to run multiple calls in parallel.
Yes. Parallelism is an appropriate solution. If it were possible to do the I/O asynchronously, then parallelism would not be the appropriate solution, but since the I/O is blocking (and you have no control over that), then parallelism is the best solution you're left with.
My method is async, therefore it would be good if I could await the result.
This doesn't necessarily follow. It's acceptable for asynchronous methods to be partially blocking, as long as that's clearly documented. The asynchronous signature (i.e., "returns a Task" and has an *Async suffix) implies that the method may be asynchronous, not that it must be asynchronous.
Personally, I prefer not to do thread offloading in my logic methods, and only do it when calling them from the UI layer (link to my blog).
Since the ThreadPool should not be used for blocking operations
Well, this is one of those rules you can consider bending. The thread pool does work just fine with blocking operations, and in fact it's my first suggested solution.
Start a number of threads (e.g. 1024)... Run the blocking calls on these threads
If you toss out the "I want my own threads" part and just use the thread pool, then the answer is quite simple: Parallel or PLINQ would work quite nicely. You can set a maximum level of parallelism for both of these approaches, and you can set a larger than normal minimum thread count on the thread pool to scale up the number of threads more quickly if you want.
This does toss a lot of blocking work on the thread pool, which is generally not recommended but can work in some scenarios. Specifically, client applications like console apps or GUI apps would work fine with this. If this is in a web app, though, then you would not want to fill up the thread pool with blocking calls. In that case, I'd actually recommend splitting up the scanning to a separate app using a basic distributed architecture (link to my blog).
await the completion (e. g. via TaskCompletionSource) and process the result of each call in normal Tasks on the TheadPool
If you want to do the parallel work on a separate thread, then you can wrap it in await Task.Run(...); mucking around with TCS isn't necessary.
I'm really confused about async-awaits, pools and threads. The main problem starts with this question: "What can I do when I have to handle 10k socket I/O?" (aka The C10k Problem).
First, I tried to make a custom pooling architecture with threads
that uses one main Queue and multiple Threads to process all
incoming datas. It was a great experience about understanding
thread-safety and multi-threading but thread is an overkill
with async-await nowadays.
Later, I implemented a simple architecture with async-await but I
can't understand why "The async and await keywords don't cause
additional threads to be created." (from MSDN)? I think there
must be some threads to do jobs like BackgroundWorker.
Finally, I implemented another architecture with ThreadPool and it
looks like my first custom pooling.
Now, I think there should be someone else with me who confused about handling The C10k. My project is a dedicated (central) server for my game project that is hub/lobby server like MCSG's lobbies or COD's matchmaking servers. I'll do the login operations, game server command executions/queries and information serving (like version, patch).
Last part might be more specific about my project but I really need some good suggestions about real world solutions about multiple (heavy) data handling.
(Also yes, 1k-10k-100k connection handling depending on server hardware but this is a general question)
The key point: Choosing Between the Task Parallel Library and the ThreadPool (MSDN Blog)
[ADDITIONAL] Good (basic) things to read who wants to understand what are we talking about:
Threads
Async, Await
ThreadPool
BackgroundWorker
async/await is roughly analogous to the "Serve many clients with each thread, and use asynchronous I/O and completion notification" approach in your referenced article.
While async and await by themselves do not cause any additional threads, they will make use of thread pool threads if an async method resumes on a thread pool context. Note that the async interaction with ThreadPool is highly optimized; it is very doubtful that you can use Thread or ThreadPool to get the same performance (with a reasonable time for development).
If you can, I'd recommend using an existing protocol - e.g., SignalR. This will greatly simplify your code, since there are many (many) pitfalls to writing your own TCP/IP protocol. SignalR can be self-hosted or hosted on ASP.NET.
No. If we use asynchronous programming pattern that .NET introduced in 4.5, in most of the cases we need not to create manual thread by us. The compiler does the difficult work that the developer used to do. Creating a new thread is costly, it takes time. Unless we need to control a thread, then “Task-based Asynchronous Pattern (TAP)” and “Task Parallel Library (TPL)” is good enough for asynchronous and parallel programming. TAP and TPL uses Task. In general Task uses the thread from ThreadPool(A thread pool is a collection of threads already created and maintained by .NET framework. If we use Task, most of the cases we need not to use thread pool directly. A thread can do many more useful things. You can read more about Thread Pooling
You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. Asynchrony is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
Await is specifically designed to deal with something taking time, most typically an I/O request. Which traditionally was done with a callback when the I/O request was complete. Writing code that relies on these callbacks is quite difficult, await greatly simplifies it. Await just takes care of dealing with the delay, it doesn't otherwise do anything that a thread does. The await expression, what's at the right of the await keyword, is what gets the job done. You can use Async with any method that returns a Task. The XxxxAsync() methods are just precooked ones in the .NET framework for common operations that take time. Like downloading data from a web server.
I would recommend you to read Asynchronous Programming with Async and Await
As per my understanding async/await will use ThreadPool thread for performing asynchronous operation and we prefer Threadpool thread when operation will be done within shorter span of time, so threadpool threads will be free early.
So if we use async/await or Task for downloading huge amount of data, then whether it will impact on application performance since threadpool thread will not be free early and Threadpool will have to create new thread(which is expensive operation).
One more thing, if async/await is not preferable in above scenario, what should be alternative to download huge amount of data?? Should we create new thread explicitly.
Please share your thought and thanks in advance.....:):)
Async IO does not use threads while it runs. That's the point.
Async IO does not make an IO faster. It only changes the way it is started and completed. It will gain you zero performance for your big file download.
Some correction, According to the document and as I have explained ThreadPools do not have the overhead of creating threads. Hence it provide the advantage of avoiding "thread creating overhead and thread disposing".
Quoted :
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are put in queue until they can be serviced as threads become available.
So yes, having MANY downlaods simultaniously COULD outnumber available number of threads in ThreadPool
Finally your main question : Yes async/await is a good solution for a file download. A good tutorial i used sometime back.
You should absolutely use async/await in this case. Using async/await does not block the calling thread so it does not cause creation of new threads.
The async and await keywords don't cause additional threads to be created. Async methods don't require multithreading because an async method doesn't run on its own thread.
And you are asking about IO operation, using async/await is a perfect fit for this:
The async-based approach to asynchronous programming is preferable to
existing approaches in almost every case. In particular, this approach
is better than BackgroundWorker for IO-bound operations because the
code is simpler and you don't have to guard against race conditions.
The MSDN article has more details.
I've been reading a lot lately about this topic and , still I need to clarify something
The whole idea with asynchronous methods is Thread economy :
Allow many tasks to run on a few threads. this is done by using the hardware driver to do the job while releasing the thread back to the thread-pool so it can server other jobs.
please notice .
I'm not talking about asynchronous delegates which ties another thread (execute a task in parallel with the caller).
However I've seen 2 main types of asynchronous methods examples :
Code samples (from books) who only uses existing I/O asynchronous operations as beginXXX / endXX e.g. Stream.BeginRead.
And I couldn't find any asynchronous methods samples which don't use existing .net I/O operations e.g. Stream.BeginRead )
Code samples like this (and this). which doesnt actually invoking an asynchronous operation (although the author thinks he is - but he actually causes a thread to block !)
Question :
Does asynchronous methods are used only with .net I/O existing methods like BeginXXX , EndXXX ?
I mean , If I want to create my own asynchronous methods like BeginMyDelay(int ms,...){..} , EndMyDelay(...). I couldn't done it without tie a blocked thread to it....correct?
Thank you very much.
p.s. please notice this question is tagged as .net 4 and not .net4.5
You're talking about APM.
APM widely uses OS concept, known as IO Completion ports. That's why different IO operations are the best candidates to use APM.
You could write your own APM methods.
But, in fact, these methods will be either over existing APM methods, or they will be IO-bound, and will use some native OS mechanism (like FilesStream, which uses overlapped file IO).
For compute-bound asynchronous operations APM only will increase complexity, IMO.
A bit more clarification.
Work with hardware is asynchronous by its nature. Hardware needs a time to perform request - newtork card must send or receive data, HDD must read/write etc. If IO is synchronous, thread, which was generated IO request, is waiting for response. And here APM helps - you shouldn't wait, just execute something else, and when IO will be complete, I'll call you, says APM.
The main point - operation is performing outside of CPU.
When you're writing any compute-bound operation, which will use CPU for it execution without any IO, there's nothing to wait here. So, APM coludn't help - if you need CPU, you need thread - you need thread pool.
I think, but I'm not sure, that you can create your own asynchronous methods. For example creating a new thread and wait for it to finish some work (db query, ...).
In term of overall system performance probably it is not useful, as you say you just create another thread. But for example if you work on IIS, the original request thread can be used for other requests while you are waiting for the 'background' operation.
I think that IIS has a fixed number of threads (thread pool), so in this case can be useful.
I mean , If I want to create my own asynchronous methods like
BeginMyDelay(int ms,...){..} , EndMyDelay(...). I couldn't done it
without tie a blocked thread to it....correct?
While I've not dug into the implementation of async, I can't see any reason why one couldn't do this.
The simplest way would be to use existing libraries that help [e.g. timers] or some sort of event system IIRC.
However even if you don't want to use any library helpers then you're stuck with a problem... the 'blocked thread'.
Sure the code does look something like this:
while (true){
foreach (var item in WaitingTasks)
if (item.Ready())
/*fire item, and remove it from tasks*/;
/*Some blocking action*/
}
Thing is - 'Some blocking action' doesn't have to be 'blocking'. You could yield/sleep the thread, or use it to process some data. For example, the Unity Game Engine does a similar thing with Coroutines - where the same thread that processes all the code also checks to see if various coroutines [that have been delayed due to time] need to be updated. Replace /*Some blocking action*/ with ProcessGameLoop().
Hoe that helps, feel free to ask questions/post corrections etc.
I am looking for an appropriate pattern and best modern way to solve the following problem:
My application is expecting inputs from multiple sources, for example: GUI, monitoring file-system, voice command, web request, etc. When an input is received I need to send it to some ProcessInput(InputData arg) method that would start processing the data in the background, without blocking the application to receive and process more data, and in some way return some results whenever the processing is complete. Depending on the input, the processing can take significantly different amounts of time. For starters I don't need the ability to check the progress or cancel the processing.
After reading a dozen of articles on MSDN and blogposts of some rock-star programmers I am really confused what pattern should be used here, and more importantly which features of .NET
My findings are:
ThreadPool.QueueUserWorkItem - easiest to understand, not very convinient about returning the results
BackgroundWorker - seems to be used only only for rather simple tasks, all workers run on single thread?
Event-based Asynchronous Pattern
Tasks in Task Parallel Library
C# 5 async/await - these seem to be shortcuts for Tasks from Task Parallel
Notes:
Performance is important, so taking advantage of multi-core system when possible would be really nice.
This is not a web application.
My problem reminds me of a TCP server(really any sort of server) where application is constantly listening for new connections/data on multiple sockets, I found the article Asynchronous Server Socket and I am curious if that pattern could be a possible solution for me.
My application is expecting inputs from multiple sources, for example: GUI, monitoring file-system, voice command, web request, etc.
I've done a whole lot of asynchronous programming in my time. I find it useful to distinguish between background operations and asynchronous events. A "background operation" is something that you initiate, and some time later it completes. An "asynchronous event" is something that's always going on independent of your program; you can subscribe, receive the events for a time, and then unsubscribe.
So, GUI inputs and file-system monitoring would be examples of asynchronous events; whereas web requests are background operations. Background operations can also be split into CPU-bound (e.g., processing some input in a pipeline) and I/O-bound (e.g., web request).
I make this distinction especially in .NET because different approaches have different strengths and weaknesses. When doing your evaluations, you also need to take into consideration how errors are propogated.
First, the options you've already found:
ThreadPool.QueueUserWorkItem - almost the worst option around. It can only handle background operations (no events), and doesn't handle I/O-bound operations well. Returning results and errors are both manual.
BackgroundWorker (BGW) - not the worst, but definitely not the best. It also only handles background operations (no events), and doesn't handle I/O-bound operations well. Each BGW runs in its own thread - which is bad, because you can't take advantage of the work-stealing self-balancing nature of the thread pool. Furthermore, the completion notifications are (usually) all queued to a single thread, which can cause a bottleneck in very busy systems.
Event-Based Asynchronous Pattern (EAP) - This is the first option from your list that would support asynchronous events as well as background operations, and it also can efficiently handle I/O-bound operations. However, it's somewhat difficult to program correctly, and it has the same problem as BGW where completion notifications are (usually) all queued to a single thread. (Note that BGW is the EAP applied to CPU-bound background operations). I wrote a library to help in writing EAP components, along with some EAP-based sockets. But I do not recommend this approach; there are better options available these days.
Tasks in Task Parallel Library - Task is the best option for background operations, both CPU-bound and I/O-bound. I review several background operation options on my blog - but that blog post does not address asychronous events at all.
C# 5 async/await - These allow a more natural expression of Task-based background operations. They also offer an easy way to synchronize back to the caller's context if you want to (useful for UI-initiated operations).
Of these options, async/await are the easiest to use, with Task a close second. The problem with those is that they were designed for background operations and not asynchronous events.
Any asynchronous event source may be consumed using asynchronous operations (e.g., Task) as long as you have a sufficient buffer for those events. When you have a buffer, you can just restart the asynchronous operation each time it completes. Some buffers are provided by the OS (e.g., sockets have read buffers, UI windows have message queues, etc), but you may have to provide other buffers yourself.
Having said that, here's my recommendations:
Task-based Asynchronous Pattern (TAP) - using either await/async or Task directly, use TAP to model at least your background operations.
TPL Dataflow (part of VS Async) - allows you to set up "pipelines" for data to travel through. Dataflow is based on Tasks. The disadvantage to Dataflow is that it's still developing and (IMO) not as stable as the rest of the Async support.
Reactive Extensions (Rx) - this is the only option that is specifically designed for asynchronous events, not just background operations. It's officially released (unlike VS Async and Dataflow), but the learning curve is steeper.
All three of these options are efficient (using the thread pool for any actual processing), and they all have well-defined semantics for error handling and results. I do recommend using TAP as much as possible; those parts can then easily be integrated into Dataflow or Rx.
You mentioned "voice commands" as one possible input source. You may be interested in a BuildWindows video where Stephen Toub sings -- and uses Dataflow to harmonize his voice in near-realtime. (Stephen Toub is one of the geniuses behind TPL, Dataflow, and Async).
IMO using a thread pool is the way to go WRT processing the input. Take a look at http://smartthreadpool.codeplex.com. It provides a very nice API (using generics) for waiting on results. You could use this in conjunction with Asynchronous Server Socket implementation. It may also be worth your while to take a look at Jeff Richter's Power Threading Lib: http://www.wintellect.com/Resources/Downloads
I am by no means expert in theese matters but I did some research on the subject recently and I'm very pleased with results achieved with MS TPL library. Tasks give you a nice wrapper around ThreadPool threads and are optimized for parallel processing so they ensure more performance. If you are able to use .NET 4.0 for your project, you should probably explore using tasks. They represent more advanced way of dealing with async operations and provide a nice way to cancel operations in progress using CancellationToken objects.
Here is the short example of accessing UI thread from different thread using tasks:
private void TaskUse()
{
var task = new Task<string>(() =>
{
Thread.Sleep(5000);
return "5 seconds passed!";
});
task.ContinueWith((tResult) =>
{
TestTextBox.Text = tResult.Result;
}, TaskScheduler.FromCurrentSynchronizationContext());
task.Start();
}
From previous example you can see how easy is to synchronize with UI thread with using TaskScheduler.FromCurrentSynchronizationContext(), assuming you call this method from UI thread. Tasks also provide optimizations for blocking operations like scenarios where you need to wait for service response and such by providing TaskCreationOptions.LongRunning enum value in Task constructor. This will assure that specified operation doesn't block processor core since maximum number of active tasks is determined by number of present processor cores.