Deep understanding of async / await on ASP.NET MVC - c#

I don't understand exactly what is going on behind the scenes when I have an async action on an MVC controller especially when dealing with I/O operations. Let's say I have an upload action:
public async Task<ActionResult> Upload (HttpPostedFileBase file) {
....
await ReadFile(file);
...
}
From what I know these are the basic steps that happen:
A new thread is peeked from threadpool and assigned to handle incomming request.
When await gets hit, if the call is an I/O operation then the original thread gets back into pool and the control is transfered to a so-called IOCP (Input output completion port). What I do not understand is why the request is still alive and waits for an answer because in the end the calling client will wait for our request to complete.
My question is: Who / when / how does this wait for complete blocking occurs?
Note: I saw the blog post There Is No Thread, and it makes sense for GUI applications, but for this server side scenario I don't get it. Really.

There's some good resources on the 'net that do describe this in detail. I wrote an MSDN article that describes this at a high level.
What i do not understand is why the request is still alive and waits for an answer because in the end the calling client will wait for our request to complete.
It's still alive because the ASP.NET runtime has not yet completed it. Completing the request (by sending the response) is an explicit action; it's not like the request will complete on its own. When ASP.NET sees that the controller action returns a Task/Task<T>, it will not complete the request until that task completes.
My question is: Who / when / how does this wait for complete blocking occurs ?
Nothing is waiting.
Think of it this way: ASP.NET has a collection of current requests that it's processing. For a given request, as soon as it's complete, the response is sent out and then that request is removed from the collection.
The key is that it's a collection of requests, not threads. Each of those requests may or may not have a thread working on it at any point in time. Synchronous requests always have a single thread (the same thread). Asynchronous requests may have periods when they don't have threads.
Note: i saw this thread: http://blog.stephencleary.com/2013/11/there-is-no-thread.html and it makes sense for GUI applications but for this server side scenario I don't get it.
The threadless approach to I/O works exactly the same for ASP.NET apps as it does for GUI apps.
Eventually, the file write will complete, which (eventually) completes the task returned from ReadFile. This "completing of the task" work is normally done with a thread pool thread. Since the task is now complete, the Upload action will continue executing, causing that thread to enter the request context (that is, there is now a thread executing that request again). When the Upload method is complete, then the task returned from Upload is complete, and ASP.NET writes out the response and removes the request from its collection.

Under the hood the compiler performs a sleight of hand and transforms your async \ await code into a Task-based code with a callback. In the most simple case:
public async Task X()
{
A();
await B();
C();
}
Gets changed into something like:
public Task X()
{
A();
return B().ContinueWith(()=>{ C(); })
}
So there's no magic - just a lot of Tasks and callbacks. For more complex code the transformations will be more complex too, but in the end the resulting code will be logically equivalent to what you wrote. If you want, you can take one of ILSpy/Reflector/JustDecompile and see for yourself what is compiled "under the hood".
ASP.NET MVC infrastructure in turn is intelligent enough to recognize if your action method is a normal one, or a Task based one, and alter its behavior in turn. Therefore the request doesn't "disappear".
One common misconception is that everything with async spawns another thread. In fact, it's mostly the opposite. At the end of the long chain of the async Task methods there is normally a method which performs some asynchronous IO operation (such as reading from disk or communicating via network), which is a magical thing performed by Windows itself. For the duration of this operation, there is no thread associated with the code at all - it's effectively halted. After the operation completes however, Windows calls back and then a thread from the thread pool is assigned to continue the execution. There's a bit of framework code involved to preserve the HttpContext of the request, but that's all.

The ASP.NET runtime understands what tasks are and delays sending the HTTP response until the task is done. In fact the Task.Result value is needed in order to even generate a response.
The runtime basically does this:
var t = Upload(...);
t.ContinueWith(_ => SendResponse(t));
So when your await is hit both your code and the runtimes code gets off the stack and "there is no thread" at that point. The ContinueWith callback revives the request and sends the response.

Related

How does a thread that launches a blocking I/O request under TPL return immediately?

I would like to preface this question with the following:
I'm familiar with the IAsyncStateMachine implementation that the await keyword in C# generates.
My question is not about the basic flow of control that ensures when you use the async and await keywords.
Assumption A
The default threading behaviour in any threading environment, whether it be at the Windows operating system level or in POSIX systems or in the .NET thread pool, has been that when a thread makes a request for an I/O bound operation, say for a disk read, it issues the request to the disk device driver and enters a waiting state. Of course, I am glossing over the details because they are not of moment to our discussion.
Importantly, that thread can do nothing useful until it is unblocked by an interrupt from the device driver notifying it of completion. During this time, the thread remains on the wait queue and cannot be re-used for any other work.
I would first like a confirmation of the above description.
Assumption B
Secondly, even with the introduction of TPL, and its enhancements done in v4.5 of the .NET framework, and with the language level support for asynchronous operations involving tasks, this default behaviour described in Assumption A has not changed.
Question
Then, I'm at a loss trying to reconcile Assumptions A and B with the claim that suddenly emerged in all TPL literature that:
When the, say, main thread, starts this request for this I/O bound
work, it immediately returns and continues executing the rest of
the queued up messages in the message pump.
Well, what makes that thread return back to do other work? Isn't that thread supposed to be in the waiting state in the wait queue?
You might be tempted to reply that the code in the state machine launches the task awaiter and if the awaiter hasn't completed, the main thread returns.
That beggars the question -- what thread does the awaiter run on?
And the answer that springs up to mind is: whatever the implementation of the method be, of whose task it is awaiting.
That drives us down the rabbit hole further until we reach the last of such implementations that actually delivers the I/O request.
Where is that part of the source code in the .NET framework that changes this underlying fundamental mechanism about how threads work?
Side Note
While some blocking asynchronous methods such as WebClient.DownloadDataTaskAsync, if one were to follow their code
through their (the method's and not one's own) oval tract into their
intestines, one would see that they ultimately either execute the
download synchronously, blocking the current thread if the operation
was requested to be performed synchronously
(Task.RunSynchronously()) or if requested asynchronously, they
offload the blocking I/O bound call to a thread pool thread using the
Asynchronous Programming Model (APM) Begin and End methods.
This surely will cause the main thread to return immediately because
it just offloaded blocking I/O work to a thread pool thread, thereby
adding approximately diddlysquat to the application's scalability.
But this was a case where, within the bowels of the beast, the work
was secretly offloaded to a thread pool thread. In the case of an API
that doesn't do that, say an API that looks like this:
public async Task<string> GetDataAsync()
{
var tcs = new TaskCompletionSource<string>();
// If GetDataInternalAsync makes the network request
// on the same thread as the calling thread, it will block, right?
// How then do they claim that the thread will return immediately?
// If you look inside the state machine, it just asks the TaskAwaiter
// if it completed the task, and if it hasn't it registers a continuation
// and comes back. But that implies that the awaiter is on another thread
// and that thread is happily sleeping until it gets a kick in the butt
// from a wait handle, right?
// So, the only way would be to delegate the making of the request
// to a thread pool thread, in which case, we have not really improved
// scalability but only improved responsiveness of the main/UI thread
var s = await GetDataInternalAsync();
tcs.SetResult(s); // omitting SetException and
// cancellation for the sake of brevity
return tcs.Task;
}
Please be gentle with me if my question appears to be nonsensical. The extent of knowledge of things in almost all matters is limited. I am just learning anything.
When you are talking about an async I/O operation, the truth, as pointed out here by Stephen Cleary (http://blog.stephencleary.com/2013/11/there-is-no-thread.html) is that there is no thread. An async I/O operation is completed at a lower level than the threading model. It generally occurs within interrupt handler routines. Therefore, there is no I/O thread handling the request.
You ask how a thread that launches a blocking I/O request returns immediately. The answer is because an I/O request is not at its core actually blocking. You could block a thread such that you are intentionally saying not to do anything else until that I/O request finishes, but it was never the I/O that was blocking, it was the thread deciding to spin (or possibly yield its time slice).
The thread returns immediately because nothing has to sit there polling or querying the I/O operation. That is the core of true asynchronicity. An I/O request is made, and ultimately the completion bubbles up from an ISR. Yes, this may bubble up into the thread pool to set the task completion, but that happens in a nearly imperceptible amount of time. The work itself never had to be ran on a thread. The request itself may have been issued from a thread, but as it is an asynchronous request, the thread can immediately return.
Let's forget C# for a moment. Lets say I am writing some embedded code and I request data from a SPI bus. I send the request, continue my main loop, and when the SPI data is ready, an ISR is triggered. My main loop resumes immediately precisely because my request is asynchronous. All it has to do is push some data into a shift register and continue on. When data is ready for me to read back, an interrupt triggers. This is not running on a thread. It may interrupt a thread to complete the ISR, but you could not say that it actually ran on that thread. Just because its C#, this process is not ultimately any different.
Similarly, lets say I want to transfer data over USB. I place the data in a DMA location, set a flag to tell the bus to transfer my URB, and then immediately return. When I get a response back it also is moved into memory, an interrupt occurs and sets a flag to let the system know hey, heres a packet of data sitting in a buffer for you.
So once again, I/O is never truly blocking. It could appear to block, but that is not what is happening at the low level. It is higher level processes that may decide that an I/O operation has to happen synchronously with some other code. This is not to say of course that I/O is instant. Just that the CPU is not stuck doing work to service the I/O. It COULD block if implemented that way, and this COULD involve threads. But that is not how async I/O is implemented.

ASP.NET Web API 2 Async action methods with Task.Run performance

I'm trying to benchmark (using Apache bench) a couple of ASP.NET Web API 2.0 endpoints. One of which is synchronous and one async.
[Route("user/{userId}/feeds")]
[HttpGet]
public IEnumerable<NewsFeedItem> GetNewsFeedItemsForUser(string userId)
{
return _newsFeedService.GetNewsFeedItemsForUser(userId);
}
[Route("user/{userId}/feeds/async")]
[HttpGet]
public async Task<IEnumerable<NewsFeedItem>> GetNewsFeedItemsForUserAsync(string userId)
{
return await Task.Run(() => _newsFeedService.GetNewsFeedItemsForUser(userId));
}
After watching Steve Sanderson's presentation I issued the following command ab -n 100 -c 10 http://localhost.... to each endpoint.
I was surprised as the benchmarks for each endpoint seemed to be approximately the same.
Going off what Steve explained I was expecting that the async endpoint would be more performant because it would release thread pool threads back to the thread pool immediately, thus making them available for other requests and improving throughput. But the numbers seem exactly the same.
What am I misunderstanding here?
Using await Task.Run to create "async" WebApi is a bad idea - you will still use a thread, and even from the same thread pool used for requests.
It will lead to some unpleasant moments described in good details here:
Extra (unnecessary) thread switching to the Task.Run thread pool thread. Similarly, when that thread finishes the request, it has to
enter the request context (which is not an actual thread switch but
does have overhead).
Extra (unnecessary) garbage is created. Asynchronous programming is a tradeoff: you get increased responsiveness at the expense of higher
memory usage. In this case, you end up creating more garbage for the
asynchronous operations that is totally unnecessary.
The ASP.NET thread pool heuristics are thrown off by Task.Run “unexpectedly” borrowing a thread pool thread. I don’t have a lot of
experience here, but my gut instinct tells me that the heuristics
should recover well if the unexpected task is really short and would
not handle it as elegantly if the unexpected task lasts more than two
seconds.
ASP.NET is not able to terminate the request early, i.e., if the client disconnects or the request times out. In the synchronous case,
ASP.NET knew the request thread and could abort it. In the
asynchronous case, ASP.NET is not aware that the secondary thread pool
thread is “for” that request. It is possible to fix this by using
cancellation tokens, but that’s outside the scope of this blog post.
Basically, you do not allow any asynchrony to the ASP.NET - you just hide the CPU-bound synchronous code behind the async facade. Async on its own is ideal for I/O bound code, because it allows to utilize CPU (threads) at their top efficiency (no blocking for I/O), but when you have Compute-bound code, you will still have to utilize CPU to the same extent.
And taking into account the additional overhead from Task and context switching you will get even worser results than with simple sync controller methods.
HOW TO MAKE IT TRULY ASYNC:
GetNewsFeedItemsForUser method shall be turned into async.
[Route("user/{userId}/feeds/async")]
[HttpGet]
public async Task<IEnumerable<NewsFeedItem>> GetNewsFeedItemsForUserAsync(string userId)
{
return await _newsFeedService.GetNewsFeedItemsForUser(userId);
}
To do it:
If it is some library method then look for its async variant (if there are none - bad luck, you'll have to search for some competing analogue).
If it is your custom method using file system or database then leverage their async facilities to create async API for the method.

Asynchronous DB-Query to trigger Stored Procedure

I want to performa an asynchronous DB Query in C# that calls a stored procedure for a Backup. Since we use Azure this takes about 2 minutes and we don't want the user to wait that long.
So the idea is to make it asynchronous, so that the task continues to run, after the request.
[HttpPost]
public ActionResult Create(Snapshot snapshot)
{
db.Database.CommandTimeout = 7200;
Task.Run(() => db.Database.ExecuteSqlCommandAsync("EXEC PerformSnapshot #User = '" + CurrentUser.AccountName + "', #Comment = '" + snapshot.Comment + "';"));
this.ShowUserMessage("Your snapshot has been created.");
return this.RedirectToActionImpl("Index", "Snapshots", new System.Web.Routing.RouteValueDictionary());
}
I'm afraid that I haven't understood the concept of asynchronous taks. The query will not be executed (or aborted?), if I don't use the wait statement. But actually "waiting" is the one thing I espacially don't want to do here.
So... why am I forced to use wait here?
Or will the method be started, but killed if the requst is finished?
We don't want the user to wait that long.
async-await won't help you with that. Odd as it may sound, the basic async-await pattern is about implementing synchronous behavior in a non-blocking fashion. It doesn't re-arrange your logical flow; in fact, it goes to great lengths to preserve it. The only thing you've changed by going async here is that you're no longer tying up a thread during that 2-minute database operation, which is a huge win your app's scalability if you have lots of concurrent users, but doesn't speed up an individual request one bit.
I think what you really want is to run the operation as a background job so you can respond to the user immediately. But be careful - there are bad ways to do that in ASP.NET (i.e. Task.Run) and there are good ways.
Dave, you're not forced to use await here. And you're right - from user perspective it still will take 2 minutes. The only difference is that the thread which processes your request can now process other requests meanwhile database does its job. And when database finishes, the thread will continue process your request.
Say you have limited number of threads capable to process HTTP request. This async code will help you to process more requests per time period, but it won't help user to get the job done faster.
This seems to be down to a misunderstanding as to what async and await do.
async does not mean run this on a new thread, in essence it acts as a signal to the compiler to build a state machine, so a method like this:
Task<int> GetMeAnInt()
{
return await myWebService.GetMeAnInt();
}
sort of (cannot stress this enough), gets turned into this:
Task<int> GetMeAnInt()
{
var awaiter = myWebService.GetMeAnInt().GetAwaiter();
awaiter.OnCompletion(() => goto done);
return Task.InProgress;
done:
return awaiter.Result;
}
MSDN has way more information about this, and there's even some code out there explaining how to build your own awaiters.
async and await at their very core just enable you to write code that uses callbacks under the hood, but in a nice way that tells the compiler to do the heavy lifting for you.
If you really want to run something in the background, then you need to use Task:
Task<int> GetMeAnInt()
{
return Task.Run(() => myWebService.GetMeAnInt());
}
OR
Task<int> GetMeAnInt()
{
return Task.Run(async () => await myWebService.GetMeAnInt());
}
The second example uses async and await in the lambda because in this scenario GetMeAnInt on the web service also happens to return Task<int>.
To recap:
async and await just instruct the compiler to do some jiggerypokery
This uses labels and callbacks with goto
Fun fact, this is valid IL but the C# compiler doesn't allow it for your own code, hence why the compiler can get away with the magic but you can't.
async does not mean "run on a background thread"
Task.Run() can be used to queue a threadpool thread to run an arbitrary function
Task.Factory.Start() can be used to grab a brand new thread to run an arbitrary function
await instructs the compiler that this is the point at which the result of the awaiter for the awaitable (e.g. Task) being awaited is required - this is how it knows how to structure the state machine.
As I describe in my MSDN article on async ASP.NET, async is not a silver bullet; it doesn't change the HTTP protocol:
When some developers learn about async and await, they believe it’s a way for the server code to “yield” to the client (for example, the browser). However, async and await on ASP.NET only “yield” to the ASP.NET runtime; the HTTP protocol remains unchanged, and you still have only one response per request.
In your case, you're trying to use a web request to kick off a backend operation and then return to the browser. ASP.NET was not designed to execute backend operations like this; it is only a web tier framework. Having ASP.NET execute work is dangerous because ASP.NET is only aware of work coming in from its requests.
I have an overview of various solutions on my blog. Note that using a plain Task.Run, Task.Factory.StartNew, or ThreadPool.QueueUserWorkItem is extremely dangerous because ASP.NET doesn't know anything about that work. At the very least you should use HostingEnvironment.QueueBackgroundWorkItem so ASP.NET at least knows about the work. But that doesn't guarantee that the work will actually ever complete.
A proper solution is to place the work in a persistent queue and have an independent background worker process that queue. See the Asynchronous Messaging Primer (specifically, your scenario is "Decoupling workloads").

How different async programming is from Threads?

I've been reading some async articles here: http://www.asp.net/web-forms/tutorials/aspnet-45/using-asynchronous-methods-in-aspnet-45 and the author says :
When you’re doing asynchronous work, you’re not always using a thread.
For example, when you make an asynchronous web service request,
ASP.NET will not be using any threads between the async method call
and the await.
So what I am trying to understand is, how does it become async if we don't use any Threads for concurrent execution? What does it mean "you're not always using a thread."?
Let me first explain what I know regarding working with threads (A quick example, of course Threads can be used in different situations other than UI and Worker methodology here)
You have UI Thread to take input, give output.
You can handle things in UI Thread but it makes the UI unresponsive.
So lets say we have a stream-related operation and we need to download some sort of data.
And we also allow users to do other things while it is being downloaded.
We create a new worker thread which downloads the file and changes the progress bar.
Once it is done, there is nothing to do so thread is killed.
We continue from UI thread.
We can either wait for the worker thread in UI thread depending on the situation but before that while the file is being downloaded, we can do other things with UI thread and then wait for the worker thread.
Isn't the same for async programming? If not, what's the difference? I read that async programming uses ThreadPool to pull threads from though.
Threads are not necessary for asynchronous programming.
"Asynchronous" means that the API doesn't block the calling thread. It does not mean that there is another thread that is blocking.
First, consider your UI example, this time using actual asynchronous APIs:
You have UI Thread to take input, give output.
You can handle things in UI Thread but it makes the UI unresponsive.
So lets say we have a stream-related operation and we need to download some sort of data.
And we also allow users to do other things while it is being downloaded.
We use asynchronous APIs to download the file. No worker thread is necessary.
The asynchronous operation reports its progress back to the UI thread (which updates the progress bar), and it also reports its completion to the UI thread (which can respond to it like any other event).
This shows how there can be only one thread involved (the UI thread), yet also have asynchronous operations going on. You can start up multiple asynchronous operations and yet only have one thread involved in those operations - no threads are blocked on them.
async/await provides a very nice syntax for starting an asynchronous operation and then returning, and having the rest of the method continue when that operation completes.
ASP.NET is similar, except it doesn't have a main/UI thread. Instead, it has a "request context" for every incomplete request. ASP.NET threads come from a thread pool, and they enter the "request context" when they work on a request; when they're done, they exit their "request context" and return to the thread pool.
ASP.NET keeps track of incomplete asynchronous operations for each request, so when a thread returns to the thread pool, it checks to see if there are any asynchronous operations in progress for that request; if there are none, then the request is complete.
So, when you await an incomplete asynchronous operation in ASP.NET, the thread will increment that counter and return. ASP.NET knows the request isn't complete because the counter is non-zero, so it doesn't finish the response. The thread returns to the thread pool, and at that point: there are no threads working on that request.
When the asynchronous operation completes, it schedules the remainder of the async method to the request context. ASP.NET grabs one of its handler threads (which may or may not be the same thread that executed the earlier part of the async method), the counter is decremented, and the thread executes the async method.
ASP.NET vNext is slightly different; there's more support for asynchronous handlers throughout the framework. But the general concept is the same.
For more information:
My async/await intro post tries to be both an intro yet also reasonably complete picture of how async and await work.
The official async/await FAQ has lots of great links that go into a lot of detail.
The MSDN magazine article It's All About the SynchronizationContext exposes some of the plumbing underneath.
First time when I saw async and await, I thougth they were C# Syntactic sugar for Asynchronous Programming Model. I was wrong, async and await are more than that. It is a brand new asynchronous pattern Task-based Asynchronous Pattern, http://www.microsoft.com/en-us/download/details.aspx?id=19957 is a good article to get start. Most of the FCL classes which inplement TAP are call APM methods (BegingXXX() and EndXXX()). Here are two code snaps for TAP and AMP:
TAP sample:
static void Main(string[] args)
{
GetResponse();
Console.ReadLine();
}
private static async Task<WebResponse> GetResponse()
{
var webRequest = WebRequest.Create("http://www.google.com");
Task<WebResponse> response = webRequest.GetResponseAsync();
Console.WriteLine(new StreamReader(response.Result.GetResponseStream()).ReadToEnd());
return response.Result;
}
APM sample:
static void Main(string[] args)
{
var webRequest = WebRequest.Create("http://www.google.com");
webRequest.BeginGetResponse(EndResponse, webRequest);
Console.ReadLine();
}
static void EndResponse(IAsyncResult result)
{
var webRequest = (WebRequest) result.AsyncState;
var response = webRequest.EndGetResponse(result);
Console.WriteLine(new StreamReader(response.GetResponseStream()).ReadToEnd());
}
Finally these two will be the same, because GetResponseAsync() call BeginGetResponse() and EndGetResponse() inside. When we reflector the source code of GetResponseAsync(), we will get code like this:
task = Task<WebResponse>.Factory.FromAsync(
new Func<AsyncCallback, object, IAsyncResult>(this.BeginGetResponse),
new Func<IAsyncResult, WebResponse>(this.EndGetResponse), null);
For APM, in the BeginXXX(), there is an argument for a callback method which will invoked when the task (typically is an IO heavy operation) was completed. Creating a new thread and asynchronous, both of them will immediately return in main thread, both of them are unblocked. On performance side, creating new thread will cost more resource when process I/O-bound operations such us read file, database operation and network read. There are two disadvantages in creating new thread,
like in your mentioned article, there are memory cost and CLR are
limitation on thread pool.
Context switch will happen. On the other hander, asynchronous will
not create any thread manually and it will not have context switch
when the the IO-bound operations return.
Here is an picture which can help to understand the differences:
This diagram is from a MSDN article "Asynchronous Pages in ASP.NET 2.0", which explain very detail about how the old asynchronous working in ASP.NET 2.0.
About Asynchronous Programming Model, please get more detail from Jeffrey Richter's article "Implementing the CLR Asynchronous Programming Model", also there are more detail on his book "CLR via Csharp 3rd Edition" in chapter 27.
Let’s imagine that you are implementing a web application and as each client request comes in to
your server, you need to make a database request. When a client request comes in, a thread pool
thread will call into your code. If you now issue a database request synchronously, the thread will block
for an indefinite amount of time waiting for the database to respond with the result. If during this time
another client request comes in, the thread pool will have to create another thread and again this
thread will block when it makes another database request. As more and more client requests come in,
more and more threads are created, and all these threads block waiting for the database to respond.
The result is that your web server is allocating lots of system resources (threads and their memory) that
are barely even used!
And to make matters worse, when the database does reply with the various results, threads become
unblocked and they all start executing. But since you might have lots of threads running and relatively
few CPU cores, Windows has to perform frequent context switches, which hurts performance even
more. This is no way to implement a scalable application.
To read data from the file, I now call ReadAsync instead of Read. ReadAsync internally allocates a
Task object to represent the pending completion of the read operation. Then, ReadAsync
calls Win32’s ReadFile function (#1). ReadFile allocates its IRP, initializes it just like it did in the
synchronous scenario (#2), and then passes it down to the Windows kernel (#3). Windows adds the IRP
to the hard disk driver’s IRP queue (#4), but now, instead of blocking your thread, your thread is
allowed to return to your code; your thread immediately returns from its call to ReadAsync (#5, #6,
and #7). Now, of course, the IRP has not necessarily been processed yet, so you cannot have code after
ReadAsync that attempts to access the bytes in the passed-in Byte[].

Do asynchronous operations in ASP.NET MVC use a thread from ThreadPool on .NET 4

After this question, it makes me comfortable when using async
operations in ASP.NET MVC. So, I wrote two blog posts on that:
My Take on Task-based Asynchronous Programming in C# 5.0 and ASP.NET MVC Web Applications
Asynchronous Database Calls With Task-based Asynchronous Programming Model (TAP) in ASP.NET MVC 4
I have too many misunderstandings in my mind about asynchronous operations on ASP.NET MVC.
I always hear this sentence: Application can scale better if operations run asynchronously
And I heard this kind of sentences a lot as well: if you have a huge volume of traffic, you may be better off not performing your queries asynchronously - consuming 2 extra threads to service one request takes resources away from other incoming requests.
I think those two sentences are inconsistent.
I do not have much information about how threadpool works on ASP.NET but I know that threadpool has a limited size for threads. So, the second sentence has to be related to this issue.
And I would like to know if asynchronous operations in ASP.NET MVC uses a thread from ThreadPool on .NET 4?
For example, when we implement a AsyncController, how does the app structures? If I get huge traffic, is it a good idea to implement AsyncController?
Is there anybody out there who can take this black curtain away in front of my eyes and explain me the deal about asynchrony on ASP.NET MVC 3 (NET 4)?
Edit:
I have read this below document nearly hundreds of times and I understand the main deal but still I have confusion because there are too much inconsistent comment out there.
Using an Asynchronous Controller in ASP.NET MVC
Edit:
Let's assume I have controller action like below (not an implementation of AsyncController though):
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Do an advanced looging here which takes a while
});
return View();
}
As you see here, I fire an operation and forget about it. Then, I return immediately without waiting it be completed.
In this case, does this have to use a thread from threadpool? If so, after it completes, what happens to that thread? Does GC comes in and clean up just after it completes?
Edit:
For the #Darin's answer, here is a sample of async code which talks to database:
public class FooController : AsyncController {
//EF 4.2 DbContext instance
MyContext _context = new MyContext();
public void IndexAsync() {
AsyncManager.OutstandingOperations.Increment(3);
Task<IEnumerable<Foo>>.Factory.StartNew(() => {
return
_context.Foos;
}).ContinueWith(t => {
AsyncManager.Parameters["foos"] = t.Result;
AsyncManager.OutstandingOperations.Decrement();
});
Task<IEnumerable<Bars>>.Factory.StartNew(() => {
return
_context.Bars;
}).ContinueWith(t => {
AsyncManager.Parameters["bars"] = t.Result;
AsyncManager.OutstandingOperations.Decrement();
});
Task<IEnumerable<FooBar>>.Factory.StartNew(() => {
return
_context.FooBars;
}).ContinueWith(t => {
AsyncManager.Parameters["foobars"] = t.Result;
AsyncManager.OutstandingOperations.Decrement();
});
}
public ViewResult IndexCompleted(
IEnumerable<Foo> foos,
IEnumerable<Bar> bars,
IEnumerable<FooBar> foobars) {
//Do the regular stuff and return
}
}
Here's an excellent article I would recommend you reading to better understand asynchronous processing in ASP.NET (which is what asynchronous controllers basically represent).
Let's first consider a standard synchronous action:
public ActionResult Index()
{
// some processing
return View();
}
When a request is made to this action a thread is drawn from the thread pool and the body of this action is executed on this thread. So if the processing inside this action is slow you are blocking this thread for the entire processing, so this thread cannot be reused to process other requests. At the end of the request execution, the thread is returned to the thread pool.
Now let's take an example of the asynchronous pattern:
public void IndexAsync()
{
// perform some processing
}
public ActionResult IndexCompleted(object result)
{
return View();
}
When a request is sent to the Index action, a thread is drawn from the thread pool and the body of the IndexAsync method is executed. Once the body of this method finishes executing, the thread is returned to the thread pool. Then, using the standard AsyncManager.OutstandingOperations, once you signal the completion of the async operation, another thread is drawn from the thread pool and the body of the IndexCompleted action is executed on it and the result rendered to the client.
So what we can see in this pattern is that a single client HTTP request could be executed by two different threads.
Now the interesting part happens inside the IndexAsync method. If you have a blocking operation inside it, you are totally wasting the whole purpose of the asynchronous controllers because you are blocking the worker thread (remember that the body of this action is executed on a thread drawn from the thread pool).
So when can we take real advantage of asynchronous controllers you might ask?
IMHO we can gain most when we have I/O intensive operations (such as database and network calls to remote services). If you have a CPU intensive operation, asynchronous actions won't bring you much benefit.
So why can we gain benefit from I/O intensive operations? Because we could use I/O Completion Ports. IOCP are extremely powerful because you do not consume any threads or resources on the server during the execution of the entire operation.
How do they work?
Suppose that we want to download the contents of a remote web page using the WebClient.DownloadStringAsync method. You call this method which will register an IOCP within the operating system and return immediately. During the processing of the entire request, no threads are consumed on your server. Everything happens on the remote server. This could take lots of time but you don't care as you are not jeopardizing your worker threads. Once a response is received the IOCP is signaled, a thread is drawn from the thread pool and the callback is executed on this thread. But as you can see, during the entire process, we have not monopolized any threads.
The same stands true with methods such as FileStream.BeginRead, SqlCommand.BeginExecute, ...
What about parallelizing multiple database calls? Suppose that you had a synchronous controller action in which you performed 4 blocking database calls in sequence. It's easy to calculate that if each database call takes 200ms, your controller action will take roughly 800ms to execute.
If you don't need to run those calls sequentially, would parallelizing them improve performance?
That's the big question, which is not easy to answer. Maybe yes, maybe no. It will entirely depend on how you implement those database calls. If you use async controllers and I/O Completion Ports as discussed previously you will boost the performance of this controller action and of other actions as well, as you won't be monopolizing worker threads.
On the other hand if you implement them poorly (with a blocking database call performed on a thread from the thread pool), you will basically lower the total time of execution of this action to roughly 200ms but you would have consumed 4 worker threads so you might have degraded the performance of other requests which might become starving because of missing threads in the pool to process them.
So it is very difficult and if you don't feel ready to perform extensive tests on your application, do not implement asynchronous controllers, as chances are that you will do more damage than benefit. Implement them only if you have a reason to do so: for example you have identified that standard synchronous controller actions are a bottleneck to your application (after performing extensive load tests and measurements of course).
Now let's consider your example:
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Do an advanced looging here which takes a while
});
return View();
}
When a request is received for the Index action a thread is drawn from the thread pool to execute its body, but its body only schedules a new task using TPL. So the action execution ends and the thread is returned to the thread pool. Except that, TPL uses threads from the thread pool to perform their processing. So even if the original thread was returned to the thread pool, you have drawn another thread from this pool to execute the body of the task. So you have jeopardized 2 threads from your precious pool.
Now let's consider the following:
public ViewResult Index() {
new Thread(() => {
//Do an advanced looging here which takes a while
}).Start();
return View();
}
In this case we are manually spawning a thread. In this case the execution of the body of the Index action might take slightly longer (because spawning a new thread is more expensive than drawing one from an existing pool). But the execution of the advanced logging operation will be done on a thread which is not part of the pool. So we are not jeopardizing threads from the pool which remain free for serving another requests.
Yes - all threads come from the thread-pool. Your MVC app is already multi-threaded, when a request comes in a new thread will be taken from the pool and used to service the request. That thread will be 'locked' (from other requests) until the request is fully serviced and completed. If there is no thread available in the pool the request will have to wait until one is available.
If you have async controllers they still get a thread from the pool but while servicing the request they can give up the thread, while waiting for something to happen (and that thread can be given to another request) and when the original request needs a thread again it gets one from the pool.
The difference is that if you have a lot of long-running requests (where the thread is waiting for a response from something) you might run out of threads from the the pool to service even basic requests. If you have async controllers, you don't have any more threads but those threads that are waiting are returned to the pool and can service other requests.
A nearly real life example...
Think of it like getting on a bus, there's five people waiting to get on, the first gets on, pays and sits down (the driver serviced their request), you get on (the driver is servicing your request) but you can't find your money; as you fumble in your pockets the driver gives up on you and gets the next two people on (servicing their requests), when you find your money the driver starts dealing with you again (completing your request) - the fifth person has to wait until you are done but the third and fourth people got served while you were half way through getting served. This means that the driver is the one and only thread from the pool and the passengers are the requests. It was too complicated to write how it would work if there was two drivers but you can imagine...
Without an async controller, the passengers behind you would have to wait ages while you looked for your money, meanwhile the bus driver would be doing no work.
So the conclusion is, if lots of people don't know where their money is (i.e. require a long time to respond to something the driver has asked) async controllers could well help throughput of requests, speeding up the process from some. Without an aysnc controller everyone waits until the person in front has been completely dealt with. BUT don't forget that in MVC you have a lot of bus drivers on a single bus so async is not an automatic choice.
There are two concepts at play here. First of all we can make our code run in parallel to execute faster or schedule code on another thread to avoid making the user wait. The example you had
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Do an advanced looging here which takes a while
});
return View();
}
belongs to the second category. The user will get a faster response but the total workload on the server is higher because it has to do the same work + handle the threading.
Another example of this would be:
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Make async web request to twitter with WebClient.DownloadString()
});
Task.Factory.StartNew(() => {
//Make async web request to facebook with WebClient.DownloadString()
});
//wait for both to be ready and merge the results
return View();
}
Because the requests run in parallel the user won't have to wait as long as if they where done in serial. But you should realize that we use up more resources here than if we ran in serial because we run the code at many threads while we have on thread waiting too.
This is perfectly fine in a client scenario. And it is quite common there to wrap synchronous long running code in a new task(run it on another thread) too keep the ui responsive or parallize to make it faster. A thread is still used for the whole duration though. On a server with high load this could backfire because you actually use more resources. This is what people have warned you about
Async controllers in MVC has another goal though. The point here is to avoid having threads sittings around doing nothing(which can hurt scalability). It really only matters if the API's you are calling have async methods. Like WebClient.DowloadStringAsync().
The point is that you can let your thread be returned to handle new requests untill the web request is finished where it will call you callback which gets the same or a new thread and finish the request.
I hope you understand the difference between asynchronous and parallel. Think of parallel code as code where your thread sits around and wait for the result. While asynchronous code is code where you will be notified when the code is done and you can get back working at it, in the meantime the thread can do other work.
Applications can scale better if operations run asynchronously, but only if there are resources available to service the additional operations.
Asynchronous operations ensure that you're never blocking an action because an existing one is in progress. ASP.NET has an asynchronous model that allows multiple requests to execute side-by-side. It would be possible to queue the requests up and processes them FIFO, but this would not scale well when you have hundreds of requests queued up and each request takes 100ms to process.
If you have a huge volume of traffic, you may be better off not performing your queries asynchronously, as there may be no additional resources to service the requests. If there are no spare resources, your requests are forced to queue up, take exponentially longer or outright fail, in which case the asynchronous overhead (mutexes and context-switching operations) isn't giving you anything.
As far as ASP.NET goes, you don't have a choice - it's uses an asynchronous model, because that's what makes sense for the server-client model. If you were to be writing your own code internally that uses an async pattern to attempt to scale better, unless you're trying to manage a resource that's shared between all requests, you won't actually see any improvements because they're already wrapped in an asynchronous process that doesn't block anything else.
Ultimately, it's all subjective until you actually look at what's causing a bottleneck in your system. Sometimes it's obvious where an asynchronous pattern will help (by preventing a queued resource blocking). Ultimately only measuring and analysing a system can indicate where you can gain efficiencies.
Edit:
In your example, the Task.Factory.StartNew call will queue up an operation on the .NET thread-pool. The nature of Thread Pool threads is to be re-used (to avoid the cost of creating/destroying lots of threads). Once the operation completes, the thread is released back to the pool to be re-used by another request (the Garbage Collector doesn't actually get involved unless you created some objects in your operations, in which case they're collected as per normal scoping).
As far as ASP.NET is concerned, there is no special operation here. The ASP.NET request completes without respect to the asynchronous task. The only concern might be if your thread pool is saturated (i.e. there are no threads available to service the request right now and the pool's settings don't allow more threads to be created), in which case the request is blocked waiting to start the task until a pool thread becomes available.
Yes, they use a thread from the thread pool. There is actually a pretty excellent guide from MSDN that will tackle all of your questions and more. I have found it to be quite useful in the past. Check it out!
http://msdn.microsoft.com/en-us/library/ee728598.aspx
Meanwhile, the comments + suggestions that you hear about asynchronous code should be taken with a grain of salt. For starters, just making something async doesn't necessarily make it scale better, and in some cases can make your application scale worse. The other comment you posted about "a huge volume of traffic..." is also only correct in certain contexts. It really depends on what your operations are doing, and how they interact with other parts of the system.
In short, lots of people have lots of opinions about async, but they may not be correct out of context. I'd say focus on your exact problems, and do basic performance testing to see what async controllers, etc. actually handle with your application.
First thing its not MVC but the IIS who maintains the thread pool. So any request which comes to MVC or ASP.NET application is served from threads which are maintained in thread pool. Only with making the app Asynch he invokes this action in a different thread and releases the thread immediately so that other requests can be taken.
I have explained the same with a detail video (http://www.youtube.com/watch?v=wvg13n5V0V0/ "MVC Asynch controllers and thread starvation" ) which shows how thread starvation happens in MVC and how its minimized by using MVC Asynch controllers.I also have measured the request queues using perfmon so that you can see how request queues are decreased for MVC asynch and how its worst for Synch operations.

Categories