I have a dot net core 5 console app which is processing around 100,000+ messages per min from rabbitmq
When a message is received from rabbitmq, a thread goes off and crunches some numbers, however one of those operations is to call an external API to get information about its location.
When this external API service slows down and response times go up, I see thread starvation and thread count on windows task manager can get into the 1000's and the app basically slows to doing nothing
When the app loads the main thread establishes a connection to rabbitmq and subscribes to new messages arriving in the rabbitmq, and every time a message arrives, my console app consumes each message, and starts a threadpool item and continues getting new rabbitmq messages
private void Consumer_Received(object sender, BasicDeliverEventArgs deliveryArgs)
{
var data = Encoding.UTF8.GetString(deliveryArgs.Body.ToArray());
ThreadPool.QueueUserWorkItem(new WaitCallback(StartProcessing), data);
}
If I put a breakpoint, this void keeps being hit and a new threadpool process calls the StartProcessing void which is where the cpu crunching happens and the external api call
public void StartProcessing(object xdata)
{
//1. crunch cpu
//2. call external API
}
Each message is processed in around 100ms for the cpu stuff, but the external API is taking between 80-500ms on a normal day, but when there are issues (possibly network) it can take upto 10 secs to respond to 1 request, this is when the app starts to break.
My question is surrounding this implementation and how stop thread starvation.
This is a high throughput multithreaded app and it needs to process as many messages as possible.
The app needs to relieve back pressure when the external API is slow to respond and its constantly context switching threads.
Is using ThreadPool.QueueUserWorkItem the correct implementation or should I be using Async await etc?
I'm also open to hearing if this is a bad implementation and if there is another pattern I should be using for this.
//////////////////////////////////
UPDATE 1
//////////////////////////////////
So i changed the code to use async task and its super slow to get messages from rabbitmq
The old code got all messages (200,000) within a few seconds, the new code got through about 1,000 in a few minutes
the new code is
private void Consumer_Received(object sender, BasicDeliverEventArgs deliveryArgs)
{
StartProcessing(deliveryArgs.Body.ToArray()).ConfigureAwait(false);
}
public static async Task<bool> StartProcessing(ReadOnlyMemory<byte> data)
{
await Task.Run(() =>
{
ReadOnlySpan<byte> xdata = data.Span; //defensiveCopy of in memory pointer
//do stuff
}).ConfigureAwait(false);
return true;
}
I is there something wrong with my implentation?
The "StartProcessing" code should be fire and forgot really as the main thread should continue to the next message in rabbitmq
I seems like its waiting for the message to process before continuing
//////////////////////////////////
It sounds like this is the exact scenario asynchronous functions where made for.
If you are using the CPU using background threads will help you a bit, but only up to however many hardware threads you have.
But it sounds like you are mostly blocking on network IO. Using a thread that is just blocked until some kind of IO responds is quite wasteful since each thread consumes some resources. And it can easily result in problems like maxing out the thread pool.
By now, .Net and many libraries have been updated to provide true asynchronous functions for IO. This releases the thread to do other stuff instead of blocking, and when the IO is done it will schedule the remaining work on a new background thread. And using async/await lets you write the code more or less as you would for regular synchronous code, letting the compiler rewrite it to a state machine to deal with the complicated issue of maintaining state. Ideally, you should not need to more threads than the number of hardware threads you have, since each thread should be doing actual work.
Keep in mind that just because there is a async method returning a task it does not necessarily mean it is truly asynchronous. Some base classes/interfaces, like stream, have been extended with asynchronous versions. And some library vendors, rather than doing the work of providing an actual asynchronous implementation, just wraps the synchronous method, providing no real benefit.
For example:
private async void Consumer_Received(...)
{
try{
var result = await Task.Run(()=> MyCpuBoundWork());
await MyNetworkCall(result);
}
catch{
// handle exceptions
}
}
As a message is received this will use another background thread to do the CPU bound work. I'm not sure how rabbitMq generates messages, the Task.Run part is only needed if it uses a single thread for all messages. After the CPU bound is done it will continue with the network call.
Related
I know this question might be a bit trivial, but all the answers I find on the internet leave me confused.
I'm aware with basic principles of how async/await works (how await asynchroniously waits for the task to complete not blocking the main thread),
but I don't understand its real benefit, because it seems to me everything you do with async/await you can do using Task Paralel Library.
Please consider this example, to better understand what I mean:
Let's say I have a SuperComplexMethod that returns some value and I would like to execute it in parallel, meanwhile doing some other things. Normally I would do it this way:
internal class Program
{
private static void Main()
{
//I will start a task first that will run asynchroniously
var task = Task.Run(() => SuperComplexMethod());
//Then I will be doing some other work, and then get the result when I need it
Console.WriteLine("Doing some other work...");
var result = task.Result;
}
static string SuperComplexMethod()
{
Console.WriteLine("Doing very complex calculations...");
Thread.Sleep(3000);
return "Some result";
}
}
Here how I would have to do it using async/await:
internal class Program
{
private static void Main()
{
var task = SuperComplexMethodAsync();
Console.WriteLine("Doing some other work...");
var result = task.Result;
}
//I have to create this async wrapper that can wait for the task to complete
async static Task<string> SuperComplexMethodAsync()
{
return await Task.Run(() => SuperComplexMethod());
}
static string SuperComplexMethod()
{
Console.WriteLine("Doing very complex calculations...");
Thread.Sleep(3000);
return "Some result";
}
}
As you can see in the second example in order to use async/await approach, I have to create a wrapper method that starts a task and asynchronously waits for it to complete. Obviously it seems redundant to me, because I can achieve the very same behavior without using this wrapper marked async/await.
Can you please explain me what is so special about async/await, and what actual benefits it provides over using tools of Task Parallel Library alone?
Arguably the main reason to use async/await is thread sparing. Imagine the following scenario (I'll simplify to make the point): a) you have a web application that has 10 threads available to process incoming requests; b) all requests involve I/O (e.g. connecting to a remote database, connecting to upstream network services via HTTP/SOAP) to process/complete; c) each request takes 2 seconds to process.
Now imagine 20 requests arrive at about the same time. Without async/await, your web app would start to process the first 10 requests. While this is happening the other 10 would just sit in the queue for 2 seconds, with your web app out of threads and hence unable to process them. Only when the first 10 complete would the second 10 begin to be processed.
Under async/await, the first 10 requests would instead begin tasks, and, while awaiting those tasks, the threads that were processing them would be returned to the web app to process other requests. So your web app would begin processing the second 10 almost straight away, rather than waiting. As each of the awaited tasks from the first 10 completes, the web app would continue processing the rest of their methods, either on a thread-pool thread or one of the web app's threads (which it is depends on how you configure things). We can usually expect in an I/O scenario that the I/O is by far the bulk of the duration of the call, so we can make a reasonable assumption that in the above scenario, the network/database call might take 1.9s and the rest of the code (adapting DTOs, some business logic, etc.) might take 0.1s. If we assume the continuation (after the await) is processed by a web app thread, that thread is now only tied up for 0.1 of the 2 seconds, instead of the full 2 seconds in the non async/await scenario.
You might naturally think: well I've just pushed the threads out of one pool of threads and into another, and that will eventually fill up too. To understand why this isn't really true in practise in truly async scenarios, you need to read There Is No Thread.
The upshot is that you are now able to concurrently process many more requests than you have threads available to process them.
You'll notice the above is focused on I/O, and that's really where async/await shines. If your web app instead processed requests by performing complex mathematical calculations using the CPU, you would not see the above benefit, hence why async/await is not really suited for nor intended for use with CPU-bound activities.
Before others jump in with all the exceptions to the rules (and there are some), I'm only presenting a vanilla simplified scenario to show the value of async/await in I/O-bound scenarios. Covering everything about async/await would create a very long answer (and this one is long enough already!)
I should also add that there are other ways to process web requests asynchronously, ways that pre-date async/await, but async/await very significantly simplifies the implementation.
--
Moving briefly to say a WinForms or similar app, the scenario is very similar, except now you really only have one thread available to process UI requests, and any time you hold onto that thread, the UI will be unresponsive, so you can use a similar approach to move long-running operations off the UI thread. In the UI scenario, it becomes more reasonable to perform CPU-bound operations off the UI thread as well. When doing this, a thread pool thread will instead perform that CPU work, freeing up the UI thread to keep the UI responsive. Now there is a thread, but at least it's not the UI one. This is generally called "offloading", which is one of the other primary uses for async/await.
--
Your example is a console app - there's often not a lot to be gained in that context, except for the ability to fairly easily (arguably more easily than creating your own threads) execute several requests concurrently on the thread pool.
When using async and await the compiler generates a state machine in the background
public async Task MyMethodAsync()
{
Task<int> longRunningTask = LongRunningOperationAsync();
// independent work which doesn't need the result of LongRunningOperationAsync can be done here
//and now we call await on the task
int result = await longRunningTask;
//use the result
Console.WriteLine(result);
}
public async Task<int> LongRunningOperationAsync() // assume we return an int from this long running operation
{
await Task.Delay(1000); // 1 second delay
return 1;
}
so what happens here:
Task longRunningTask = LongRunningOperationAsync(); starts
executing LongRunningOperation
Independent work is done on let's assume the Main Thread (Thread ID
= 1) then await long running task is reached.
Now, if the longRunningTask hasn't finished and it is still running, MyMethodAsync() will return to its calling method, thus the main thread doesn't get blocked. When the longRunningTask is done then a thread from the ThreadPool (can be any thread) will return to MyMethodAsync() in its previous context and continue execution (in this case printing the result to the console).
A second case would be that the longRunningTask has already finished its execution and the result is available. When reaching the await longRunningTask we already have the result so the code will continue executing on the very same thread. (in this case printing result to console). Of course this is not the case for the above example, where there's a Task.Delay(1000) involved.
For More ... Refer :=>
Async/Await - Best Practices
Simplifying Asynchronous
I have a windows service that is responsible for listening JMS messages. I am giving a simplified version of implementation details. As messages arrive they are handed over for processing to a different Task (thread) and limit a max number of tasks with the help of BlockingCollection. There is a retry mechanism in place to retry until the processing is successful with some amount of delay between each retry or max retry attempts are exhausted. The reason for retry mechanism is to cope with issues in Legacy applications that consume these messages. Legacy systems are built using Pessimistic locking and sometimes the processing of message runs into errors, which eventually goes thru after few retry attempts. Due to cost benefit analysis, it was decided not to address the issues in Legacy systems as those applications will be replaced in 2 to 3 years.
This retry mechanism runs on the same task thread that is responsible for handling the processing of message. Initially I used Thread.Sleep to introduce delay between each retry attempt. It worked, but when I try to shutdown the windows service, it is taking longer if there are messages currently being processed and waiting to be retried.
I then went on an adventure of implementing a way to cancel the waiting mechanism if a shutdown event was triggered.
I used two different approaches.
Option #1
One using ManualResetEvent and when I have to wait I have following code in place (posting only relevant code blocks)
private readonly ManualResetEvent _lockEvent = new ManualResetEvent(false);
if (_lockEvent.WaitOne(TimeSpan.FromMilliseconds(120000)))
{
Log.Info($"Thread interrupted. Retrying will resume after windows service restarts for message id {messageId}");
return;
}
When a shutdown event occurs, I cancel the cancellationTokenSource and set the ManualResetEvent. Everything appears to do what I want. Its just that I have to do two operations so that any code that depends on CancellationToken know to gracefully cancel and also gracefully break the retry waiting.
_subscriberCancellationTokenSource.Cancel();
_lockEvent.Set();
Option #2
After upgrading to .Net 4.6, I started using Task type wherever I can. I realized, I could use Task to implement a delay as well, so here is a simplified version of code that I tried
private void WaitBeforeRetrying(CancellationToken cancellationToken)
{
var waitingTask = Task.Delay(120000, cancellationToken);
waitingTask.Wait(cancellationToken);
}
Where ever I need delay, I just invoke the method by passing a CancellationToken
WaitBeforeRetrying(SubscriberCancellationToken);
When a shutdown event occurs, I simply invoke cancel on CancellationTokenSource and everything shuts down gracefully.
_subscriberCancellationTokenSource.Cancel();
Both Option 1 and Option 2 appears to be doing the job.
Are there any drawbacks for Option 2 over Option 1? Any other better option than what I have so far? Really appreciate any input.
UPDATE
After reading the comments from #EricLippert, I understood what I was doing wrong. Most of my threads were going into a waiting state instead of actually doing any productive work. It was the result of sprinkling few asynchronous calls in the synchronous workflow.
I now modified my delay method as follows
private async Task WaitBeforeRetrying(CancellationToken cancellationToken)
{
await Task.Delay(120000, cancellationToken);
}
And I invoke it as
await WaitBeforeRetrying(SubscriberCancellationToken);
And then refactored rest of the code to propagate async mechanism all the way to the top layer. It not only helped to easily cancel the delay if I don't have to wait, but also prevented the threads to be in a blocked state unnecessarily. Really appreciate every ones feedback.
I don't think they're any different. The effect of both is that the thread is blocked until the time runs out.
If you're using this in ASP.NET, then blocking threads is not a good thing. In that case, you can make your method async and use await Task.Delay. That'll resume the code after the delay, but allow the thread to work on other things in the mean time.
private async Task WaitBeforeRetrying(CancellationToken cancellationToken)
{
await Task.Delay(120000, cancellationToken);
}
I have a fairly simple Azure Worker Process application with a main thread that calls a number of functions.
Each function basically grabs some data via an API call to an external site, manipulates the data, then pushes it somewhere else via an API call. Each function calls a different external site.
The code looks like this:
public partial class WorkerRole : RoleEntryPoint
{
public override void Run()
{
Trace.TraceInformation("Entry point called", "Information");
while (true)
{
Function1();
Function2();
// etc etc, there are nearly a hundred functions
}
Thread.Sleep(Timeout.Infinite);
}
}
I have no experience in using the Async properties, and very little with Azure itself. Each function takes an hour or so to run, so my 100 functions take 100 hours. How can I change the function to run async? Is it all within .NET, or do I need to programatically spin up additional Azure processes and call different functions to them?
Thanks for your help.
The worker role is your application entry point, the interface between your code and the host code. The Run method is how Azure gives you a thread to run your application code. If you need to run lots of work in parallel, you can use normal .NET threading techniques to do so:
Create a new System.Threading.Thread to run each task on. Good for long-running operations where you will not yield the thread.
Create tasks using Task.Factory.StartNew to have them scheduled for execution on the Thread Pool. Good if you have short-running tasks, or tasks which regularly yield the thread.
Unless you have 100 cores available, trying to make 100 tasks execute simultaneously will cause a lot of thread-switching overhead and may result in substantially slower performance than you could reach if you were to queue up the activities to work over a smaller number of thread.
If you say you have 100 things which take 1 hour each, are they all doing CPU-central activities for that full hour? If, like most applications, they're spending most of their time making requests of other resources and waiting for results, you should try and take advantage of Asynchronous programming techniques, like using Task with async and await to yield the thread often.
Yielding a thread is the best way to increase your system throughput - it's able to do other work whilst you wait for the task to complete. Your CPU cores are kept busy and all the work can proceed more quickly.
I have read documenation and many tutorials on TPL but none covers model I want to achieve.
There were always fixed number of iterations for some algorithm.
I need constantly running threads (as many as possible):
while(true)
get data from MAIN thread
perform heavy time-consuming task (in separate thread)
update MAIN thread information
Additionaly I need mechanism which will be able to set alarm clock (e.g. 5 seconds). After five seconds all work must be suspended for a while and then resumed.
Should I use Task.ContinueWith the same task? But I am not processing result of previous task launch, but instead I update data structure in MAIN Thread and then decide what will be the input of new task iteration...
How can I leave to TPL decision how many task should be created for best efficiency?
No I am using BackgroundWorkers, becase they have nice RunEventCompleted event - inside it I am on my main thread so I can update my MAIN structure, check time constraints and then eventually call StartAsync again on the BackgroundWorker which completed. It is nice and clear, but probably very inneficient.
I need to make it highly efficient on multi-processor, multi-core servers.
One problem is that computation is always online, never stops. There is some networking also, which enables to ask remotely of current state of MAIN structure.
Second problem is critical time control (I must have precise timer - when it stops which no thread can be restarted). Then comes special high priority task after it ends, all work is resumed.
Third problem is that there is no upper bound for operations to do.
These three constraints, from what I observed, do not go along TPL well - I can't use something like Parallel.For because the collection is modified by results of task itself in realtime...
I don't know also how to combine:
ability to let TPL decide how many threads should be created
with sort of lifetime runing of threads (with pauses and synchronization points between consecutive restarts)
creating threads only once at the begining (they should be only restarted with constantly new parameters)
Can someone give me clues?
I know how to do it bad, inefficent way. There are some small requirements which I described, which prevent me from doing this right. I am a little bit confused.
You need to use messaging + actors + a scheduler imo. And then you need to use a language capable for it. Have a look at this code that asynchronously receives from Azure Service Bus, enqueues in a shared queue and manages runtime state through an actor.
Inline:
Should I use Task.ContinueWith the same task?
No, ContinueWith will get your program killed based on exception handling inside of each continuation passing; there's no good way in TPL to marshal failed state into the call-side/main thread.
But I am not processing result of previous task launch, but
instead I update data structure in
MAIN Thread and then decide what will be the input of new task
iteration...
You need to move beyond threading for this, unless you're willing to spend A LOT of time on the problem.
How can I leave to TPL decision how many task should be created for
best efficiency?
That's handled by the framework that runs your async workflows.
No I am using BackgroundWorkers, becase they have nice
RunEventCompleted event - inside it I am on my main thread so I can
update my MAIN structure, check time constraints and then eventually
call StartAsync again on the BackgroundWorker which completed. It is
nice and clear, but probably very inneficient. I need to make it
highly efficient on multi-processor, multi-core servers.
One problem is that computation is always online, never stops. There
is some networking also, which enables to ask remotely of current
state of MAIN structure. Second problem is critical time control (I
must have precise timer - when it stops which no thread can be
restarted).
If you run everything asynchronously, you can pass messages to your actor that suspends it. You scheduling actor is responsible for calling all its subscribers with their schedulled messages; have a look at the paused state in the code linked. If you have outstanding requests you can pass them a cancellation token and handle a 'hard' cancellation/socket abort that way.
Then comes special high priority task after it ends, all
work is resumed. These two constraints, from what I observed, do not
go along TPL well - I can't use something like Parallel.For because
the collection is modified by results of task itself in realtime...
You probably need a pattern called pipes-and-filters. You pipe your input into a chain of workers (actors); each worker consumes from the other worker's output. Signalling is done using a control channel (in my case that is the inbox of the actor).
I think you should read
MSDN: How to implement a producer / consumer dataflow pattern
I had the same problem: one producer produced items, while several consumers consumed them and decided to send them to other consumers. Each consumer was working asynchronously and independent from other consumers.
Your main task is the producer. He produces items that your other tasks should process. The class with the code of your main task has a function:
public async Task ProduceOutputAsync(...)
Your main program starts this Task using:
var producerTask = Task.Run( () => MyProducer.ProduceOutputAsync(...)
Once this is called the producer task starts producing output. Meanwhile your main program can continue doing other things, like for instance start the consumers.
But let's first focus on the Producer task.
The producer task produces items of type T to be processed by other tasks. They are carried over to the other task using objects that implement ITargetBlock'.
Every time the producer task has finished creating an object of type T it sends it to the target block using ITargetBlock.Post, or preferably the async version:
while (continueProducing())
{
T product = await CreateProduct(...)
bool accepted = await this.TargetBlock(product)
// process the return value
}
// if here, nothing to produce anymore. Notify the consumers:
this.TargetBlock.Complete();
The producer needs an ITargetBlock<T>. In my application a BufferBlock<T> was enough. Check MSDN for the other possible targets.
Anyway, the data flow block should also implement ISourceBlock<T>. Your receiver waits for input to arrive at the source, fetches it and processes it. Once finished, it can send the result to its own target block, and wait for the next input until there is no input expected anymore. Of course if your consumer doesn't produce output it doesn't have to send anything to a target.
Waiting for input is done as follows:
ISourceBlock`<T`> mySource = ...;
while (await mySource.ReceiveAsync())
{ // a object of type T is available at the source
T objectToProcess = await mySource.ReceiveAsync();
// keep in mind that someone else might have fetched your object
// so only process it if you've got it.
if (objectToProcess != null)
{
await ProcessAsync(objectToProcess);
// if your processing produces output send the output to your target:
var myOutput = await ProduceOutput(objectToprocess);
await myTarget.SendAsync(myOutput);
}
}
// if here, no input expected anymore, notify my consumers:
myTarget.Complete();
construct your producer
construct all consumers
give the producer a BufferBlock to send its output to
Start the producer MyProducer.ProduceOutputAsync(...)
While the producer produces output and sends it to the buffer block:
give the consumers the same BufferBlock
Start the consumers as a separate task
await Task.WhenAll(...) to wait for all tasks to complete.
Each consumer will stop as soon as it hears that no input is expected anymore.
After all tasks have completed your main function can read the results and return
After this question, it makes me comfortable when using async
operations in ASP.NET MVC. So, I wrote two blog posts on that:
My Take on Task-based Asynchronous Programming in C# 5.0 and ASP.NET MVC Web Applications
Asynchronous Database Calls With Task-based Asynchronous Programming Model (TAP) in ASP.NET MVC 4
I have too many misunderstandings in my mind about asynchronous operations on ASP.NET MVC.
I always hear this sentence: Application can scale better if operations run asynchronously
And I heard this kind of sentences a lot as well: if you have a huge volume of traffic, you may be better off not performing your queries asynchronously - consuming 2 extra threads to service one request takes resources away from other incoming requests.
I think those two sentences are inconsistent.
I do not have much information about how threadpool works on ASP.NET but I know that threadpool has a limited size for threads. So, the second sentence has to be related to this issue.
And I would like to know if asynchronous operations in ASP.NET MVC uses a thread from ThreadPool on .NET 4?
For example, when we implement a AsyncController, how does the app structures? If I get huge traffic, is it a good idea to implement AsyncController?
Is there anybody out there who can take this black curtain away in front of my eyes and explain me the deal about asynchrony on ASP.NET MVC 3 (NET 4)?
Edit:
I have read this below document nearly hundreds of times and I understand the main deal but still I have confusion because there are too much inconsistent comment out there.
Using an Asynchronous Controller in ASP.NET MVC
Edit:
Let's assume I have controller action like below (not an implementation of AsyncController though):
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Do an advanced looging here which takes a while
});
return View();
}
As you see here, I fire an operation and forget about it. Then, I return immediately without waiting it be completed.
In this case, does this have to use a thread from threadpool? If so, after it completes, what happens to that thread? Does GC comes in and clean up just after it completes?
Edit:
For the #Darin's answer, here is a sample of async code which talks to database:
public class FooController : AsyncController {
//EF 4.2 DbContext instance
MyContext _context = new MyContext();
public void IndexAsync() {
AsyncManager.OutstandingOperations.Increment(3);
Task<IEnumerable<Foo>>.Factory.StartNew(() => {
return
_context.Foos;
}).ContinueWith(t => {
AsyncManager.Parameters["foos"] = t.Result;
AsyncManager.OutstandingOperations.Decrement();
});
Task<IEnumerable<Bars>>.Factory.StartNew(() => {
return
_context.Bars;
}).ContinueWith(t => {
AsyncManager.Parameters["bars"] = t.Result;
AsyncManager.OutstandingOperations.Decrement();
});
Task<IEnumerable<FooBar>>.Factory.StartNew(() => {
return
_context.FooBars;
}).ContinueWith(t => {
AsyncManager.Parameters["foobars"] = t.Result;
AsyncManager.OutstandingOperations.Decrement();
});
}
public ViewResult IndexCompleted(
IEnumerable<Foo> foos,
IEnumerable<Bar> bars,
IEnumerable<FooBar> foobars) {
//Do the regular stuff and return
}
}
Here's an excellent article I would recommend you reading to better understand asynchronous processing in ASP.NET (which is what asynchronous controllers basically represent).
Let's first consider a standard synchronous action:
public ActionResult Index()
{
// some processing
return View();
}
When a request is made to this action a thread is drawn from the thread pool and the body of this action is executed on this thread. So if the processing inside this action is slow you are blocking this thread for the entire processing, so this thread cannot be reused to process other requests. At the end of the request execution, the thread is returned to the thread pool.
Now let's take an example of the asynchronous pattern:
public void IndexAsync()
{
// perform some processing
}
public ActionResult IndexCompleted(object result)
{
return View();
}
When a request is sent to the Index action, a thread is drawn from the thread pool and the body of the IndexAsync method is executed. Once the body of this method finishes executing, the thread is returned to the thread pool. Then, using the standard AsyncManager.OutstandingOperations, once you signal the completion of the async operation, another thread is drawn from the thread pool and the body of the IndexCompleted action is executed on it and the result rendered to the client.
So what we can see in this pattern is that a single client HTTP request could be executed by two different threads.
Now the interesting part happens inside the IndexAsync method. If you have a blocking operation inside it, you are totally wasting the whole purpose of the asynchronous controllers because you are blocking the worker thread (remember that the body of this action is executed on a thread drawn from the thread pool).
So when can we take real advantage of asynchronous controllers you might ask?
IMHO we can gain most when we have I/O intensive operations (such as database and network calls to remote services). If you have a CPU intensive operation, asynchronous actions won't bring you much benefit.
So why can we gain benefit from I/O intensive operations? Because we could use I/O Completion Ports. IOCP are extremely powerful because you do not consume any threads or resources on the server during the execution of the entire operation.
How do they work?
Suppose that we want to download the contents of a remote web page using the WebClient.DownloadStringAsync method. You call this method which will register an IOCP within the operating system and return immediately. During the processing of the entire request, no threads are consumed on your server. Everything happens on the remote server. This could take lots of time but you don't care as you are not jeopardizing your worker threads. Once a response is received the IOCP is signaled, a thread is drawn from the thread pool and the callback is executed on this thread. But as you can see, during the entire process, we have not monopolized any threads.
The same stands true with methods such as FileStream.BeginRead, SqlCommand.BeginExecute, ...
What about parallelizing multiple database calls? Suppose that you had a synchronous controller action in which you performed 4 blocking database calls in sequence. It's easy to calculate that if each database call takes 200ms, your controller action will take roughly 800ms to execute.
If you don't need to run those calls sequentially, would parallelizing them improve performance?
That's the big question, which is not easy to answer. Maybe yes, maybe no. It will entirely depend on how you implement those database calls. If you use async controllers and I/O Completion Ports as discussed previously you will boost the performance of this controller action and of other actions as well, as you won't be monopolizing worker threads.
On the other hand if you implement them poorly (with a blocking database call performed on a thread from the thread pool), you will basically lower the total time of execution of this action to roughly 200ms but you would have consumed 4 worker threads so you might have degraded the performance of other requests which might become starving because of missing threads in the pool to process them.
So it is very difficult and if you don't feel ready to perform extensive tests on your application, do not implement asynchronous controllers, as chances are that you will do more damage than benefit. Implement them only if you have a reason to do so: for example you have identified that standard synchronous controller actions are a bottleneck to your application (after performing extensive load tests and measurements of course).
Now let's consider your example:
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Do an advanced looging here which takes a while
});
return View();
}
When a request is received for the Index action a thread is drawn from the thread pool to execute its body, but its body only schedules a new task using TPL. So the action execution ends and the thread is returned to the thread pool. Except that, TPL uses threads from the thread pool to perform their processing. So even if the original thread was returned to the thread pool, you have drawn another thread from this pool to execute the body of the task. So you have jeopardized 2 threads from your precious pool.
Now let's consider the following:
public ViewResult Index() {
new Thread(() => {
//Do an advanced looging here which takes a while
}).Start();
return View();
}
In this case we are manually spawning a thread. In this case the execution of the body of the Index action might take slightly longer (because spawning a new thread is more expensive than drawing one from an existing pool). But the execution of the advanced logging operation will be done on a thread which is not part of the pool. So we are not jeopardizing threads from the pool which remain free for serving another requests.
Yes - all threads come from the thread-pool. Your MVC app is already multi-threaded, when a request comes in a new thread will be taken from the pool and used to service the request. That thread will be 'locked' (from other requests) until the request is fully serviced and completed. If there is no thread available in the pool the request will have to wait until one is available.
If you have async controllers they still get a thread from the pool but while servicing the request they can give up the thread, while waiting for something to happen (and that thread can be given to another request) and when the original request needs a thread again it gets one from the pool.
The difference is that if you have a lot of long-running requests (where the thread is waiting for a response from something) you might run out of threads from the the pool to service even basic requests. If you have async controllers, you don't have any more threads but those threads that are waiting are returned to the pool and can service other requests.
A nearly real life example...
Think of it like getting on a bus, there's five people waiting to get on, the first gets on, pays and sits down (the driver serviced their request), you get on (the driver is servicing your request) but you can't find your money; as you fumble in your pockets the driver gives up on you and gets the next two people on (servicing their requests), when you find your money the driver starts dealing with you again (completing your request) - the fifth person has to wait until you are done but the third and fourth people got served while you were half way through getting served. This means that the driver is the one and only thread from the pool and the passengers are the requests. It was too complicated to write how it would work if there was two drivers but you can imagine...
Without an async controller, the passengers behind you would have to wait ages while you looked for your money, meanwhile the bus driver would be doing no work.
So the conclusion is, if lots of people don't know where their money is (i.e. require a long time to respond to something the driver has asked) async controllers could well help throughput of requests, speeding up the process from some. Without an aysnc controller everyone waits until the person in front has been completely dealt with. BUT don't forget that in MVC you have a lot of bus drivers on a single bus so async is not an automatic choice.
There are two concepts at play here. First of all we can make our code run in parallel to execute faster or schedule code on another thread to avoid making the user wait. The example you had
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Do an advanced looging here which takes a while
});
return View();
}
belongs to the second category. The user will get a faster response but the total workload on the server is higher because it has to do the same work + handle the threading.
Another example of this would be:
public ViewResult Index() {
Task.Factory.StartNew(() => {
//Make async web request to twitter with WebClient.DownloadString()
});
Task.Factory.StartNew(() => {
//Make async web request to facebook with WebClient.DownloadString()
});
//wait for both to be ready and merge the results
return View();
}
Because the requests run in parallel the user won't have to wait as long as if they where done in serial. But you should realize that we use up more resources here than if we ran in serial because we run the code at many threads while we have on thread waiting too.
This is perfectly fine in a client scenario. And it is quite common there to wrap synchronous long running code in a new task(run it on another thread) too keep the ui responsive or parallize to make it faster. A thread is still used for the whole duration though. On a server with high load this could backfire because you actually use more resources. This is what people have warned you about
Async controllers in MVC has another goal though. The point here is to avoid having threads sittings around doing nothing(which can hurt scalability). It really only matters if the API's you are calling have async methods. Like WebClient.DowloadStringAsync().
The point is that you can let your thread be returned to handle new requests untill the web request is finished where it will call you callback which gets the same or a new thread and finish the request.
I hope you understand the difference between asynchronous and parallel. Think of parallel code as code where your thread sits around and wait for the result. While asynchronous code is code where you will be notified when the code is done and you can get back working at it, in the meantime the thread can do other work.
Applications can scale better if operations run asynchronously, but only if there are resources available to service the additional operations.
Asynchronous operations ensure that you're never blocking an action because an existing one is in progress. ASP.NET has an asynchronous model that allows multiple requests to execute side-by-side. It would be possible to queue the requests up and processes them FIFO, but this would not scale well when you have hundreds of requests queued up and each request takes 100ms to process.
If you have a huge volume of traffic, you may be better off not performing your queries asynchronously, as there may be no additional resources to service the requests. If there are no spare resources, your requests are forced to queue up, take exponentially longer or outright fail, in which case the asynchronous overhead (mutexes and context-switching operations) isn't giving you anything.
As far as ASP.NET goes, you don't have a choice - it's uses an asynchronous model, because that's what makes sense for the server-client model. If you were to be writing your own code internally that uses an async pattern to attempt to scale better, unless you're trying to manage a resource that's shared between all requests, you won't actually see any improvements because they're already wrapped in an asynchronous process that doesn't block anything else.
Ultimately, it's all subjective until you actually look at what's causing a bottleneck in your system. Sometimes it's obvious where an asynchronous pattern will help (by preventing a queued resource blocking). Ultimately only measuring and analysing a system can indicate where you can gain efficiencies.
Edit:
In your example, the Task.Factory.StartNew call will queue up an operation on the .NET thread-pool. The nature of Thread Pool threads is to be re-used (to avoid the cost of creating/destroying lots of threads). Once the operation completes, the thread is released back to the pool to be re-used by another request (the Garbage Collector doesn't actually get involved unless you created some objects in your operations, in which case they're collected as per normal scoping).
As far as ASP.NET is concerned, there is no special operation here. The ASP.NET request completes without respect to the asynchronous task. The only concern might be if your thread pool is saturated (i.e. there are no threads available to service the request right now and the pool's settings don't allow more threads to be created), in which case the request is blocked waiting to start the task until a pool thread becomes available.
Yes, they use a thread from the thread pool. There is actually a pretty excellent guide from MSDN that will tackle all of your questions and more. I have found it to be quite useful in the past. Check it out!
http://msdn.microsoft.com/en-us/library/ee728598.aspx
Meanwhile, the comments + suggestions that you hear about asynchronous code should be taken with a grain of salt. For starters, just making something async doesn't necessarily make it scale better, and in some cases can make your application scale worse. The other comment you posted about "a huge volume of traffic..." is also only correct in certain contexts. It really depends on what your operations are doing, and how they interact with other parts of the system.
In short, lots of people have lots of opinions about async, but they may not be correct out of context. I'd say focus on your exact problems, and do basic performance testing to see what async controllers, etc. actually handle with your application.
First thing its not MVC but the IIS who maintains the thread pool. So any request which comes to MVC or ASP.NET application is served from threads which are maintained in thread pool. Only with making the app Asynch he invokes this action in a different thread and releases the thread immediately so that other requests can be taken.
I have explained the same with a detail video (http://www.youtube.com/watch?v=wvg13n5V0V0/ "MVC Asynch controllers and thread starvation" ) which shows how thread starvation happens in MVC and how its minimized by using MVC Asynch controllers.I also have measured the request queues using perfmon so that you can see how request queues are decreased for MVC asynch and how its worst for Synch operations.