HostingEnvironment.QueueBackgroundWorkItem - Clarification? - c#

I've read Stephen's article about fire and forget background actions in Asp.net.
It is not recommended to use Task.Run for fire-and-forget because Asp.net doesn't know that you've queued a task.
So if a recycle is about to occur, the task has no way of knowing it.
That's where HostingEnvironment.QueueBackgroundWorkItem gets in.
It will know that a recycle is about to happen and will invoke the Cancellation Token.
But!
FWIK - Background tasks are being "terminated" once the main thread has finished.
That means that if a request gets in (a new thread is being created/fetched) and it invokesTask.Run , and the response has finished (but Task has not) then the Task will be terminated.
Question:
Does QueueBackgroundWorkItem solve this problem ? or does it only exist to warn about recycle?
In other words, if there's a request which runs QueueBackgroundWorkItem and the response has finished, will the QueueBackgroundWorkItem continue to execute its code?
The docs say: " independent of any request", but I'm not sure if it answers my question though

According to the documentation, this method tries to delay application shutdown until background work has completed.
Differs from a normal ThreadPool work item in that ASP.NET can keep track of how many work items registered through this API are currently running, and the ASP.NET runtime will try to delay AppDomain shutdown until these work items have finished executing.
Also, it does not flow certain contexts which are associated with the current request and are inappropriate for request-independent background work:
This overloaded method doesn't flow the ExecutionContext or SecurityContext from the caller to the callee. Therefore, members of those objects, such as the CurrentPrincipal property, will not flow from the caller to the callee.
In ASP.NET there is no way to make sure that background work ever completes. The machine could blue screen, there could be a bug terminating the worker process, there could be a timeout forcing termination and many other things.
Or, your code could have a bug and crash. That causes the queued work to be lost as well.
If you need something executed reliably, execute it synchronously before confirming completion, or queue it somewhere (message queue, database, ...).
That means that if a request gets in (a new thread is being created/fetched) and it invokesTask.Run, and the response has finished (but Task has not) then the Task will be terminated.
No, Task.Run works independently of HTTP requests. In fact, there is no way to cancel a Task except if the task's code cancels itself.

Related

How a program is executed asynchronously: [duplicate]

I thought that they were basically the same thing — writing programs that split tasks between processors (on machines that have 2+ processors). Then I'm reading this, which says:
Async methods are intended to be non-blocking operations. An await
expression in an async method doesn’t block the current thread while
the awaited task is running. Instead, the expression signs up the rest
of the method as a continuation and returns control to the caller of
the async method.
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
and I'm wondering whether someone can translate that to English for me. It seems to draw a distinction between asynchronicity (is that a word?) and threading and imply that you can have a program that has asynchronous tasks but no multithreading.
Now I understand the idea of asynchronous tasks such as the example on pg. 467 of Jon Skeet's C# In Depth, Third Edition
async void DisplayWebsiteLength ( object sender, EventArgs e )
{
label.Text = "Fetching ...";
using ( HttpClient client = new HttpClient() )
{
Task<string> task = client.GetStringAsync("http://csharpindepth.com");
string text = await task;
label.Text = text.Length.ToString();
}
}
The async keyword means "This function, whenever it is called, will not be called in a context in which its completion is required for everything after its call to be called."
In other words, writing it in the middle of some task
int x = 5;
DisplayWebsiteLength();
double y = Math.Pow((double)x,2000.0);
, since DisplayWebsiteLength() has nothing to do with x or y, will cause DisplayWebsiteLength() to be executed "in the background", like
processor 1 | processor 2
-------------------------------------------------------------------
int x = 5; | DisplayWebsiteLength()
double y = Math.Pow((double)x,2000.0); |
Obviously that's a stupid example, but am I correct or am I totally confused or what?
(Also, I'm confused about why sender and e aren't ever used in the body of the above function.)
Your misunderstanding is extremely common. Many people are taught that multithreading and asynchrony are the same thing, but they are not.
An analogy usually helps. You are cooking in a restaurant. An order comes in for eggs and toast.
Synchronous: you cook the eggs, then you cook the toast.
Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
Now does it make sense that multithreading is only one kind of asynchrony? Threading is about workers; asynchrony is about tasks. In multithreaded workflows you assign tasks to workers. In asynchronous single-threaded workflows you have a graph of tasks where some tasks depend on the results of others; as each task completes it invokes the code that schedules the next task that can run, given the results of the just-completed task. But you (hopefully) only need one worker to perform all the tasks, not one worker per task.
It will help to realize that many tasks are not processor-bound. For processor-bound tasks it makes sense to hire as many workers (threads) as there are processors, assign one task to each worker, assign one processor to each worker, and have each processor do the job of nothing else but computing the result as quickly as possible. But for tasks that are not waiting on a processor, you don't need to assign a worker at all. You just wait for the message to arrive that the result is available and do something else while you're waiting. When that message arrives then you can schedule the continuation of the completed task as the next thing on your to-do list to check off.
So let's look at Jon's example in more detail. What happens?
Someone invokes DisplayWebSiteLength. Who? We don't care.
It sets a label, creates a client, and asks the client to fetch something. The client returns an object representing the task of fetching something. That task is in progress.
Is it in progress on another thread? Probably not. Read Stephen's article on why there is no thread.
Now we await the task. What happens? We check to see if the task has completed between the time we created it and we awaited it. If yes, then we fetch the result and keep running. Let's suppose it has not completed. We sign up the remainder of this method as the continuation of that task and return.
Now control has returned to the caller. What does it do? Whatever it wants.
Now suppose the task completes. How did it do that? Maybe it was running on another thread, or maybe the caller that we just returned to allowed it to run to completion on the current thread. Regardless, we now have a completed task.
The completed task asks the correct thread -- again, likely the only thread -- to run the continuation of the task.
Control passes immediately back into the method we just left at the point of the await. Now there is a result available so we can assign text and run the rest of the method.
It's just like in my analogy. Someone asks you for a document. You send away in the mail for the document, and keep on doing other work. When it arrives in the mail you are signalled, and when you feel like it, you do the rest of the workflow -- open the envelope, pay the delivery fees, whatever. You don't need to hire another worker to do all that for you.
In-browser Javascript is a great example of an asynchronous program that has no multithreading.
You don't have to worry about multiple pieces of code touching the same objects at the same time: each function will finish running before any other javascript is allowed to run on the page. (Update: Since this was written, JavaScript has added async functions and generator functions. These functions do not always run to completion before any other javascript is executed: whenever they reach a yield or await keyword, they yield execution to other javascript, and can continue execution later, similar to C#'s async methods.)
However, when doing something like an AJAX request, no code is running at all, so other javascript can respond to things like click events until that request comes back and invokes the callback associated with it. If one of these other event handlers is still running when the AJAX request gets back, its handler won't be called until they're done. There's only one JavaScript "thread" running, even though it's possible for you to effectively pause the thing you were doing until you have the information you need.
In C# applications, the same thing happens any time you're dealing with UI elements--you're only allowed to interact with UI elements when you're on the UI thread. If the user clicked a button, and you wanted to respond by reading a large file from the disk, an inexperienced programmer might make the mistake of reading the file within the click event handler itself, which would cause the application to "freeze" until the file finished loading because it's not allowed to respond to any more clicking, hovering, or any other UI-related events until that thread is freed.
One option programmers might use to avoid this problem is to create a new thread to load the file, and then tell that thread's code that when the file is loaded it needs to run the remaining code on the UI thread again so it can update UI elements based on what it found in the file. Until recently, this approach was very popular because it was what the C# libraries and language made easy, but it's fundamentally more complicated than it has to be.
If you think about what the CPU is doing when it reads a file at the level of the hardware and Operating System, it's basically issuing an instruction to read pieces of data from the disk into memory, and to hit the operating system with an "interrupt" when the read is complete. In other words, reading from disk (or any I/O really) is an inherently asynchronous operation. The concept of a thread waiting for that I/O to complete is an abstraction that the library developers created to make it easier to program against. It's not necessary.
Now, most I/O operations in .NET have a corresponding ...Async() method you can invoke, which returns a Task almost immediately. You can add callbacks to this Task to specify code that you want to have run when the asynchronous operation completes. You can also specify which thread you want that code to run on, and you can provide a token which the asynchronous operation can check from time to time to see if you decided to cancel the asynchronous task, giving it the opportunity to stop its work quickly and gracefully.
Until the async/await keywords were added, C# was much more obvious about how callback code gets invoked, because those callbacks were in the form of delegates that you associated with the task. In order to still give you the benefit of using the ...Async() operation, while avoiding complexity in code, async/await abstracts away the creation of those delegates. But they're still there in the compiled code.
So you can have your UI event handler await an I/O operation, freeing up the UI thread to do other things, and more-or-less automatically returning to the UI thread once you've finished reading the file--without ever having to create a new thread.

How to wrap a single sync operation with a Task with a CancellationToken?

I have a single synchronous operation that could take a lot of time to complete. The caller of the operation provides a CancellationToken and the operation should be stopped immediately when the token is cancelled (within a few ms after cancellation would also work in this case).
How can I wrap this in a task with a CancellationToken?
I can't change the calling code nor the call itself.
What it used to be: LongOperation();
What I have now: await Task.Run(() => LongOperation(), cancellationToken).ConfigureAwait(false);
Clearly this doesn't work as you have to poll the token inside the action given to Task.Run.
I can't change the calling code nor the call itself.
Clearly this doesn't work as you have to poll the token inside the action given to Task.Run.
By far the easiest solution is to lift one of the requirements: either allow the CancellationToken to be ignored, or change the called code.
If that's really, truly, honestly not possible, then you'll need to run the code in another process. So, you'll need to kick off a child process that has access to that method, marshal all the arguments over to it, and then marshal back any result value or exception. Then, when the token is cancelled, kill the process.
There are less safe ways of doing the same thing: you can run the code in another AppDomain and shutdown the AppDomain on cancel, or you can run the code in another Thread and Abort the Thread on cancel. But both of those can easily cause resource leaks or application stability problems. The only truly safe way is a separate process.

.NET Task Performance with 1000s of blocked Tasks

I have some .NET4 code that needs to know if/when a network request times out.
Is the following code going to cause a new Thread to be added to the .NET ThreadPool each time a task runs, and then release it when it exits?
var wait = new Task(() =>
{
using (var pauseEvent = new ManualResetEvent(false))
pauseEvent.WaitOne(TimeSpan.FromMilliseconds(delay));
}).ContinueWith(action);
wait.Start()
https://stackoverflow.com/a/15096427/464603 suggests this approach would work, but have performance implications for the general system.
If so, how would you recommend handling a high number of request timeouts/s - probably 1000timeouts/s when bursting?
In Python I have previously used something like a tornado IOLoop to make sure this isn't heavy on the Kernel / ThreadPool.
I have some .NET4 code that needs to know if/when a network request times out.
The easiest way to do this is to use a timeout right at the API level, e.g., WebRequest.Timeout or CancellationTokenSource.CancelAfter. That way the operation itself will actually stop with an error when the timeout occurs. This is the proper way to do a timeout.
Doing a timed wait is quite different. (Your code does a timed wait). With a timed wait, it's only the wait that times out; the operation is still going, consuming system resources, and has no idea that it's supposed to stop.
If you must do a timed wait on a WaitHandle like ManualResetEvent, then you can use ThreadPool.RegisterWaitForSingleObject, which allows a thread pool thread to wait for 31 objects at a time instead of just one. However, I would consider this a last-ditch extreme solution, only acceptable if the code simply cannot be modified to use proper timeouts.
P.S. Microsoft.Bcl.Async adds async/await support for .NET 4.
P.P.S. Don't ever use StartNew or ContinueWith without explicitly specifying a scheduler. As I describe on my blog, it's dangerous.
First of all, adding Tasks to Thread Pool doesn't necessarily cause new Thread to be added to Thread Pool. When you add a new Task to Thread Pool it is added to internal queue. Existing Threads from Thread Pool take Tasks from this queue one by one and execute them. Thread Pool will start new Threads or stop them as it deems appropriate.
Adding Task with blocking logic inside will cause Threads from Thread Pool to block. It means that they won't be able to execute other Tasks from queue, which will lead to performance issues.
One way to add delay to some action is to use Task.Delay method which internally uses timers.
Task.Delay(delay).ContinueWith(action);
This will not block any Threads from Thread Pool. After specified delay, action will be added to Thread Pool and executed.
You may also directly use timers.
As someone suggested in comment, you may also use async methods. I believe the following code would be equivalent of your sample.
public async Task ExecuteActionAfterDelay()
{
await Task.Delay(3000);
action();
}
You might also want to look at this question Asynchronously wait for Task<T> to complete with timeout.

How does a thread that launches a blocking I/O request under TPL return immediately?

I would like to preface this question with the following:
I'm familiar with the IAsyncStateMachine implementation that the await keyword in C# generates.
My question is not about the basic flow of control that ensures when you use the async and await keywords.
Assumption A
The default threading behaviour in any threading environment, whether it be at the Windows operating system level or in POSIX systems or in the .NET thread pool, has been that when a thread makes a request for an I/O bound operation, say for a disk read, it issues the request to the disk device driver and enters a waiting state. Of course, I am glossing over the details because they are not of moment to our discussion.
Importantly, that thread can do nothing useful until it is unblocked by an interrupt from the device driver notifying it of completion. During this time, the thread remains on the wait queue and cannot be re-used for any other work.
I would first like a confirmation of the above description.
Assumption B
Secondly, even with the introduction of TPL, and its enhancements done in v4.5 of the .NET framework, and with the language level support for asynchronous operations involving tasks, this default behaviour described in Assumption A has not changed.
Question
Then, I'm at a loss trying to reconcile Assumptions A and B with the claim that suddenly emerged in all TPL literature that:
When the, say, main thread, starts this request for this I/O bound
work, it immediately returns and continues executing the rest of
the queued up messages in the message pump.
Well, what makes that thread return back to do other work? Isn't that thread supposed to be in the waiting state in the wait queue?
You might be tempted to reply that the code in the state machine launches the task awaiter and if the awaiter hasn't completed, the main thread returns.
That beggars the question -- what thread does the awaiter run on?
And the answer that springs up to mind is: whatever the implementation of the method be, of whose task it is awaiting.
That drives us down the rabbit hole further until we reach the last of such implementations that actually delivers the I/O request.
Where is that part of the source code in the .NET framework that changes this underlying fundamental mechanism about how threads work?
Side Note
While some blocking asynchronous methods such as WebClient.DownloadDataTaskAsync, if one were to follow their code
through their (the method's and not one's own) oval tract into their
intestines, one would see that they ultimately either execute the
download synchronously, blocking the current thread if the operation
was requested to be performed synchronously
(Task.RunSynchronously()) or if requested asynchronously, they
offload the blocking I/O bound call to a thread pool thread using the
Asynchronous Programming Model (APM) Begin and End methods.
This surely will cause the main thread to return immediately because
it just offloaded blocking I/O work to a thread pool thread, thereby
adding approximately diddlysquat to the application's scalability.
But this was a case where, within the bowels of the beast, the work
was secretly offloaded to a thread pool thread. In the case of an API
that doesn't do that, say an API that looks like this:
public async Task<string> GetDataAsync()
{
var tcs = new TaskCompletionSource<string>();
// If GetDataInternalAsync makes the network request
// on the same thread as the calling thread, it will block, right?
// How then do they claim that the thread will return immediately?
// If you look inside the state machine, it just asks the TaskAwaiter
// if it completed the task, and if it hasn't it registers a continuation
// and comes back. But that implies that the awaiter is on another thread
// and that thread is happily sleeping until it gets a kick in the butt
// from a wait handle, right?
// So, the only way would be to delegate the making of the request
// to a thread pool thread, in which case, we have not really improved
// scalability but only improved responsiveness of the main/UI thread
var s = await GetDataInternalAsync();
tcs.SetResult(s); // omitting SetException and
// cancellation for the sake of brevity
return tcs.Task;
}
Please be gentle with me if my question appears to be nonsensical. The extent of knowledge of things in almost all matters is limited. I am just learning anything.
When you are talking about an async I/O operation, the truth, as pointed out here by Stephen Cleary (http://blog.stephencleary.com/2013/11/there-is-no-thread.html) is that there is no thread. An async I/O operation is completed at a lower level than the threading model. It generally occurs within interrupt handler routines. Therefore, there is no I/O thread handling the request.
You ask how a thread that launches a blocking I/O request returns immediately. The answer is because an I/O request is not at its core actually blocking. You could block a thread such that you are intentionally saying not to do anything else until that I/O request finishes, but it was never the I/O that was blocking, it was the thread deciding to spin (or possibly yield its time slice).
The thread returns immediately because nothing has to sit there polling or querying the I/O operation. That is the core of true asynchronicity. An I/O request is made, and ultimately the completion bubbles up from an ISR. Yes, this may bubble up into the thread pool to set the task completion, but that happens in a nearly imperceptible amount of time. The work itself never had to be ran on a thread. The request itself may have been issued from a thread, but as it is an asynchronous request, the thread can immediately return.
Let's forget C# for a moment. Lets say I am writing some embedded code and I request data from a SPI bus. I send the request, continue my main loop, and when the SPI data is ready, an ISR is triggered. My main loop resumes immediately precisely because my request is asynchronous. All it has to do is push some data into a shift register and continue on. When data is ready for me to read back, an interrupt triggers. This is not running on a thread. It may interrupt a thread to complete the ISR, but you could not say that it actually ran on that thread. Just because its C#, this process is not ultimately any different.
Similarly, lets say I want to transfer data over USB. I place the data in a DMA location, set a flag to tell the bus to transfer my URB, and then immediately return. When I get a response back it also is moved into memory, an interrupt occurs and sets a flag to let the system know hey, heres a packet of data sitting in a buffer for you.
So once again, I/O is never truly blocking. It could appear to block, but that is not what is happening at the low level. It is higher level processes that may decide that an I/O operation has to happen synchronously with some other code. This is not to say of course that I/O is instant. Just that the CPU is not stuck doing work to service the I/O. It COULD block if implemented that way, and this COULD involve threads. But that is not how async I/O is implemented.

What happens to the thread when reaching 'await' on 'async' method?

My question as the title suggest is about the background of 'async' and 'await'.
Is it true to say that what the current thread reaches 'await' keyword, it goes to "sleep",
and wakes up when the await method is done?
Thanks!
Guy
Is it true to say that what the current thread reaches 'await' keyword, it goes to "sleep", and wakes up when the await method is done?
No. The whole point of async is to avoid having threads sleeping when they could be doing other work. Additionally, if the thread running the async method is a UI thread, you really don't want it to be sleeping at all - you want it to be available to other events.
When execution reaches an await expression, the generated code will check whether the thing you're awaiting is already available. If it is, you can use it and keep going. Otherwise, it will add a continuation to the "awaitable" part, and return immediately.
The continuation makes sure that the rest of the async method gets run when the awaitable value is ready. Which thread that happens in depends on the context in which you're awaiting - if the async method is running in thread pool threads, the continuation could run on a different thread to the one the method started on... but that shouldn't matter. (The rest of the context will still be propagated.)
Note that it's fine for the async method to return without having completed - because an async method can't return a value directly - it always returns a Task<T> (or Task, or void)... and the task returned by the method will be only be completed when the async method has really reached the end.
async is only syntactic sugar that allows await keyword to be used.
If async, await is used in ASP.NET Core, then your request thread will be released to thread pool.
As Stephen Cleary says:
Asynchronous request handlers operate differently. When a request
comes in, ASP.NET takes one of its thread pool threads and assigns it
to that request. This time the request handler will call that external
resource asynchronously. This returns the request thread to the thread
pool until the call to the external resource returns. Figure 3
illustrates the thread pool with two threads while the request is
asynchronously waiting for the external resource.
The important difference is that the request thread has been returned
to the thread pool while the asynchronous call is in progress. While
the thread is in the thread pool, it’s no longer associated with that
request. This time, when the external resource call returns, ASP.NET
takes one of its thread pool threads and reassigns it to that request.
That thread continues processing the request. When the request is
completed, that thread is again returned to the thread pool. Note that
with synchronous handlers, the same thread is used for the lifetime of
the request; with asynchronous handlers, in contrast, different
threads may be assigned to the same request (at different times).
For desktop application:
await releases the current thread, but NOT to the thread pool.
The UI thread doesn't come from the thread pool. If you run asynchronous method,
e.g. ExecuteScalarAsync without async, await keywords, then this method
will run asynchronously no matter what. The calling thread won't be
affected .
Special thanks for nice comments to Panagiotis Kanavos.
E.g. you have a heavy stored procedure and your stored procedure takes 10 minutes to be executed. And if you run this code from C# without async, await keywords, then your execution thread will wait your stored procedure for 10 minutes. And this waiting thread will do nothing, it will just wait stored procedure.
However, if async, await keyword is used, then your thread will not wait stored procedure. The thread will be eligible to work.
Although this question has already been answered by Jon Skeet who is a highly skilled person (and one of my favorites), it is worth reading the contents that I mention below for other readers of this post.
By using an async keyword on a method, the original asynchronous method creates a state machine instance, initializes it with the captured state (including this pointer if the method is not static), and then starts the execution by calling AsyncTaskMethodBuilder<T>.Start with the state machine instance passed by reference.
As soon as control reaches an await keyword, the current thread (which can be a .Net thread pool's worker thread), creates a callback (as a delegate) to execute the rest of the sync code exactly after the await keyword (Continuation) using the SynchronizationContext/TaskSheduler's APIs (SynchronizationContext may not be present in all applications, such as Console Applications or ASP.Net Core Web Applications), the captured SynchronizationContext is stored in the state machine as an object, the IO work is sent to an IOCP thread, and the current thread is then released.
The IOCP thread binds to an IOCP (IO Completion Port), opens a connection, and asks it to execute the code that has been waited, and the IOCP sends the execution command to the corresponding device (socket/drive).
Whenever the IO work is finished by the relevant device, a signal from the IOCP is returned to the IOCP thread along with the result of the IO work, and then the IOCP thread, based on that captured SynchronizationContext determines which thread of thread pool should process the continuation/callback (that was stored in the state machine).
Also, the following articles can be useful:
https://devblogs.microsoft.com/premier-developer/dissecting-the-async-methods-in-c/
https://tooslowexception.com/net-asyncawait-in-a-single-picture/
https://devblogs.microsoft.com/dotnet/configureawait-faq/#what-is-a-synchronizationcontext
No. Current thread actually doesn't go to sleep. The execution continues. This is the whole trick of it. You may have some code that processes data while asynchronous actions are still pending. It means that by the time those async completes your main thread is free to run and process other data.
As for the other part of the question - async just executes on another thread, not the current one. I believe that CLR is responsible for spinning those threads, so that many async actions are allowed at the same time (i.e. you may be retrieving data asynchronously from different web servers at the same time).

Categories