What are the scalability benefits of async (non-blocking) code? - c#

Blocking threads is considered a bad practice for 2 main reasons:
Threads cost memory.
Threads cost processing time via context switches.
Here are my difficulties with those reasons:
Non-blocking, async code should also cost pretty much the same amount of memory, because the callstack should be saved somewhere right before executing he async call (the context is saved, after all). And if threads are significantly inefficient (memory-wise), why doesn't the OS/CLR offer a more light-weight version of threads (saving only the callstack's context and nothing else)? Wouldn't it be a much cleaner solution to the memory problem, instead of forcing us to re-architecture our programs in an asynchronous fashion (which is significantly more complex, harder to understand and maintain)?
When a thread gets blocked, it is put into a waiting state by the OS. The OS won't context-switch to the sleeping thread. Since way over 95% of the thread's life cycle is spent on sleeping (assuming IO-bound apps here), the performance hit should be negligible, since the processing sections of the thread would probably not be pre-empted by the OS because they should run very fast, doing very little work. So performance-wise, I can't see a whole lot of benefit to a non-blocking approach either.
What am I missing here or why are those arguments flawed?

Non-blocking, async code should also cost pretty much the same amount of memory, because the callstack should be saved somewhere right before executing he async call (the context is saved, after all).
The entire call stack is not saved when an await occurs. Why do you believe that the entire call stack needs to be saved? The call stack is the reification of continuation and the continuation of the awaited task is not the continuation of the await. The continuation of the await is on the stack.
Now, it may well be the case that when every asynchronous method in a given call stack has awaited, information equivalent to the call stack has been stored in the continuations of each task. But the memory burden of those continuations is garbage collected heap memory, not a block of a million bytes of committed stack memory. The continuation state size is order n in the size of the number of tasks; the burden of a thread is a million bytes whether you use it or not.
if threads are significantly inefficient (memory-wise), why doesn't the OS/CLR offer a more light-weight version of threads
The OS does. It offers fibers. Of course, fibers still have a stack, so that's maybe not better. You could have a thread with a small stack I suppose.
Wouldn't it be a much cleaner solution to the memory problem, instead of forcing us to re-architecture our programs in an asynchronous fashion
Suppose we made threads -- or for that matter, processes -- much cheaper. That still doesn't solve the problem of synchronizing access to shared memory.
For what it's worth, I think it would be great if processes were lighter weight. They're not.
Moreover, the question somewhat contradicts itself. You're doing work with threads, so you are already willing to take on the burden of managing asynchronous operations. A given thread must be able to tell another thread when it has produced the result that the first thread asked for. Threading already implies asynchrony, but asynchrony does not imply threading. Having an async architecture built in to the language, runtime and type system only benefits people who have the misfortune to have to write code that manages threads.
Since way over 95% of the thread's life cycle is spent on sleeping (assuming IO-bound apps here), the performance hit should be negligible, since the processing sections of the thread would probably not be pre-empted by the OS because they should run very fast, doing very little work.
Why would you hire a worker (thread) and pay their salary to sit by the mailbox (sleeping the thread) waiting for the mail to arrive (handling an IO message)? IO interrupts don't need a thread in the first place. IO interrupts exist in a world below the level of threads.
Don't hire a thread to wait on IO; let the operating system handle asynchronous IO operations. Hire threads to do insanely huge amounts of high latency CPU processing, and then assign one thread to each CPU you own.
Now we come to your question:
What are the benefits of async (non-blocking) code?
Not blocking the UI thread
Making it easier to write programs that live in a world with high latency
Making more efficient use of limited CPU resources
But let me rephrase the question using an analogy. You're running a delivery company. There are many orders coming in, many deliveries going out, and you cannot tell a customer that you will not take their delivery until every delivery before theirs is completed. Which is better:
hire fifty guys to take calls, pick up packages, schedule deliveries, and deliver packages, and then require that 46 of them be idle at all times or
hire four guys and make each of them really good at first, doing a little bit of work at a time, so that they are always responsive to customer requests, and second, really good at keeping a to-do list of jobs they need to do in the future
The latter seems like a better deal to me.

You are messing multithreading and async concepts here.
Both your "difficulties" come from the assumption that each async method gets assigned a specialized thread on which it does the work. However, the state of affairs is quite opposite: each time an async operation needs to be executed, the CLR picks an idle (thus already created) thread from the threadpool and executes that method on the selected thread.
The core concept here is that async doesn't mean always creating new threads, it means scheduling the execution on existing threads so that no thread is sitting idle.

Related

Is Blocking code really expensive on modern systems?

I'm trying to grasp a bit better the concepts of async programming (mostly for C#) and blocking/non blocking code.
In C#, if I call .Wait() on a Task , is it always considered "blocking" ?
I understand that the current thread will be blocked. However the thread is put in a "waiting" state (AFAIK), and AFAIK it will never be scheduled by the OS until woken up when the Task completed (I assume the thread is woken up by kernel magic)
In that case, the CPU time taken by this blocking operation should be negligible during the waiting period. Is it indeed the case?
So where are the advantage of async programming coming from? Is it because it allows to go beyond 1000 or so threads that the OS wouldn't allow ? Is it because the memory overhead per async task is lower than the overhead of a thread?
Keep in mind that the "event loop" that manages all the tasks in async context also has work to do to manage the scheduling of all async tasks, bookeeping etc. Is it really less work than what the kernerl has to do in the blocking case to manage threads?
Wait() will block your thread the same as calling a non-async I/O.
Blocking is not inherently inefficient. In fact, it can be more performant if you have a process that will have very few threads. Windows' scheduler actually has some interesting special designs for I/O-blocked threads which you can read about in the Windows Internals books, such as boosting a thread to front of the line if it's been waiting on an I/O for a long time.
However, it doesn't scale. Every thread you create has overhead: memory for stack and register space, thread-local storage used by your app and inside of .NET, cache thrashing caused by all the extra memory needed, context switching, and so on. It's generally not going to be an efficient use of resources especially when each thread will spend a majority of its time blocked.
Async takes advantage of the fact that conceptually we don't really need everything a thread has to offer -- we only want concurrency, so we can make more domain-relevant optimizations in how we use our resources.
It rarely hurts a project to be async by default. If your app doesn't need to be hyper-optimized for scalability, it won't hurt or help you. If your app does, then it'll be a huge help. Things like async/await can just help you model your concurrency better, so regardless of your perf goals it can be useful.
Async I/O is moving towards an even cooler place: I/O APIs like Windows RIO and Linux's io_uring allow you to do I/O without even context switching. Currently .NET does not take advantage of these things, but PipeWriter and PipeReader were built with it in mind for the future.

Are there advantages to asynchronous code on dedicated backend servers with no UI thread?

I had a developer challenge the use of asynchronous server side code the other day. He asked why asynchronous code is superior to synchronous code on a server with no UI thread to block? I gave him the typical thread exhaustion answer but after thinking about it for a while I was no longer sure my answer was correct. After doing a little research I found that the upper limit to threads in an OS is governed by memory not an arbitrary number. And servers like Kestrel support unlimited threads. So in "theory" the number of requests (threads) a server can block on in parallel is governed by memory. Which is no different than async code in .NET; it lifts stack variables to the heap but it's still memory bound.
I've always assumed that smarter people than me had thought this through and async code was the right way to handle IO bound code. But what are the measurable advantages of async .NET code when running in a dedicated server farm with no UI thread? Does a move to the cloud (AWS) change the answer?
Server-side asynchronous code purpose is completely different from asynchronous UI code.
Asynchronous UI code makes UI more responsive (especially when multiple CPU cores are available), it allows multiple UI tasks to run in parallel which improves UI user experience.
The purpose of server-side asynchronous code on the other hand is to minimise the resources necessary to serve multiple clients simultaneously. In fact it is beneficial even if there is only one CPU core or a single-threaded event loop like in Node.js. And it all boils down to a simple concept of
Asynchronous IO.
The difference between synchronous and asynchronous IO is that in case of the former the thread which initialises an IO operation is paused until the IO operation is completed (e.g. until DB request is executed or a file on a disk is read). The same thread is then un-paused once the IO operation is completed to process the result of it. Note: even though while paused the thread is most likely not using any CPU resources (it is probably put to sleep by a thread scheduler) its resources are still tied to this particular IO operation and are pretty much wasted while IO is executed by the hardware. Effectively with synchronous IO you will need at least one thread per currently being processed client request even though most of those threads are probably asleep waiting for their IO operations to complete. In .NET each thread has at least 1MB of stack allocated so if the server is currently processing say 1000 requests it leads to almost 1GB of memory allocated simply for thread stacks plus an additional burden for a thread scheduler and more time CPU spends doing context switches: the more threads there are the slower overall performance of the system. More memory allocated means less efficient memory/CPU caches usage too.
Asynchronous IO is more efficient because a worker thread only initialises an IO operation and instead of waiting for it to complete it is immediately switched to another useful task (e.g. continuation of another client's request processing) and when the IO operation is completed by the hardware the processing of the result is resumed on any available worker thread. As a result, depending on the ratio between overall time spent waiting for hardware to complete IO and the time spent doing CPU tasks (e.g. serialisation of the result of IO operation into JSON) this approach can use less threads to serve the same number of simultaneous client requests: if, say, 90% of the time is spent in IO we can potentially use only 100 thread to serve the same 1000 simultaneous requests. The more your server-side code is IO-bound vs CPU-bound the more simultaneous clients requests it can process using a given amount of resources: CPU and memory.
What is the drawback of asynchronous code? Mainly it is generally harder to write than synchronous. Asynchronous code uses callbacks to resume operation so instead of a simple linear code a programmer needs to pass a delegate (a continuation) to IO method which is later called by the system when IO operation is completed (potentially on a different thread). However modern C# with its async/await facilities makes this task less complicated and even makes asynchronous code to almost look like synchronous. The only thing to remember: the asynchronous code only works when it is asynchronous "all the way down": even a single Task.Wait or Task.Result somewhere in the stack of calls from initial HTTP request processing to DB request call makes entire code synchronous thus forcing the current working thread to wait for that Wait call to finish defeating the purpose. Note: await in C# code does not actually awaits to the result of the call but is converted by the compiler to a ContinueWith i.e. to a continuation callback though in practice it is a bit more complicated than that but luckily the complexity is hidden from a programmer so nowadays writing efficient asynchronous code is relatively straightforward task.

Conditions to use async-methods in c# .net-core web-apis

I'm implementing several small services, each of which uses entity-framework to store certain (but little) data. They also have a fair bit of business-logic so it makes sense to separate them from one another.
I'm certainly aware that async-methods and the async-await pattern itself can solve many problems in regards to performance especially when it comes to any I/O or cpu-intensive operations.
I'm uncertain wether to use the async-methods of entity-framework logic (e.g. SaveChangesAsync or FirstOrDefaultAsync) because I can't find metrics that say "now you do it, and now you don't" besides from "Is it I/O or CPU-Intensive or not?".
What I've found when researching this topic (not limited to this but these are showing the problem):
not using it can lead to your application stopping to respond because the threads (not the ones of the cpu, but virtual threads of the os) can run out because of the in that case blocking i/o calls to the database.
using it bloats your code and decreases performance because of the context-switches at every method. Especially when I apply those to entity-framework calls it means that I have at least three context switches for one call from controller to business-logic to the repository to the database.
What I don't know, and that's what I would like to know from you:
How many virtual os threads are there? Or to be more precise: If I expect my application and server to be able to handle 100 requests to this service within five seconds (and I don't expect them to be more, 100 is already exagerated), should I back away from using async/await there?
What are the precise metrics that I could look at to answer this question for any of my services?
Or should I rather always use async-methods for I/O calls because they are already there and it could always happen that the load-situation on my server changes and there's so much going on that the async-methods would help me a great deal with that?
I'm certainly aware that async-methods and the async-await pattern itself can solve many problems in regards to performance especially when it comes to any I/O or cpu-intensive operations.
Sort of. The primary benefit of asynchronous code is that it frees up threads. UI apps (i.e., desktop/mobile) manifest this benefit in more responsive user interfaces. Services such as the ones you're writing manifest this benefit in better scalability - the performance benefits are only visible when under load. Also, services only receive this benefit from I/O operations; CPU-bound operations require a thread no matter what, so using await Task.Run on service applications doesn't help at all.
not using it can lead to your application stopping to respond because the threads (not the ones of the cpu, but virtual threads of the os) can run out because of the in that case blocking i/o calls to the database.
Yes. More specifically, the thread pool has a limited injection rate, so it can only grow so far so quickly. Asynchrony (freeing up threads) helps your service handle bursty traffic and heavy load. Quote:
Bear in mind that asynchronous code does not replace the thread pool. This isn’t thread pool or asynchronous code; it’s thread pool and asynchronous code. Asynchronous code allows your application to make optimum use of the thread pool. It takes the existing thread pool and turns it up to 11.
Next question:
using it bloats your code and decreases performance because of the context-switches at every method.
The main performance drawback to async is usually memory related. There's additional structures that need to be allocated to keep track of ongoing asynchronous work. In the synchronous world, the thread stack itself has this information.
What I don't know, and that's what I would like to know from you: [when should I use async?]
Generally speaking, you should use async for any new code doing I/O-based operations (including all EF operations). The metrics-based arguments are more about cost/benefit analysis of converting to async - i.e., given an existing old synchronous codebase, at what point is it worth investing the time to convert it to async.
TLDR: Should I use async? YES!
You seem to have fallen for the most common mistake when trying to understand async/await. Async is orthogonal to multi-threading.
To answer your question, when should you the async method?
If currentContext.IsAsync && method.HasAsyncVersion
return UseAsync.Yes;
Else
return UseAsync.No;
That above is the short version.
Async/Await actually solves a few problems
Unblock UI thread
M:N threading
Multithreaded scheduling and synchronization
Interupt/Event based asynchronous scheduling
Given the large number of different use cases for async/await, the "assumptions" you state only apply to certain cases.
For example, context switching, only happens with Multi-Threading. Single-Threaded Interupt based Async actually reduces context switching by reducing blocking times and keeping the OS thread well fed with work.
Finally, your question on OS threads, is fundimentally wrong.
Firstly, OS threads each require creation of a stack (4MB of continous RAM, 100 threads means 400MB of RAM before any work is even done).
Secondly, unless you have 100 physical cores on your PC, your CPUs will have to context switch between each OS thread, resulting in the CPU stalling, whilst it loads that thread. By using M:N threading, you can keep the CPU running, by reducing the number of OS threads and instead using Green Threads (Task in dotnet).
Thirdly, not all "await" results in "async" behavior. Tasks are able to synchronously return, short-circuiting all of the "bloat".
In short, without digging really deep, it is hard to find optimization opportunities by switching from async to sync methods.

What resources do blocked threads take-up

One of the main purposes of writing code in the asynchronous programming model (more specifically - using callbacks instead of blocking the thread) is to minimize the number of blocking threads in the system.
For running threads , this goal is obvious, because of context switches and synchronization costs.
But what about blocked threads? why is it so important to reduce their number?
For example, when waiting for a response from a web server a thread is blocked and doesn't take-up any CPU time and does not participate in any context switch.
So my question is:
other than RAM (about 1MB per thread ?) What other resources do blocked threads take-up?
And another more subjective question:
In what cases will this cost really justify the hassle of writing asynchronous code (the price could be, for example, splitting your nice coherent method to lots of beginXXX and EndXXX methods, and moving parameters and local variables to be class fields).
UPDATE - additional reasons I didn't mention or didn't give enough weight to:
More threads means more locking on communal resources
More threads means more creation and disposing of threads which is expensive
The system can definitely run-out of threads/RAM and then stop servicing clients (in a web server scenario this can actually bring down the service)
So my question is: other than RAM (about 1MB per thread ?) What other resources do blocked threads take-up?
This is one of the largest ones. That being said, there's a reason that the ThreadPool in .NET allows so many threads per core - in 3.5 the default was 250 worker threads per core in the system. (In .NET 4, it depends on system information such as virtual address size, platform, etc. - there isn't a fixed default now.) Threads, especially blocked threads, really aren't that expensive...
However, I would say, from a code management standpoint, it's worth reducing the number of blocked threads. Every blocked thread is an operation that should, at some point, return and become unblocked. Having many of these means you have quite a complicated set of code to manage. Keeping this number reduced will help keep the code base simpler - and more maintainable.
And another more subjective question: In what cases will this cost really justify the hassle of writing asynchronous code (the price could be, for example, splitting your nice coherent method to lots of beginXXX and EndXXX methods, and moving parameters and local variables to be class fields).
Right now, it's often a pain. It depends a lot on the scenario. The Task<T> class in .NET 4 dratically improves this for many scenarios, however. Using the TPL, it's much less painful than it was previously using the APM (BeginXXX/EndXXX) or even the EAP.
This is why the language designers are putting so much effort into improving this situation in the future. Their goals are to make async code much simpler to write, in order to allow it to be used more frequently.
Besides from any resources the blocked thread might hold a lock on, thread pool size is also of consideration. If you have reached the maximum thread pool size (if I recall correctly for .NET 4 is max thread count is 100 per CPU) you simply won't be able to get anything else to run on the thread pool until at least one thread gets freed up.
I would like to point out that the 1MB figure for stack memory (or 256KB, or whatever it's set to) is a reserve; while it does take away from available address space, the actual memory is only committed as it's needed.
On the other hand, having a very large number of threads is bound to bog down the task scheduler somewhat as it has to keep track of them (which have become runnable since the last tick, and so on).

Why .net Threadpool is used only for short time span tasks?

I've read at many places that .net Threadpool is meant for short time span tasks (may be not more than 3secs). In all these mentioning I've not found a concrete reason why it should be not be used.
Even some people said that it leads to nasty results if we use for long time tasks and also leads to deadlocks.
Can somebody explain it in plain english with technical reason why we should not use thread pool for long time span tasks?
To be specific, I would even like to give a scenario and want to to know why ThreadPool should not be used in this scenario with proper reasons behind it.
Scenario: I need to process some thousands of user's data. User's processing data is retrieved from a local database and using that information I need to connect to an API hosted on some other location and the response from API will be stored in the local database after processing it.
If someone can explain me pitfalls in this scenario if I use ThreadPool with thread limit of 20? Processing time of each user may range from 3 sec to 1 min (or more).
The point of the threadpool is to avoid the situation where the time spent creating the thread is longer than the time spent using it. By reusing existing threads, we get to avoid that overhead.
The downside is that the threadpool is a shared resource: if you're using a thread, something else can't. So if you have lots of long-running tasks, you could end up with thread-pool starvation, possibly even leading to deadlock.
Don't forget that your application's code may not be the only code using the thread pool... the system code uses it a lot too.
It sounds like you might want to have your own producer/consumer queue, with a small number of threads processing it. Alternatively, if you could talk to your other service using an asynchronous API, you may find that each bit of processing on your computer would be short-lived.
It is related to the way the threadpool scheduler works. It tries hard to ensure that it won't release more waiting threads than you have CPU cores. Which is a good idea, running more threads than cores is wasteful as Windows spends time switching context between threads. Making the overall time needed to complete the jobs longer.
As soon as a TP thread completes, another one is allowed to run. Two times per second, the TP scheduler steps in when the running threads do not complete. It cannot tell why these threads are taking so much time to get their job done. Half a second is a lot of CPU cycles, a cool billion or so. It therefore assumes that the threads are blocking, waiting for some kind of I/O to complete. Like a dbase query, a disk read, a socket connection attempt, stuff like that.
And it allows another thread to run. You've now got more threads then you have cores. Which isn't really a problem if those original threads are indeed blocking, they're not consuming any CPU cycles.
You can see where this leads: if your thread runs for 3 seconds then its creating a bit of a logjam. It delays, but won't block, other TP threads that are waiting to run. If your thread needs to spend so much time because it is constantly blocking then you are better off creating a regular Thread. And if you really care that the thread does not get delayed by the TP scheduler then you should use a Thread as well.
The TP scheduler was tinkered with in .NET 4.0 btw, what I wrote is really only true for earlier releases. The basics are still there, it just uses a smarter scheduling algorithm. Based on a feedback, dynamically scheduling by measuring throughput. This really only matters if you have a lot of TP threads going.
Two reasons not really touched upon:
The threadpool is used as the normal means of handling I/O callback functions, which are usually supposed to happen very soon after associated I/O operation completes. In general, timeliness is more important with short tasks than long ones, but long-running tasks in the threadpool will delay the execution of notification tasks which could have (and should have) started up, run, and completed quickly.
If a threadpool task becomes blocked until such time as some other threadpool task runs, it may hog a threadpool thread, thus delaying or in some cases blocking altogether the start of that other task (or any others).
Generally, having a threadpool thread acquire a lock (waiting if necessary) isn't a problem. If it's necessary for one threadpool thread to wait for another threadpool thread to release a lock, the fact that latter thread acquired the lock in the first place implies that it got started. On the other hand, waiting for e.g. some data to arrive from a connection may cause deadlock if an I/O callback routine is used to flag the arrival of data. If many too many threadpool threads are waiting for the I/O callback to signal that data has arrived, the system may decide to defer the callback until one of the threadpool threads completes.

Categories