ThreadPool utilizes recycling of threads for optimal performance over using multiple of the Thread class. However, how does this apply to processing methods with while loops inside of the ThreadPool?
As an example, if we were to apply a thread in the ThreadPool to a client that has connected to a TCP server, that client would need a while loop to keep checking for incoming data. The loop can be exited to disconnect the client, but only if the server closes or if the client demands a disconnection.
If that is the case, then how would having a ThreadPool help when masses of clients connect? Either way the same amount of memory is used if the clients stay connected. If they stay connected, then the threads cannot be recycled. If so, then ThreadPool would not help much until a client disconnects and opens up a thread to recycle.
On the other hand it was suggested to me to use the Network.BeginReceive and NetworkStream.EndReceive asynchronous methods to avoid threads all together to save RAM usage and CPU usage. Is this true or not?
Either way the same amount of memory is used if the clients stay
connected.
So far this is true. It's up to your app to decide how much state it needs to keep per client.
If they stay connected, then the threads cannot be recycled. If so,
then ThreadPool would not help much until a client disconnects and
opens up a thread to recycle.
This is untrue, because it assumes that all interesting operations performed by these threads are synchronous. This is a naive mode of operation, and in fact real world code is asynchronous: a thread makes a call to request an action and is then free to do other things. When a result is made available as a result of that action, some thread looking for other things to do will run the code that acts on the result.
On the other hand it was suggested to me to use the
Network.BeginReceive and NetworkStream.EndReceive asynchronous methods
to avoid threads all together to save RAM usage and CPU usage. Is this
true or not?
As explained above, async methods like these will allow you to service a potentially very large number of clients with only a small number of worker threads -- but by itself it will do nothing to either help or hurt the memory situation.
You are correct. Slow blocking codes can cause poor performances both on the client-side as well as server-side. You can run slow work on a separate thread and that might work well enough on the client-side but may not help on the server-side. Having blocking methods in the server can diminish the overall performance of the server because it can lead to a situation where your server has a large no of threads running and all blocked. So, even simple request might end up taking a long time. It is better to use asynchronous APIs if they are available for slow running tasks just like the situation you are in. (Note: even if the asynchronous operations are not available, you can implement one by implementing a custom awaiter class) This is better for the clients as well as servers. The main point of asynchronous code is to reduce the no of threads. Because servers can have larger no of requests in progress simultaneously because reducing no of threads to handle a particular no of clients can improve scalability.
If you dont need to have more control over the threads or the thread-pool you can go with asynchronous approach.
Also, each thread takes 1 MB space on the heap. So, asynchronous methods will definitely help reduce memory usage. However, I think the nature of the work you have described here is going to take pretty much the same amount of time in multi-threaded as well as asynchronous approach.
Related
I'm implementing several small services, each of which uses entity-framework to store certain (but little) data. They also have a fair bit of business-logic so it makes sense to separate them from one another.
I'm certainly aware that async-methods and the async-await pattern itself can solve many problems in regards to performance especially when it comes to any I/O or cpu-intensive operations.
I'm uncertain wether to use the async-methods of entity-framework logic (e.g. SaveChangesAsync or FirstOrDefaultAsync) because I can't find metrics that say "now you do it, and now you don't" besides from "Is it I/O or CPU-Intensive or not?".
What I've found when researching this topic (not limited to this but these are showing the problem):
not using it can lead to your application stopping to respond because the threads (not the ones of the cpu, but virtual threads of the os) can run out because of the in that case blocking i/o calls to the database.
using it bloats your code and decreases performance because of the context-switches at every method. Especially when I apply those to entity-framework calls it means that I have at least three context switches for one call from controller to business-logic to the repository to the database.
What I don't know, and that's what I would like to know from you:
How many virtual os threads are there? Or to be more precise: If I expect my application and server to be able to handle 100 requests to this service within five seconds (and I don't expect them to be more, 100 is already exagerated), should I back away from using async/await there?
What are the precise metrics that I could look at to answer this question for any of my services?
Or should I rather always use async-methods for I/O calls because they are already there and it could always happen that the load-situation on my server changes and there's so much going on that the async-methods would help me a great deal with that?
I'm certainly aware that async-methods and the async-await pattern itself can solve many problems in regards to performance especially when it comes to any I/O or cpu-intensive operations.
Sort of. The primary benefit of asynchronous code is that it frees up threads. UI apps (i.e., desktop/mobile) manifest this benefit in more responsive user interfaces. Services such as the ones you're writing manifest this benefit in better scalability - the performance benefits are only visible when under load. Also, services only receive this benefit from I/O operations; CPU-bound operations require a thread no matter what, so using await Task.Run on service applications doesn't help at all.
not using it can lead to your application stopping to respond because the threads (not the ones of the cpu, but virtual threads of the os) can run out because of the in that case blocking i/o calls to the database.
Yes. More specifically, the thread pool has a limited injection rate, so it can only grow so far so quickly. Asynchrony (freeing up threads) helps your service handle bursty traffic and heavy load. Quote:
Bear in mind that asynchronous code does not replace the thread pool. This isn’t thread pool or asynchronous code; it’s thread pool and asynchronous code. Asynchronous code allows your application to make optimum use of the thread pool. It takes the existing thread pool and turns it up to 11.
Next question:
using it bloats your code and decreases performance because of the context-switches at every method.
The main performance drawback to async is usually memory related. There's additional structures that need to be allocated to keep track of ongoing asynchronous work. In the synchronous world, the thread stack itself has this information.
What I don't know, and that's what I would like to know from you: [when should I use async?]
Generally speaking, you should use async for any new code doing I/O-based operations (including all EF operations). The metrics-based arguments are more about cost/benefit analysis of converting to async - i.e., given an existing old synchronous codebase, at what point is it worth investing the time to convert it to async.
TLDR: Should I use async? YES!
You seem to have fallen for the most common mistake when trying to understand async/await. Async is orthogonal to multi-threading.
To answer your question, when should you the async method?
If currentContext.IsAsync && method.HasAsyncVersion
return UseAsync.Yes;
Else
return UseAsync.No;
That above is the short version.
Async/Await actually solves a few problems
Unblock UI thread
M:N threading
Multithreaded scheduling and synchronization
Interupt/Event based asynchronous scheduling
Given the large number of different use cases for async/await, the "assumptions" you state only apply to certain cases.
For example, context switching, only happens with Multi-Threading. Single-Threaded Interupt based Async actually reduces context switching by reducing blocking times and keeping the OS thread well fed with work.
Finally, your question on OS threads, is fundimentally wrong.
Firstly, OS threads each require creation of a stack (4MB of continous RAM, 100 threads means 400MB of RAM before any work is even done).
Secondly, unless you have 100 physical cores on your PC, your CPUs will have to context switch between each OS thread, resulting in the CPU stalling, whilst it loads that thread. By using M:N threading, you can keep the CPU running, by reducing the number of OS threads and instead using Green Threads (Task in dotnet).
Thirdly, not all "await" results in "async" behavior. Tasks are able to synchronously return, short-circuiting all of the "bloat".
In short, without digging really deep, it is hard to find optimization opportunities by switching from async to sync methods.
Everybody knows that asynchrony gives you "better throughput", "scalability", and more efficient in terms of resources consumption. I also thought this (simplistic) way before doing an experiment below. It basically tells that if we take into account all the overhead for asynchronous code and compare it against properly configured synchronous code it yields little to no performance/throughput/resource consumption advantages.
The question: Does asynchronous code actually perform so much better comparing to the synchronous code with correctly configured thread pool? May be my performance tests are flawed in some dramatic way?
Test setup: Two ASP.NET Web API methods with JMeter trying to call them with 200 threads thread group (30 seconds rump up time).
[HttpGet]
[Route("async")]
public async Task<string> AsyncTest()
{
await Task.Delay(_delayMs);
return "ok";
}
[HttpGet]
[Route("sync")]
public string SyncTest()
{
Thread.Sleep(_delayMs);
return "ok";
}
Here is response time (log scale). Notice how synchronous code becomes faster when Thread Pool injected enough threads. If we were to set up Thread Pool beforehand (via SetMinThreads) it would outperform async right from the start.
What about resources consumption you would ask. "Thread has big cost in terms of CPU time scheduling, context switching and RAM footprint". Not so fast. Threads scheduling and context switching is efficient. As far as the stack usage goes thread does not instantly consume the RAM, but rather just reserve virtual address space and commit only a tiny fraction which is actually needed.
Let's look at what the data says. Even with bigger amount threads sync version has smaller memory footprint (working set which maps into the physical memory).
UPDATE. I want to post the results of follow-up experiment which should be more representational since avoids some biases of the first one.
First of all, the results of the first experiment are taken using IIS Express, which is basically dev time server, so I needed to move away from that. Also, considering the feedback I've isolated load generation machine from the server (two Azure VMs in the same network). I've also discovered that some IIS threading limits are from hard to impossible to violate and ended up switching to ASP.NET WebAPI self-hosting to eliminate IIS from the variables as well. Note that memory footprints/CPU times are radically different with this test, please do not compare numbers across the different test runs as setups are totally different (hosting, hardware, machines setup). Additionally, when I moved to another machines and another hosting solution the Thread Pool strategy changed (it is dynamic) and injection rate increased.
Settings: Delay 100ms, 200 JMeter "users", 30 sec ramp-up time.
I want to conclude these experiments with the following: Yes, under some specific
(more laboratory like) circumstances it's possible to get comparable results for sync vs. async, but in real world cases where workload can not be 100% predictable and workload is uneven we inevitably will hit some kind of threading limits: either server side limits, or Thread Pool grow limits (and bear in mind that thread pool management is automatic mechanism with not always easily predictable properties). Additionally, sync version does have a bigger memory footprint (both working set, and way bigger virtual memory size). As far as CPU consumption is concerned async also wins (CPU time per request metric).
On IIS with default settings the situation is even more dramatic: synchronous version is order(s) of magnitude slower (and smaller throughput) due to quite tight limit on threads count - 20 per CPU.
PS. Do use asynchronous pipelines for IO! [... sigh of relief...]
Everybody knows that asynchrony gives you "better throughput", "scalability", and more efficient in terms of resources consumption.
Scalability, yes. Throughput: it depends. Each asynchronous request is slower than the equivalent synchronous request, so you would only see a throughput benefit when scalability comes into play (i.e., there are more requests than threads available).
Does asynchronous code actually perform so much better comparing to the synchronous code with correctly configured thread pool?
Well, the catch there is "correctly configured thread pool". What you're assuming is that you can 1) predict your load, and 2) have a server big enough to handle it using one thread per request. For many (most?) real-world production scenarios, either or both of these are not true.
From my article on async ASP.NET:
Why not just increase the size of the thread pool [instead of using async]? The answer is twofold: Asynchronous code scales both further and faster than blocking thread pool threads.
First, asynchronous code scales further than synchronous code. With more realistic example code, the total scalability of ASP.NET servers (stress tested) showed a multiplicative increase. In other words, an asynchronous server could handle several times the number of continuous requests as a synchronous server (with both thread pools turned up to the maximum for that hardware). However, these experiments (not done by me) were done on a expected "realistic baseline" for average ASP.NET apps. I don't how the same results would carry over to a noop string return.
Second, asynchronous code scales faster than synchronous code. This one is pretty obvious; synchronous code scales fine up to the number of thread pool threads, but then can't scale faster than the thread injection rate. So you get that really slow response to a sudden heavy load, as shown in the beginning of your response time graph.
I think the work you've done is interesting; I am particularly surprised at the memory usage differences (or rather, lack of difference). I'd love to see you work this into a blog post. Recommendations:
Use ASP.NET Core for your tests. The old ASP.NET had only a partially-asynchronous pipeline; ASP.NET Core would be necessary for a more "pure" comparison of sync vs async.
Don't test locally; there are a lot of caveats when doing that. I'd recommend choosing a VM size (or single-instance Docker container or whatever) and testing in the cloud for repeatability.
Also try stress testing in addition to load testing. Keep increasing load until the server is totally overwhelmed, and see how both the async and sync servers respond.
As a final reminder (also from my article):
Bear in mind that asynchronous code does not replace the thread pool. This isn’t thread pool or asynchronous code; it’s thread pool and asynchronous code. Asynchronous code allows your application to make optimum use of the thread pool. It takes the existing thread pool and turns it up to 11.
Trully asynchronous code (I/O) is more scalable because it releases thread pool threads for other work instead of blocking them. So, for the same number of threads being, it can handle more requests.
But it does that at the cost of more control data structures and more work. So, (other than saving thread pool threads) it consumes more resources (memory, CPU).
It's all about availability, not performance.
Blocking threads is considered a bad practice for 2 main reasons:
Threads cost memory.
Threads cost processing time via context switches.
Here are my difficulties with those reasons:
Non-blocking, async code should also cost pretty much the same amount of memory, because the callstack should be saved somewhere right before executing he async call (the context is saved, after all). And if threads are significantly inefficient (memory-wise), why doesn't the OS/CLR offer a more light-weight version of threads (saving only the callstack's context and nothing else)? Wouldn't it be a much cleaner solution to the memory problem, instead of forcing us to re-architecture our programs in an asynchronous fashion (which is significantly more complex, harder to understand and maintain)?
When a thread gets blocked, it is put into a waiting state by the OS. The OS won't context-switch to the sleeping thread. Since way over 95% of the thread's life cycle is spent on sleeping (assuming IO-bound apps here), the performance hit should be negligible, since the processing sections of the thread would probably not be pre-empted by the OS because they should run very fast, doing very little work. So performance-wise, I can't see a whole lot of benefit to a non-blocking approach either.
What am I missing here or why are those arguments flawed?
Non-blocking, async code should also cost pretty much the same amount of memory, because the callstack should be saved somewhere right before executing he async call (the context is saved, after all).
The entire call stack is not saved when an await occurs. Why do you believe that the entire call stack needs to be saved? The call stack is the reification of continuation and the continuation of the awaited task is not the continuation of the await. The continuation of the await is on the stack.
Now, it may well be the case that when every asynchronous method in a given call stack has awaited, information equivalent to the call stack has been stored in the continuations of each task. But the memory burden of those continuations is garbage collected heap memory, not a block of a million bytes of committed stack memory. The continuation state size is order n in the size of the number of tasks; the burden of a thread is a million bytes whether you use it or not.
if threads are significantly inefficient (memory-wise), why doesn't the OS/CLR offer a more light-weight version of threads
The OS does. It offers fibers. Of course, fibers still have a stack, so that's maybe not better. You could have a thread with a small stack I suppose.
Wouldn't it be a much cleaner solution to the memory problem, instead of forcing us to re-architecture our programs in an asynchronous fashion
Suppose we made threads -- or for that matter, processes -- much cheaper. That still doesn't solve the problem of synchronizing access to shared memory.
For what it's worth, I think it would be great if processes were lighter weight. They're not.
Moreover, the question somewhat contradicts itself. You're doing work with threads, so you are already willing to take on the burden of managing asynchronous operations. A given thread must be able to tell another thread when it has produced the result that the first thread asked for. Threading already implies asynchrony, but asynchrony does not imply threading. Having an async architecture built in to the language, runtime and type system only benefits people who have the misfortune to have to write code that manages threads.
Since way over 95% of the thread's life cycle is spent on sleeping (assuming IO-bound apps here), the performance hit should be negligible, since the processing sections of the thread would probably not be pre-empted by the OS because they should run very fast, doing very little work.
Why would you hire a worker (thread) and pay their salary to sit by the mailbox (sleeping the thread) waiting for the mail to arrive (handling an IO message)? IO interrupts don't need a thread in the first place. IO interrupts exist in a world below the level of threads.
Don't hire a thread to wait on IO; let the operating system handle asynchronous IO operations. Hire threads to do insanely huge amounts of high latency CPU processing, and then assign one thread to each CPU you own.
Now we come to your question:
What are the benefits of async (non-blocking) code?
Not blocking the UI thread
Making it easier to write programs that live in a world with high latency
Making more efficient use of limited CPU resources
But let me rephrase the question using an analogy. You're running a delivery company. There are many orders coming in, many deliveries going out, and you cannot tell a customer that you will not take their delivery until every delivery before theirs is completed. Which is better:
hire fifty guys to take calls, pick up packages, schedule deliveries, and deliver packages, and then require that 46 of them be idle at all times or
hire four guys and make each of them really good at first, doing a little bit of work at a time, so that they are always responsive to customer requests, and second, really good at keeping a to-do list of jobs they need to do in the future
The latter seems like a better deal to me.
You are messing multithreading and async concepts here.
Both your "difficulties" come from the assumption that each async method gets assigned a specialized thread on which it does the work. However, the state of affairs is quite opposite: each time an async operation needs to be executed, the CLR picks an idle (thus already created) thread from the threadpool and executes that method on the selected thread.
The core concept here is that async doesn't mean always creating new threads, it means scheduling the execution on existing threads so that no thread is sitting idle.
I'm making a multi-threaded application using delegates to handle the processing of requests in a WCF service. I want the clients to be able to send the request and then disconnect and await for a callback to announce the work is done (which will most likely be searching through a database). I don't know how many requests may come in at once, it could be one every once in a while or it could spike to dozens.
As far as I know, .Net's threadpool has 25 threads available to use. What happens when I spawn 25 delegates or more? Does it throw an error, does it wait, does it pause an existing operation and start working on the new delegate, or some other behavior?
Beyond that, what happens if I want to spawn up to or more than 25 delegates while other operations (such as incoming/outgoing connections) want to start, and/or when another operation is working and I want to spawn another delegate?
I want to make sure this is scalable without being too complex.
Thanks
All operations are queued (I am assuming that you are using the threadpool directly or indirectly). It is the job of the threadpool to munch through the queue and dispatch operations onto threads. Eventually all threads may become busy, which will just mean that the queue will grow until threads are free to start processing queued work items.
You're confusing delegates with threads, and number of concurrent connections.
With WCF 2-way bindings, the connection remains open while waiting for the callback.
IIS 7 or above, on modern hardware should have no difficulty maintaining a few thousand concurrent connections if they're sitting idle.
Delegates are just method pointers - you can have as many as you wish. That doesn't mean they're being invoked concurrently.
If you are using ThreadPool.QueueUserWorkItem then it just queues the extra items until a thread is available.
ThreadPools default max amount of thread is 250 not 25! You can still set a higher limit for the ThreadPool if you need that.
If your ThreadPool runs out of threads two things may happen: All opperations are queued until the next resource is available. If there are finished threads those might still be "in use" so the GC will trigger and free up some of them, providing you with new resources.
However you can also create Threads not using the ThreadPool.
Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation).
I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them.
But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives.
If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues
The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority.
I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me.
Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?
UPDATE:
So I think that I was mis-understanding the async pattern in (2) above. I would get a callback on one worker thread when there was data available. Then I would begin another async receive and get another callback, etc. I wouldn't get N callbacks at the same time.
The question still is the same though. Is there any reason that the callbacks would be better in my specific situation where I'm the client and only connected to one server.
The slowest part of your application will be the network communication. It's highly likely that you will make almost no difference to performance for a one thread, one connection client by tweaking things like this. The network communication itself will dwarf all other contributions to processing or context switching time.
Say that I get N callbacks at almost
the same time on N different
threadpool threads notifying me that
there's new data.
Why is that going to happen? If you have one socket, you Begin an operation on it to receive data, and you get exactly one callback when it's done. You then decide whether to do another operation. It sounds like you're overcomplicating it, though maybe I'm oversimplifying it with regard to what you're trying to do.
In summary, I'd say: pick the simplest programming model that gets you what you want; considering choices available in your scenario, they would be unlikely to make any noticeable difference to performance whichever one you go with. With the blocking model, you're "wasting" a thread that could be doing some real work, but hey... maybe you don't have any real work for it to do.
The number one rule of performance is only try to improve it when you have to.
I see you mention standards but never mention problems, if you are not having any, then you don't need to worry what the standards say.
"This class was specifically designed for network server applications that require high performance."
As I understand, you are a client here, having only a single connection.
Data on this connection arrives in order, consumed by a single thread.
You will probably loose performance if you instead receive small amounts on separate threads, just so that you can assemble them later in a serialized - and thus like single-threaded - manner.
Much Ado about Nothing.
You do not really need to speed this up, you probably cannot.
What you can do, however is to dispatch work units to other threads after you receive them.
You do not need SocketAsyncEventArgs for this. This might speed things up.
As always, measure & measure.
Also, just because you can, it does not mean you should.
If the performance is enough for the foreseeable future, why complicate matters?