I'm implementing an API. The front-end will likely be REST/HTTP, back-end MSSQL, with a lightweight middle-tier in between. Probably IIS hosted.
Each incoming request will have a non-unique Id attached. Any requests that share the same Id must be processed serialised (and FIFO); whereas requests with different Id's can (and should) be processed concurrently for performance and efficiency.
Assume that when clients call my API a new thread is created to process the request (thread per call model).
Assume that every request is about the same size and same amount of computational work
Current Design
My current design and implementation is very simple and straightforward. A Monitor is created on a per-Id basis, and all requests are processed through a critical section of code that enforces serialisation. In other words, the Monitor for Id X will block all threads carrying a request with Id X until the current thread carrying Id X has completed its work.
An advantage of this current design is Simplicity (it was easy to implement).
A second advantage is that there is none of the cost that comes with switching data between threads - each thread carries a request all the way from its initial arrival at the API through to sending a response back to the client.
One possible disadvantage is that where many requests arrive sharing the same Id, there will be lots of threads blocking (and, ultimately, unblocking again)
Another possible disadvantage is that this design does not lend itself easily to Asynchronous code for potentially increasing scalability (scalability would likely have to be realised in other ways).
Alternate Design
Another design might consist of a more complex arrangement:
Create a dedicated BlockingCollection for each Id encountered Create
a single, dedicated long-running consumer thread for each
BlockingCollection
Each thread that processes a request acts as a producer by enqueing the request to the relevant BlockingCollection
The producing thread then waits (in Async style) until a response is
ready to be collected
Each consumer thread processes items from its BlockingCollection Queue in a serialised manner, and signals the thread that is awaiting a response once the response is ready
A disadvantage of this design is complexity
A second disadvantage is that there will be overhead due to switching data between threads (at least twice per request)
However, I think that on a reasonably busy system, where lots of requests are coming in, this might be better at reducing the number of blocking threads;
And possibly the number of threads required overall will be fewer.
It also lends itself better to Asynchronous code than the original design, which might make it scale better.
Questions
Given the effort and complexity of a re-implementation using the Alternate Design, is it likely to be worthwhile?
(At the moment I am leaning towards a NO and sticking with my current design: but any views or general thoughts would be much appreciated.)
If there is no straightforward answer to the above question, then what are the key considerations that I need to factor in to my decision?
Your current solution will scale terribly if the amount of requests gets too high (requests start queuing). Each request is spawning a new thread and will allocate the necessary resources as well.
Have a look at the Actor Model.
You would spawn a thread per request ID and just push the requests via "message" to the actor.
Consider to use lazy initialization for the actors, meaning you only spawn a thread if there is actually a request for the ID going on. If the message queue of an actor is empty you can terminate them and only spawn them again, if a new request with its ID is coming in.
An implementation made with Threadpool should also help with performance in the future.
Related
On reading this article it gives the impression that using async can mean that a webserver can concurrently serve more requests than it has threads.
I don't understand how this works though. I do understand (at least I think I understand) that using async one can spin off multiple IO requests without blocking inside a request and without starting a new threads. I believe there's some magic going on but deep within the covers there's probably a call to select().
However, even then, I still think you need one thread per request. The thread presumably has the current stack, and other information about where execution in that thread is up to (like an instruction pointer). I can't see how one can just trash that information whilst you're waiting for a file descriptor in a select() call, lest you forget where you're at and also risk all your active data being garbage collected. But the article first referenced suggested this somehow happens, and I quote:
Asynchronous requests allow a smaller number of threads to handle a larger number of requests.
I don't really understand the mechanics of how this happens, and to me it seems impossible. Sure, async will reduce the number of additional threads to deal with IO requests, but I still can't see how you can avoid at least one thread per request. Unless you do something weird like trashing the thread but saving it's state to the heap and then restoring it later, but I don't really see how that achieves much (you could just apply the thread pool to "running threads" and achieve basically the same thing).
Is async/await useful in a backend / webservice scenario?
Given the case there is only one thread for all requests / work. If this thread awaits a task it is not blocked but it also has no other work to do so it just idles. (It can't accept another request because the current execution is waiting for the task to resolve).
Given the case there is one thread per request / "work item". The Thread still idles because the other request is handled by another thread.
The only case I can imagine is doing two async operations at a the same time is like reading a file and sending an http request. But this sounds like a rare case. Is should read the file first and then post the content and not post something I didn't even read.
Given the case there is one thread per request / "work item". The Thread still idles because the other request is handled by another thread.
That's closer to reality but the server doesn't just keep adding threads ad infinitum - at some point it'll let requests queue if there's not a thread free to handle the request. And that's where freeing up a thread that's got no other work to usefully do at the moment starts winning.
It's hard to read your question without feeling that you misunderstand how webservers work and how async/await & threads work. To make it simple, just think of it like this: async/await is almost always good to use when you query an external resource (e.g. database, web service/API, system file, etc). If you follow this simple rule, you don't need to think too deeply about each situation.
However, when you read & learn more on these subjects and gain good experience, deep thinking becomes essential in each case because there are always exceptions to any rule, so there are scenarios where the overhead of using async/await & threads may transcends their benefits. For example, Microsoft decided not to use it for the logger in ASP.Net Core and there is even a comment about it in the source code.
In your case, the webserver uses much more threads that you seem to think and for much more reasons than you seem to think. Also when a thread is idling waiting for something, it cannot do anything else. What async/await do is that they untie the thread from the current awaited task so the thread can go back to the pool and do something else. When the awaited task is finished, a thread (can be a different thread) is pulled out of the pool to continue the job. You seem to understand this to some degree, but perhaps you just don't know what other things a thread in a webserver can do. Believe me, there is a lot to do.
Finally, remember that threads are generic workers, they can do anything. Webservers may have specialized threads for different tasks, but they fall into two or three categories. Threads can still do anything within their category. Webservers can even move threads to different categories when required. All of that is done for you so you don't need to think about it in most cases and you can just focus on freeing the threads so the webserver can do its job.
Given the case there is only one thread for all requests / work.
I challenge you to say that this is a very abstruse case. Even before multi core servers because standard, asp.net used 50+ threads per core.
If this thread awaits a task it is not blocked but it also has no other work to do so it
just idles.
No, it goes back into the pool handling other requests. MOST web services will love handling as many requests as possible with as few resources as possible. Servers only handling one client are a rare edge case. Extremely rare. Most web services will handle as many requests as the plenthora of clients throw at them.
ThreadPool utilizes recycling of threads for optimal performance over using multiple of the Thread class. However, how does this apply to processing methods with while loops inside of the ThreadPool?
As an example, if we were to apply a thread in the ThreadPool to a client that has connected to a TCP server, that client would need a while loop to keep checking for incoming data. The loop can be exited to disconnect the client, but only if the server closes or if the client demands a disconnection.
If that is the case, then how would having a ThreadPool help when masses of clients connect? Either way the same amount of memory is used if the clients stay connected. If they stay connected, then the threads cannot be recycled. If so, then ThreadPool would not help much until a client disconnects and opens up a thread to recycle.
On the other hand it was suggested to me to use the Network.BeginReceive and NetworkStream.EndReceive asynchronous methods to avoid threads all together to save RAM usage and CPU usage. Is this true or not?
Either way the same amount of memory is used if the clients stay
connected.
So far this is true. It's up to your app to decide how much state it needs to keep per client.
If they stay connected, then the threads cannot be recycled. If so,
then ThreadPool would not help much until a client disconnects and
opens up a thread to recycle.
This is untrue, because it assumes that all interesting operations performed by these threads are synchronous. This is a naive mode of operation, and in fact real world code is asynchronous: a thread makes a call to request an action and is then free to do other things. When a result is made available as a result of that action, some thread looking for other things to do will run the code that acts on the result.
On the other hand it was suggested to me to use the
Network.BeginReceive and NetworkStream.EndReceive asynchronous methods
to avoid threads all together to save RAM usage and CPU usage. Is this
true or not?
As explained above, async methods like these will allow you to service a potentially very large number of clients with only a small number of worker threads -- but by itself it will do nothing to either help or hurt the memory situation.
You are correct. Slow blocking codes can cause poor performances both on the client-side as well as server-side. You can run slow work on a separate thread and that might work well enough on the client-side but may not help on the server-side. Having blocking methods in the server can diminish the overall performance of the server because it can lead to a situation where your server has a large no of threads running and all blocked. So, even simple request might end up taking a long time. It is better to use asynchronous APIs if they are available for slow running tasks just like the situation you are in. (Note: even if the asynchronous operations are not available, you can implement one by implementing a custom awaiter class) This is better for the clients as well as servers. The main point of asynchronous code is to reduce the no of threads. Because servers can have larger no of requests in progress simultaneously because reducing no of threads to handle a particular no of clients can improve scalability.
If you dont need to have more control over the threads or the thread-pool you can go with asynchronous approach.
Also, each thread takes 1 MB space on the heap. So, asynchronous methods will definitely help reduce memory usage. However, I think the nature of the work you have described here is going to take pretty much the same amount of time in multi-threaded as well as asynchronous approach.
One of the main purposes of writing code in the asynchronous programming model (more specifically - using callbacks instead of blocking the thread) is to minimize the number of blocking threads in the system.
For running threads , this goal is obvious, because of context switches and synchronization costs.
But what about blocked threads? why is it so important to reduce their number?
For example, when waiting for a response from a web server a thread is blocked and doesn't take-up any CPU time and does not participate in any context switch.
So my question is:
other than RAM (about 1MB per thread ?) What other resources do blocked threads take-up?
And another more subjective question:
In what cases will this cost really justify the hassle of writing asynchronous code (the price could be, for example, splitting your nice coherent method to lots of beginXXX and EndXXX methods, and moving parameters and local variables to be class fields).
UPDATE - additional reasons I didn't mention or didn't give enough weight to:
More threads means more locking on communal resources
More threads means more creation and disposing of threads which is expensive
The system can definitely run-out of threads/RAM and then stop servicing clients (in a web server scenario this can actually bring down the service)
So my question is: other than RAM (about 1MB per thread ?) What other resources do blocked threads take-up?
This is one of the largest ones. That being said, there's a reason that the ThreadPool in .NET allows so many threads per core - in 3.5 the default was 250 worker threads per core in the system. (In .NET 4, it depends on system information such as virtual address size, platform, etc. - there isn't a fixed default now.) Threads, especially blocked threads, really aren't that expensive...
However, I would say, from a code management standpoint, it's worth reducing the number of blocked threads. Every blocked thread is an operation that should, at some point, return and become unblocked. Having many of these means you have quite a complicated set of code to manage. Keeping this number reduced will help keep the code base simpler - and more maintainable.
And another more subjective question: In what cases will this cost really justify the hassle of writing asynchronous code (the price could be, for example, splitting your nice coherent method to lots of beginXXX and EndXXX methods, and moving parameters and local variables to be class fields).
Right now, it's often a pain. It depends a lot on the scenario. The Task<T> class in .NET 4 dratically improves this for many scenarios, however. Using the TPL, it's much less painful than it was previously using the APM (BeginXXX/EndXXX) or even the EAP.
This is why the language designers are putting so much effort into improving this situation in the future. Their goals are to make async code much simpler to write, in order to allow it to be used more frequently.
Besides from any resources the blocked thread might hold a lock on, thread pool size is also of consideration. If you have reached the maximum thread pool size (if I recall correctly for .NET 4 is max thread count is 100 per CPU) you simply won't be able to get anything else to run on the thread pool until at least one thread gets freed up.
I would like to point out that the 1MB figure for stack memory (or 256KB, or whatever it's set to) is a reserve; while it does take away from available address space, the actual memory is only committed as it's needed.
On the other hand, having a very large number of threads is bound to bog down the task scheduler somewhat as it has to keep track of them (which have become runnable since the last tick, and so on).
Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation).
I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them.
But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives.
If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues
The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority.
I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me.
Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?
UPDATE:
So I think that I was mis-understanding the async pattern in (2) above. I would get a callback on one worker thread when there was data available. Then I would begin another async receive and get another callback, etc. I wouldn't get N callbacks at the same time.
The question still is the same though. Is there any reason that the callbacks would be better in my specific situation where I'm the client and only connected to one server.
The slowest part of your application will be the network communication. It's highly likely that you will make almost no difference to performance for a one thread, one connection client by tweaking things like this. The network communication itself will dwarf all other contributions to processing or context switching time.
Say that I get N callbacks at almost
the same time on N different
threadpool threads notifying me that
there's new data.
Why is that going to happen? If you have one socket, you Begin an operation on it to receive data, and you get exactly one callback when it's done. You then decide whether to do another operation. It sounds like you're overcomplicating it, though maybe I'm oversimplifying it with regard to what you're trying to do.
In summary, I'd say: pick the simplest programming model that gets you what you want; considering choices available in your scenario, they would be unlikely to make any noticeable difference to performance whichever one you go with. With the blocking model, you're "wasting" a thread that could be doing some real work, but hey... maybe you don't have any real work for it to do.
The number one rule of performance is only try to improve it when you have to.
I see you mention standards but never mention problems, if you are not having any, then you don't need to worry what the standards say.
"This class was specifically designed for network server applications that require high performance."
As I understand, you are a client here, having only a single connection.
Data on this connection arrives in order, consumed by a single thread.
You will probably loose performance if you instead receive small amounts on separate threads, just so that you can assemble them later in a serialized - and thus like single-threaded - manner.
Much Ado about Nothing.
You do not really need to speed this up, you probably cannot.
What you can do, however is to dispatch work units to other threads after you receive them.
You do not need SocketAsyncEventArgs for this. This might speed things up.
As always, measure & measure.
Also, just because you can, it does not mean you should.
If the performance is enough for the foreseeable future, why complicate matters?