I want to create a high performance server in C# which could take about ~10k clients. Now i started writing a TcpServer with C# and for each client-connection i open a new thread. I also use one thread to accept the connections. So far so good, works fine.
The server has to deserialize AMF incoming objects do some logic ( like saving the position of a player ) and send some object back ( serializing objects ). I am not worried about the serializing/deserializing part atm.
My main concern is that I will have a lot of threads with 10k clients and i've read somewhere that an OS can only hold like a few hunderd threads.
Are there any sources/articles available on writing a decent async threaded server ? Are there other possibilties or will 10k threads work fine ? I've looked on google, but i couldn't find much info about design patterns or ways which explain it clearly
You're going to run into a number of problems.
You can't spin up 10,000 threads for a couple of reasons. It'll trash the kernel scheduler. If you're running a 32-bit, then the default stack address space of 1MB means that 10k threads will reserve about 10GB of address space. That'll fail.
You can't use a simple select system either. At it's heart, select is O(N) for the number of sockets. With 10k sockets, that's bad.
You can use IO Completion Ports. This is the scenario they're designed for. To my knowledge there is no stable, managed IO Completion port library. You'll have to write your own using P/Invoke or Managed C++. Have fun.
The way to write an efficient multithreaded server is to use I/O completion ports (using a thread per request is quite inefficient, as #Marcelo mentions).
If you use the asynchronous version of the .NET socket class, you get this for free. See this question which has pointers to documentation.
You want to look into using IO completion ports. You basically have a threadpool and a queue of IO operations.
I/O completion ports provide an
efficient threading model for
processing multiple asynchronous I/O
requests on a multiprocessor system.
When a process creates an I/O
completion port, the system creates an
associated queue object for requests
whose sole purpose is to service these
requests. Processes that handle many
concurrent asynchronous I/O requests
can do so more quickly and efficiently
by using I/O completion ports in
conjunction with a pre-allocated
thread pool than by creating threads
at the time they receive an I/O
request.
You definitely don't want a thread per request. Even if you have fewer clients, the overhead of creating and destroying threads will cripple the server, and there's no way you'll get to 10,000 threads; the OS scheduler will die a horrible death long before then.
There are numerous articles online about asynchronous server programming in C# (e.g., here). Just google around a bit.
Related
I'm trying to create a socket server which can handle relatively large amount of clients. There are different approaches used for such a task, first is to use a separate thread for each incoming connection, and second is to use async/await pattern.
First approach is bad as there will be relatively small number of threads such as all system's resources will be lost on context switching.
Second approach from the first sight is good as we can have our own threadpool with limited number of worker threads, so dispatcher will receive incoming connections, add them to some queue and call async socket read methods which in turn will receive data from socket and add this data/errors to queue for further processing(error handling client responses, DB-related work).
There is not so much info on internal implementation of async/await I could found, but as I understood while using non-UI application all continuation is done through TaskScheduler.Current which is using runtime's threadpool and so it's resources are limited. Greater amount of incoming connections will result in no free threads in runtime's threadpool or amount will be so large that system will stop responding.
In this matter async/await will result in same problem as with 1-client/1-thread concern, however with little advantage as runtime threadpool's threads may not occupy so much address space as default System.Threading.Thread (I believe 1MB stack size + ~1/2MB of control data).
Is there any way I can made one thread to wait for some kernel interrupt on say 10 sockets so application will only use my explicitly sized thread pool? (I mean that in case there is any further data on one from 10 sockets, one thread will wake up and handle it.)
In this matter async/await will result in same problem as with 1-client/1-thread concern
When thread reach code that is running asynchronously then control is returned to caller so that means thread is returned to thread pool and can handle another request so it is any superior to 1-client/1-thread because thread isn't blocked.
There is some any intersting blog about asnyc/await:
1
ThreadPool utilizes recycling of threads for optimal performance over using multiple of the Thread class. However, how does this apply to processing methods with while loops inside of the ThreadPool?
As an example, if we were to apply a thread in the ThreadPool to a client that has connected to a TCP server, that client would need a while loop to keep checking for incoming data. The loop can be exited to disconnect the client, but only if the server closes or if the client demands a disconnection.
If that is the case, then how would having a ThreadPool help when masses of clients connect? Either way the same amount of memory is used if the clients stay connected. If they stay connected, then the threads cannot be recycled. If so, then ThreadPool would not help much until a client disconnects and opens up a thread to recycle.
On the other hand it was suggested to me to use the Network.BeginReceive and NetworkStream.EndReceive asynchronous methods to avoid threads all together to save RAM usage and CPU usage. Is this true or not?
Either way the same amount of memory is used if the clients stay
connected.
So far this is true. It's up to your app to decide how much state it needs to keep per client.
If they stay connected, then the threads cannot be recycled. If so,
then ThreadPool would not help much until a client disconnects and
opens up a thread to recycle.
This is untrue, because it assumes that all interesting operations performed by these threads are synchronous. This is a naive mode of operation, and in fact real world code is asynchronous: a thread makes a call to request an action and is then free to do other things. When a result is made available as a result of that action, some thread looking for other things to do will run the code that acts on the result.
On the other hand it was suggested to me to use the
Network.BeginReceive and NetworkStream.EndReceive asynchronous methods
to avoid threads all together to save RAM usage and CPU usage. Is this
true or not?
As explained above, async methods like these will allow you to service a potentially very large number of clients with only a small number of worker threads -- but by itself it will do nothing to either help or hurt the memory situation.
You are correct. Slow blocking codes can cause poor performances both on the client-side as well as server-side. You can run slow work on a separate thread and that might work well enough on the client-side but may not help on the server-side. Having blocking methods in the server can diminish the overall performance of the server because it can lead to a situation where your server has a large no of threads running and all blocked. So, even simple request might end up taking a long time. It is better to use asynchronous APIs if they are available for slow running tasks just like the situation you are in. (Note: even if the asynchronous operations are not available, you can implement one by implementing a custom awaiter class) This is better for the clients as well as servers. The main point of asynchronous code is to reduce the no of threads. Because servers can have larger no of requests in progress simultaneously because reducing no of threads to handle a particular no of clients can improve scalability.
If you dont need to have more control over the threads or the thread-pool you can go with asynchronous approach.
Also, each thread takes 1 MB space on the heap. So, asynchronous methods will definitely help reduce memory usage. However, I think the nature of the work you have described here is going to take pretty much the same amount of time in multi-threaded as well as asynchronous approach.
I'm running an application that has 5000 instances of the UdpClient class since I need to transmit packets concurrently on different ports.
I'm currently using a System.Timers.Timer, which runs on the ThreadPool. It has a very short Interval with many Elapsed events firing.
Would it be better to modify the application to work using BeginRead/EndRead functions of the UdpClient?
When you work with IO operations (i.e, network, file system, etc.), you should always use async operations, i.e BeginXXX/EndXXX, XXXAsync/XXXCompleted, XXXAsync in context of async/await. These operations use so known IO complition ports. In two words, it doesn't consume any CPU resources while the data is transmitting. As soon as the requested data is loaded, it takes a thread from the ThreadPool and queues the handler to this thread. In your case, you are wasting CPU resources. Instead of doing some useful work, the threads just wait until the data is transmitted. Also, the ThreadPool has limited number of threads (usually equal to number of CPU cores). So, in you case it sends/receives data only to/from 2 clients in a time.
The asynchronous methods might seem very complicated (especially BeginXXX/EndXXX), but there are a lot of wrappers which significantly simplifies thier useage. For example, you can use Rx's FromAsyncPattern extension method, or you can use new async/await asyncronous model.
Hopefully two simple questions relating to creating a server application:
Is there a theoretical/practical limit on the number of simultaneous sockets that can be open? Ignoring the resources required to process the data once it has arrived! If its of relevance I am targeting the .net framework
Should each connection be run in a separate thread that's permanently assigned to it, or should use of a Thread Pool be made? The dedicated thread approach seems simpler, but it seems odd to have 100+ threads running it once. Is this acceptable practice?
Any advice is greatly appreciated
Venatu
You may find the following answer useful. It illustrates how to write a scalable TCP server using the .NET thread pool and asynchronous sockets methods (BeginAccept/EndAccept and BeginReceive/EndReceive).
This being said it is rarely a good idea to write its own server when you could use one of the numerous WCF bindings (or even write custom ones) and benefit from the full power of the WCF infrastructure. It will probably scale better than every custom written server.
There are practical limits, yes. However, you will most likely run out of resources to handle the load long before you reach them. CPU, or memory are more likely to be exhausted before number of connections.
For maximum scalability, you don't want a seperate thread per connection, but rather you would use an Asynchronous model that only uses threads when servicing active (as in receiving or sending data) connections.
As I remember correctly (did sockets long time ago) the very best way of implementing them is with ReceiveAsync (.NET 3.5) / BeginReceive methods using asynchronous callbacks which will utilize thread pool. Don't open a thread for every connection, it is a waste of resources.
I'm working on an application that requires for one type of message to go hit a database, and the other type of message to go and hit some external xml api.
I have to process A LOT... one of the big challenges is to get HttpWebRequest class performing well. I initially started with just using the standard synchronous methods and threadpooling the whole thing. This was not good.
So after a bit of reading I saw that the recommended way to do this was to use the Begin/End methods to delegate the work to IO completion ports, thus freeing up the threadpool and yielding better performance. This doesn't seem to be the case... the performance is marginally better but I certainly can't see the IO completion ports being used that much compared to threadpool.
I have a thread that spins round and sends me the available worker threads + completion ports in the threadpool. Completion ports is always very low (max I've seen is 9 used) and I'm always using about 120 worker threads (sometimes more). I use the begin / end pattern for all methods in httpwebrequest:
Begin/EndGetRequestStream
Begin/EndWrite (Stream)
Begin/EndGetResponse
Begin/EndRead (Stream)
Am I doing it right? Am I missing something? I can use (sometimes) up to 2048 http connections simultaneously (from netstat output) - why would the completion port numbers be so low?
If anyone could give some serious advice about how to do well with this managing worker threads, completion ports and httpwebrequest it would be hugely appreciated!
EDIT: is .NET a reasonable tool for this? Can I get a high volume of httpconnections working with .NET and the System.Net stack? It's been suggested to use something like WinHttp (or some other C++ library), and pInvoke it from .NET, but this isn't something I especially want to do!
The way I understand it, you don't tie up an I/O completion port all the time that an asynchronous request is outstanding - it's only "busy" when data has been returned and is being processed on the corresponding thread. Hopefully you don't have very much work to do in the callback, which is why you don't have many in-use ports at any one time.
Are you actually getting poor performance though? Is your cause for concern merely the low numbers? Are you getting the throughput you'd expect?
One problem you may have is that the HTTP connection pool for any one host is relatively small. If you have hundreds of requests to the same machine, then by default only 2 requests will actually be made at a time, to avoid DoS-attacking the host in question (and to get the benefits of keep-alive). You can increase this programmatically or using app.config. Of course, this may not be an issue in your case, either because you've already fixed the problem or because all your requests are to different hosts. (If netstat is showing 2048 connections then that doesn't sound bad.)
Maybe your EndRead methods should only write the result to a thread safe queue that you then read from a small number of worker threads that are under your control. And/Or use the fact that HttpWebRequest will signal a waitable object when it is done and write your own logic to wait on all the outstanding requests from a single (or small number of) threads.
Having only 9 completion port threads actually means you're probably using them correctly and efficiently. I'm going to assume that the machine you're running on has either 8 cores or 4 hyperthreaded cores which means that the OS will try to keep up to 8 active (not sleeping/blocking/waiting) completion port threads at any time.
If one of the running threads becomes inactive (sleep/block/wait) and there are additional work items to process, then an additional thread will be created to keep the active count at 8. If you see 9 threads, that means that you are introducing virtually no blocking in the methods on your completion port threads and actually doing CPU work with them.
If you have 8 threads actively doing CPU bound work on 8 cores, then adding more threads will only slow things down (context switching between threads will be the wasted time).
What you should be looking in to is why you have 120 other threads and what those are doing.