Socket.BeginReceive Performance on Mono

Socket.BeginReceive Performance on Mono - c#

I'm developing a server in C#. This server will act as a data server for a backup service: a client will send data, a lot of data, continuously, specifically will send data chunk of files, up to five, in the same tcp channel. I'll send data to the server slowly, i don't want to kill customer bandwidth, so i didn't need to speed up at max data send and, for this reason, i can use a single tcp channel for everything.
Said this, actually the server uses BeginReceive method to acquire data from client and, on windows, this means IOCP. My questions is: how BeginReceive will perform on linux/freebsd trough mono? On windows, i've read a lot of stuff, will perform very well but this software, the server part, will run on linux or freebsd trough mono and i don't know how these methods are implemented on it!
More, to try to reduce continue allocations of an Async State object for the (Begin|End)Receive method i mantain one for the tcp connection and in the BeginReceive callback i copy out data before reuse it (naturally i don't clear data in because i know how much read trough EndReceive return value). Buffer is set on 8kb so i'll at max copy out 8kb of data, it shouldn't kill resoruces.
My target is to get up to 400/500 connections at max. It isn't so much, but the server (machine), in the meantime, will handle files trough an own filesystem (developed using fuse first in C# and later in C) on LVM+Linux Software Raid Mirror and antivirus check using clamav so the software must be light as can!
EDIT: I forgot to say that the machine will be (probably) a Intel Core 2 Duo 2.66+ GHz (3 MB L2 - FSB 1066 MHz) with 2 GB of ram and the SO using 64 bits.
Is mono using epoll (libevent) or kqueue (on freebsd)? And I should do something specific to try to maximize performances? Can I do something more to don't kill resources receiving data packets?

I know it's a little late, but I just found this question...
Mono is able to handle the number of connections that you need and much more. I regularly test xsp2 (the Mono ASP.NET standalone server) with over 1k simultaneous connections.. If this is going to be a high load situation, you should play a bit with setting MONO_THREADS_PER_CPU until you find the right number of threads for the ThreadPool.
On linux, Mono uses epoll when available (which is always these days).

I can't speak specifically about the performance of that one function on mono, but in general mono performs very well these days. 4-500 connections is as you say, not very many, so I doubt you'd have any issues.
In saying that, it shouldn't be very hard to set a test for this kind of thing up. I think that's probably the only way you'll get a conclusive answer for your situation.

Related

In C#, when testing several UDP sockets running on Timers, what resource limitations will be reached first?

Ok this might be a vague question, but I'll try to keep it specific!
In C#, I'm running 1 server that listens for UDP sockets. The server runs a Timer object to monitor the client's last packet sent "time" to decide whether to forget it after 15 seconds.
For each of the UDP clients launched (in their own process / terminal), they ALSO run their own Timer object to send a packet every now and then (more or less to keep it alive).
Now, I ran a stress test and could "only" reach about 104 simultaneous UDP clients connections in their own terminals / command-line windows. After that, it just gives up and shows me this:
So, being new to network programming and all, I don't have a sweet clue which specific resource it ran-out off or what limit it reached to cause it to hang at that specific nth process. Makes it difficult to understand how some frameworks claim to reach >100,000 concurrent connections!
Now, if I was to take a few guesses, I'm thinking I've ran into a limit of:
UDP sockets?
Used up as much concurrent ports the OS allows?
Timers the OS can process? (I sure hope not!)
Too many terminal windows open, especially from the same process *.exe?
Thread-Locking the wrong parts / at the wrong time?
Maybe my tests are just wrong / poorly conducted? (Maybe instead of testing individual UDP clients in their own process, I should batch a few in the same terminal?)
For the more visual, here's basically what's happening:

Performance send / receive data via TCP socket Client-Server application

I'm developing a system for exchanging data between client and server application using sockets via TCP.
The server was written long ago in C++ (by a third part), the client instead is the application that I'm still developing in C#. The bases have already been completed. Both applications are communicating properly. The problem is that data is exchanged (in my opinion) too slowly.
I emphasize in my opinion because I have no idea what you can actually expect using sockets via TCP. Currently, and by eye, after several tests, the communication speed is an average of 200/250 KB per second, both in reception and in sending.
There are applications that allow you to evaluate the speed of data exchange between two endpoints on a specific port? What speed should I expect?
More specifically, the client and the server are developed for the file sharing. Currently, the client can receive and send one file at a time. The protocol would do otherwise (receive/send multiple files at once), but for security reasons I preferred otherwise (if you're receiving more than one file and you lose the connection you'll lose all files you're receiving).
Change this feature could significantly affect the speed of data exchange? In what way?

First, you have to know the bandwith of the link your host provides you to know what to expect. If you are hosting it yourself, your upload speed may be way slower than the download speed link. You can log into your server and run a speed test.
I had a situation similar to yours (developed a server and a client to send binary data to it) and by that time (2-3 years ago) I used a version of this tool to measure the speed of the data exchange. You would install it on your server, set up the monitor (ports you want to watch, etc), configure the charts and let it run for some time. It worked really well. You can look for other similar tools searching for "Bandwidth Traffic Monitor".
If you allow your application to exchange multiple files at the same time, have in mind that your link speed will be divided among the connections and, depending on how many clients are using your application, the multiple connections and file writes your server will be doing may be limited by its processing capacity, in addition to the bandwith.

What speed should I expect?
The full speed the slowest part of the network between the two nodes can transfer, minus the bandwidth of existing traffic over that slowest part. On a 100 Mbit LAN, I would expect a one-way transfer speed of 10 MB/s.
Change this feature could significantly affect the speed of data exchange? In what way?
That totally depends on the protocol being used and the client and server implementation. You'll have to benchmark where the hard work is being done, where your bottleneck is.

How to maximize http.sys file upload performance

I'm building a tool that transfers very large streaming data sets (possibly on the order of terabytes in a single stream; routinely in the tens of gigabytes) from one server to another. The client portion of the tool will read blocks from the source disk, and send them over the network. The server side will read these blocks off the network and write them to a file on the server disk.
Right now I'm trying to decide which transport to use. Options are raw TCP, and HTTP.
I really, REALLY want to be able to use HTTP. The HttpListener (or WCF if I want to go that route) make it easy to plug in to the HTTP Server API (http.sys), and I can get things like authentication and SSL for free. The problem right now is performance.
I wrote a simple test harness that sends 128K blocks of NULL bytes using the BeginWrite/EndWrite async I/O idiom, with async BeginRead/EndRead on the server side. I've modified this test harness so I can do this with either HTTP PUT operations via HttpWebRequest/HttpListener, or plain old socket writes using TcpClient/TcpListener. To rule out issues with network cards or network pathways, both the client and server are on one machine and communicate over localhost.
On my 12-core Windows 2008 R2 test server, the TCP version of this test harness can push bytes at 450MB/s, with minimal CPU usage. On the same box, the HTTP version of the test harness runs between 130MB/s and 200MB/s depending upon how I tweak it.
In both cases CPU usage is low, and the vast majority of what CPU usage there is is kernel time, so I'm pretty sure my usage of C# and the .NET runtime is not the bottleneck. The box has two 6-core Xeon X5650 processors, 24GB of single-ranked DDR3 RAM, and is used exclusively by me for my own performance testing.
I already know about HTTP client tweaks like ServicePointManager.MaxServicePointIdleTime, ServicePointManager.DefaultConnectionLimit, ServicePointManager.Expect100Continue, and HttpWebRequest.AllowWriteStreamBuffering.
Does anyone have any ideas for how I can get HTTP.sys performance beyond 200MB/s? Has anyone seen it perform this well on any environment?
UPDATE:
Here's a bit more detail on the performance I'm seeing with TcpListener vs HttpListener:
First, I wrote a TcpClient/TcpListener test. On my test box that was able to push 450MB/s.
Then using reflector I figured out how to get the raw Socket object underlying HttpWebRequest, and modified my HTTP client test to use that. Still no joy; barely 200MB/s.
My current theory is that http.sys is optimized for the typical IIS use case, which is lots of concurrent small requests, and lots of concurrent and possibly large responses. I hypothesize that in order to achieve this optimization, MSFT had to do so at the expense of what I'm trying to accomplish, which is very high throughput on a single very large request, with a very small response.
For what it's worth, I also tried up to 32 concurrent HTTP PUT operations to see if it could scale out, but there was still no joy; about 200MB/s.
Interestingly, on my development workstation, which is a quad-core Xeon Precision T7400 running 64-bit Windows 7, my TcpClient implementation is about 200MB/s, and the HTTP version is also about 200MB/s. Once I take it to a higher-end server-class machine running Server 2008 R2, the TcpClient code gets up to 450MB/s, while the HTTP.sys code stays around 200.
At this point I've sadly concluded that HTTP.sys is not the right tool for the job I need done, and will have to continue to use the hand-rolled socket protocol we've been using all along.

I can't see too much of interest except for this Tech Note. It might be worth having a fiddle with MaxBytesPerSend

If you're going to send files over the LAN then UDP is the way to go, because TCP's overhead is a waste in that case. TCP provides rate limiting to avoid too many lost packets, whereas with UDP the application has to sort that out by itself. NFS would do the job, were it not that you're stuck with windows; but I'm sure there must be ready made UDP stuff. Also use the tool "iperf" (available on linux, probably also windows) to benchmark the network link irrespective of the protocol. Some network cards are plain crap and rely on the CPU too much, which will limit your speed to 200mbit. You want a proper network card with its own processors (don't know the exact terms to put this).

How to send information fast like many games do?

I'm thinking like the methods games like Counter Sstrike, WoW etc uses. In CS you often have just like 50 ping, is there any way to send information to an online MySQL database at that speed?
Currently I'm using an online PHP script which my program requests, but this is really slow, because the program first has to send headers and post-information to it, and then retrieve the result as an ordinary webpage.
There really have to be any easier, faster way of doing this? I've heard about TCP/IP, is this what I should use here? Is it possible for it to connect to the database in a faster way than indirectly via the PHP script?

TCP/IP is made up of three protocols:
TCP
UDP
ICMP
ICMP is what you are using when you ping another computer on a network.
Games, like CounterStrike, don't care about what you previously did. So there's no requirement for completeness, to be able to reconstruct what you did (which is why competitors have to tape what they are doing). This is what UDP is used for - there's no guarantee that data is delivered or received. Which is why lag can be such a problem - you're already dead, you just didn't know it.
TCP guarantees that data is sent and received. Slower than UDP.
There are numerous things to be aware of to have a fast connection - less hops, etc.

Client-to-server for latency-critical stuff? Use non-blocking UDP.
For reliable stuff that can be a little slower, if you use TCP make sure you do so in a non-blocking fashion (select(), non-blocking send, etc.).
The big reason to use UDP is if you have time-sensitive data - if the position of a critter gets dropped, you're better off ignoring it and sending the next position packet rather than re-sending the last one.
And I don't think any high-performance game has each and every call resolve to a call to the database. It's more common to (if a database is even used) persist data occasionally, or at important events.
You're not going to implement Counterstrike or anything similar on top of http.

Most games like the ones you cite use UDP for this (one of the TCP/IP suite of protocols.) UDP is chosen over TCP for this application since it's lighter weight allowing for better performance and TCP's reliability features aren't necessary.
Keep in mind though, those games have standalone clients and servers usually written in C or C++. If your application is browser-based and you're trying to do this over HTTP then use a long-lived connection and strip back the headers as much as possible, including cookies. The Tornado framework may be of interest to you there. You may also want to look into HTML5 WebSockets however widespread support is still a fair way off.
If you are targeting a browser-based plugin like Flash, Java, SilverLight then you may be able to use UDP but I don't know enough about those platforms to confirm.
Edit:
Also worth mentioning: once your networking code and protocol is sufficiently optimized there are still things you can do to improve the experience for players with high pings.

Tips / techniques for high-performance C# server sockets

I have a .NET 2.0 server that seems to be running into scaling problems, probably due to poor design of the socket-handling code, and I am looking for guidance on how I might redesign it to improve performance.
Usage scenario: 50 - 150 clients, high rate (up to 100s / second) of small messages (10s of bytes each) to / from each client. Client connections are long-lived - typically hours. (The server is part of a trading system. The client messages are aggregated into groups to send to an exchange over a smaller number of 'outbound' socket connections, and acknowledgment messages are sent back to the clients as each group is processed by the exchange.) OS is Windows Server 2003, hardware is 2 x 4-core X5355.
Current client socket design: A TcpListener spawns a thread to read each client socket as clients connect. The threads block on Socket.Receive, parsing incoming messages and inserting them into a set of queues for processing by the core server logic. Acknowledgment messages are sent back out over the client sockets using async Socket.BeginSend calls from the threads that talk to the exchange side.
Observed problems: As the client count has grown (now 60-70), we have started to see intermittent delays of up to 100s of milliseconds while sending and receiving data to/from the clients. (We log timestamps for each acknowledgment message, and we can see occasional long gaps in the timestamp sequence for bunches of acks from the same group that normally go out in a few ms total.)
Overall system CPU usage is low (< 10%), there is plenty of free RAM, and the core logic and the outbound (exchange-facing) side are performing fine, so the problem seems to be isolated to the client-facing socket code. There is ample network bandwidth between the server and clients (gigabit LAN), and we have ruled out network or hardware-layer problems.
Any suggestions or pointers to useful resources would be greatly appreciated. If anyone has any diagnostic or debugging tips for figuring out exactly what is going wrong, those would be great as well.
Note: I have the MSDN Magazine article Winsock: Get Closer to the Wire with High-Performance Sockets in .NET, and I have glanced at the Kodart "XF.Server" component - it looks sketchy at best.

Socket I/O performance has improved in .NET 3.5 environment. You can use ReceiveAsync/SendAsync instead of BeginReceive/BeginSend for better performance. Chech this out:
http://msdn.microsoft.com/en-us/library/bb968780.aspx

A lot of this has to do with many threads running on your system and the kernel giving each of them a time slice. The design is simple, but does not scale well.
You probably should look at using Socket.BeginReceive which will execute on the .net thread pools (you can specify somehow the number of threads it uses), and then pushing onto a queue from the asynchronous callback ( which can be running in any of the .NET threads ). This should give you much higher performance.

A thread per client seems massively overkill, especially given the low overall CPU usage here. Normally you would want a small pool of threads to service all clients, using BeginReceive to wait for work async - then simply despatch the processing to one of the workers (perhaps simply by adding the work to a synchronized queue upon which all the workers are waiting).

I am not a C# guy by any stretch, but for high-performance socket servers the most scalable solution is to use I/O Completion Ports with a number of active threads appropriate for the CPU(s) the process s running on, rather than using the one-thread-per-connection model.
In your case, with an 8-core machine you would want 16 total threads with 8 running concurrently. (The other 8 are basically held in reserve.)

The Socket.BeginConnect and Socket.BeginAccept are definitely useful. I believe they use the ConnectEx and AcceptEx calls in their implementation. These calls wrap the initial connection negotiation and data transfer into one user/kernel transition. Since the initial send/recieve buffer is already ready the kernel can just send it off - either to the remote host or to userspace.
They also have a queue of listeners/connectors ready which probably gives a bit of boost by avoiding the latency involved with userspace accepting/receiving a connection and handing it off (and all the user/kernel switching).
To use BeginConnect with a buffer it appears that you have to write the initial data to the socket before connecting.

As others have suggested, the best way to implement this would be to make the client facing code all asynchronous. Use BeginAccept() on the TcpServer() so that you dont have to manually spawn a thread. Then use BeginRead()/BeginWrite() on the underlying network stream that you get from the accepted TcpClient.
However, there is one thing I dont understand here. You said that these are long lived connections, and a large number of clients. Assuming that the system has reached steady state, where you have your max clients (say 70) connected. You have 70 threads listening for the client packets. Then, the system should still be responsive. Unless your application has memory/handle leaks and you are running out of resources so that your server is paging. I would put a timer around the call to Accept() where you kick off a client thread and see how much time that takes. Also, I would start taskmanager and PerfMon, and monitor "Non Paged Pool", "Virtual Memory", "Handle Count" for the app and see whether the app is in a resource crunch.
While it is true that going Async is the right way to go, I am not convinced if it will really solve the underlying problem. I would monitor the app as I suggested and make sure there are no intrinsic problems of leaking memory and handles. In this regard, "BigBlackMan" above was right - you need more instrumentation to proceed. Dont know why he was downvoted.

Random intermittent ~250msec delays might be due to the Nagle algorithm used by TCP. Try disabling that and see what happens.

One thing I would want to eliminate is that it isn't something as simple as the garbage collector running. If all your messages are on the heap, you are generating 10000 objects a second.
Take a read of Garbage Collection every 100 seconds
The only solution is to keep your messages off the heap.

I had the same issue 7 or 8 years ago and 100ms to 1 sec pauses , the problem was Garbage Collection .. Had about 400 Meg in use from 4 gig BUT there were a lot of objects.
I ended up storing messages in C++ but you could use ASP.NET cache ( which used to use COM and moved them out of the heap )

I don't have an answer but to get more information I'd suggest sprinkling your code with timers and logging avg and max time taken for suspect operations like adding to the queue or opening a socket.
At least that way you will have an idea of what to look at and where to begin.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.