C# Socket: is multiple sending less efficient than a single send? - c#

I am writing a high-throughput server serving thousands of connections. Suppose I have 400 bytes to send via a socket. Suppose I do it in two ways:
Call the Socket.Send() 40 times, each time sending 10 bytes.
Call the Socket.Send() once, sending 400 bytes.
Do these two ways make much difference in terms of speed, CPU load, etc?

If Socket.NoDelay is left at false, then it will very rarely make any difference - most of the time, you're just going to buffering locally - albeit with a bit more P/Invoke overhead than is absolutely necessary (due to lots of calls through the socket layer). Note that Socket.NoDelay should usually be set to true in anything where you care.
If Socket.NoDelay is true, then if everything is working maximally, then you might introduce additional packet fragmentation by using 40 sends of 10 bytes, which would be avoided when using one send of 400 bytes. However, in many cases, the various abstractions and layers in the OS/hardware stacks means that a lot of the 10 byte chunks will probably end up sharing packets. That's still a lot more packets than 1, in the optimal case, though.
Note also that this is always a trade-off: packet fragmentation will decrease overall throughput, but sending the first bytes sooner could reduce the perceived latency, if the other 390 bytes are going to take a measurable (but presumably small) amount of time to construct.
In most cases: this is unlikely to be a bottleneck. If you can avoid packet fragmentation without causing latency, that may be desirable. If it was me, I'd probably be more concerned with efficient buffer management to maximise scalability while avoiding pauses due to GC; tools like the new "pipelines" IO API can really help with that, and Kestrel can be used to host a TCP server based on "pipelines" in a lot less code than you would be using if you wrote your own socket listener - and it then deals with all the buffer management for you.

Related

TcpClient performance - sending 4 scalar values much slower than sending 1 byte array containing all values

I'm writing an application where two applications (say server and client) are communicating via a TCP-based connection on localhost.
The code is fairly performance critical, so I'm trying to optimize as best as possible.
The code below is from the server application. To send messages, my naive approach was to create a BinaryWriter from the TcpClient's stream, and write each value of the message via the BinaryWriter.
So let's say the message consists of 4 values; a long, followed by a bolean value, and then 2 more longs; the naive approach was:
TcpClient client = ...;
var writer = new BinaryWriter(client.GetStream());
// The following takes ca. 0.55ms:
writer.Write((long)123);
writer.Write(true);
writer.Write((long)456);
writer.Write((long)2);
With 0.55ms execution time, this strikes me as fairly slow.
Then, I've tried the following instead:
TcpClient client = ...;
// The following takes ca. 0.15ms:
var b1 = BitConverter.GetBytes((long)123);
var b2 = BitConverter.GetBytes(true);
var b3 = BitConverter.GetBytes((long)456);
var b4 = BitConverter.GetBytes((long)2);
var result = new byte[b1.Length + b2.Length + b3.Length + b4.Length];
Array.Copy(b1, 0, result, 0, b1.Length);
Array.Copy(b2, 0, result, b1.Length, b2.Length);
Array.Copy(b3, 0, result, b1.Length + b2.Length, b3.Length);
Array.Copy(b4, 0, result, b1.Length + b2.Length + b3.Length, b4.Length);
client.GetStream().Write(result, 0, result.Length);
The latter runs in ca 0.15ms, while the first approach took roughly 0.55ms, so 3-4 times slower.
I'm wondering ... why?
And more importantly, what would be the best way to write messages as fast as possible (while maintaining at least a minimum of code readability)?
The only way I could think of right now is to create a custom class similar to BinaryWriter;
but instead of writing each value directly to the stream, it would buffer a certain amount of data (say 10,000 bytes or such) and only send it to the stream when its internal buffer is full, or when some .Flush() method is explicitly called (e.g. when message is done being written).
This should work, but I wonder if I'm overcomplicating things and there's an even simpler way to achieve good performance?
And if this was indeed the best way - any suggestions how big the internal buffer should ideally be? Does it make sense to align this with Winsock's send and receive buffers, or best to make it as big as possible (or rather as big as sensible given memory constraints)?
Thanks!
The first code does four blocking network-IO operations, while the second one does only one. Usually, most types of IO operations incur in quite heavy overhead, so you would presumably want to avoid small writes/reads and batch things up.
You should always serialize your data, and if posible, batch it into a single message. This way you would avoid as much IO overhead as possible.
Probably the question is more about Interprocess Communication (IPC) rather than TCP protocol. There are multiple options to use for IPC (see Interprocess Communications page on Microsoft Dev Center). First you need to define your system requirements (how the system should perform/scale), than you need to choose a simplest option that works best in your particular scenario using performance metrics.
Relevant excerpt from Performance Culture article by Joe Duffy:
Decent engineers intuit. Good engineers measure. Great engineers do both.
Measure what, though?
I put metrics into two distinct categories:
Consumption metrics. These directly measure the resources consumed by running a test.
Observational metrics. These measure the outcome of running a test, observationally, using metrics “outside” of the system.
Examples of consumption metrics are hardware performance counters, such as instructions retired, data cache misses, instruction cache misses, TLB misses, and/or context switches. Software performance counters are also good candidates, like number of I/Os, memory allocated (and collected), interrupts, and/or number of syscalls. Examples of observational metrics include elapsed time and cost of running the test as billed by your cloud provider. Both are clearly important for different reasons.
As for TCP, I don't see the point of writing data in small pieces when you can write it at once. You can use BufferedStream to decorate TCP client stream instance and use same BinaryWriter with it. Just make sure you don't mix reads and writes in a way that forces BufferedStream to try to write internal buffer back to the stream, because that operation is not supported in NetworkStream. See Is it better to send 1 large chunk or lots of small ones when using TCP? and Why would BufferedStream.Write throw “This stream does not support seek operations”? discussions on StackOverflow.
For more information check Example of Named Pipes, C# Sockets vs Pipes, IPC Mechanisms in C# - Usage and Best Practices, When to use .NET BufferedStream class? and When is optimisation premature? discussions on StackOverflow.

socket buffer size: pros and cons of bigger vs smaller

I've never really worked with COM sockets before, and now have some code that is listening to a rather steady stream of data (500Kb/s - 2000Kb/s).
I've experimented with different sizes, but am not really sure what I'm conceptually doing.
byte[] m_SocketBuffer = new byte[4048];
//vs
byte[] m_SocketBuffer = new byte[8096];
The socket I'm using is System.Net.Sockets.Socket, and this is my constructor:
new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
My questions are:
Is there a general trade-off for large vs. small buffers?
How does one go about sizing the buffer? What should you use as a gauge?
I'm retrieving the data like this:
string socketData = Encoding.ASCII.GetString(m_SocketBuffer, 0, iReceivedBytes)
while (sData.Length > 0)
{ //do stuff }
Does my reading event happen when the buffer is full? Like whenever the socket buffer hits the threshold, that's when I can read from it?
Short version:
Optimal buffer size is dependent on a lot of factors, including the underlying network transport and how your own code handles the I/O.
10's of K is probably about right for a high-volume server moving a lot of data. But you can use much smaller if you know that remote endpoints won't ever be sending you a lot of data all at once.
Buffers are pinned for the duration of the I/O operation, which can cause or exacerbate heap fragmentation. For a really high-volume server, it might make sense to allocate very large buffers (larger than 85,000 bytes) so that the buffer is allocated from the large-object heap (which either has no fragmentation issues, or is in a perpetual state of fragmentation, depending on how you look at it :) ), and then use just a portion of each large buffer for any given I/O operation.
Re: your specific questions:
Is there a general trade-off for large vs. small buffers?
Probably the most obvious is the usual: a buffer larger than you will ever actually need is just wasting space.
Making buffers too small forces more I/O operations, possibly forcing more thread context switches (depending on how you are doing I/O), and for sure increasing the number of program statements that have to be executed.
There are other trade-offs of course, but to go into each and every one of them would be far too broad a discussion for this forum.
How does one go about sizing the buffer? What should you use as a gauge?
I'd start with a size that seems "reasonable", and then experiment from there. Adjust the buffer size in various load testing scenarios, increasing and decreasing, to see what if any effect there is on performance.
Does my reading event happen when the buffer is full? Like whenever the socket buffer hits the threshold, that's when I can read from it?
When you read from the socket, the network layer will put as much data into your buffer as it can. If there is more data available than will fit, it fills the buffer. If there is less data available than will fit, the operation will complete without filling the buffer (but always when at least one byte has been placed into the buffer…the only time a read operation completes with zero-length is when the connection is being shut down)

SocketAsyncEventArgs vs TcpListener/TcpClient [duplicate]

Is there a valid reason to not use TcpListener for implementing a high performance/high throughput TCP server instead of SocketAsyncEventArgs?
I've already implemented this high performance/high throughput TCP server using SocketAsyncEventArgs went through all sort of headaches to handling those pinned buffers using a big pre-allocated byte array and pools of SocketAsyncEventArgs for accepting and receiving, putting together using some low level stuff and shiny smart code with some TPL Data Flow and some Rx and it works perfectly; almost text book in this endeavor - actually I've learnt more than 80% of these stuff from other-one's code.
However there are some problems and concerns:
Complexity: I can not delegate any sort of modifications to this server to another
member of the team. That bounds me to this sort of tasks and I can
not pay enough attention to other parts of other projects.
Memory Usage (pinned byte arrays): Using SocketAsyncEventArgs the pools are needed to
be pre-allocated. So for handling 100000 concurrent connections
(worse condition, even on different ports) a big pile of RAM is uselessly hovers there;
pre-allocated (even if these conditions are met just at some times,
server should be able to handle 1 or 2 such peaks everyday).
TcpListener actually works good: I actually had put TcpListener into test (with some tricks like
using AcceptTcpClient on a dedicated thread, and not the async
version and then sending the accepted connections to a
ConcurrentQueue and not creating Tasks in-place and the like)
and with latest version of .NET, it worked very well, almost as good
as SocketAsyncEventArgs, no data-loss and a low memory foot-print
which helps with not wasting too much RAM on server and no pre-allocation is needed.
So why I do not see TcpListener being used anywhere and everybody (including myself) is using SocketAsyncEventArgs? Am I missing something?
I see no evidence that this question is about TcpListener at all. It seems you are only concerned with the code that deals with a connection that already has been accepted. Such a connection is independent of the listener.
SocketAsyncEventArgs is a CPU-load optimization. I'm convinced you can achieve a higher rate of operations per second with it. How significant is the difference to normal APM/TAP async IO? Certainly less than an order of magnitude. Probably between 1.2x and 3x. Last time I benchmarked loopback TCP transaction rate I found that the kernel took about half of the CPU usage. That means your app can get at most 2x faster by being infinitely optimized.
Remember that SocketAsyncEventArgs was added to the BCL in the year 2000 or so when CPUs were far less capable.
Use SocketAsyncEventArgs only when you have evidence that you need it. It causes you to be far less productive. More potential for bugs.
Here's the template that your socket processing loop should look like:
while (ConnectionEstablished()) {
var someData = await ReadFromSocketAsync(socket);
await ProcessDataAsync(someData);
}
Very simple code. No callbacks thanks to await.
In case you are concerned about managed heap fragmentation: Allocate a new byte[1024 * 1024] on startup. When you want to read from a socket read a single byte into some free portion of this buffer. When that single-byte read completes you ask how many bytes are actually there (Socket.Available) and synchronously pull the rest. That way you only pin a single rather small buffer and still can use async IO to wait for data to arrive.
This technique does not require polling. Since Socket.Available can only increase without reading from the socket we do not risk to perform a read that is too small accidentally.
Alternatively, you can combat managed heap fragmentation by allocating few very big buffers and handing out chunks.
Or, if you don't find this to be a problem you don't need to do anything.

Improve C# Socket performance

We have a TCP Async socket game server written in C# serving many concurrent connections. Everything else works fine except for the problems below:
Frequent disconnections for some users (Not all mind you)
Degraded performance for users on slow connections
Most of the users who face this problem are using GSM connections (portable USB dongles based on CDMA) and often the signal strength is poor. So basically they have rather poor bandwidth. Perhaps under 1KB per sec.
My question: What can we do to make the connections more stable and improve performance even for the slower connections?
I am thinking dynamic TCP buffers on client and server side, but not really sure of the performance degradation due to overhead in dynamically doing this for each connection of my direction is even correct.
Max data packet size is under 1 KB.
Current TCP buffer size on server and client is 16KB
Any pointers or references on how to write stable anync socket code in C# for poor or slow connections would be a great help. Thanks in advance.
"Performance" is a very relative term. It looks like your main concern is with data transfer rates. Unfortunately you can't do much about it given low-bandwidth connections - maybe data compression can help, but actual effect depends on your data, and there's always a tradeoff between transfer rate improvement vs. compression/de-compression delays. There's also latency to consider for any sort of interactive game.
As #Pierre suggested in the comments you might consider UDP for the transport, but that only works if you can tolerate packet loss and re-ordering, and that again depends of the data and what you do with it.
Another approach I would suggest investigating is to provide two different quality-of-service modes. Clients on good links can use full functionality (say, full-resolution game images), while low-bandwidth clients would get reduced options (say, much smaller size low-quality images). You can measure initial round-trip times on client connect/handshake to select the mode.
Hope this helps a bit.

Fastest form of downloading using sockets

Hi
I have TCP/IP client server application. i want to send large serialized object around 1MB through sockets.
Is it possible to get better performance by splitting byte array to for example 10 chunks of arrays and open a socket for each and send them Async compared to opening one socket and send all large data through it ?
Thanks
Splitting the data to less than the MTU will introduce more overhead as there will be more packets - this will actually slow things down. What you are proposing is already being done as part of the protocol i.e. splitting and re-assembling. I would experiment with sending less data e.g. compression.
No, this doesn't speed up the transfer under normal conditions, it only adds overhead. It would only help if you have a slow network segment which is quite busy otherwise and the traffic is shaped per TCP connection.
Make sure that your sockets code is efficient, because wrong buffer and therefore packet sizes, synchroneous operation and other stuff may slow the transfer down.

Categories