We have a production system that gathers telemetry data from remote devices. This data is sent at a reasonable frequency and we end up receiving up to thousands of messages a second at peak times. The payload is circa 80 bytes per message. I am starting to do some performance testing of various storage machanisms, but I thought first of all I would try and see how fast I could push UDP without any data storage involved. I am getting approx 70,000 messages a second max throughput testing on my local machine (seems to be around the same if I use another machine to send the test data). From my rough calcuations, this is way lower than I expected given the network link capacity. The sender sits in a tight loop sending data. I am fully aware of all the issues with UDP re; lost packets, etc. I just want to get an idea of our systems weak points.
Is the throughput so low because of the small packet size?
Matt
private IPEndPoint _receiveEndpoint = new IPEndPoint(IPAddress.Any, _receivePort);
private Stopwatch sw = new Stopwatch();
private int _recievedCount = 0;
private long _lastCount = 0;
private Thread _receiverThread;
private bool _running = true;
_clientReceive = new UdpClient();
_clientReceive.Client.Bind(_receiveEndpoint);
_receiverThread = new Thread(DoReceive);
_receiverThread.Start();
while (_running)
{
Byte[] receiveBytes = _clientReceive.Receive(ref _receiveEndpoint);
_clientReceive.Receive(ref _receiveEndpoint);
if (!sw.IsRunning)
sw.Start();
string receiveString = Encoding.ASCII.GetString(receiveBytes);
_recievedCount = ++_recievedCount;
long howLong = sw.ElapsedMilliseconds;
if (howLong/1000 > _lastCount)
{
_lastCount = howLong/1000;
Invoke(new MethodInvoker(() => { Text = _recievedCount + " iterations in " + sw.ElapsedMilliseconds + " msecs"; }));
}
}
Lots of small UDP packets are definitely going to result in a lower network throughput than you'd get with larger packets however did you include the IP & UDP header sizes in your calculations?
Apart from that 70k messages/second is very very high and definitely not something you'd want to have happening across the internet if that was where the app is eventually going to be deployed. Even thousands of messages/second is high and if it was me I'd be looking to try and make the communications from the telemetry equipment less chatty perhaps by bundling up multiple readings into a single transmission.
If that's not an option and you are on a private network and you need to increase the network throughput you may have to start looking at your network card, its driver and then fine tuning some Windows networking parameters. But whatever you do with the messages you are almost certainly going to bottle-neck on whatever processing you do on them, especially if it involves disk, way before you get to 70k messages/second (I'd be suprised if you can even get to 10K/second when you're doing anything useful with them).
Yes, you should do measurements with various payload sizes, and see how much throughput you get. For small payloads, there might be an overhead with UDP/IP/Ethernet headers that might be reducing your throughput.
Also see the following article on SO: Having trouble achieving 1Gbit UDP throughput
Related
I'm trying to send many small (~350 byte) UDP messages to different remote hosts, one packet for each remote host. I'm using a background thread to listen for responses
private void ReceivePackets()
{
while (true)
{
try
{
receiveFrom = new IPEndPoint(IPAddress.Any, localPort);
byte[] data = udpClientReceive.Receive(ref receiveFrom);
// Decode message & short calculation
}
catch (Exception ex)
{
Log("Error receiving data: " + ex.ToString());
}
}
}
and a main thread for sending messages using
udpClientSend.SendAsync(send_buffer, send_buffer.Length, destinationIPEP);
Both UdpClient udpClientReceive and UdpClient udpClientSend are bound to the same port.
The problem is SendAsync() takes around 15ms to complete and I need to send a few thousand packets per second. What I already tried is using udpClientSend.Send(send_buffer, send_buffer.Length, destination);which is just as slow. I also set both receive/send buffers higher and I tried setting udpClientSend.Client.SendTimeout = 1; which has no effect. I suspect it might have to do with the remote host changing for every single packet? If that is the case, will using many UDPclients in seperate threads make things faster?
Thanks for any help!
Notes:
Network bandwidth is not the problem and I need to use UDP not TCP.
I've seen similar questions on this website but none have a satisfying answer.
Edit
There is only one thread for sending, it runs a simple loop in which udpClientSend.SendAsync() is called.
I'm querying nodes in the DHT (bittorrent hashtable) so multicasting is not an option (?) - every host only gets 1 packet.
Exchanging UDPClient class with the Socket class and using AsyncSendTo() does not speed things up (or insignificantly).
I have narrowed down the problem: Changing the remote host address to some fixed IP & port increases throughput to over 3000 packets/s. Thus changing the destination address too often seems to be the bottleneck.
I'm thinking my problem might be related to UDP "Connect"-Speed in C# and UDPClient.Connect() is slowing down the code. If so, is there a fix for this? Is it a language or an OS problem?
Why are you using UdpClient Connect before sending? It's an optional step if you want to set a default host. Looks like you want to send to multiple destinations so you can set the remote host each time you call the Send method.
If you decide to remove the Connect method you may see an improvement, but it still looks like you have a bottleneck somewhere else in your code. Isolate the Send calls and measure time then start adding more steps until you find what's slowing you down.
var port = 4242;
var ipEndPoint1 = new IPEndPoint(IPAddress.Parse("192.168.56.101"), port);
var ipEndPoint2 = new IPEndPoint(IPAddress.Parse("192.168.56.102"), port);
var buff = new byte[350];
var client = new UdpClient();
int count = 0;
var stopWatch = new Stopwatch();
stopWatch.Start();
while (count < 3000)
{
IPEndPoint endpoint = ipEndPoint1;
if ((count % 2) == 0)
endpoint = ipEndPoint2;
client.Send(buff, buff.Length, endpoint);
count++;
}
stopWatch.Stop();
Console.WriteLine("RunTime " + stopWatch.Elapsed.TotalMilliseconds.ToString());
I'm currently writing a chat-server, bottom up, in C#.
It's like one single big room, with all the clients in, and then you can initiate private chats also. I've also laid the code out for future integration of multiple rooms (but not necessary right now).
It's been written mostly for fun, but also because I'm going to make a new chatsite for young people like myself, as there are no one left here in Denmark.
I've just tested it out with 170 clients (Written in Javascript with JQuery and a Flash bridge to socket connectivity). The response time on local network from a message being sent to it being delivered was less than 1 second. But now I'm considering what kind of performance I'm able to squeeze out of this.
I can see if I connect two clients and then 168 other, and write on client 2 and watch client 1, it comes up immediately on client 1. And the CPU usage and RAM usage shows no signs of server stress at all. It copes fine and I think it can scale to at least 1000 - 1500 without the slightest problem.
I have however noticed something, and that is if I open the 170 clients again and send a message on client 1 and watch on client 170, there is a log around 750 milliseconds or so.
I know the problem, and that is, when the server receives a chat message it broadcasts it to every client on the server. It does however need to enumerate all these clients, and that takes time. The delay right now is very acceptable for a chat, but I'm worried client 1 sending to client 750 maybe (not tested yet) will take 2 - 3 seconds. And i'm also worried when I begin to get maybe 2 - 3 messages a second.
So to sum it up, I want to speed up the server broadcasting process. I'm already utilizing a parallel foreach loop and I'm also using asynchronous sockets.
Here is the broadcasting code:
lock (_clientLock)
{
Parallel.ForEach(_clients, c =>
{
c.Value.Send(message);
});
}
And here is the send function being invoked on each client:
try {
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
_socket.BeginSend(bytesOut, 0, bytesOut.Length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
I want to know if there is any way to speed this up?
I've considered writing some kind of helper class accepting messages in a que and then using maybe 20 threads or so, to split up the broadcasting list.
But I want to know YOUR opinions on this topic, I'm a student and I want to learn! (:
Btw. I like how you spot problems in your code when about to post to stack overflow. I've now made an overloaded function to accept a byte array from the server class when using broadcast, so the UTF-8 conversion only needs to happen once. Also to play it safe, the calculation of the byte array length only happens once now. See the updated version below.
But I'm still interested in ways of improving this even more!
Updated broadcast function:
lock (_clientLock)
{
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
int bytesOutLength = bytesOut.Length;
Parallel.ForEach(_clients, c =>
{
c.Value.Send(bytesOut, bytesOutLength);
});
}
Updated send function on client object:
public void Send(byte[] message, int length)
{
try
{
_socket.BeginSend(message, 0, length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
}
~1s sounds really slow for a local network. Average LAN latency is 0.3ms. Is Nagle enabled or disabled? I'm guessing it is enabled... so: change that (Socket.NoDelay). That does mean you have to take responsibility for not writing to the socket in an overly-fragmented way, of course - so don't drip the message in character-by-character. Assemble the message to send (or better: multiple outstanding messages) in memory, and send it as a unit.
I have run into an issue with the slow C# start-up time causing UDP packets to drop initially. Below, I is what I have done to mitigate this start-up delay. I essentially wait an additional 10ms between the first two packet transmissions. This fixes the initial drops at least on my machine. My concern is that a slower machine may need a longer delay than this.
private void FlushPacketsToNetwork()
{
MemoryStream packetStream = new MemoryStream();
while (packetQ.Count != 0)
{
byte[] packetBytes = packetQ.Dequeue().ToArray();
packetStream.Write(packetBytes, 0, packetBytes.Length);
}
byte[] txArray = packetStream.ToArray();
udpSocket.Send(txArray);
txCount++;
ExecuteStartupDelay();
}
// socket takes too long to transmit unless I give it some time to "warm up"
private void ExecuteStartupDelay()
{
if (txCount < 3)
{
timer.SpinWait(10e-3);
}
}
So, I am wondering is there a better approach to let C# fully load all of its dependencies? I really don't mind if it takes several seconds to completely load; I just do not want to do any high bandwidth transmissions until C# is ready for full speed.
Additional relevant details
This is a console application, the network transmission is run from a separate thread, and the main thread just waits for a key press to terminate the network transmitter.
In the Program.Main method I have tried to get the most performance from my application by using the highest priorities reasonable:
public static void Main(string[] args)
{
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
...
Thread workerThread = new Thread(new ThreadStart(worker.Run));
workerThread.Priority = ThreadPriority.Highest;
workerThread.Start();
...
Console.WriteLine("Press any key to stop the stream...");
WaitForKeyPress();
worker.RequestStop = true;
workerThread.Join();
Also, the socket settings I am currently using are shown below:
udpSocket = new Socket(targetEndPoint.Address.AddressFamily,
SocketType.Dgram,
ProtocolType.Udp);
udpSocket.Ttl = ttl;
udpSocket.SendBufferSize = 1024 * 1024;
udpSocket.Blocking = true;
udpSocket.Connect(targetEndPoint);
The default SendBufferSize is 8192, so I went ahead and moved it up to a megabyte, but this setting did not seem to have any affect on the dropped packets at the beginning.
From the comments I learned that TCP is not an option for you (because of inherent delays in transmission), also you do not want to loose packets due to other side being not fully loaded.
So you actually need to implement some features present in TCP (retransmission) but in more robust and lightweight fashion. I also assume that you are in control of the receiving side.
I propose that you send some predetermined number of packets. And then wait for confirmation. For instance, every packet can have an id that constantly grows. Every N packets, receiving application sends the number of last received packet to the sender. After receiving this number sender will know if it is necessary to repeat last N packets.
This approach should not hurt your bandwidth very much and you will get some sort of information about received data (although not guaranteed).
Otherwise it is best to switch to TCP. By the way did you try using TCP? How much your bandwidth hurts because of it?
I'm currently writing a prototype application in C#/.Net4 where i need to transfer an unknown amount of data. The data is read in from a text file and then serialized into a byte array.
Now i need to implement both transmission methods, UDP and TCP. The transmission in both ways does work fine but i have some struggleing with UDP. I assumend that the transmission using UDP have to be much faster than using TCP but in fact my tests proved that the UDP transmission is about 7 to 8 times slower than using TCP.
I tested the transmission with a 12 megabyte file and the TCP transmission took about 1 second whereas the UDP transmission took about 7 seconds.
In the application i use simple sockets to transmit the data. Since UDP does only allow a maximum of 65535kb per message, i splitted the serialized the byte array of the file into several parts where each part has the size of the socker SendBufferSize and then i transfer each part using Socket.Send() method call.
Here is the code for the Sender part.
while (startOffset < data.Length)
{
if ((startOffset + payloadSize) > data.Length)
{
payloadSize = data.Length - startOffset;
}
byte[] subMessageBytes = new byte[payloadSize + 16];
byte[] messagePrefix = new UdpMessagePrefix(data.Length, payloadSize, messageCount, messageId).ToByteArray();
Buffer.BlockCopy(messagePrefix, 0, subMessageBytes, 0, 16);
Buffer.BlockCopy(data, startOffset, subMessageBytes, messageOffset, payloadSize);
messageId++;
startOffset += payloadSize;
udpClient.Send(subMessageBytes, subMessageBytes.Length);
messages.Add(subMessageBytes);
}
This code does simply copy the next part to be send into an byte array and then call the send method on the socket. My first guess was, that the splitting/copying of the byte arrays was slowing down the performance, but i isolated and tested the splitting code and the splitting took only a few milliseconds, so that was not causing the problem.
int receivedMessageCount = 1;
Dictionary<int, byte[]> receivedMessages = new Dictionary<int, byte[]>();
while (receivedMessageCount != totalMessageCount)
{
byte[] data = udpClient.Receive(ref remoteIpEndPoint);
UdpMessagePrefix p = UdpMessagePrefix.FromByteArray(data);
receivedMessages.Add(p.MessageId, data);
//Console.WriteLine("Received packet: " + receivedMessageCount + " (ID: " + p.MessageId + ")");
receivedMessageCount++;
//Console.WriteLine("ReceivedMessageCount: " + receivedMessageCount);
}
Console.WriteLine("Done...");
return receivedMessages;
This is the server side code where i receive the UDP messages. Each message has some bytes as a prefix where the total number of messages is stored and the size. So i simply call socket.Receive in a loop until i received the amount of messages which were specified in the prefix.
My assumption here is that i may have implemented the UDP transmission code not "efficiently" enough... Maybe one of you guys already sees a problem in the code snippets or have any other suggestion or hint for me why my UDP transmission is slower than TCP.
thanks in advance!
While UDP datagram size can be up to 64K, the actual wire frames are usually 1500 bytes (normal ethernet MTU). That also has to fit an IP header of minimum 20 bytes and a UDP header of 8 bytes, leaving you with 1472 bytes of usable payload.
What you are seeing is the result of your OS network stack fragmenting the UDP datagrams on the sender side and then re-assembling them on the receiver side. That takes time, thus your results.
TCP, on the other hand, does its own packetization and tries to find path MTU, so it's more efficient in this case.
Limit your data chunks to 1472 bytes and measure again.
I think you should measure CPU usage and network throughput for the duration of the test.
If the CPU is pegged, this is your problem: Turn on a profiler.
If the network (cable) is pegged this is a different class of problems. I wouldn't know what to do about it ;-)
If neither is pegged, run a profiler and see where most wall-clock time is spent. There must be some waiting going on.
If you don't have a profiler just hit break 10 times in the debugger and see where it stops most often.
Edit: My response to your measurement: We know that 99% of all execution time is spent in receiving data. But we don't know yet if the CPU is busy. Look into task-manager and look which process is busy.
My guess is it is the System process. This is the windows kernel and probably the UDP component of it.
This might have to do with packet fragmentation. IP packets have a certain maximum size like 1472 bytes. Your UDP packets are being fragmented and reassembled on the receiving machine. I am surprised that is taking so much CPU time.
Try sending packets of total size of 1000 and 1472 (try both!) and report the results.
I'm working on a C# Server application for a game engine I'm writing in ActionScript 3. I'm using an authoritative server model as to prevent cheating and ensure fair game. So far, everything works well:
When the client begins moving, it tells the server and starts rendering locally; the server, then, tells everyone else that client X has began moving, among with details so they can also begin rendering. When the client stops moving, it tells the server, which performs calculations based on the time the client began moving and the client render tick delay and replies to everyone, so they can update with the correct values.
The thing is, when I use the default 20ms tick delay on server calculations, when the client moves for a rather long distance, there's a noticeable tilt forward when it stops. If I increase slightly the delay to 22ms, on my local network everything runs very smoothly, but in other locations, the tilt is still there. After experimenting a little, I noticed that the extra delay needed is pretty much tied to the latency between client and server. I even boiled it down to a formula that would work quite nicely: delay = 20 + (latency / 10).
So, how would I proceed to obtain the latency between a certain client and the server (I'm using asynchronous sockets). The CPU effort can't be too much, as to not have the server run slowly. Also, is this really the best way, or is there a more efficient/easier way to do this?
Sorry that this isn't directly answering your question, but generally speaking you shouldn't rely too heavily on measuring latency because it can be quite variable. Not only that, you don't know if the ping time you measure is even symmetrical, which is important. There's no point applying 10ms of latency correction if it turns out that the ping time of 20ms is actually 19ms from server to client and 1ms from client to server. And latency in application terms is not the same as in networking terms - you may be able to ping a certain machine and get a response in 20ms but if you're contacting a server on that machine that only processes network input 50 times a second then your responses will be delayed by an extra 0 to 20ms, and this will vary rather unpredictably.
That's not to say latency measurement it doesn't have a place in smoothing predictions out, but it's not going to solve your problem, just clean it up a bit.
On the face of it, the problem here seems to be that that you're sent information in the first message which you use to extrapolate data from until the last message is received. If all else stays constant then the movement vector given in the first message multiplied by the time between the messages will give the server the correct end position that the client was in at roughly now-(latency/2). But if the latency changes at all, the time between the messages will grow or shrink. The client may know he's moved 10 units, but the server simulated him moving 9 or 11 units before being told to snap him back to 10 units.
The general solution to this is to not assume that latency will stay constant but to send periodic position updates, which allow the server to verify and correct the client's position. With just 2 messages as you have now, all the error is found and corrected after the 2nd message. With more messages, the error is spread over many more sample points allowing for smoother and less visible correction.
It can never be perfect though: all it takes is a lag spike in the last millisecond of movement and the server's representation will overshoot. You can't get around that if you're predicting future movement based on past events, as there's no real alternative to choosing either correct-but-late or incorrect-but-timely since information takes time to travel. (Blame Einstein.)
One thing to keep in mind when using ICMP based pings is that networking equipment will often give ICMP traffic lower priority than normal packets, especially when the packets cross network boundaries such as WAN links. This can lead to pings being dropped or showing higher latency than traffic is actually experiencing and lends itself to being an indicator of problems rather than a measurement tool.
The increasing use of Quality of Service (QoS) in networks only exacerbates this and as a consequence though ping still remains a useful tool, it needs to be understood that it may not be a true reflection of the network latency for non-ICMP based real traffic.
There is a good post at the Itrinegy blog How do you measure Latency (RTT) in a network these days? about this.
You could use the already available Ping Class. Should be preferred over writing your own IMHO.
Have a "ping" command, where you send a message from the server to the client, then time how long it takes to get a response. Barring CPU overload scenarios, it should be pretty reliable. To get the one-way trip time, just divide the time by 2.
We can measure the round-trip time using the Ping class of the .NET Framework.
Instantiate a Ping and subscribe to the PingCompleted event:
Ping pingSender = new Ping();
pingSender.PingCompleted += PingCompletedCallback;
Add code to configure and action the ping.
Our PingCompleted event handler (PingCompletedEventHandler) has a PingCompletedEventArgs argument. The PingCompletedEventArgs.Reply gets us a PingReply object. PingReply.RoundtripTime returns the round trip time (the "number of milliseconds taken to send an Internet Control Message Protocol (ICMP) echo request and receive the corresponding ICMP echo reply message"):
public static void PingCompletedCallback(object sender, PingCompletedEventArgs e)
{
...
Console.WriteLine($"Roundtrip Time: {e.Reply.RoundtripTime}");
...
}
Code-dump of a full working example, based on MSDN's example. I have modified it to write the RTT to the console:
public static void Main(string[] args)
{
string who = "www.google.com";
AutoResetEvent waiter = new AutoResetEvent(false);
Ping pingSender = new Ping();
// When the PingCompleted event is raised,
// the PingCompletedCallback method is called.
pingSender.PingCompleted += PingCompletedCallback;
// Create a buffer of 32 bytes of data to be transmitted.
string data = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
byte[] buffer = Encoding.ASCII.GetBytes(data);
// Wait 12 seconds for a reply.
int timeout = 12000;
// Set options for transmission:
// The data can go through 64 gateways or routers
// before it is destroyed, and the data packet
// cannot be fragmented.
PingOptions options = new PingOptions(64, true);
Console.WriteLine("Time to live: {0}", options.Ttl);
Console.WriteLine("Don't fragment: {0}", options.DontFragment);
// Send the ping asynchronously.
// Use the waiter as the user token.
// When the callback completes, it can wake up this thread.
pingSender.SendAsync(who, timeout, buffer, options, waiter);
// Prevent this example application from ending.
// A real application should do something useful
// when possible.
waiter.WaitOne();
Console.WriteLine("Ping example completed.");
}
public static void PingCompletedCallback(object sender, PingCompletedEventArgs e)
{
// If the operation was canceled, display a message to the user.
if (e.Cancelled)
{
Console.WriteLine("Ping canceled.");
// Let the main thread resume.
// UserToken is the AutoResetEvent object that the main thread
// is waiting for.
((AutoResetEvent)e.UserState).Set();
}
// If an error occurred, display the exception to the user.
if (e.Error != null)
{
Console.WriteLine("Ping failed:");
Console.WriteLine(e.Error.ToString());
// Let the main thread resume.
((AutoResetEvent)e.UserState).Set();
}
Console.WriteLine($"Roundtrip Time: {e.Reply.RoundtripTime}");
// Let the main thread resume.
((AutoResetEvent)e.UserState).Set();
}
You might want to perform several pings and then calculate an average, depending on your requirements of course.