I've written a simple .Net client that connects to a hardware device (FPGA) over TCP/IP. When I click a button in the client it sends a small (4 byte) "request" to the device, then immediately reads a block of data (approx 32kb) that the device responds with. The client code looks something like this:-
var stream = _tcpClient.GetStream();
stream.Write(requestData, 0, requestData.Length);
using (var ms = new MemoryStream())
{
var tempBuffer = new byte[65535];
do
{
var numBytesRead = stream.Read(tempBuffer, 0, tempBuffer.Length);
ms.Write(tempRead, 0, numBytesRead);
}
while(ms.Length < ExpectedResponseSize);
_hardwareResponse = ms.ToArray();
}
Using a Stopwatch in the above code typically reports 2-3ms to read the entire 32kb response back, and this timing remains consistent if I repeatedly click the button slowly (e.g. once per second).
If I start clicking the button more rapidly (e.g. every half a second) then after a few seconds the timings suddenly drop to around 12ms and stay there, even if I go back to clicking the button slowly. If I close then reopen the connection on the client and try again, it's back to 2-3ms times.
WireShark shows 3-4 ACKs coming out of the PC during the faster responses, but this increases to a dozen or more once the timing drops to 12ms. In both cases the number and size of packets coming from the FPGA is the same. I'm as confident as I can be that it's not a problem in the code on the client or FPGA (neither can get much simpler) - gut feeling is that it's a protocol or network thing. Any thoughts?
Have you looked at the various answers in this similar question: https://serverfault.com/questions/215674/latency-in-tcp-ip-over-ethernet-networks/215963 ? I suspect that the issue is window size or ack algorithm reconfiguring itself pessimistically, but the only real way to get to the bottom is to try adjusting one thing at a time on the client and server until you spot the performance difference. A lot of adaptive algorithms (like Nagle) are in use by default and can sometimes have unintended consequences on local traffic.
Related
I run into a problem that googling seems can't solve. To keep it simple I have a client written in C# and a server running Linux written in C. Client is calling Send(buffer) in a loop 100 times. The problem is that server receives only a dozen of them. If I put a sleep, big enough, in a loop everything turns out fine. The buffer is small - about 30B. I read about Nagle's algorithm and ACK delay but it doesn't answer my problems.
for(int i = 0; i < 100; i++)
{
try
{
client.Send(oneBuffer, 0, oneBuffer.Length, SocketFlags.None)
}
catch (SocketException socE)
{
if ((socE.SocketErrorCode == SocketError.WouldBlock)
|| (socE.SocketErrorCode == SocketError.NoBufferSpaceAvailable)
|| (socE.SocketErrorCode == SocketError.IOPending))
{
Console.WriteLine("Never happens :(");
}
}
Thread.Sleep(100); //problem solver but why??
}
It's look like send buffer gets full and rejects data until it gets empty again, in blocking mode and nonblocking mode. Even better, I never get any exception!? I would expect some of the exceptions to raise but nothing. :( Any ideas? Thnx in advance.
TCP is stream oriented. This means that recv can read any amount of bytes between one and the total number of bytes outstanding (sent but not yet read). "Messages" do not exist. Sent buffers can be split or merged.
There is no way to get message behavior from TCP. There is no way to make recv read at least N bytes. Message semantics are constructed by the application protocol. Often, by using fixed-size messages or a length prefix. You can read at least N bytes by doing a read loop.
Remove that assumption from your code.
I think this issue is due to the nagle algorithm :
The Nagle algorithm is designed to reduce network traffic by causing
the socket to buffer small packets and then combine and send them in
one packet under certain circumstances. A TCP packet consists of 40
bytes of header plus the data being sent. When small packets of data
are sent with TCP, the overhead resulting from the TCP header can
become a significant part of the network traffic. On heavily loaded
networks, the congestion resulting from this overhead can result in
lost datagrams and retransmissions, as well as excessive propagation
time caused by congestion. The Nagle algorithm inhibits the sending of
new TCP segments when new outgoing data arrives from the user if any
previouslytransmitted data on the connection remains unacknowledged.
Calling client.Send function doesn't mean a TCP segment will be sent.
In your case, as buffers are small, the naggle algorithm will regroup them into larger segments. Check on server side that the dozen of buffers received contains the whole data.
When you add a Thread.Sleep(100), you will receive 100 packets on server side because nagle algotithm won't wait longer for further data.
If you really need a short latency in your application, you can explicitly disable nagle algorithm for your TcpClient : set the NoDelay property to true. Add this line at the begening of your code :
client.NoDelay = true;
I was naive thinking there was a problem with TCP stack. It was with my server code. Somewhere in between the data manipulation I used strncpy() function on a buffer that stores messages. Every message contained \0 at the end. Strncpy copied only the first message (the first string) out of the buffer regardless the count that was given (buffer length). That resulted in me thinking I had lost messages.
When I used the delay between send() calls on client, messages didn't get buffered. So, strncpy() worked on a buffer with one message and everything went smoothly. That "phenomenon" led we into thinking that speed rate of send calls is causing my problems.
Again thanks on help, your comments made me wonder. :)
I'm currently writing a chat-server, bottom up, in C#.
It's like one single big room, with all the clients in, and then you can initiate private chats also. I've also laid the code out for future integration of multiple rooms (but not necessary right now).
It's been written mostly for fun, but also because I'm going to make a new chatsite for young people like myself, as there are no one left here in Denmark.
I've just tested it out with 170 clients (Written in Javascript with JQuery and a Flash bridge to socket connectivity). The response time on local network from a message being sent to it being delivered was less than 1 second. But now I'm considering what kind of performance I'm able to squeeze out of this.
I can see if I connect two clients and then 168 other, and write on client 2 and watch client 1, it comes up immediately on client 1. And the CPU usage and RAM usage shows no signs of server stress at all. It copes fine and I think it can scale to at least 1000 - 1500 without the slightest problem.
I have however noticed something, and that is if I open the 170 clients again and send a message on client 1 and watch on client 170, there is a log around 750 milliseconds or so.
I know the problem, and that is, when the server receives a chat message it broadcasts it to every client on the server. It does however need to enumerate all these clients, and that takes time. The delay right now is very acceptable for a chat, but I'm worried client 1 sending to client 750 maybe (not tested yet) will take 2 - 3 seconds. And i'm also worried when I begin to get maybe 2 - 3 messages a second.
So to sum it up, I want to speed up the server broadcasting process. I'm already utilizing a parallel foreach loop and I'm also using asynchronous sockets.
Here is the broadcasting code:
lock (_clientLock)
{
Parallel.ForEach(_clients, c =>
{
c.Value.Send(message);
});
}
And here is the send function being invoked on each client:
try {
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
_socket.BeginSend(bytesOut, 0, bytesOut.Length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
I want to know if there is any way to speed this up?
I've considered writing some kind of helper class accepting messages in a que and then using maybe 20 threads or so, to split up the broadcasting list.
But I want to know YOUR opinions on this topic, I'm a student and I want to learn! (:
Btw. I like how you spot problems in your code when about to post to stack overflow. I've now made an overloaded function to accept a byte array from the server class when using broadcast, so the UTF-8 conversion only needs to happen once. Also to play it safe, the calculation of the byte array length only happens once now. See the updated version below.
But I'm still interested in ways of improving this even more!
Updated broadcast function:
lock (_clientLock)
{
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
int bytesOutLength = bytesOut.Length;
Parallel.ForEach(_clients, c =>
{
c.Value.Send(bytesOut, bytesOutLength);
});
}
Updated send function on client object:
public void Send(byte[] message, int length)
{
try
{
_socket.BeginSend(message, 0, length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
}
~1s sounds really slow for a local network. Average LAN latency is 0.3ms. Is Nagle enabled or disabled? I'm guessing it is enabled... so: change that (Socket.NoDelay). That does mean you have to take responsibility for not writing to the socket in an overly-fragmented way, of course - so don't drip the message in character-by-character. Assemble the message to send (or better: multiple outstanding messages) in memory, and send it as a unit.
I have run into an issue with the slow C# start-up time causing UDP packets to drop initially. Below, I is what I have done to mitigate this start-up delay. I essentially wait an additional 10ms between the first two packet transmissions. This fixes the initial drops at least on my machine. My concern is that a slower machine may need a longer delay than this.
private void FlushPacketsToNetwork()
{
MemoryStream packetStream = new MemoryStream();
while (packetQ.Count != 0)
{
byte[] packetBytes = packetQ.Dequeue().ToArray();
packetStream.Write(packetBytes, 0, packetBytes.Length);
}
byte[] txArray = packetStream.ToArray();
udpSocket.Send(txArray);
txCount++;
ExecuteStartupDelay();
}
// socket takes too long to transmit unless I give it some time to "warm up"
private void ExecuteStartupDelay()
{
if (txCount < 3)
{
timer.SpinWait(10e-3);
}
}
So, I am wondering is there a better approach to let C# fully load all of its dependencies? I really don't mind if it takes several seconds to completely load; I just do not want to do any high bandwidth transmissions until C# is ready for full speed.
Additional relevant details
This is a console application, the network transmission is run from a separate thread, and the main thread just waits for a key press to terminate the network transmitter.
In the Program.Main method I have tried to get the most performance from my application by using the highest priorities reasonable:
public static void Main(string[] args)
{
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
...
Thread workerThread = new Thread(new ThreadStart(worker.Run));
workerThread.Priority = ThreadPriority.Highest;
workerThread.Start();
...
Console.WriteLine("Press any key to stop the stream...");
WaitForKeyPress();
worker.RequestStop = true;
workerThread.Join();
Also, the socket settings I am currently using are shown below:
udpSocket = new Socket(targetEndPoint.Address.AddressFamily,
SocketType.Dgram,
ProtocolType.Udp);
udpSocket.Ttl = ttl;
udpSocket.SendBufferSize = 1024 * 1024;
udpSocket.Blocking = true;
udpSocket.Connect(targetEndPoint);
The default SendBufferSize is 8192, so I went ahead and moved it up to a megabyte, but this setting did not seem to have any affect on the dropped packets at the beginning.
From the comments I learned that TCP is not an option for you (because of inherent delays in transmission), also you do not want to loose packets due to other side being not fully loaded.
So you actually need to implement some features present in TCP (retransmission) but in more robust and lightweight fashion. I also assume that you are in control of the receiving side.
I propose that you send some predetermined number of packets. And then wait for confirmation. For instance, every packet can have an id that constantly grows. Every N packets, receiving application sends the number of last received packet to the sender. After receiving this number sender will know if it is necessary to repeat last N packets.
This approach should not hurt your bandwidth very much and you will get some sort of information about received data (although not guaranteed).
Otherwise it is best to switch to TCP. By the way did you try using TCP? How much your bandwidth hurts because of it?
I'm trying to write a simple c# application which downloads a large number of small files from an FTP server.
I've tried two approaches:
1 - generic socket programming
2 - using FtpWebRequest and FtpWebResponse objects
The download speed (for the same file) when using the first approach varies from 1.5s to 7s, the 2nd gives more less the same results - about 2.5s each time.
Considering that about 1.4s out of those 2.5s takes the process of initiating the FtpWebRequest object (only 1.1s for receiving data) the difference is quite significant.
The question is how to achieve for the 1st approach the same good stable download speed as for the 2nd one?
For the 1st approach the problem seems to lay in the loop below (as it takes about 90% of the download time):
Int32 intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);
while (intResponseLength != 0)
{
localFile.Write(buffer, 0, intResponseLength);
intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);
}
Equivalent part of code for the 2nd approach (always takes about 1.1s for particular file):
Int32 intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);
while (intResponseLength != 0)
{
localFile.Write (buffer, 0, intResponseLength);
intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);
}
I've tried buffers from 56b to 32kB - no significant difference.
Also creating a stream on the open data socket:
Stream str = new NetworkStream(dataSocket);
and reading it (instead of using dataSocket.Receive)
str.Read(buffer, 0, intBufferSize);
doesn't help... in fact it's even slower.
Thanks in advance for any suggestion!
You need to use Socket.Poll or Socket.Select methods to check availability of data. What you do not only slows down operation, but also causes extensive CPU load. Poll or Select will yield processor time until data is available or timeout elapses. You can keep the same loop but include a call to one of the above methods, and play with timeouts (try values from 10 ms to 500 ms to find timeout, optimal for your task).
I'm working on a C# Server application for a game engine I'm writing in ActionScript 3. I'm using an authoritative server model as to prevent cheating and ensure fair game. So far, everything works well:
When the client begins moving, it tells the server and starts rendering locally; the server, then, tells everyone else that client X has began moving, among with details so they can also begin rendering. When the client stops moving, it tells the server, which performs calculations based on the time the client began moving and the client render tick delay and replies to everyone, so they can update with the correct values.
The thing is, when I use the default 20ms tick delay on server calculations, when the client moves for a rather long distance, there's a noticeable tilt forward when it stops. If I increase slightly the delay to 22ms, on my local network everything runs very smoothly, but in other locations, the tilt is still there. After experimenting a little, I noticed that the extra delay needed is pretty much tied to the latency between client and server. I even boiled it down to a formula that would work quite nicely: delay = 20 + (latency / 10).
So, how would I proceed to obtain the latency between a certain client and the server (I'm using asynchronous sockets). The CPU effort can't be too much, as to not have the server run slowly. Also, is this really the best way, or is there a more efficient/easier way to do this?
Sorry that this isn't directly answering your question, but generally speaking you shouldn't rely too heavily on measuring latency because it can be quite variable. Not only that, you don't know if the ping time you measure is even symmetrical, which is important. There's no point applying 10ms of latency correction if it turns out that the ping time of 20ms is actually 19ms from server to client and 1ms from client to server. And latency in application terms is not the same as in networking terms - you may be able to ping a certain machine and get a response in 20ms but if you're contacting a server on that machine that only processes network input 50 times a second then your responses will be delayed by an extra 0 to 20ms, and this will vary rather unpredictably.
That's not to say latency measurement it doesn't have a place in smoothing predictions out, but it's not going to solve your problem, just clean it up a bit.
On the face of it, the problem here seems to be that that you're sent information in the first message which you use to extrapolate data from until the last message is received. If all else stays constant then the movement vector given in the first message multiplied by the time between the messages will give the server the correct end position that the client was in at roughly now-(latency/2). But if the latency changes at all, the time between the messages will grow or shrink. The client may know he's moved 10 units, but the server simulated him moving 9 or 11 units before being told to snap him back to 10 units.
The general solution to this is to not assume that latency will stay constant but to send periodic position updates, which allow the server to verify and correct the client's position. With just 2 messages as you have now, all the error is found and corrected after the 2nd message. With more messages, the error is spread over many more sample points allowing for smoother and less visible correction.
It can never be perfect though: all it takes is a lag spike in the last millisecond of movement and the server's representation will overshoot. You can't get around that if you're predicting future movement based on past events, as there's no real alternative to choosing either correct-but-late or incorrect-but-timely since information takes time to travel. (Blame Einstein.)
One thing to keep in mind when using ICMP based pings is that networking equipment will often give ICMP traffic lower priority than normal packets, especially when the packets cross network boundaries such as WAN links. This can lead to pings being dropped or showing higher latency than traffic is actually experiencing and lends itself to being an indicator of problems rather than a measurement tool.
The increasing use of Quality of Service (QoS) in networks only exacerbates this and as a consequence though ping still remains a useful tool, it needs to be understood that it may not be a true reflection of the network latency for non-ICMP based real traffic.
There is a good post at the Itrinegy blog How do you measure Latency (RTT) in a network these days? about this.
You could use the already available Ping Class. Should be preferred over writing your own IMHO.
Have a "ping" command, where you send a message from the server to the client, then time how long it takes to get a response. Barring CPU overload scenarios, it should be pretty reliable. To get the one-way trip time, just divide the time by 2.
We can measure the round-trip time using the Ping class of the .NET Framework.
Instantiate a Ping and subscribe to the PingCompleted event:
Ping pingSender = new Ping();
pingSender.PingCompleted += PingCompletedCallback;
Add code to configure and action the ping.
Our PingCompleted event handler (PingCompletedEventHandler) has a PingCompletedEventArgs argument. The PingCompletedEventArgs.Reply gets us a PingReply object. PingReply.RoundtripTime returns the round trip time (the "number of milliseconds taken to send an Internet Control Message Protocol (ICMP) echo request and receive the corresponding ICMP echo reply message"):
public static void PingCompletedCallback(object sender, PingCompletedEventArgs e)
{
...
Console.WriteLine($"Roundtrip Time: {e.Reply.RoundtripTime}");
...
}
Code-dump of a full working example, based on MSDN's example. I have modified it to write the RTT to the console:
public static void Main(string[] args)
{
string who = "www.google.com";
AutoResetEvent waiter = new AutoResetEvent(false);
Ping pingSender = new Ping();
// When the PingCompleted event is raised,
// the PingCompletedCallback method is called.
pingSender.PingCompleted += PingCompletedCallback;
// Create a buffer of 32 bytes of data to be transmitted.
string data = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
byte[] buffer = Encoding.ASCII.GetBytes(data);
// Wait 12 seconds for a reply.
int timeout = 12000;
// Set options for transmission:
// The data can go through 64 gateways or routers
// before it is destroyed, and the data packet
// cannot be fragmented.
PingOptions options = new PingOptions(64, true);
Console.WriteLine("Time to live: {0}", options.Ttl);
Console.WriteLine("Don't fragment: {0}", options.DontFragment);
// Send the ping asynchronously.
// Use the waiter as the user token.
// When the callback completes, it can wake up this thread.
pingSender.SendAsync(who, timeout, buffer, options, waiter);
// Prevent this example application from ending.
// A real application should do something useful
// when possible.
waiter.WaitOne();
Console.WriteLine("Ping example completed.");
}
public static void PingCompletedCallback(object sender, PingCompletedEventArgs e)
{
// If the operation was canceled, display a message to the user.
if (e.Cancelled)
{
Console.WriteLine("Ping canceled.");
// Let the main thread resume.
// UserToken is the AutoResetEvent object that the main thread
// is waiting for.
((AutoResetEvent)e.UserState).Set();
}
// If an error occurred, display the exception to the user.
if (e.Error != null)
{
Console.WriteLine("Ping failed:");
Console.WriteLine(e.Error.ToString());
// Let the main thread resume.
((AutoResetEvent)e.UserState).Set();
}
Console.WriteLine($"Roundtrip Time: {e.Reply.RoundtripTime}");
// Let the main thread resume.
((AutoResetEvent)e.UserState).Set();
}
You might want to perform several pings and then calculate an average, depending on your requirements of course.