Fastest way to "broadcast" to list of TCP clients - c#

I'm currently writing a chat-server, bottom up, in C#.
It's like one single big room, with all the clients in, and then you can initiate private chats also. I've also laid the code out for future integration of multiple rooms (but not necessary right now).
It's been written mostly for fun, but also because I'm going to make a new chatsite for young people like myself, as there are no one left here in Denmark.
I've just tested it out with 170 clients (Written in Javascript with JQuery and a Flash bridge to socket connectivity). The response time on local network from a message being sent to it being delivered was less than 1 second. But now I'm considering what kind of performance I'm able to squeeze out of this.
I can see if I connect two clients and then 168 other, and write on client 2 and watch client 1, it comes up immediately on client 1. And the CPU usage and RAM usage shows no signs of server stress at all. It copes fine and I think it can scale to at least 1000 - 1500 without the slightest problem.
I have however noticed something, and that is if I open the 170 clients again and send a message on client 1 and watch on client 170, there is a log around 750 milliseconds or so.
I know the problem, and that is, when the server receives a chat message it broadcasts it to every client on the server. It does however need to enumerate all these clients, and that takes time. The delay right now is very acceptable for a chat, but I'm worried client 1 sending to client 750 maybe (not tested yet) will take 2 - 3 seconds. And i'm also worried when I begin to get maybe 2 - 3 messages a second.
So to sum it up, I want to speed up the server broadcasting process. I'm already utilizing a parallel foreach loop and I'm also using asynchronous sockets.
Here is the broadcasting code:
lock (_clientLock)
{
Parallel.ForEach(_clients, c =>
{
c.Value.Send(message);
});
}
And here is the send function being invoked on each client:
try {
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
_socket.BeginSend(bytesOut, 0, bytesOut.Length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
I want to know if there is any way to speed this up?
I've considered writing some kind of helper class accepting messages in a que and then using maybe 20 threads or so, to split up the broadcasting list.
But I want to know YOUR opinions on this topic, I'm a student and I want to learn! (:
Btw. I like how you spot problems in your code when about to post to stack overflow. I've now made an overloaded function to accept a byte array from the server class when using broadcast, so the UTF-8 conversion only needs to happen once. Also to play it safe, the calculation of the byte array length only happens once now. See the updated version below.
But I'm still interested in ways of improving this even more!
Updated broadcast function:
lock (_clientLock)
{
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
int bytesOutLength = bytesOut.Length;
Parallel.ForEach(_clients, c =>
{
c.Value.Send(bytesOut, bytesOutLength);
});
}
Updated send function on client object:
public void Send(byte[] message, int length)
{
try
{
_socket.BeginSend(message, 0, length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
}

~1s sounds really slow for a local network. Average LAN latency is 0.3ms. Is Nagle enabled or disabled? I'm guessing it is enabled... so: change that (Socket.NoDelay). That does mean you have to take responsibility for not writing to the socket in an overly-fragmented way, of course - so don't drip the message in character-by-character. Assemble the message to send (or better: multiple outstanding messages) in memory, and send it as a unit.

Related

How to increase processor utilize for a specific action

I have application that using PcapDotNet DLLs and send packets.
In some Pcap files with high speed rate (~50 Mbit/Sec or ~9000 Packet per seconds) the play takes long time compare to the original Pcap length\duration and I can see that ~25% of my CPUs (i have 4 cores) are utilized.
I have asked this question in the project page and the developer suggest I parallel my program so I can better utilize my resources because this is single thread program.
This is example of my function that sends the packets and it does the most pf the work, this function also include several events like report how many packets sent etc... (not include in mt example)
So my question is, assuming that the function is the one that does the most work How can I divide my CPU resources in a better way?
using (PacketSendBuffer sendBuffer = new PacketSendBuffer((uint)capLength))
{
while (inputCommunicator.ReceivePacket(out packet) == PacketCommunicatorReceiveResult.Ok) // Read the packets from the file
{
if (packet != null)
{
try
{
_outputCommunicator.SendPacket(packet); // Send the Packets
_sentPackets++;
}
catch (Exception ex)
{
// Throw exception
}
}
}
}
Are you sure you're not spinning? What's the read timeout on your inputCommunicator? It might be a better idea to use callbacks rather than infinite loops to read the packets:
http://pcapdotnet.codeplex.com/wikipage?title=Pcap.Net%20Tutorial%20-%20Opening%20an%20adapter%20and%20capturing%20the%20packets
Or you could use asynchronous I/O instead, given that you're not actually doing any real CPU work.
In any case, try profiling the application. Find out where it spends time. Find any concurrency issues. You'll have a clearer picture then.

TCP segments disappearing

I run into a problem that googling seems can't solve. To keep it simple I have a client written in C# and a server running Linux written in C. Client is calling Send(buffer) in a loop 100 times. The problem is that server receives only a dozen of them. If I put a sleep, big enough, in a loop everything turns out fine. The buffer is small - about 30B. I read about Nagle's algorithm and ACK delay but it doesn't answer my problems.
for(int i = 0; i < 100; i++)
{
try
{
client.Send(oneBuffer, 0, oneBuffer.Length, SocketFlags.None)
}
catch (SocketException socE)
{
if ((socE.SocketErrorCode == SocketError.WouldBlock)
|| (socE.SocketErrorCode == SocketError.NoBufferSpaceAvailable)
|| (socE.SocketErrorCode == SocketError.IOPending))
{
Console.WriteLine("Never happens :(");
}
}
Thread.Sleep(100); //problem solver but why??
}
It's look like send buffer gets full and rejects data until it gets empty again, in blocking mode and nonblocking mode. Even better, I never get any exception!? I would expect some of the exceptions to raise but nothing. :( Any ideas? Thnx in advance.
TCP is stream oriented. This means that recv can read any amount of bytes between one and the total number of bytes outstanding (sent but not yet read). "Messages" do not exist. Sent buffers can be split or merged.
There is no way to get message behavior from TCP. There is no way to make recv read at least N bytes. Message semantics are constructed by the application protocol. Often, by using fixed-size messages or a length prefix. You can read at least N bytes by doing a read loop.
Remove that assumption from your code.
I think this issue is due to the nagle algorithm :
The Nagle algorithm is designed to reduce network traffic by causing
the socket to buffer small packets and then combine and send them in
one packet under certain circumstances. A TCP packet consists of 40
bytes of header plus the data being sent. When small packets of data
are sent with TCP, the overhead resulting from the TCP header can
become a significant part of the network traffic. On heavily loaded
networks, the congestion resulting from this overhead can result in
lost datagrams and retransmissions, as well as excessive propagation
time caused by congestion. The Nagle algorithm inhibits the sending of
new TCP segments when new outgoing data arrives from the user if any
previouslytransmitted data on the connection remains unacknowledged.
Calling client.Send function doesn't mean a TCP segment will be sent.
In your case, as buffers are small, the naggle algorithm will regroup them into larger segments. Check on server side that the dozen of buffers received contains the whole data.
When you add a Thread.Sleep(100), you will receive 100 packets on server side because nagle algotithm won't wait longer for further data.
If you really need a short latency in your application, you can explicitly disable nagle algorithm for your TcpClient : set the NoDelay property to true. Add this line at the begening of your code :
client.NoDelay = true;
I was naive thinking there was a problem with TCP stack. It was with my server code. Somewhere in between the data manipulation I used strncpy() function on a buffer that stores messages. Every message contained \0 at the end. Strncpy copied only the first message (the first string) out of the buffer regardless the count that was given (buffer length). That resulted in me thinking I had lost messages.
When I used the delay between send() calls on client, messages didn't get buffered. So, strncpy() worked on a buffer with one message and everything went smoothly. That "phenomenon" led we into thinking that speed rate of send calls is causing my problems.
Again thanks on help, your comments made me wonder. :)

C# Begin Send within a foreach loop issue

I have a group of "Packets" which are custom classed that are coverted to byte[] and then sent to the client. When a client joins, they are updated with the previous "Catch Up Packets" that were sent previous to the user joining. Think of it as a chat room where you are updated with the previous conversations.
My issue is on the client end, we do not receive all the information; Sometimes not at all..
Below is pseudo c# code for what I see
code looks like this.
lock(CatchUpQueue.SyncRoot)
{
foreach(Packet packet in CatchUpQueue)
{
// If I put Console.WriteLine("I am Sending Packets"); It will work fine up to (2) client sockets else if fails again.
clientSocket.BeginSend(data, 0, data.length, SocketFlags.None, new AsyncCallback(EndSend), data);
}
}
Is this some sort of throttle issue or an issue with sending to many times: ie: if there are 4 packets in the queue then it calls begin send 4 times.
I have searched for a topic similiar and I cannot find one. Thank you for your help.
Edit: I would also like to point out that the sending between clients continues normally for any sends after the client connects. But for some reason the packets within this for loop are not all sent.
I would suspect that you are flooding the TCP port with packets, and probably overflowing its send buffer, at which point it will probably return errors rather than sending the data.
The idea of Async I/O is not to allow you to send an infinite amount of data packets simultaneously, but to allow your foreground thread to continue processing while a linear sequence of one or more I/O operations occurs in the background.
As the TCP stream is a serial stream, try respecting that and send each packet in turn. That is, after BeginSend, use the Async callback to detect when the Send has completed before you send again. You are effectively doing this by adding a Sleep, but this is not a very good solution (you will either be sending packets more slowly than possible, or you may not sleep for long enough and packets will be lost again)
Or, if you don't need the I/O to run in the background, use your simple foreach loop, but use a synchronous rather than Async send.
Okay,
Apparently a fix, so far still has me confused, is to Thread.Sleep for the number of ms for each packet I am sending.
So...
for(int i = 0; i < PacketQueue.Count; i++)
{
Packet packet = PacketQueue[i];
clientSocket.BeginSend(data, 0, data.length, SocketFlags.None, new AsyncCallback(EndSend), data);
Thread.Sleep(PacketQueue.Count);
}
I assume that for some reason the loop stops some of the calls from happening... Well I will continue to work with this and try to find the real answer.

MSMQ Receive() method timeout

My original question from a while ago is MSMQ Slow Queue Reading, however I have advanced from that and now think I know the problem a bit more clearer.
My code (well actually part of an open source library I am using) looks like this:
queue.Receive(TimeSpan.FromSeconds(10), MessageQueueTransactionType.Automatic);
Which is using the Messaging.MessageQueue.Receive function and queue is a MessageQueue. The problem is as follows.
The above line of code will be called with the specified timeout (10 seconds). The Receive(...) function is a blocking function, and is supposed to block until a message arrives in the queue at which time it will return. If no message is received before the timeout is hit, it will return at the timeout. If a message is in the queue when the function is called, it will return that message immediately.
However, what is happening is the Receive(...) function is being called, seeing that there is no message in the queue, and hence waiting for a new message to come in. When a new message comes in (before the timeout), it isn't detecting this new message and continues waiting. The timeout is eventually hit, at which point the code continues and calls Receive(...) again, where it picks up the message and processes it.
Now, this problem only occurs after a number of days/weeks. I can make it work normally again by deleting & recreating the queue. It happens on different computers, and different queues. So it seems like something is building up, until some point when it breaks the triggering/notification ability that the Receive(...) function uses.
I've checked a lot of different things, and everything seems normal & isn't different from a queue that is working normally. There is plenty of disk space (13gig free) and RAM (about 350MB free out of 1GB from what I can tell). I have checked registry entries which all appear the same as other queues, and the performance monitor doesn't show anything out of the normal. I have also run the TMQ tool and can't see anything noticably wrong from that.
I am using Windows XP on all the machines and they all have service pack 3 installed. I am not sending a large amount of messages to the queues, at most it would be 1 every 2 seconds but generally a lot less frequent than that. The messages are only small too and nowhere near the 4MB limit.
The only thing I have just noticed is the p0000001.mq and r0000067.mq files in C:\WINDOWS\system32\msmq\storage are both 4,096KB however they are that size on other computers also which are not currently experiencing the problem. The problem does not happen to every queue on the computer at once, as I can recreate 1 problem queue on the computer and the other queues still experience the problem.
I am not very experienced with MSMQ so if you post possible things to check can you please explain how to check them or where I can find more details on what you are talking about.
Currently the situation is:
ComputerA - 4 queues normal
ComputerB - 2 queues experiencing problem, 1 queue normal
ComputerC - 2 queues experiencing problem
ComputerD - 1 queue normal
ComputerE - 2 queues normal
So I have a large number of computers/queues to compare and test against.
Any particular reason you aren't using an event handler to listen to the queues? The System.Messaging library allows you to attach a handler to a queue instead of, if I understand what you are doing correctly, looping Receive every 10 seconds. Try something like this:
class MSMQListener
{
public void StartListening(string queuePath)
{
MessageQueue msQueue = new MessageQueue(queuePath);
msQueue.ReceiveCompleted += QueueMessageReceived;
msQueue.BeginReceive();
}
private void QueueMessageReceived(object source, ReceiveCompletedEventArgs args)
{
MessageQueue msQueue = (MessageQueue)source;
//once a message is received, stop receiving
Message msMessage = null;
msMessage = msQueue.EndReceive(args.AsyncResult);
//do something with the message
//begin receiving again
msQueue.BeginReceive();
}
}
We are also using NServiceBus and had a similar problem inside our network.
Basically, MSMQ is using UDP with two-phase commits. After a message is received, it has to be acknowledged. Until it is acknowledged, it cannot be received on the client side as the receive transaction hasn't been finalized.
This was caused by different things in different times for us:
once, this was due to the Distributed Transaction Coordinator unable to communicate between machines as firewall misconfiguration
another time, we were using cloned virtual machines without sysprep which made internal MSMQ ids non-unique and made it receive a message to one machine and ack to another. Eventually, MSMQ figures things out but it takes quite a while.
Try this
public Message Receive( TimeSpan timeout, Cursor cursor )
overloaded function.
To get a cursor for a MessageQueue, call the CreateCursor method for that queue.
A Cursor is used with such methods as Peek(TimeSpan, Cursor, PeekAction) and Receive(TimeSpan, Cursor) when you need to read messages that are not at the front of the queue. This includes reading messages synchronously or asynchronously. Cursors do not need to be used to read only the first message in a queue.
When reading messages within a transaction, Message Queuing does not roll back cursor movement if the transaction is aborted. For example, suppose there is a queue with two messages, A1 and A2. If you remove message A1 while in a transaction, Message Queuing moves the cursor to message A2. However, if the transaction is aborted for any reason, message A1 is inserted back into the queue but the cursor remains pointing at message A2.
To close the cursor, call Close.
If you want to use something completely synchronous and without event you can test this method
public object Receive(string path, int millisecondsTimeout)
{
var mq = new System.Messaging.MessageQueue(path);
var asyncResult = mq.BeginReceive();
var handles = new System.Threading.WaitHandle[] { asyncResult.AsyncWaitHandle };
var index = System.Threading.WaitHandle.WaitAny(handles, millisecondsTimeout);
if (index == 258) // Timeout
{
mq.Close();
return null;
}
var result = mq.EndReceive(asyncResult);
return result;
}

How do I obtain the latency between server and client in C#?

I'm working on a C# Server application for a game engine I'm writing in ActionScript 3. I'm using an authoritative server model as to prevent cheating and ensure fair game. So far, everything works well:
When the client begins moving, it tells the server and starts rendering locally; the server, then, tells everyone else that client X has began moving, among with details so they can also begin rendering. When the client stops moving, it tells the server, which performs calculations based on the time the client began moving and the client render tick delay and replies to everyone, so they can update with the correct values.
The thing is, when I use the default 20ms tick delay on server calculations, when the client moves for a rather long distance, there's a noticeable tilt forward when it stops. If I increase slightly the delay to 22ms, on my local network everything runs very smoothly, but in other locations, the tilt is still there. After experimenting a little, I noticed that the extra delay needed is pretty much tied to the latency between client and server. I even boiled it down to a formula that would work quite nicely: delay = 20 + (latency / 10).
So, how would I proceed to obtain the latency between a certain client and the server (I'm using asynchronous sockets). The CPU effort can't be too much, as to not have the server run slowly. Also, is this really the best way, or is there a more efficient/easier way to do this?
Sorry that this isn't directly answering your question, but generally speaking you shouldn't rely too heavily on measuring latency because it can be quite variable. Not only that, you don't know if the ping time you measure is even symmetrical, which is important. There's no point applying 10ms of latency correction if it turns out that the ping time of 20ms is actually 19ms from server to client and 1ms from client to server. And latency in application terms is not the same as in networking terms - you may be able to ping a certain machine and get a response in 20ms but if you're contacting a server on that machine that only processes network input 50 times a second then your responses will be delayed by an extra 0 to 20ms, and this will vary rather unpredictably.
That's not to say latency measurement it doesn't have a place in smoothing predictions out, but it's not going to solve your problem, just clean it up a bit.
On the face of it, the problem here seems to be that that you're sent information in the first message which you use to extrapolate data from until the last message is received. If all else stays constant then the movement vector given in the first message multiplied by the time between the messages will give the server the correct end position that the client was in at roughly now-(latency/2). But if the latency changes at all, the time between the messages will grow or shrink. The client may know he's moved 10 units, but the server simulated him moving 9 or 11 units before being told to snap him back to 10 units.
The general solution to this is to not assume that latency will stay constant but to send periodic position updates, which allow the server to verify and correct the client's position. With just 2 messages as you have now, all the error is found and corrected after the 2nd message. With more messages, the error is spread over many more sample points allowing for smoother and less visible correction.
It can never be perfect though: all it takes is a lag spike in the last millisecond of movement and the server's representation will overshoot. You can't get around that if you're predicting future movement based on past events, as there's no real alternative to choosing either correct-but-late or incorrect-but-timely since information takes time to travel. (Blame Einstein.)
One thing to keep in mind when using ICMP based pings is that networking equipment will often give ICMP traffic lower priority than normal packets, especially when the packets cross network boundaries such as WAN links. This can lead to pings being dropped or showing higher latency than traffic is actually experiencing and lends itself to being an indicator of problems rather than a measurement tool.
The increasing use of Quality of Service (QoS) in networks only exacerbates this and as a consequence though ping still remains a useful tool, it needs to be understood that it may not be a true reflection of the network latency for non-ICMP based real traffic.
There is a good post at the Itrinegy blog How do you measure Latency (RTT) in a network these days? about this.
You could use the already available Ping Class. Should be preferred over writing your own IMHO.
Have a "ping" command, where you send a message from the server to the client, then time how long it takes to get a response. Barring CPU overload scenarios, it should be pretty reliable. To get the one-way trip time, just divide the time by 2.
We can measure the round-trip time using the Ping class of the .NET Framework.
Instantiate a Ping and subscribe to the PingCompleted event:
Ping pingSender = new Ping();
pingSender.PingCompleted += PingCompletedCallback;
Add code to configure and action the ping.
Our PingCompleted event handler (PingCompletedEventHandler) has a PingCompletedEventArgs argument. The PingCompletedEventArgs.Reply gets us a PingReply object. PingReply.RoundtripTime returns the round trip time (the "number of milliseconds taken to send an Internet Control Message Protocol (ICMP) echo request and receive the corresponding ICMP echo reply message"):
public static void PingCompletedCallback(object sender, PingCompletedEventArgs e)
{
...
Console.WriteLine($"Roundtrip Time: {e.Reply.RoundtripTime}");
...
}
Code-dump of a full working example, based on MSDN's example. I have modified it to write the RTT to the console:
public static void Main(string[] args)
{
string who = "www.google.com";
AutoResetEvent waiter = new AutoResetEvent(false);
Ping pingSender = new Ping();
// When the PingCompleted event is raised,
// the PingCompletedCallback method is called.
pingSender.PingCompleted += PingCompletedCallback;
// Create a buffer of 32 bytes of data to be transmitted.
string data = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
byte[] buffer = Encoding.ASCII.GetBytes(data);
// Wait 12 seconds for a reply.
int timeout = 12000;
// Set options for transmission:
// The data can go through 64 gateways or routers
// before it is destroyed, and the data packet
// cannot be fragmented.
PingOptions options = new PingOptions(64, true);
Console.WriteLine("Time to live: {0}", options.Ttl);
Console.WriteLine("Don't fragment: {0}", options.DontFragment);
// Send the ping asynchronously.
// Use the waiter as the user token.
// When the callback completes, it can wake up this thread.
pingSender.SendAsync(who, timeout, buffer, options, waiter);
// Prevent this example application from ending.
// A real application should do something useful
// when possible.
waiter.WaitOne();
Console.WriteLine("Ping example completed.");
}
public static void PingCompletedCallback(object sender, PingCompletedEventArgs e)
{
// If the operation was canceled, display a message to the user.
if (e.Cancelled)
{
Console.WriteLine("Ping canceled.");
// Let the main thread resume.
// UserToken is the AutoResetEvent object that the main thread
// is waiting for.
((AutoResetEvent)e.UserState).Set();
}
// If an error occurred, display the exception to the user.
if (e.Error != null)
{
Console.WriteLine("Ping failed:");
Console.WriteLine(e.Error.ToString());
// Let the main thread resume.
((AutoResetEvent)e.UserState).Set();
}
Console.WriteLine($"Roundtrip Time: {e.Reply.RoundtripTime}");
// Let the main thread resume.
((AutoResetEvent)e.UserState).Set();
}
You might want to perform several pings and then calculate an average, depending on your requirements of course.

Categories