I have an async socket Server written in C#. Often, my users complain that they are disconnected from the without any reason. Other internet applications on the user's computer (MSN, SKYPE Etc.) appear to be well connected and therefore it does not seem to be an issue with their internet connection. Can anyone provide some information on how to make a connection between client and server rock solid (i mean prevent frequent disconnection) or where I can look for disconnection issues?
Network Address Translators (aka "home routers", "wireless gateways", etc.) that share an ISP assigned IP address among multiple computers may be at play here. NATs keep a table of active connections (sometimes known as a "port mapping table") that keeps track of how to translate the ip:port pair for active TCP connections and observed UDP sessions. There are some brands of routers that are known to clear the port mapping out of memory if it doesn't observe any traffic going over the TCP connection for some period of time. Apps like MSN Messenger and Skype mitigate this by periodically sending "keep alive" between client, server, and other persistent connections. These messages go in both directions.
You didn't give much information to go on, so the NAT problem is just one possibility out of many.
The best way to diagnose connectivity failures is to invest heavily in logging network messages (both on client and server), socket events, and all return values from the socket APIs. That way, if a customer reports a problem, he can send you his logs - and hopefully you can fetch the corresponding server logs for comparison. You can then diagnose where the disconnect originated from and/or what socket error reset the connection. There's a high probability something in your code is triggering the disconnect.
I assume you are talking about TCP here. The socket library which the clients of your application use is pretty much the same as the one that they use to connect to Skype or MSN. Usually the reason for disconnect is that one of the parties closes the socket (e.g. after connection is idle for some time). Check the places in your code where you close your connections.
Related
I created extended TCP server and TCP client classes (in C#) for communication over network for my project use.
And as far as I understand, a client cannot really know if a server is down unless it requests for something which expects a reply but does not get it.
In our application, time and availability (of the server) are critical factors as it involve heavy machines for automation. Hence, according to the discussion on the design, the server is supposed to send its "Heart Beat" periodically such that in case a client does not receive anything from server after a period of time, it will:
Start to attempt its own recovery actions and if it still fails,
It will raise alarm to the service officer in the control room
I am supposed to implement the "heart beat" part in the server. And I have simple implementation of creating "Heart Beat".
public void SendHeartBeatToAllClients(byte[] hbdata) {
foreach (Socket socket in clientNoSocketList.Select(x => x.Value).ToList())
socket.Send(hbdata);
}
So far it works fine, but one thing that worries me is that the heart beat data (hbdata) is short (only few pre-arranged bytes, to save time to talk over many machines) and self-defined and since the server also sends some other data besides the hbdata, and considering the possible latency or other unexpected case, there is always a possibility for this hbdata to be mixed up. Also, in my "heart beat" implementation, the client does not need reply anything to the server.
So here are my questions:
Is my worry not well-grounded (as it is fine so far)? Is there any flaw?
Is Ping a better or a common way to have such heart beat functionality over TCP? Why or why not?
If Ping is to be implemented, considering that Ping has reply, is there a way to implement replyless Ping?
Any suggestion to make the heart beat robust enough yet in the shortest amount of data possible?
This is probably the hardest question to answer. Can you provide a little more detail? Why do you think that your server can't handle sending more than a few bytes? Are we talking thousands of machines here? Is everything on a local LAN, or does this go across multiple networks, or the internet?
Ping is an ICMP echo request - ping is very commonly used by networking monitor software, etc to ensure that clients are online. Typically you do not need to implement your own, if you are just pinging for network access (see: https://msdn.microsoft.com/en-us/library/system.net.networkinformation.ping(v=vs.110).aspx).
Also note that ping is not over TCP at all, but rather ICMP, a somewhat different protocol, used for network diagnostics among other things. But that brings me to number 3...
Ping without a reply is kind of pointless. For what you have in mind, I think the protocol you want is UDP - you can broadcast an arbitrary datagram, with no need for any kind of handshake or reply (TCP by definition involves establishing a session with a handshake) - it just sends. These would be Sockets with SocketType.Dgram instead of SocketType.Stream, and ProtocolType.Udp instead of Tcp or ICMP. If you want to get a little more involved, you can use Broadcast to send to same thing to the entire LAN, or Multicast to send to a specific group of clients.
Again, are you sure you need to be that concerned about limiting traffic, etc here?
Personally, I would flip it around, and have the clients "Check In" at a set interval, reporting a status code to the server. If the server notices a client hasn't checked in for a while, it should send a message to the client and expect a reply.
If you really are having issues scaling that up, I would have the server send the "Heart beats" via UDP at a set interval, and if the client thinks it's missing them, have a mechanism for it to hit the server and ask for a reply - and then if it doesn't get a response, raise the alarm.
Edit: just saw Prabhu's answer - he's right, ping will only tell you if the computer is up, you definitely want something inside the actual application to report back, not just the status of the network connection.
in my "heart beat" implementation, the client does not need reply anything to the server.
Application level keep-alives need to be two-way is'n't? What the above enables is that clients can be sure that server is alive and healthy on receiving the heart beat. If the client does not respond, server will not know the true status of the client. If client becomes unreachable,heart beats pile up in the servers send buffer. Server application will be oblivious to the fact.
Is my worry not well-grounded (as it is fine so far)? Is there any flaw?
Small sized bytes shouldn't be a problem. Its better the heart beats are small.
Is Ping a better or a common way to have such heart beat functionality over TCP? Why or why not?
Ping will be positive even if the client application is down but the system is healthy.
I'm new to sockets and have a couple of questions on their usage in .NET. This is a consumer program so there won't be any scaling issues as the user runs the server and client.
1) Is it better to keep a socket connection open until the server is closed, or should I open a connection only when the user requests it and close it upon completion? It's not a real time game so requests would be intermittent, but are there any downsides to leaving the socket connection open?
2) Do sockets require the user to have admin rights if they're running the server? I looked around and it seemed that RAW sockets do, but I plan on using Stream or Dgram instead depending on which works best for my program.
If you're talking about a single socket then no it's not a big deal
to leave it open. There are lots of ports available and if your
socket is just sitting in a wait state it's going to consume a
negligible amount of system resources.
TCP and UDP socket connections do not require admin rights to open.
However, depending on the user's firewall settings a firewall
exception may be required to allow your application to make an
outside connection and depending on the firewall software that may
or may not require admin rights.
I have a client server situation where the client opens a TCP socket to the server, and sometimes long periods of time will pass with no data being sent between them. I have encountered an issue where the server tries to send data to the client, and it seems to be successful, but the client never receives it, and after a few minutes, it looks like the client then gets disconnected.
Do I need to send some kind of keep alive packet every once in a while?
Edit: To note, this is with peers on the same computer. The computer is behind a NAT, that forwards a range of ports used to this computer. The client that connects with the server opens the connection via DNS. i.e. it uses the mydomain.net & port to connect.
On Windows, sockets with no data sent are a big source for trouble in many applications and must be handled correctly.
The problem is, that SO_KEEPALIVE's period can be set system-wide (otherwise, a default is useless two hours) or with the later winsock API.
Therefore, many applications do send some occasional byte of data every now and then (to be disregarded by the peer) only to make the network layer declare disconnection after ACK is not received (after all due retransmissions done by the layer and ack timeout).
Answering your question: no, the sockets do not disconnect automatically.
Yet, you must be careful with the above issue. What complicates it further is that testing this behavior is very hard. For example, if you set everything correctly and you expect to detect disconnection properly, you cannot test it by disconnecting the physical layer. This is because the NIC will sense the carrier loss and the socket layer will signal to close all application sockets that relied on it. A good way to test it is connect two computers with 3 legs and two switches in between, disconnecting the middle leg, thus preventing carrier loss but still physically disconnecting the machines.
There is a timeout built in to TCP but you can adjust it, See SendTimeout and ReciveTimeout of the Socket class, but I have a suspiciouion that is not your problem. A NAT router may also have a expiration time for TCP connections before it removes it from it's port forwarding table. If no traffic passes within the time of that timeout on the router it will block all incoming traffic (as it cleared the forwarding information from it's memory so it does not know what computer to send the traffic to), also the outgoing connection will likely have a different source port so the server may not recognize it as the same connection.
It's more secure to use Keep-alive option (SO_KEEPALIVE under linux), to prevent disconnect due to inactivity, but this may generate some extra packets.
This sample code do it under linux:
int val = 1;
....
// After creating the socket
if (setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (char *)&val, sizeof(val)))
fprintf(stderr, "setsockopt failure : %d", errno);
Regards.
TCP sockets don't automatically close at all. However TCP connections do. But if this is happening between peers in the same computer the connection should never be dropped as long as both peers exist and have their sockets open.
I have a C# application that has been running fine for several years. It connects via a TCP/IP socket to a machine that sends me stock trade executions.
Recently, I've tried to deploy it to some machines in a new data center that is behind a hardware firewall, and I've started to see some weird dis-connects.
When a dis-connect happens, in my app (the client side), I see nothing unusual except that I stop receiving data over the socket. Wireshark confirms that no data is reaching the socket and my application's receive thread is blocking on the Receive() call when I stop it in the debugger. The socket shows as ESTABLISHED in netstat.
But from the server side, it looks like my client is dis-connecting. Looking at their logs, it looks like the socket on their end usually ends up with either (nRecvd=-1,errno=104) or (nRecvd=0,errno=11). (104 is connection reset by peer).
The dis-connect only seems to happen after a period of in-activity. I have solved this for now by implementing a heartbeat between my client and their server that just sends a short message every 20 seconds and gets a reply. This has caused the dis-connects to drop to 0 over the past few days.
At first, I figured that the hardware firewall was the problem. It was causing the socket to time out after in-activity. But the person in charge of the firewall claims that the timeout for connects on this port (8887) is 2160 minutes.
I am running Windows Server 2003 and .NET 3.5. The trades server is a linux machine (sles9 I believe though I'm not sure).
Any ideas on what might be going on? What could I do to debug this more given that I don't have any access to the firewall logs and no ability to change the code on the trade server?
Thanks,
Mike
What you describe is common, and it's common to implement a heartbeat to keep TCP sockets alive through such firewalls/gateways like you did.
That hardware might have hard 2160 minutes timeouts (in my experience 20-30 minutes is more common though) , but connections are usually dropped much more aggressively if there's any kind of load. Such firewalls have limited resources, and when they need more connection tracking they tend to drop the oldest connection tracked without any activity regardless of the hard timeout set.
If you want to debug this more, go sniff on the server side of the firewall and see what , if anyting, happens when the server gets a disconnect
I would setup wiresharp on both sides of the firewall to see what happens on TCP (and lower level).
And when the admin says the "timeout for connects" is something. Is that the timeout for an idle, established connection? Anything else does not make any sense I guess.
Also, are you using KeepAlive option for TCP? And is that forwarded by the firewall or not?
As I said, probably want to run wireshark on both sides of the firewall...
I have three applications that talk to each other using sockets. They can all live on their own machines but they can also share a machine. Right now I'm having two of them on the same and the third on its own machine. I'm trying to make my communication bullet proof so I unplug cables and kill the applications to make sure everything works as intended.
Here's a quick sketch of the thing:
Now, when I unplug the network cable to PC2 (the red connection "Con B"), the internal connection stops talking (the blue connection "Con A"). I send stuff from "App 1" on the socket that never gets to "App 2".
I have made a mechanism that discovers this and disconnects and then reconnects and after that I can unplug the cable all I want and "Con A" just keeps working. It's only that first time.
I have confirmed having communication through "Con A" before disconnecting "Con B".
I connect and reconnect exactly the same way, it's the same code, so there's no difference.
What's happening?
Additional information trigged by answers:
PC 1 and PC 2 share addresses down to the last byte.
I have an internal keep alive mechanism, I send a message and expect a response every 10 seconds.
When I kill App 3, this does not happen, only when unplugging the cable.
What address are you using for "Con A"? If you are using an address that is bound to the external network adapter, even though you're talking to the same machine, then what you describe could happen.
What you can do is use the address localhost (127.0.0.1) for "Con A", which should be completely independent of what happens on the external network.
On some platforms (windows) pulling the network cable tells the network stack to activly invalidate open socket connections associated with the interface.
In this scenario pulling a network cable is actually a bad test because it provides positive feedback to your application that it may not receive in a real life situation.
One common error for people to make when writing client/server applications is to not incporporate an application layer keep-alive or at least enable keepalives at the transport layer. An application recv()ing data can otherwise be forever oblivious to any failure condition until it write()s and the write fails due to transport layer timeout.
Pulling the network cable has different effects depending on the OS you're running. As another poster said, Windows detects it and invalidates any existing connections. Your application should get a connection closed message in that case.
My Linux server on the other hand deals with it quite gracefully. After an extended (30-40 seconds) de-cabling the other day the SSH connection from my laptop to the server was still happily available and responsive.
As long as the cable is not unplugged longer than the TCP timeouts the stack should be able to buffer up packets and retransmit them as soon as possible. TCP is designed for that. If you're not using TCP then the packets will fall out of the Ethernet hole and evaporate into the atmosphere.
#einstein: If you're using select() or derivatives it pays to never select with a NULL timeout. Always have a sensible timeout and check the socket status if it expires.