I have three applications that talk to each other using sockets. They can all live on their own machines but they can also share a machine. Right now I'm having two of them on the same and the third on its own machine. I'm trying to make my communication bullet proof so I unplug cables and kill the applications to make sure everything works as intended.
Here's a quick sketch of the thing:
Now, when I unplug the network cable to PC2 (the red connection "Con B"), the internal connection stops talking (the blue connection "Con A"). I send stuff from "App 1" on the socket that never gets to "App 2".
I have made a mechanism that discovers this and disconnects and then reconnects and after that I can unplug the cable all I want and "Con A" just keeps working. It's only that first time.
I have confirmed having communication through "Con A" before disconnecting "Con B".
I connect and reconnect exactly the same way, it's the same code, so there's no difference.
What's happening?
Additional information trigged by answers:
PC 1 and PC 2 share addresses down to the last byte.
I have an internal keep alive mechanism, I send a message and expect a response every 10 seconds.
When I kill App 3, this does not happen, only when unplugging the cable.
What address are you using for "Con A"? If you are using an address that is bound to the external network adapter, even though you're talking to the same machine, then what you describe could happen.
What you can do is use the address localhost (127.0.0.1) for "Con A", which should be completely independent of what happens on the external network.
On some platforms (windows) pulling the network cable tells the network stack to activly invalidate open socket connections associated with the interface.
In this scenario pulling a network cable is actually a bad test because it provides positive feedback to your application that it may not receive in a real life situation.
One common error for people to make when writing client/server applications is to not incporporate an application layer keep-alive or at least enable keepalives at the transport layer. An application recv()ing data can otherwise be forever oblivious to any failure condition until it write()s and the write fails due to transport layer timeout.
Pulling the network cable has different effects depending on the OS you're running. As another poster said, Windows detects it and invalidates any existing connections. Your application should get a connection closed message in that case.
My Linux server on the other hand deals with it quite gracefully. After an extended (30-40 seconds) de-cabling the other day the SSH connection from my laptop to the server was still happily available and responsive.
As long as the cable is not unplugged longer than the TCP timeouts the stack should be able to buffer up packets and retransmit them as soon as possible. TCP is designed for that. If you're not using TCP then the packets will fall out of the Ethernet hole and evaporate into the atmosphere.
#einstein: If you're using select() or derivatives it pays to never select with a NULL timeout. Always have a sensible timeout and check the socket status if it expires.
Related
I created extended TCP server and TCP client classes (in C#) for communication over network for my project use.
And as far as I understand, a client cannot really know if a server is down unless it requests for something which expects a reply but does not get it.
In our application, time and availability (of the server) are critical factors as it involve heavy machines for automation. Hence, according to the discussion on the design, the server is supposed to send its "Heart Beat" periodically such that in case a client does not receive anything from server after a period of time, it will:
Start to attempt its own recovery actions and if it still fails,
It will raise alarm to the service officer in the control room
I am supposed to implement the "heart beat" part in the server. And I have simple implementation of creating "Heart Beat".
public void SendHeartBeatToAllClients(byte[] hbdata) {
foreach (Socket socket in clientNoSocketList.Select(x => x.Value).ToList())
socket.Send(hbdata);
}
So far it works fine, but one thing that worries me is that the heart beat data (hbdata) is short (only few pre-arranged bytes, to save time to talk over many machines) and self-defined and since the server also sends some other data besides the hbdata, and considering the possible latency or other unexpected case, there is always a possibility for this hbdata to be mixed up. Also, in my "heart beat" implementation, the client does not need reply anything to the server.
So here are my questions:
Is my worry not well-grounded (as it is fine so far)? Is there any flaw?
Is Ping a better or a common way to have such heart beat functionality over TCP? Why or why not?
If Ping is to be implemented, considering that Ping has reply, is there a way to implement replyless Ping?
Any suggestion to make the heart beat robust enough yet in the shortest amount of data possible?
This is probably the hardest question to answer. Can you provide a little more detail? Why do you think that your server can't handle sending more than a few bytes? Are we talking thousands of machines here? Is everything on a local LAN, or does this go across multiple networks, or the internet?
Ping is an ICMP echo request - ping is very commonly used by networking monitor software, etc to ensure that clients are online. Typically you do not need to implement your own, if you are just pinging for network access (see: https://msdn.microsoft.com/en-us/library/system.net.networkinformation.ping(v=vs.110).aspx).
Also note that ping is not over TCP at all, but rather ICMP, a somewhat different protocol, used for network diagnostics among other things. But that brings me to number 3...
Ping without a reply is kind of pointless. For what you have in mind, I think the protocol you want is UDP - you can broadcast an arbitrary datagram, with no need for any kind of handshake or reply (TCP by definition involves establishing a session with a handshake) - it just sends. These would be Sockets with SocketType.Dgram instead of SocketType.Stream, and ProtocolType.Udp instead of Tcp or ICMP. If you want to get a little more involved, you can use Broadcast to send to same thing to the entire LAN, or Multicast to send to a specific group of clients.
Again, are you sure you need to be that concerned about limiting traffic, etc here?
Personally, I would flip it around, and have the clients "Check In" at a set interval, reporting a status code to the server. If the server notices a client hasn't checked in for a while, it should send a message to the client and expect a reply.
If you really are having issues scaling that up, I would have the server send the "Heart beats" via UDP at a set interval, and if the client thinks it's missing them, have a mechanism for it to hit the server and ask for a reply - and then if it doesn't get a response, raise the alarm.
Edit: just saw Prabhu's answer - he's right, ping will only tell you if the computer is up, you definitely want something inside the actual application to report back, not just the status of the network connection.
in my "heart beat" implementation, the client does not need reply anything to the server.
Application level keep-alives need to be two-way is'n't? What the above enables is that clients can be sure that server is alive and healthy on receiving the heart beat. If the client does not respond, server will not know the true status of the client. If client becomes unreachable,heart beats pile up in the servers send buffer. Server application will be oblivious to the fact.
Is my worry not well-grounded (as it is fine so far)? Is there any flaw?
Small sized bytes shouldn't be a problem. Its better the heart beats are small.
Is Ping a better or a common way to have such heart beat functionality over TCP? Why or why not?
Ping will be positive even if the client application is down but the system is healthy.
Many questions relating to port 80 being used have answers saying that there are many programs that use it as their default port. This post mentions some: Skype, IIS, Apache...
Since only one application can listen on any one port at a time - How can that be? And if the answer is that that's only their default port - how will an application know it has to send information to a different port? For example - if iis will listen on port 81 because Skype is listening on 80 - how will anyone requesting a web page know to send the request to theip:81 as opposed to theip:80?
My goal is to have a robust way of setting up a connection between programs, when any hard coded port might fail due to some application already listening on it. The port will only need to be used once in order to communicate what dynamic port will be used for the rest of the session. This is a problem for both network connections and for connecting several applications on the same computer.
Registering with IANA is not always possible, and won't even necessarily solve the problem - someone might still be listening on a registered port. And obviously the solution of "hope for no collisions" - just doesn't cut it.
(I do understand that a connection has two sockets (and a protocol) and therefore one socket can have multiple connections. My question is about listening on a socket in order to establish the connection.)
What I would expect, is there to exist some service on the OS (Windows) that I could register my application with, and receive all incoming traffic with some signature - even if it's simply some magic string. Or perhaps some port where multiple applications can listen concurrently - and all would get every incoming message. But I haven't found anything like that so far.
How can that be? Simply...it's not. Only one application will listen on each port. – Adriano Repetti
Right. When Skype listens on those ports before I start my web-server, the server fails. It took me a while to find out why.
Only one app can listen on a socket in a sane way. The OS allows multiple apps to listen on the same port if you specify special options but that's insane. Accepted connections are then dispatched to different applications in an unspecified (i.e. random) way.
IIS can run multiple web-apps on the same port because it opens the port once in kernel mode and dispatches connections to its worker processes.
I do not believe it is ever possible for multiple sockets to listen on the same (TCP) port. If you try to bind a socket to a port that is already open, you will get an error.
I believe Skype gets around the problem you describe by using their own servers as a rendezvous point. The simple explanation being:
Alice starts her client, connects to the central server, and informs it of what port she is listening on.
Bob starts his client and likewise informs the central server.
Now, Alice wants to connect to Bob, but doesn't know which port to send packets to.
Alice will then query the central server for Bob's port number.
With this information, a direct connection is then established with Bob using that port.
The logic can of course extend to learning the other party's IP address as well as even obtaining public keys.
Note that there's actually a bit more involved with most modern peer-to-peer applications, Skype being no exception. The problem being that most computers are now behind at least one NAT router. Getting two devices each behind their own router to connect to each other is known as NAT traversal - the most common technique having a central coordinating server instruct both clients to simultaneously connect to each other. For more information on this, I recommend Steve Gibson's Security Now!, episode #42
on the port forwarding setup page i just put a comma and add an other port to open. it wont allow you to set up an additional rule with listen port 80 but it will allow you to trigger multiple ports with that one listen port
For TCP, you can only have one application listening on a single port
at one time. Now if you had 2 network cards or created a virtual
interface, you could have one application listen on the first IP and
the second one on the second IP using the same port number.
For UDP (Multicasts), multiple applications can subscribe to the same
port.
one application listening on a single port that's the reason why ports exist. To allow multiple applications to share the network without conflicts.
But there are ways to do what you requested:
You could write a master process, which possesses the port and notifies slave processes using some separation logic.
On Linux and BSD you can set up remapping rules that redirect packets from the visible port to different ones(such as listener app), again by using some separation logic(e.g. redirect according to network origin etc.).
Note: For TCP, you multiple applications can listen on the same socket by using SO_REUSEADDR option before binding but what this does is redirect the incoming connection to only one of the listeners.
I have a client server situation where the client opens a TCP socket to the server, and sometimes long periods of time will pass with no data being sent between them. I have encountered an issue where the server tries to send data to the client, and it seems to be successful, but the client never receives it, and after a few minutes, it looks like the client then gets disconnected.
Do I need to send some kind of keep alive packet every once in a while?
Edit: To note, this is with peers on the same computer. The computer is behind a NAT, that forwards a range of ports used to this computer. The client that connects with the server opens the connection via DNS. i.e. it uses the mydomain.net & port to connect.
On Windows, sockets with no data sent are a big source for trouble in many applications and must be handled correctly.
The problem is, that SO_KEEPALIVE's period can be set system-wide (otherwise, a default is useless two hours) or with the later winsock API.
Therefore, many applications do send some occasional byte of data every now and then (to be disregarded by the peer) only to make the network layer declare disconnection after ACK is not received (after all due retransmissions done by the layer and ack timeout).
Answering your question: no, the sockets do not disconnect automatically.
Yet, you must be careful with the above issue. What complicates it further is that testing this behavior is very hard. For example, if you set everything correctly and you expect to detect disconnection properly, you cannot test it by disconnecting the physical layer. This is because the NIC will sense the carrier loss and the socket layer will signal to close all application sockets that relied on it. A good way to test it is connect two computers with 3 legs and two switches in between, disconnecting the middle leg, thus preventing carrier loss but still physically disconnecting the machines.
There is a timeout built in to TCP but you can adjust it, See SendTimeout and ReciveTimeout of the Socket class, but I have a suspiciouion that is not your problem. A NAT router may also have a expiration time for TCP connections before it removes it from it's port forwarding table. If no traffic passes within the time of that timeout on the router it will block all incoming traffic (as it cleared the forwarding information from it's memory so it does not know what computer to send the traffic to), also the outgoing connection will likely have a different source port so the server may not recognize it as the same connection.
It's more secure to use Keep-alive option (SO_KEEPALIVE under linux), to prevent disconnect due to inactivity, but this may generate some extra packets.
This sample code do it under linux:
int val = 1;
....
// After creating the socket
if (setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (char *)&val, sizeof(val)))
fprintf(stderr, "setsockopt failure : %d", errno);
Regards.
TCP sockets don't automatically close at all. However TCP connections do. But if this is happening between peers in the same computer the connection should never be dropped as long as both peers exist and have their sockets open.
I have an async socket Server written in C#. Often, my users complain that they are disconnected from the without any reason. Other internet applications on the user's computer (MSN, SKYPE Etc.) appear to be well connected and therefore it does not seem to be an issue with their internet connection. Can anyone provide some information on how to make a connection between client and server rock solid (i mean prevent frequent disconnection) or where I can look for disconnection issues?
Network Address Translators (aka "home routers", "wireless gateways", etc.) that share an ISP assigned IP address among multiple computers may be at play here. NATs keep a table of active connections (sometimes known as a "port mapping table") that keeps track of how to translate the ip:port pair for active TCP connections and observed UDP sessions. There are some brands of routers that are known to clear the port mapping out of memory if it doesn't observe any traffic going over the TCP connection for some period of time. Apps like MSN Messenger and Skype mitigate this by periodically sending "keep alive" between client, server, and other persistent connections. These messages go in both directions.
You didn't give much information to go on, so the NAT problem is just one possibility out of many.
The best way to diagnose connectivity failures is to invest heavily in logging network messages (both on client and server), socket events, and all return values from the socket APIs. That way, if a customer reports a problem, he can send you his logs - and hopefully you can fetch the corresponding server logs for comparison. You can then diagnose where the disconnect originated from and/or what socket error reset the connection. There's a high probability something in your code is triggering the disconnect.
I assume you are talking about TCP here. The socket library which the clients of your application use is pretty much the same as the one that they use to connect to Skype or MSN. Usually the reason for disconnect is that one of the parties closes the socket (e.g. after connection is idle for some time). Check the places in your code where you close your connections.
I have a C# application that has been running fine for several years. It connects via a TCP/IP socket to a machine that sends me stock trade executions.
Recently, I've tried to deploy it to some machines in a new data center that is behind a hardware firewall, and I've started to see some weird dis-connects.
When a dis-connect happens, in my app (the client side), I see nothing unusual except that I stop receiving data over the socket. Wireshark confirms that no data is reaching the socket and my application's receive thread is blocking on the Receive() call when I stop it in the debugger. The socket shows as ESTABLISHED in netstat.
But from the server side, it looks like my client is dis-connecting. Looking at their logs, it looks like the socket on their end usually ends up with either (nRecvd=-1,errno=104) or (nRecvd=0,errno=11). (104 is connection reset by peer).
The dis-connect only seems to happen after a period of in-activity. I have solved this for now by implementing a heartbeat between my client and their server that just sends a short message every 20 seconds and gets a reply. This has caused the dis-connects to drop to 0 over the past few days.
At first, I figured that the hardware firewall was the problem. It was causing the socket to time out after in-activity. But the person in charge of the firewall claims that the timeout for connects on this port (8887) is 2160 minutes.
I am running Windows Server 2003 and .NET 3.5. The trades server is a linux machine (sles9 I believe though I'm not sure).
Any ideas on what might be going on? What could I do to debug this more given that I don't have any access to the firewall logs and no ability to change the code on the trade server?
Thanks,
Mike
What you describe is common, and it's common to implement a heartbeat to keep TCP sockets alive through such firewalls/gateways like you did.
That hardware might have hard 2160 minutes timeouts (in my experience 20-30 minutes is more common though) , but connections are usually dropped much more aggressively if there's any kind of load. Such firewalls have limited resources, and when they need more connection tracking they tend to drop the oldest connection tracked without any activity regardless of the hard timeout set.
If you want to debug this more, go sniff on the server side of the firewall and see what , if anyting, happens when the server gets a disconnect
I would setup wiresharp on both sides of the firewall to see what happens on TCP (and lower level).
And when the admin says the "timeout for connects" is something. Is that the timeout for an idle, established connection? Anything else does not make any sense I guess.
Also, are you using KeepAlive option for TCP? And is that forwarded by the firewall or not?
As I said, probably want to run wireshark on both sides of the firewall...