Socket Dis-Connects On One End, Firewall?

Socket Dis-Connects On One End, Firewall? - c#

I have a C# application that has been running fine for several years. It connects via a TCP/IP socket to a machine that sends me stock trade executions.
Recently, I've tried to deploy it to some machines in a new data center that is behind a hardware firewall, and I've started to see some weird dis-connects.
When a dis-connect happens, in my app (the client side), I see nothing unusual except that I stop receiving data over the socket. Wireshark confirms that no data is reaching the socket and my application's receive thread is blocking on the Receive() call when I stop it in the debugger. The socket shows as ESTABLISHED in netstat.
But from the server side, it looks like my client is dis-connecting. Looking at their logs, it looks like the socket on their end usually ends up with either (nRecvd=-1,errno=104) or (nRecvd=0,errno=11). (104 is connection reset by peer).
The dis-connect only seems to happen after a period of in-activity. I have solved this for now by implementing a heartbeat between my client and their server that just sends a short message every 20 seconds and gets a reply. This has caused the dis-connects to drop to 0 over the past few days.
At first, I figured that the hardware firewall was the problem. It was causing the socket to time out after in-activity. But the person in charge of the firewall claims that the timeout for connects on this port (8887) is 2160 minutes.
I am running Windows Server 2003 and .NET 3.5. The trades server is a linux machine (sles9 I believe though I'm not sure).
Any ideas on what might be going on? What could I do to debug this more given that I don't have any access to the firewall logs and no ability to change the code on the trade server?
Thanks,
Mike

What you describe is common, and it's common to implement a heartbeat to keep TCP sockets alive through such firewalls/gateways like you did.
That hardware might have hard 2160 minutes timeouts (in my experience 20-30 minutes is more common though) , but connections are usually dropped much more aggressively if there's any kind of load. Such firewalls have limited resources, and when they need more connection tracking they tend to drop the oldest connection tracked without any activity regardless of the hard timeout set.
If you want to debug this more, go sniff on the server side of the firewall and see what , if anyting, happens when the server gets a disconnect

I would setup wiresharp on both sides of the firewall to see what happens on TCP (and lower level).
And when the admin says the "timeout for connects" is something. Is that the timeout for an idle, established connection? Anything else does not make any sense I guess.
Also, are you using KeepAlive option for TCP? And is that forwarded by the firewall or not?
As I said, probably want to run wireshark on both sides of the firewall...

Related

Disconnecting socket after certain amount of time that no data recceived

I'm just making my server disconnect sockets that send no data after a certain amount of time, like 20 seconds.
I wonder whether working with timers is good for that or is there something special for that in socket library? Working with timers on the server for every socket makes it heavy.
Is it unsafe to make the client program handle that? For example every client disconnects after not sending data for a while.

This should be very easy to implement as part of your keep-alive checking. Unless you're completely ignoring the issue of dropped connections, you probably have a keep-alive system that periodically sends a message client->server and vice versa if there's been no communication. It should be trivial to add a simple "last data received time" value to the socket state, and then close the socket if it gets too far from DateTime.Now.
But the more important question is "Why?". The best solution depends on what your reasons for this are in the first place. Do you want to make the server usable to more clients by dumping those that aren't sending data? You'll probably make everything worse, since the timeouts for TCP sockets are more like 2-4 minutes, so when you disconnect the client after 20s and it reconnects, it will now be using two server-side ports, instead of one. Oops.
As for your comment on the deleted answer, and connection without data send and receive i think it gonna waste your threads points closer to your real problem - the amount of connection your server has should have no relation to how many threads the server uses to service those connections. So the only thing an open connection would "waste" is basically a bit of memory (depending on the amount of memory you need per connection, plus the socket cost with its buffers) and a TCP port. This can be an issue in some applications, but if you ever get to that level of "load", you can probably congratulate yourself already. You will much more likely run out of other resources before getting anywhere close to the port limits (assumption based on the fact that it sounds like you're making an MMO game). If you do really run into issues with those, you probably want to drop TCP anyway and rewrite everything in UDP (or preferrably, some ready solution on top of UDP).

The Client-Server model describes how a client should connect to a server and perform requests.
What I would recommend to you would be to connect to the server, and when you finish retrieving all the date you need, close the socket (on the client side).
The server will eventually find the socket's resources released, but you can check for the socket's Connected property to release the resources sooner.

When client disconnect to server then server can get disconnect event. it look like
socket.on('disconnect', function () {
// Disconnect event handling
});
On client side you also findout disconnect event .. in which you need to reconnect the server.

Is it possible that repeated OnConnected will call before previous OnDisconnected?

Imagine some spherical horse in a vacuum:
I lost control of my client application, maybe some error has happened. And I tried to re-enter to the hub immediately.
Is it possible, that OnConnected starts faster then OnDisconnected and I turn up twice on the server?
Edited:
Sorry, I didn't say than I meant SignalR library. I think if my application won't call stop() the server will wait about 30 seconds by default. And I can connect to the server again before OnDisconnected is called. Isn't it?

You'll have to take it from the client's side, also note that if you're using TCP the following would take place:
TCP ensures that your packets will arrive in the order they were sent. And so let's imagine that at the same moment the "horse" hit the space and the connection broke, your server is sending the next packet that would check the connection (if you implemented your server good enough that is).
Here, there's two things that may happen:
The client has already recovered and can respond in time. Meaning the interval in time when the connection had problems was small enough that the next packet from the server hasn't arrived yet. And so responding to your question, there's no disconnection in the first place.
The next packet from the server arrived but the client is not responding (the connection is severed). The server would instantly take note of this, raising the OnDisconnected event. If the client recovers virtually at the same time the server takes note, then it would initiate another connection (OnConnected).
So there's no chance that the client would turn twice. If any, the
disconnection interval will be small enough for the server not to
notice the problem in the first place.
Again, another protocol may behave differently. But TCP is will designed to guarantee a well established connection and communication between a server and clients.
It's worth mentioning that many of the communication frameworks (if not all) use TCP implicitly by default.

A client can connect a second time while the first connection is open (it will have a separate connection id though).
If the client doesn't manage to notify the server that it's closing the connection, the server will wait for a certain amount of time before removing the connection (DisconnectTimeout).
So in that case, if you restart the connection immediately, it will be a new logical connection to the server with a new connection id.
SignalR will also try to reconnect to the existing connection when it is lost, in which case it would retain its connection id once reconnected. I would recommend reading the entire article about SignalR connection lifetime events.

Do TCP sockets automatically close after some time if no data is sent?

I have a client server situation where the client opens a TCP socket to the server, and sometimes long periods of time will pass with no data being sent between them. I have encountered an issue where the server tries to send data to the client, and it seems to be successful, but the client never receives it, and after a few minutes, it looks like the client then gets disconnected.
Do I need to send some kind of keep alive packet every once in a while?
Edit: To note, this is with peers on the same computer. The computer is behind a NAT, that forwards a range of ports used to this computer. The client that connects with the server opens the connection via DNS. i.e. it uses the mydomain.net & port to connect.

On Windows, sockets with no data sent are a big source for trouble in many applications and must be handled correctly.
The problem is, that SO_KEEPALIVE's period can be set system-wide (otherwise, a default is useless two hours) or with the later winsock API.
Therefore, many applications do send some occasional byte of data every now and then (to be disregarded by the peer) only to make the network layer declare disconnection after ACK is not received (after all due retransmissions done by the layer and ack timeout).
Answering your question: no, the sockets do not disconnect automatically.
Yet, you must be careful with the above issue. What complicates it further is that testing this behavior is very hard. For example, if you set everything correctly and you expect to detect disconnection properly, you cannot test it by disconnecting the physical layer. This is because the NIC will sense the carrier loss and the socket layer will signal to close all application sockets that relied on it. A good way to test it is connect two computers with 3 legs and two switches in between, disconnecting the middle leg, thus preventing carrier loss but still physically disconnecting the machines.

There is a timeout built in to TCP but you can adjust it, See SendTimeout and ReciveTimeout of the Socket class, but I have a suspiciouion that is not your problem. A NAT router may also have a expiration time for TCP connections before it removes it from it's port forwarding table. If no traffic passes within the time of that timeout on the router it will block all incoming traffic (as it cleared the forwarding information from it's memory so it does not know what computer to send the traffic to), also the outgoing connection will likely have a different source port so the server may not recognize it as the same connection.

It's more secure to use Keep-alive option (SO_KEEPALIVE under linux), to prevent disconnect due to inactivity, but this may generate some extra packets.
This sample code do it under linux:
int val = 1;
....
// After creating the socket
if (setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (char *)&val, sizeof(val)))
fprintf(stderr, "setsockopt failure : %d", errno);
Regards.

TCP sockets don't automatically close at all. However TCP connections do. But if this is happening between peers in the same computer the connection should never be dropped as long as both peers exist and have their sockets open.

High performance C# TCP server problem: No connection could be made because the target machine actively refused it

I have developed a TCP server according to your advises: High performance TCP server in C#
It is based on asynchron pattern.
I also developed a stress test application to test its performance. My server can get thousands of connections paralelly from my stress test app, can parse data and save it to my database.
When I stress my server, I can get "System.Net.Sockets.SocketException "No connection could be made because the target machine actively refused it" error from my server, so I have to reconnect to it. If I test it with 5000 concurrent connections, I have to try connect again because of this problem 10-20% of the connections, if I test it with 10K concurrent connections, it can be 30-40%. Sometimes it can be - very rarely - more, than 50%. It seems it can not handle connection accepts: I make new connections from my stress test as heavily as my test machine can - about 120 connections/sec.
So, what can cause this kind of exception? How to handle it? What to do in server side implementation to avoid this problem? How to tune TCP connection accept?
Thanks in advance!

You might be running out of available ports every now and then. You can view this easily using SysInternals' TcpView utility.
On Windows, when you release a port, it doesn't immediately go into an available state, but instead sits in a TIME_WAIT state for some interval. Until it leaves this state, no app can use this port. The time delay, the max number of ports, and the available port ranges are all different to the OS, XP vs Win7 vs Win2008 Server.
There are two registry entries that can reduce this time interval:
HKLM/System/CurrentControlSet/Services/Tcpip/Parameters/TCPTimedWaitDelay
and increase the max number of ports that can be opened by an app:
HKLM/System/CurrentControlSet/Services/Tcpip/Parameters/MaxUserPort
EDIT: MaxFreeTcbs seems to be a third setting which could help (I haven't tried this yet), mentioned in this TechNet article which has more advice on tracking down odd network problems. HTH.

You are making connections faster than the software can listen for new connections, or in other words you are reaching the connections per second limit of that port. I think you can double the amount of connections per second by listening to a second port, client side you should just reconnect when you get the exception.
There are also limits applied to the amount of connection, for these see Chris O's answer.

What happens to sockets when I unplug a network cable?

I have three applications that talk to each other using sockets. They can all live on their own machines but they can also share a machine. Right now I'm having two of them on the same and the third on its own machine. I'm trying to make my communication bullet proof so I unplug cables and kill the applications to make sure everything works as intended.
Here's a quick sketch of the thing:
Now, when I unplug the network cable to PC2 (the red connection "Con B"), the internal connection stops talking (the blue connection "Con A"). I send stuff from "App 1" on the socket that never gets to "App 2".
I have made a mechanism that discovers this and disconnects and then reconnects and after that I can unplug the cable all I want and "Con A" just keeps working. It's only that first time.
I have confirmed having communication through "Con A" before disconnecting "Con B".
I connect and reconnect exactly the same way, it's the same code, so there's no difference.
What's happening?
Additional information trigged by answers:
PC 1 and PC 2 share addresses down to the last byte.
I have an internal keep alive mechanism, I send a message and expect a response every 10 seconds.
When I kill App 3, this does not happen, only when unplugging the cable.

What address are you using for "Con A"? If you are using an address that is bound to the external network adapter, even though you're talking to the same machine, then what you describe could happen.
What you can do is use the address localhost (127.0.0.1) for "Con A", which should be completely independent of what happens on the external network.

On some platforms (windows) pulling the network cable tells the network stack to activly invalidate open socket connections associated with the interface.
In this scenario pulling a network cable is actually a bad test because it provides positive feedback to your application that it may not receive in a real life situation.
One common error for people to make when writing client/server applications is to not incporporate an application layer keep-alive or at least enable keepalives at the transport layer. An application recv()ing data can otherwise be forever oblivious to any failure condition until it write()s and the write fails due to transport layer timeout.

Pulling the network cable has different effects depending on the OS you're running. As another poster said, Windows detects it and invalidates any existing connections. Your application should get a connection closed message in that case.
My Linux server on the other hand deals with it quite gracefully. After an extended (30-40 seconds) de-cabling the other day the SSH connection from my laptop to the server was still happily available and responsive.
As long as the cable is not unplugged longer than the TCP timeouts the stack should be able to buffer up packets and retransmit them as soon as possible. TCP is designed for that. If you're not using TCP then the packets will fall out of the Ethernet hole and evaporate into the atmosphere.
#einstein: If you're using select() or derivatives it pays to never select with a NULL timeout. Always have a sensible timeout and check the socket status if it expires.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.