I have a real head scratcher here (for me).
I have the following setup:
Kubernetes Cluster in Azure (linux VMs)
ASP.NET docker image with TCP server
Software simulating TCP clients
RabbitMQ for notifying incoming messages
Peer behaviour:
The client sends its heartbeat every 10 minutes
The server sends a keep-alive every 5 minutes (nginx-ingress kills connections after being idle for ~10 minutes)
I am testing the performance of my new TCP server. The previous one, written in Java, could easily handle the load I am about to explain. For some reason, the new TCP server, written in C#, loses the connection after about 10-15 minutes.
Here is what I do:
Use the simulator to start 500 clients with a ramp-up of 300s
All connections are there established correctly
Most of the time, the first heartbeats and keep-alives are sent and received
After 10+ minutes, I receive 0 bytes from Stream.EndRead() on BOTH ends of the connection.
This is the piece of code that is triggering the error.
var numberOfBytesRead = Stream.EndRead(result);
if (numberOfBytesRead == 0)
{
This.Close("no bytes read").Sync(); //this is where I end up
return;
}
In my logging on the server side, I see lots of disconnected ('no bytes read') lines and a lot of exceptions indicating that RabbitMQ is too busy: None of the specified endpoints were reachable.
My guesses would be that the Azure Load Balancer just bounces the connections, but that does not happen with the Java TCP server. Or that the ASP.NET environment is missing some configuration.
Does anyone know how this is happening, and more important, how to fix this?
--UPDATE #1--
I just used 250 devices and that worked perfectly.
I halved the ramp-up and that was a problem again. So this seems to be a performance issue. A component in my chain is too busy.
--UPDATE #2--
I disabled the publishing to RabbitMQ and it kept working now. Now I have to fix the RabbitMQ performance.
I ended up processing the incoming data in a new Task.
This is my code now:
public void ReceiveAsyncLoop(IAsyncResult? result = null)
{
try
{
if (result != null)
{
var numberOfBytesRead = Stream.EndRead(result);
if (numberOfBytesRead == 0)
{
This.Close("no bytes read").Sync();
return;
}
var newSegment = new ArraySegment<byte>(Buffer.Array!, Buffer.Offset, numberOfBytesRead);
// This.OnDataReceived(newSegment)); <-- previously this
Task.Run(() => This.OnDataReceived(newSegment));
}
Stream.BeginRead(Buffer.Array!, Buffer.Offset, Buffer.Count, ReadingClient.ReceiveAsyncLoop, null);
}
catch (ObjectDisposedException) { /*ILB*/ }
catch (Exception ex)
{
Log.Exception(ex, $"000001: {ex.Message}");
}
}
Now, everything is super fast.
Related
We have a .NET framework 4.5.2 service running that connects to a WebSphere MQ Server (v7.5.0.9). Our service needs to connect to a Queue and put a message. It doesn’t need to receive anything after putting the message. We have this set up in a Test and Production environment. We've had this running for a while without any issues. Now we are facing an error only in the Production environments. The same code works fine in the Test environment. But Production is showing very inconstant results and we are unable to recreate the issue anywhere else.
The only way we are currently able to get it working is by restarting the .NET service multiple times until the service is able to connect to all Queue Managers. Every time we restart the service we get a different result. We may start the service and it would not be able to connect to any of the Queue Managers and then we restart again and 2 of the Queue Managers are able to connect. Once the connection has been made it is stable, the service will be able to put messages in any of the Queues without it ever disconnecting.
Some of the things we have tried
Before this issue, we were using the SYSTEM.DEF.SVRCONN channel to connect to the Queue Managers but we have changed that to use a "Server Connection" channel we have created in each Queue Manager. We can see the new channels are in an Active state but only if it is able to make the initial connection.
Originally we were connecting to a Queue Manager, putting a message, and closing the Queue but we were leaving the Queue Manager open. We have tried to Close and Disconnect the Queue Managers after every message but that seemed to make things worse.
The .Net service and Websphere are on the same box but we have tried disabling the windows firewall on the server in case there was something blocking it. That didn’t seem to make a difference either.
My background is in .NET so I'm not very familiar with the WebSphere UI and even less with the CLI. Any ideas on places to look or commands to run to get any insight on what is going on would be helpful.
The only error we get in WebSphere is "CompCode: 2, Reason: 2009" but in the service we are catching the exception, it says "Error Message: MQRC_CONNECTION_BROKEN"
Below is the code used to connect and send a message. We are using the amqmdnet.dll
try
{
properties = new Hashtable();
properties.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
properties.Add(MQC.HOST_NAME_PROPERTY, hostName);
properties.Add(MQC.PORT_PROPERTY, port);
properties.Add(MQC.CHANNEL_PROPERTY, channelName);
if (!QueueManagers.ContainsKey(queueManagerName))
{
queueManager = new MQQueueManager(queueManagerName, properties);
QueueManagers[queueManagerName] = queueManager;
}
else
{
queueManager = QueueManagers[queueManagerName];
if (!queueManager.IsConnected)
{
queueManager = new MQQueueManager(queueManagerName, properties);
QueueManagers[queueManagerName] = queueManager;
}
}
queue = queueManager.AccessQueue(queueName, MQC.MQOO_OUTPUT + MQC.MQOO_FAIL_IF_QUIESCING);
message = new MQMessage();
message.ClearMessage();
message.Format = MQC.MQFMT_STRING;
message.Encoding = MQC.MQENC_NATIVE;
message.CorrelationId = MQC.MQCI_NONE;
message.CharacterSet = MQC.MQCCSI_Q_MGR;
message.WriteString(messageString);
queue.Put(message);
}
catch (Exception ex)
{
sentToMQServer = false;
QueueManagers.TryRemove(queueManagerName, out var mgr);
queueManager?.Close();
queueManager?.Disconnect();
if (retry)
SendToMQServer(remoteClient, Message, false);
}
finally
{
message = null;
//QueueManagers.TryRemove(queueManagerName, out var mgr);
if (properties != null)
{
properties.Clear();
properties = null;
}
if (queue != null)
{
queue.Close();
queue = null;
}
//queueManager.Close();
//queueManager.Disconnect();
}
I wrote a tcp server, each time a client connection accepted, the socket instance returned by Accept or EndAccept which is called handler and many other information gathered in object called TcpClientConnection, I need to determine whether a connection is connected or not at some specific interval times, the Socket.Connected property is not reliable and according to the documentation i should use the Poll method with SelectRead option to do it.
with a test scenario i unplug the client cable, and wait for broken alarm which is built upon the handler.Poll(1, SelectMode.SelectRead), it should return true but never it happened.
This is a fundamentally caused by the way the TCP and IP protocols work. The only way to detect if a connection is disconnected is to send some data over the connection. The underlying TCP protocol will cause acknowledgements to be sent from the receiver back to the sender thereby allowing a broken connection to be detected.
These articles provide some more information
Do I need to heartbeat to keep a TCP connection open?
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
According to the documentation of Socket.Poll:
This method cannot detect certain kinds of connection problems, such as a broken network cable, or that the remote host was shut down ungracefully. You must attempt to send or receive data to detect these kinds of errors.
In another words - Poll is useful for checking if some data arrived and are available to your local OS networking stack.
If you'd need to detect the connection issues you need to call blocking read (e.g. Socket.Receive)
You can also build a simple initialization miniprotocol to exchange some agreed 'hello' back and forth message.
Here is a simplified example how you can do it:
private bool VerifyConnection(Socket socket)
{
byte[] b = new byte[1];
try
{
if (socket.Receive(b, 0, 1, SocketFlags.None) == 0)
throw new SocketException(System.Convert.ToInt32(SocketError.ConnectionReset));
socket.NoDelay = true;
socket.Send(new byte[1] { SocketHelper.HelloByte });
socket.NoDelay = false;
}
catch (Exception e)
{
this._logger.LogException(LogLevel.Fatal, e, "Attempt to connect (from: [{0}]), but encountered error during reading initialization message", socket.RemoteEndPoint);
socket.TryCloseSocket(this._logger);
return false;
}
if (b[0] != SocketHelper.HelloByte)
{
this._logger.Log(LogLevel.Fatal,
"Attempt to connect (from: [{0}]), but incorrect initialization byte sent: [{1}], Ignoring the attempt",
socket.RemoteEndPoint, b[0]);
socket.TryCloseSocket(this._logger);
return false;
}
return true;
}
I am using TCP as a mechanism for keep alive here is my code:
Client
TcpClient keepAliveTcpClient = new TcpClient();
keepAliveTcpClient.Connect(HostId, tcpPort);
//this 'read' is supposed to blocked till a legal disconnect is requested
//or till the server unexpectedly dissapears
int numberOfByptes = keepAliveTcpClient.GetStream().Read(new byte[10], 0, 10);
//more client code...
Server
TcpListener _tcpListener = new TcpListener(IPAddress.Any, 1000);
_tcpListener.Start();
_tcpClient = _tcpListener.AcceptTcpClient();
Tracer.Write(Tracer.TraceLevel.INFO, "get a client");
buffer = new byte[10];
numOfBytes = _tcpClient.GetStream().Read(buffer, 0, buffer.Length);
if(numOfBytes==0)
{
//shouldn't reach here unless the connection is close...
}
I put only the relevant code... Now what that happens is that the client code is block on read as expected, but the server read return immediately with numOfBytes equals to 0, even if I retry to do read on the server it return immediately... but the client read is still block! so in the server side I think mistakenly that the client is disconnected from the server but the client thinks it connected to the server... someone can tell how it is possible? or what is wrong with my mechanism?
Edit: After a failure I wrote to the log these properties:
_tcpClient: _tcpClient.Connected=true
Socket: (_tcpClient.Client properties)
_tcpClient.Client.Available=0
_tcpClient.Client.Blocking=true
_tcpClient.Client.Connected=true
_tcpClient.Client.IsBound=true
Stream details
_tcpClient.GetStream().DataAvailable=false;
Even when correctly implemented, this approach will only detect some remote server failures. Consider the case where the intervening network partitions the two machines. Then, only when the underlying TCP stack sends a transport level keep-alive will the system detect the failure. Keepalive is a good description of the problem. [Does a TCP socket connection have a “keep alive”?] 2 is a companion question. The RFC indicates the functionality is optional.
The only certain way to reliably confirm that the other party is still alive is to occasionally send actual data between the two endpoints. This will result in TCP promptly detecting the failure and reporting it back to the application.
Maybe something that will give clue: it happens only when 10 or more clients
connect the server the same time(the server listen to 10 or more ports).
If you're writing this code on Windows 7/8, you may be running into a connection limit issue. Microsoft's license allows 20 concurrent connections, but the wording is very specific:
[Start->Run->winver, click "Microsoft Software License Terms"]
3e. Device Connections. You may allow up to 20 other devices to access software installed on the licensed computer to use only File Services, Print Services, Internet Information Services and Internet Connection Sharing and Telephony Services.
Since what you're doing isn't file, print, IIS, ICS, or telephony, it's possible that the previous connection limit of 10 from XP/Vista is still enforced in these circumstances. Set a limit of concurrent connections to 9 in your code temporarily, and see if it keeps happening.
The way I am interpretting the MSDN remarks it seems that behavior is expected. If you have no data the Read the method returns.
With that in mind I think what I would try is to send data at a specified interval like some of the previous suggestions along with a "timeout" of some sort. If you don't see the "ping" within your designated interval you could fail the keepalive. With TCP you have to keep in mind that there is no requirement to deem a connection "broken" just because you aren't seeing data. You could completely unplug the network cables and the connection will still be considered good up until the point that you send some data. Once you send data you'll see one of 2 behaviors. Either you'll never see a response (listening machine was shutdown?) or you'll get an "ack-reset" (listening machine is no longer listening on that particular socket)
https://msdn.microsoft.com/en-us/library/vstudio/system.net.sockets.networkstream.read(v=vs.100).aspx
Remarks:
This method reads data into the buffer parameter and returns the number of bytes successfully read. If no data is available for reading, the Read method returns 0. The Read operation reads as much data as is available, up to the number of bytes specified by the size parameter. If the remote host shuts down the connection, and all available data has been received, the Read method completes immediately and return zero bytes.
As I can see you are reading data on both sides, server and client. You need to write some data from the server to the client, to ensure that your client will have something to read. You can find a small test program below (The Task stuff is just to run the Server and Client in the same program).
class Program
{
private static Task _tcpServerTask;
private const int ServerPort = 1000;
static void Main(string[] args)
{
StartTcpServer();
KeepAlive();
Console.ReadKey();
}
private static void StartTcpServer()
{
_tcpServerTask = new Task(() =>
{
var tcpListener = new TcpListener(IPAddress.Any, ServerPort);
tcpListener.Start();
var tcpClient = tcpListener.AcceptTcpClient();
Console.WriteLine("Server got client ...");
using (var stream = tcpClient.GetStream())
{
const string message = "Stay alive!!!";
var arrayMessage = Encoding.UTF8.GetBytes(message);
stream.Write(arrayMessage, 0, arrayMessage.Length);
}
tcpListener.Stop();
});
_tcpServerTask.Start();
}
private static void KeepAlive()
{
var tcpClient = new TcpClient();
tcpClient.Connect("127.0.0.1", ServerPort);
using (var stream = tcpClient.GetStream())
{
var buffer = new byte[16];
while (stream.Read(buffer, 0, buffer.Length) != 0)
Console.WriteLine("Client received: {0} ", Encoding.UTF8.GetString(buffer));
}
}
}
I have a TcpClient which i am connecting to the machine and everything is working fine.Now one extra step i want to monitor the connection status on very 60 seconds by the help of timer.As per the basic research of the topic i got to know that there is no direct way to test it .So i tried to get it by the response of the recent message sent to the machine when the application goes out of the network.
Here is the code..
// Find out whether the socket is connected to the remote host.
//Send a message to Machine
try
{
byte[] notify = Encoding.ASCII.GetBytes("Hello");
stream.Write(notify, 0, notify.Length);
}catch { }
//Check if it reached to machine or failed
bool getConnectionStatus = client.Connected;
if (getConnectionStatus == true)
{
//Do nothing
}
else
{
//Stop the thread
_shutdownEvent.WaitOne(0);
_thread.Abort();
//Start Again
_thread = new Thread(DoWork);
_thread.Start();
}
But the most astonishing thing that is happening in this case is that if the machine is out of the network then also while writing the first time it is able to write and and that's why connection status is coming as connected although it is out of the network.Second time when it is trying to send data it is failing and like expected status is disconnected.
The main problem that i am facing is that once it is disconnected from the network why it is able to send the data .Due to this i loosing all the buffer data which is stored in the machine by that time when network goes off.
Please help me..
Under the hood, the Write operation just sends the data to the network layer; you may get a "success" result before an attempt is made to transmit the data. The network layer may even delay sending the data for a while if the data is small, in an attempt to send one batch of a few messages at once.
What Alex K. said with a few words is that the most reliable way to check a network connection is to wait for a response. If no such response is received within a certain amount of time, the connection is lost.
Lets say you keep using "Hello" and the server should respond with "Yeah!". On the client side, you could extend your current code with:
try
{
byte[] notify = Encoding.ASCII.GetBytes("Hello");
stream.Write(notify, 0, notify.Length);
byte[] notifyResult = new byte[5];
int bytesRead = stream.Read(notifyResult, 0, 5);
if (bytesRead == 0)
{
// No network error, but server has disconnected
}
// Arriving here, notifyResult should contain ASCII "Yeah!"
}
catch (SocketException)
{
// Network error
}
On the server, you should recognize the "Hello" being sent, and simply respond with "Yeah!". I don't know what your server currently does, but it could be something similar to:
switch (receivedMessage)
{
case "Hello":
stream.Write(Encoding.ASCII.GetBytes("Yeah!"), 0, 5);
break;
}
Note that you should consider wrapping your messages in information packets, ie:
<Message Type> <Message> <Terminator Character>
ie. "KHello\n"
Or
<Size of Message> <Message Type> <Message>
ie. "0005KHello"
Where message type 'K' is a Keep-alive message, the newline "\n" the terminator character and "0005" the message length excluding the message type.
This way the server will always be able to tell whether it has received the full message, and the message type could indicate whether "Hello" was sent as data or as a keep-alive packet.
Is there a reason why a Socket should close by itself, after 2h? I am receiving data every second from this socket, and writing back some tiny "keep-alive" data every 30s.
Before sending, I check if socket is still connected using the following method:
public bool IsSocketReadyForWriting(Socket s)
{
try
{
if (!s.Connected)
{
Log.Info("Socket.Connected was false");
return false;
}
// following line will throw if socket disconnected
bool poll = s.Poll(2000, SelectMode.SelectWrite);
if (!poll)
{
try
{
// if poll is false, socket is closed
Log.Info("poll is false");
this.Close();
}
catch { }
return false;
}
Log.Debug("still connected");
return true;
}
catch (Exception ex)
{
Log.Error("Error while checking if socket connected", ex);
return false;
}
}
Everything works fine for about 2h, then suddenly Socket.Poll returns false, and the Socket gets closed.
Is there a setting which controls this, or am I doing something really wrong?
[Edit]
Forgot to mention: I control both server and client side of the link. These are both C# apps, one of them creates a listening socket, the other one opens a connection and sends data. They communicate without problems for 2h (no memory leaks and stuff), then the socket closes.
When this happens, I can reconnect the socket again easily, but I am just wandering if anyone knows why could be the reason for this.
By default a TCP socket is writable when there's at least one byte of space available in the socket send buffer. To reverse that - the socket is not writable when there's enough unacknowledged data sitting in the "output queue".
That said, pull out wireshark or whatever Microsoft provides for packet sniffing and see what's going on on the wire. Are your heartbeat chunks getting ACK-ed? Does the receiver window stay open or does it go to zero? Or are you just getting explicit RST or a FIN from some intermediate switch?
One way to mitigate temporary clogged pipe is to increase the send buffer size, which is kind of tiny by default on Windows - 8192 iirc. See the setsockopt (.NET probably has a version of that) and the SO_SNDBUF option.
Could be the server that is closing the connection? Do you have control over it?