IBM MQ has an automatic client renonnect functionality, and it has a default timeout of 30 minutes. After 30 minutes it stops trying to reconnect (source - p35).
I want to change the timeout so it lasts a longer time retrying (for example 2 hours). I suppose I can use the property XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT for this because it's available in the XMSC class.
Testing
I can simulate a connection failure by blocking the port 1414 where the client application makes connection to IBM MQ. And for testing purposes, I lower the timeout value to 5 minutes (instead of 30 minutes).
What I can see in the logging is that the client application receives XMSException with reason code 2544 (reconnecting):
IBM.XMS.XMSException: MQ delivered an asynchronous event with completion code 1, and reason 2544.
XMSWMQ2014.explanation
XMSWMQ2014.useraction
Linked Exception : CompCode: 1, Reason: 2544
This occurs for 30 minutes, and after that, I get an XMSException with reason code 2009 (connection broken). And auto reconnect fails.
XMSException occurred: IBM.XMS.XMSException: MQ delivered an asynchronous event with completion code 2, and reason 2009.
XMSWMQ2014.explanation
XMSWMQ2014.useraction
Linked Exception : CompCode: 2, Reason: 2009
I can conclude that changing the timeout value has no effect... Am I configuring the reconnect timeout in a wrong way?
Below, there is a code snippet:
XMSFactoryFactory factory = XMSFactoryFactory.GetInstance(XMSC.CT_WMQ);
IConnectionFactory connectionFactory = factory.CreateConnectionFactory();
connectionFactory.SetStringProperty(XMSC.WMQ_HOST_NAME, "hostname");
connectionFactory.SetIntProperty(XMSC.WMQ_PORT, 1414);
connectionFactory.SetStringProperty(XMSC.WMQ_CHANNEL, "channel_name");
connectionFactory.SetIntProperty(XMSC.WMQ_CONNECTION_MODE, XMSC.WMQ_CM_CLIENT_UNMANAGED);
connectionFactory.SetStringProperty(XMSC.WMQ_QUEUE_MANAGER, "*");
connectionFactory.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_OPTIONS, XMSC.WMQ_CLIENT_RECONNECT_Q_MGR);
connectionFactory.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT, 300); //300 seconds = 5 minutes
IConnection conn = connectionFactory.CreateConnection();
conn.Start();
IBM MQ Client version: 8.0.0.5
Notes
If I unblock the port in time, it can successfully reconnect.
Official IBM documentation: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_8.0.0/com.ibm.mq.msc.doc/xms_automatic_client_reconnection.htm
I've found a way to accomplish this, but unfortunately not by code...
The reconnect timeout can be set in mqclient.ini.
Example:
CHANNELS:
MQReconnectTimeout = 14400
With this configuration applied, the client application should keep retrying for 4 hours.
Related
So I'd like to consume SSE endpoint with exact connection timeout on .NET 4.8.
I use this lib https://github.com/launchdarkly/dotnet-eventsource
It uses HttpClient interally.
I tried to set ResponseStartTimeout(alias to HttpClient.Timeout) but it's appiles to all processing from connection to read response not only connecting.
I wonder if it's possible to throw exception if connection failed in first 30 seconds but keep it for a few hours if it's established. Just like timeouts in http clients in Java works.
WinHttpHandler doesn't contain that property. SocketsHttpHandler is not supported in 4.8.
Correct me if I'm wrong but when I keep default 100 seconds timeout it sends new request every 100 seconds and so wasting bandwitch like polling.
I have been trying to set up a gRPC API capable of streaming events to a client. Basically, after a client has subscribed, the server will use gRPC's "Server Streaming" feature to send any new event to the client.
I expect there to be periods of inactivity, where the connection should remain active. However, with my current setup it seems Nginx is cutting the connection after 60 seconds of inactivity with the following exception at the client:
Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. IOException: The response ended prematurely, with at least 9 additional bytes expected.", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.IO.IOException: The request was aborted.
---> System.IO.IOException: The response ended prematurely, with at least 9 additional bytes expected.
The question is why? and how can I prevent it?
My setup
The API is built in ASP.NET Core 3 (will probably upgrade to .NET 5 soon) and is running in a Docker container on a Digital Ocean server.
Nginx is also running in a Docker container on the server and works as a reverse proxy for the API (among other things).
The client is a simple C# client written in .NET Core and is run locally.
What have I tried?
I have tried to connect to the Docker image directly on the server using grpc_cli (bypassing Nginx) where the connection remain active for long periods of inactivity without any issues. So I can't see what else it can be, except Nginx. Also, most of Nginx default timeout values seem to be 60 seconds.
I have tried these Nginx settings and various combinations of them, yet haven't found the right one (or the right combination) yet:
location /commands.CommandService/ {
grpc_pass grpc://commandApi;
grpc_socket_keepalive on;
grpc_read_timeout 3000s; # These are recommended everywhere, but I haven't had any success
grpc_send_timeout 3000s; #
grpc_next_upstream_timeout 0;
proxy_request_buffering off;
proxy_buffering off;
proxy_connect_timeout 3000s;
proxy_send_timeout 3000s;
proxy_read_timeout 3000s;
proxy_socket_keepalive on;
keepalive_timeout 90s;
send_timeout 90s;
client_body_timeout 3000s;
}
The most common suggestion for people with similar issues is to use grpc_read_timeout and grpc_send_timeout, but they don't work for me. I guess it makes sense since I'm not actively sending/receiving anything.
My client code looks like this:
var httpClientHandler = new HttpClientHandler();
var channel = GrpcChannel.ForAddress("https://myapi.com", new GrpcChannelOptions()
{
HttpClient = new HttpClient(httpClientHandler) { Timeout = Timeout.InfiniteTimeSpan },
});
var commandService = channel.CreateGrpcService<ICommandService>();
var request = new CommandSubscriptionRequest()
{
HandlerId = _handlerId
};
var sd = new CancellationTokenSource();
var r = new CallContext(callOptions: new CallOptions(deadline: null, cancellationToken: sd.Token));
await foreach (var command in commandService.SubscribeCommandsAsync(request, r))
{
Console.WriteLine("Processing command: " + command.Id);
}
return channel;
To be clear, the call to the API works and I can receive commands from the server. If I just keep sending commands from the API, everything is working beautifully. But as soon as I stop for 60 seconds (I have timed it), the connection breaks.
A possible workaround would be to just keep sending a kind of heartbeat to keep the connection open, but I would prefer not to.
Does anyone know how I can fix it? Am I missing something obvious?
UPDATE: Turns out it wasn't Nginx. After I updated the API and the client to .NET 5 the problem disappeared. I can't say in what version this was fixed, but at least it's gone in .NET 5.
Not sure this is an Nginx issue, looks like a client connection problem.
Your results look very similar to an issue I had, that should have been fixed in .net 3.0 patch. Try updating to a newer version of .NET and see if that fixes the problem.
Alternatively, it could be a problem with the max number of connections. Try setting the MaxConcurrentConnections for the kestrel server (in appsettings.json):
{
"Kestrel": {
"Limits": {
"MaxConcurrentConnections": 100,
"MaxConcurrentUpgradedConnections": 100
}
}
}
I am currently using EventingBasicConsumer from RabbitMQClient.dll C# client, we spawn a different thread to handle each message that is delivered to the consumer.
We encountered a strange behavior, the RabbitMQ server closes connections at
times with the error missed heartbeats from client, timeout: 60s. Few moments later the client reports an error saying Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=541. I also see this error client unexpectedly closed TCP connection happening more frequently.
In some situations the clients may take more than 60 seconds to process one job request and this error happens under such conditions.
Is it required that a job be processed within 60 seconds ?, because for our process this can vary between 30 seconds to 5 minutes.
RabbitMQ server: 3.6.6
RabbitMQ.Client.dll (C# client): RabbitMQ.Client.4.1.1
Any insight into this issue is greatly appreciated.
I used to run much longer jobs (minutes) with EasyNetQ. It's more high-level client that wraps RabbitMQ.Client.
For me, the reason of these errors is something like Evk wrote in this comment. I would try EasyNetQ as it likely has fetching of messages decoupled from the handling process.
You can increase the TTL timeout in RabbitMq both per queue and per message
IBasicProperties mqProps = model.CreateBasicProperties();
mqProps.ContentType = "text/plain";
mqProps.DeliveryMode = 2;
mqProps.Expiration = "300000"
model.BasicPublish(exchangeName,
routingKey, mqProps,
messageBodyBytes);
Documentaion is at
https://www.rabbitmq.com/ttl.html
But I think you're better off by rewriting it to Async pattern for the actual processing of the message.
This might give you inspiration for doing async message processing with RabbitMq
https://codereview.stackexchange.com/questions/42836/listen-to-multiple-rabbitmq-queue-by-task-and-process-the-message
And in this question there are quite allot of information too on async message consumption.
multi-threading based RabbitMQ consumer
We have written a simple WebSocket client using System.Net.WebSockets. The KeepAliveInterval on the ClientWebSocket is set to 30 seconds.
The connection is opened successfully and traffic flows as expected in both directions, or if the connection is idle, the client sends Pong requests every 30 seconds to the server (visible in Wireshark).
But after 100 seconds the connection is abruptly terminated due to the TCP socket being closed at the client end (watching in Wireshark we see the client send a FIN). The server responds with a 1001 Going Away before closing the socket.
After a lot of digging we have tracked down the cause and found a rather heavy-handed workaround. Despite a lot of Google and Stack Overflow searching we have only seen a couple of other examples of people posting about the problem and nobody with an answer, so I'm posting this to save others the pain and in the hope that someone may be able to suggest a better workaround.
The source of the 100 second timeout is that the WebSocket uses a System.Net.ServicePoint, which has a MaxIdleTime property to allow idle sockets to be closed. On opening the WebSocket if there is an existing ServicePoint for the Uri it will use that, with whatever the MaxIdleTime property was set to on creation. If not, a new ServicePoint instance will be created, with MaxIdleTime set from the current value of the System.Net.ServicePointManager MaxServicePointIdleTime property (which defaults to 100,000 milliseconds).
The issue is that neither WebSocket traffic nor WebSocket keep-alives (Ping/Pong) appear to register as traffic as far as the ServicePoint idle timer is concerned. So exactly 100 seconds after opening the WebSocket it just gets torn down, despite traffic or keep-alives.
Our hunch is that this may be because the WebSocket starts life as an HTTP request which is then upgraded to a websocket. It appears that the idle timer is only looking for HTTP traffic. If that is indeed what is happening that seems like a major bug in the System.Net.WebSockets implementation.
The workaround we are using is to set the MaxIdleTime on the ServicePoint to int.MaxValue. This allows the WebSocket to stay open indefinitely. But the downside is that this value applies to any other connections for that ServicePoint. In our context (which is a Load test using Visual Studio Web and Load testing) we have other (HTTP) connections open for the same ServicePoint, and in fact there is already an active ServicePoint instance by the time that we open our WebSocket. This means that after we update the MaxIdleTime, all HTTP connections for the Load test will have no idle timeout. This doesn't feel quite comfortable, although in practice the web server should be closing idle connections anyway.
We also briefly explore whether we could create a new ServicePoint instance reserved just for our WebSocket connection, but couldn't see a clean way of doing that.
One other little twist which made this harder to track down is that although the System.Net.ServicePointManager MaxServicePointIdleTime property defaults to 100 seconds, Visual Studio is overriding this value and setting it to 120 seconds - which made it harder to search for.
I ran into this issue this week. Your workaround got me pointed in the right direction, but I believe I've narrowed down the root cause.
If a "Content-Length: 0" header is included in the "101 Switching Protocols" response from a WebSocket server, WebSocketClient gets confused and schedules the connection for cleanup in 100 seconds.
Here's the offending code from the .Net Reference Source:
//if the returned contentlength is zero, preemptively invoke calldone on the stream.
//this will wake up any pending reads.
if (m_ContentLength == 0 && m_ConnectStream is ConnectStream) {
((ConnectStream)m_ConnectStream).CallDone();
}
According to RFC 7230 Section 3.3.2, Content-Length is prohibited in 1xx (Informational) messages, but I've found it mistakenly included in some server implementations.
For additional details, including some sample code for diagnosing ServicePoint issues, see this thread: https://github.com/ably/ably-dotnet/issues/107
I set the KeepAliveInterval for the socket to 0 like this:
theSocket.Options.KeepAliveInterval = TimeSpan.Zero;
That eliminated the problem of the websocket shutting down when the timeout was reached. But then again, it also probably turns off the send of ping messages altogether.
I studied this issue these days, compared capture packages in Wireshark(webclient-client of python and WebSocketClient of .Net), and found what happened. In WebSocketClient, "Options.KeepAliveInterval" only send one packet to the server when no message received from server in these period. But some server only judge if there is active message from client. So we have to manually send arbitrary packets (not necessarily ping packets,and WebSocketMessageType has no ping type) to the server at regular intervals,even if the server side continuously sends packets. That's the solution.
MSDN states that Socket.Shutdown can throw a SocketException. I've had this happen to me in production recently after introducing a load balancer between my clients and my server. But I cannot reproduce it in testing without a load balancer. Can you?
Some background - I have a server application written in C# that uses TCP sockets to communicate with clients. The application protocol is very simple for the server: accept connection, read request, send response, wait for client shutdown (read expecting 0 bytes), shutdown.
This code has been in production without issue for many years. However after introducing a load balancer in front of multiple server machines one of the server processes crashed due to an unhandled SocketException that was raised when the server called Socket.Shutdown. The particular client had timed out whilst waiting for the server to respond and attempted to close the connection early. The exception message on the server was "An existing connection was forcibly closed by the remote host." It is not unusual for the client to do this, but obviously prior to the load balancer the server was raising this error at a different point in the code. Still it's clearly a server bug and the fix is obvious - handle the exception.
However using a test client application (also written in C#), I cannot find a sequence of operations that will cause the server to raise an exception during Socket.Shutdown. It appears that the load balancer did something unusual to the TCP packets, but still, I dislike using that as excuse for failing to reproduce the issue.
I can run both server and client code in debug and I have WireShark watching the packets.
On the client side, after the connection is established, the operations are:
Socket.Send() // single call
Socket.Receive() // this one times out in our scenario
Socket.XXX() // various choices as described below
On the server side, after the connection is established, the operations are:
1) Socket.Receive() //multiple calls until complete message is received
2) // Processing...
3) Socket.Write() //single call
4) Socket.Receive() // single call expecting 0 bytes
5) Socket.Shutdown()
Presume each call is wrapped with try..catch(SocketException)
A) If I pause the server during step 2, wait for the client to time out, and initiate a client shutdown using Socket.Shutdown(SocketShutDown.Send) a FIN packet is sent to the server. When the server resumes processing, all the calls will succeed (3 thru 5) because that's a perfectly acceptable TCP flow.
B) If I pause the server during step 2, wait for the client to time out, and initiate a client shutdown using Socket.Shutdown(SocketShutDown.Both) or Socket.Close() again a FIN packet is sent to the server. When the server resumes processing step 3 succeeds, but it causes the client to send a RST packet in response as it is not accepting more data. If this RST arrives before step 4 then Socket.Receive throws and step 5 succeeds. If it arrives after step 4, then Socket.Receive succeeds (returns 0 bytes), and yet step 5 succeeds.
C) If the client has "Dont Linger" set (Linger enabled with 0 timeout), and I pause the server during processing, wait for the client to time out, and initiate a client shutdown using Socket.Shutdown(SocketShutDown.Both) or Socket.Close() a "RST" packet is immediately sent to the server. When the server resumes processing steps 3 and 4 will fail but still step 5 succeeds.
I think what puzzles me most is that Socket.Shutdown appears to ignore my test client RST packets and yet evidently my load balancer was able to send a RST packet that was not ignored. What am I missing? What else can I try?