RabbitMQ Brokerunreachable exception timeout intermittently from a machine

RabbitMQ Brokerunreachable exception timeout intermittently from a machine - c#

We have 11 windows webapp machines running IIS. These send messages to rabbitMQ server for tasks. We are using rabbit for basic work queue functionality. For each message publish a new connection and a channel is created. Pretty much like in the tutorial here - https://www.rabbitmq.com/tutorials/tutorial-two-dotnet.html
This is working great most of the time, but in production, sporadically from a different machine every time once or twice a day, we start getting this exception on ConnectionFactory.CreateConnection.
[BrokerUnreachableException: None of the specified endpoints were reachable]
RabbitMQ.Client.ConnectionFactory.CreateConnection():56
[TimeoutException: Connection to amqp://machinename.domain.net:5672 timed out]
RabbitMQ.Client.Impl.SocketFrameHandler.Connect(TcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout):65
RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func2 socketFactory, Int32 timeout):52
RabbitMQ.Client.Framing.Impl.ProtocolBase.CreateFrameHandler(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout):8
RabbitMQ.Client.ConnectionFactory.CreateConnection():45
Which is causing message loss. I have been investigating max concurrent connections for each machine setting - but did not lead me anywhere. This does not coincide with our peak traffic either. The most interesting clue i have is that it happens in bursts and when it happens it is ONLY happening to one out of the 11 machines publishing messages to the queue at a time.
I am using rabbitmq dot net client.
Any ideas or pointers on what could be the possible cause?

Probably some sort of packet loss? Why not Try...Catch..Retry?
Do a ping RabbitServerHostName -t (where RabbitServerHostName is the server where you have Rabbit installed) in a command window and see after couple of days how many packet losses you have.

Because of all the packet drops and network instability issues, it is almost always a recommended approach to retry your connection creation. EasyNetQ library does it really well. It is however not very complex to implement your own timer based retry when you get this exception until connection is established.

Related

Rabbitmq server drops connection when client takes more than 60 seconds to acknowledge a message

I am currently using EventingBasicConsumer from RabbitMQClient.dll C# client, we spawn a different thread to handle each message that is delivered to the consumer.
We encountered a strange behavior, the RabbitMQ server closes connections at
times with the error missed heartbeats from client, timeout: 60s. Few moments later the client reports an error saying Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=541. I also see this error client unexpectedly closed TCP connection happening more frequently.
In some situations the clients may take more than 60 seconds to process one job request and this error happens under such conditions.
Is it required that a job be processed within 60 seconds ?, because for our process this can vary between 30 seconds to 5 minutes.
RabbitMQ server: 3.6.6
RabbitMQ.Client.dll (C# client): RabbitMQ.Client.4.1.1
Any insight into this issue is greatly appreciated.

I used to run much longer jobs (minutes) with EasyNetQ. It's more high-level client that wraps RabbitMQ.Client.
For me, the reason of these errors is something like Evk wrote in this comment. I would try EasyNetQ as it likely has fetching of messages decoupled from the handling process.

You can increase the TTL timeout in RabbitMq both per queue and per message
IBasicProperties mqProps = model.CreateBasicProperties();
mqProps.ContentType = "text/plain";
mqProps.DeliveryMode = 2;
mqProps.Expiration = "300000"
model.BasicPublish(exchangeName,
routingKey, mqProps,
messageBodyBytes);
Documentaion is at
https://www.rabbitmq.com/ttl.html
But I think you're better off by rewriting it to Async pattern for the actual processing of the message.
This might give you inspiration for doing async message processing with RabbitMq
https://codereview.stackexchange.com/questions/42836/listen-to-multiple-rabbitmq-queue-by-task-and-process-the-message
And in this question there are quite allot of information too on async message consumption.
multi-threading based RabbitMQ consumer

MSMQ - Generic System.Messaging.MessageQueueException - no error message

We have an application in which there are three separate MSMQ queues we are peeking/receiving asynchronously. These queues are continuously peeked(every 2s) to check for new messages. If one is peeked successfully it receives the message to pull it out of the queue.
The queues are private on a remote server and the client connects on a path like so:
FormatName:DIRECT=OS:servername\Private$\qname
Recently one of our environments started receiving MessageQueueExceptions unceasingly while trying to peek the queues. The peek code looks something like this:
var messageQueue = new System.Messaging.MessageQueue(path);
message = _messageQueue.Peek(_config.QueFetchTimeout);
The exception is as follows:
System.Messaging.MessageQueueException (0x80004005)
at System.Messaging.MessageQueue.MQCacheableInfo.get_ReadHandle()
at System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout, Int32 action, MQPROPS properties, NativeOverlapped* overlapped, ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr transaction)
at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageQueue.Peek(TimeSpan timeout)
at Run() in C:\workspace\ourclass.cs:line 53
Additional member values:
ErrorMessage: (Empty)
MessageQueueErrorCode: Generic
ErrorCode: -2147467259
Notice that there is no error message for this exception. I haven't been able to find anything on what this exactly entails; it just appears to be a general failure.
Another important note is that this error only seems to occur when there are multiple threads executing peeks on their separate queues. When there is only one queue running continuously the peeks are successful. Other environments are not seeing this issue. This issue has also arisen out of the blue without any changes to the queuing code.
No errors in the OS's event log.
EnableConnectionCache is false
The machine works perfectly with local private queues. It only seems to be an issue when they are remote.
Windows 2012R2 MSMQ: 6.3
Are there any ideas as to what have may caused this?
Update:
As OP stated the issue does not exist when only a single thread is running peeks in the application. So the next obvious thing was to lock the peek operation between all threads. Indeed this workaround solved the issue however I am still looking for an answer as to why the peek could not handle the multiple threads on one particular machine running Windows 2012R2 and why the error was so uninformative

.NET WebSockets forcibly closed despite keep-alive and activity on the connection

We have written a simple WebSocket client using System.Net.WebSockets. The KeepAliveInterval on the ClientWebSocket is set to 30 seconds.
The connection is opened successfully and traffic flows as expected in both directions, or if the connection is idle, the client sends Pong requests every 30 seconds to the server (visible in Wireshark).
But after 100 seconds the connection is abruptly terminated due to the TCP socket being closed at the client end (watching in Wireshark we see the client send a FIN). The server responds with a 1001 Going Away before closing the socket.
After a lot of digging we have tracked down the cause and found a rather heavy-handed workaround. Despite a lot of Google and Stack Overflow searching we have only seen a couple of other examples of people posting about the problem and nobody with an answer, so I'm posting this to save others the pain and in the hope that someone may be able to suggest a better workaround.
The source of the 100 second timeout is that the WebSocket uses a System.Net.ServicePoint, which has a MaxIdleTime property to allow idle sockets to be closed. On opening the WebSocket if there is an existing ServicePoint for the Uri it will use that, with whatever the MaxIdleTime property was set to on creation. If not, a new ServicePoint instance will be created, with MaxIdleTime set from the current value of the System.Net.ServicePointManager MaxServicePointIdleTime property (which defaults to 100,000 milliseconds).
The issue is that neither WebSocket traffic nor WebSocket keep-alives (Ping/Pong) appear to register as traffic as far as the ServicePoint idle timer is concerned. So exactly 100 seconds after opening the WebSocket it just gets torn down, despite traffic or keep-alives.
Our hunch is that this may be because the WebSocket starts life as an HTTP request which is then upgraded to a websocket. It appears that the idle timer is only looking for HTTP traffic. If that is indeed what is happening that seems like a major bug in the System.Net.WebSockets implementation.
The workaround we are using is to set the MaxIdleTime on the ServicePoint to int.MaxValue. This allows the WebSocket to stay open indefinitely. But the downside is that this value applies to any other connections for that ServicePoint. In our context (which is a Load test using Visual Studio Web and Load testing) we have other (HTTP) connections open for the same ServicePoint, and in fact there is already an active ServicePoint instance by the time that we open our WebSocket. This means that after we update the MaxIdleTime, all HTTP connections for the Load test will have no idle timeout. This doesn't feel quite comfortable, although in practice the web server should be closing idle connections anyway.
We also briefly explore whether we could create a new ServicePoint instance reserved just for our WebSocket connection, but couldn't see a clean way of doing that.
One other little twist which made this harder to track down is that although the System.Net.ServicePointManager MaxServicePointIdleTime property defaults to 100 seconds, Visual Studio is overriding this value and setting it to 120 seconds - which made it harder to search for.

I ran into this issue this week. Your workaround got me pointed in the right direction, but I believe I've narrowed down the root cause.
If a "Content-Length: 0" header is included in the "101 Switching Protocols" response from a WebSocket server, WebSocketClient gets confused and schedules the connection for cleanup in 100 seconds.
Here's the offending code from the .Net Reference Source:
//if the returned contentlength is zero, preemptively invoke calldone on the stream.
//this will wake up any pending reads.
if (m_ContentLength == 0 && m_ConnectStream is ConnectStream) {
((ConnectStream)m_ConnectStream).CallDone();
}
According to RFC 7230 Section 3.3.2, Content-Length is prohibited in 1xx (Informational) messages, but I've found it mistakenly included in some server implementations.
For additional details, including some sample code for diagnosing ServicePoint issues, see this thread: https://github.com/ably/ably-dotnet/issues/107

I set the KeepAliveInterval for the socket to 0 like this:
theSocket.Options.KeepAliveInterval = TimeSpan.Zero;
That eliminated the problem of the websocket shutting down when the timeout was reached. But then again, it also probably turns off the send of ping messages altogether.

I studied this issue these days, compared capture packages in Wireshark(webclient-client of python and WebSocketClient of .Net), and found what happened. In WebSocketClient, "Options.KeepAliveInterval" only send one packet to the server when no message received from server in these period. But some server only judge if there is active message from client. So we have to manually send arbitrary packets (not necessarily ping packets,and WebSocketMessageType has no ping type) to the server at regular intervals,even if the server side continuously sends packets. That's the solution.

Stackexchange.Redis timeouts & socketfailures

I am using Azure Redis (using Stackexchange.Redis) as a cache storage and its generally working fine. But I am getting timeouts errors now and then and I can't nail down why it is happening.
My redis connection settings:
value="dev.redis.cache.windows.net,ssl=true,password=secret,abortConnect=false,syncTimeout=3000"
I am getting all these exception in the same second (multiple calls): [I get these on GET operations aswell. Almost all these exceptions are on StringSet & StringGet. I rarely get exceptions on HashSets or HashGets]
Timeout performing SET {key}, inst: 1, mgr: ExecuteSelect, queue: 6, qu=0, qs=6, qc=0, wr=0/0, in=0/0
SocketFailure on SET
SocketFailure on SET
No connection is available to service this operation: SET
I am guessing that setting the object is taking longer than expected, this could be due to the object being large so I could potentially increase the synctimeout but would that be hiding some other problem?
I am only getting these exceptions on synchronous calls to stackexchange.redis, I have not seen an exception when the call is asynchronous.
Stacktrace:
StackExchange.Redis.RedisConnectionException: SocketFailure on SET
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) i
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.StringSet(RedisKey key, RedisValue value, Nullable`1 expiry, When when, CommandFlags flags)
at calling method
Edit: I am using StackExchange.Redis 1.0.414 package and I am using MessagePack to serialize my objects

Timeouts are typically caused by one of a few things. Here are some examples
Client or server CPU hitting 100%
Poorly configured ThreadPool settings, combined with bursts of traffic
Clients sending expensive commands to the server.
Maxing out your network Bandwidth (on client or on server)
Tips for Client side issues: https://gist.github.com/JonCole/db0e90bedeb3fc4823c2
Tips for server side issues: https://gist.github.com/JonCole/9225f783a40564c9879d
I would recommend upgrading to a newer version of the StackExchange.Redis also. Version 1.1.603 has some more detailed diagnostic info in the timeout error message that may help you identify some of common client-side the things I listed above.
As for Socket failures, a couple of common causes for connection drops between the client and server that I have seen are:
Scaling the client - I have seen brief client side connectivity issues when scaling client apps in Azure.
When Redis is patched, there will be some connection blips. Azure Redis patching is explained here: https://gist.github.com/JonCole/317fe03805d5802e31cfa37e646e419d

Please check the port number on which you are running Redis. In my case my port was 6359 but the actual port number 6379.

How do I adjust the maximum message size for a BrokeredMessage in Windows Server Service Bus?

I've set up Windows Server Service Bus 1.0 on a VM running Windows Server 2008 R2 Datacenter.
I've written a console application to send and receive messages to and from it and this works well.
I've been sending messages of increasing size successfully but the console app currently falls over when getting to 5,226,338 bytes (5,226,154 bytes message body + 184 bytes header I believe) which is just under 5MB. Ideally we need a bit more room to play with.
Some of the stack trace is as below...
Unhandled Exception:
Microsoft.ServiceBus.Messaging.MessagingCommunicationException: The
socket connection was aborted. This could be caused by an error
processing your message or a receive timeout being exceeded by the
remote host, or an underlying network resource issue. Local socket
timeout was '00:01:09.2720000'. - -->
System.ServiceModel.CommunicationException: The socket connection was
aborted. This could be caused by an error processing your message or a
receive timeout being exceeded by the remote host, or an underlying
network resource issue. Local socket timeout was '00:01:09.2720000'.
---> System.IO.IOException: The write operation failed, see inner exception. ---> System.ServiceModel.CommunicationException: The socket
connection was aborted. This could be caused by an error processing
your message or a receive timeout being exceeded by the remote host,
or an underlying network resource issue. Local socket timeout was
'00:01:09.2720000' . ---> System.Net.Sockets.SocketException: An
established connection was aborted by the software in your host
machine
The Windows Azure Service Bus apparently has a fixed limit of 256KB but the on premise Windows Server Service Bus has a limit of 50MB. See the articles below.
Mention of the Azure limit of 256KB - http://msdn.microsoft.com/en-us/library/windowsazure/hh694235.aspx
Mention of 50MB - http://msdn.microsoft.com/en-us/library/windowsazure/jj193026.aspx
I'm struggling to achieve the 50MB limit and would like to know if there is something I need to do to configure this somehow or perhaps the message needs to be sent in a certain way. I noticed there was a parameter name in the above article and wondered if that could be used in PowerShell.
I've struggled to find some good information on this online. There is much confusion out there with some articles relating to Azure Service Bus but others relating to Windows Server Service Bus.
There is Service Bus 1.1 but I think this is in preview at the moment and I'm not sure this will help.
I am using code similar to the below to send the message.
namespaceManager = NamespaceManager.Create();
messagingFactory = MessagingFactory.Create();
queueClient = messagingFactory.CreateQueueClient(queueName);
queueClient.Send(new BrokeredMessage(new string('x', 5226154)));
This was taken from a combination of the articles below, the first one being slightly outdated and second one making it slightly clearer what needed to be changed in order to get things working.
http://haishibai.blogspot.co.uk/2012/08/walkthrough-setting-up-development.html
http://msdn.microsoft.com/en-us/library/windowsazure/jj218335(v=azure.10).aspx
I hope someone can help.

I've had the same problem but I have figured it out after few tries.
Just open file
C:\Program Files\Service Bus\1.1\Microsoft.ServiceBus.Gateway.exe.config
and change nettcp binding with name netMessagingProtocolHead set
maxReceivedMessageSize="2147483647"
maxBufferSize="2147483647"
and restart all service bus services.
Now I'am able to send and receive message with size new string('A', 49 * 1024 * 1024).
Enjoy :-)
Martin

The exception you're getting is a timeout, so your best bet is probably to fine tune your timeouts a little bit. Have you tried setting the client side timeout to a higher value? You can do that via the MessagingFactorySettings object. Also, have you checked the server side logs to see if anything there gives you a clue?
The parameter you mention is to set a quota. When you send a message that it's bigger than the quota it should be immediately rejected. In your case, the message is being accepted, but it is apparently timing out when in transit.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.