We have an application in which there are three separate MSMQ queues we are peeking/receiving asynchronously. These queues are continuously peeked(every 2s) to check for new messages. If one is peeked successfully it receives the message to pull it out of the queue.
The queues are private on a remote server and the client connects on a path like so:
FormatName:DIRECT=OS:servername\Private$\qname
Recently one of our environments started receiving MessageQueueExceptions unceasingly while trying to peek the queues. The peek code looks something like this:
var messageQueue = new System.Messaging.MessageQueue(path);
message = _messageQueue.Peek(_config.QueFetchTimeout);
The exception is as follows:
System.Messaging.MessageQueueException (0x80004005)
at System.Messaging.MessageQueue.MQCacheableInfo.get_ReadHandle()
at System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout, Int32 action, MQPROPS properties, NativeOverlapped* overlapped, ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr transaction)
at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageQueue.Peek(TimeSpan timeout)
at Run() in C:\workspace\ourclass.cs:line 53
Additional member values:
ErrorMessage: (Empty)
MessageQueueErrorCode: Generic
ErrorCode: -2147467259
Notice that there is no error message for this exception. I haven't been able to find anything on what this exactly entails; it just appears to be a general failure.
Another important note is that this error only seems to occur when there are multiple threads executing peeks on their separate queues. When there is only one queue running continuously the peeks are successful. Other environments are not seeing this issue. This issue has also arisen out of the blue without any changes to the queuing code.
No errors in the OS's event log.
EnableConnectionCache is false
The machine works perfectly with local private queues. It only seems to be an issue when they are remote.
Windows 2012R2 MSMQ: 6.3
Are there any ideas as to what have may caused this?
Update:
As OP stated the issue does not exist when only a single thread is running peeks in the application. So the next obvious thing was to lock the peek operation between all threads. Indeed this workaround solved the issue however I am still looking for an answer as to why the peek could not handle the multiple threads on one particular machine running Windows 2012R2 and why the error was so uninformative
Related
The other day I received these errors trying to receive messages from transactional, public queues.
System.Messaging.MessageQueueException (0x80004005): Cannot import the transaction.
at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageQueue.Receive(TimeSpan timeout, MessageQueueTransactionType transactionType)
This is an application that has been running for years, 24/7, and this is the first time the error occurred. The app is configured to shut itself down after a couple of repeated errors like this, but when it was restarted some 15 minutes later, everything worked fine again.
Fwiw, the app is receiving from 3 different queues, on a per-queue dedicated thread. The error occurred on all 3 threads, and between every error (retrying without pausing in-between), there was an interval of between 1 and 2 seconds, and this error state went on for about 6 seconds, before the app shut itself down.
I do not expect a distributed transaction to occur in this app, but I do not explicitly set TransactionScopeOption to RequiresNew either.
I cannot seem to find a detailed explanation of that error and would like to know how to get to the bottom of this temporary glitch.
Does this error only occur for transactions involving DTC? Or could it occur for internal transactions as well?
I develop a very simple app using RabbitMQ. One machine, multiple queues and exchanges, one publisher and one consumer. After reading further about Clustering and HA I connect a second machine to create a cluster, besides I mirrored queues to have at least one replica. Now when I want to publish some data into a queue, I use the first machine as my host and it works fine, but if RabbitMQ service of the first machine not running my app crashed. My question is how to know which machine is up for creating connection and how to change the host while publishing messages?
UPDATEI use one of CreateConnection overloads to pass all my hosts for creating a connection. OK, this will solve the problem of finding an available machine to create a connection. But the second question is still there, look at the code below:
for(int i = 0, i < 300, i++){
var message = string.Format("Message #{0}: {1}", i, Guid.NewGuid());
var messageBodyTypes = Encoding.UTF8.GetBytes(message);
channel.BasicPublish(ExchangeName, "123456", null, messageBodyBytes);
}
These lines of code is work perfect when the connection is OK, but assume that in the middle of publishing messages to an exchange, the service stopped unexpectedly, then in this case first System.IO.FileLoadException raised and if I continue the executation RabbitMQ.Client.Exceptions.AlreadyClosedException raised which is saying:
Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=320, text="CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", classId=0, methidId=0, cause=
I think there must be a way to change the host when the connection closed during publishing messages, but how, no IDEA!
These lines of code is work perfect when the connection is OK, but
assume that in the middle of publishing messages to an exchange, the
service stopped unexpectedly, then in this case first
System.IO.FileLoadException raised and if I continue the executation
RabbitMQ.Client.Exceptions.AlreadyClosedException raised which is
saying:
You must close the channel and current connection and open a new one of each mid-loop. That should use a different connection. You only have to do this when the exception is caught, not on every iteration of the loop.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
We have 11 windows webapp machines running IIS. These send messages to rabbitMQ server for tasks. We are using rabbit for basic work queue functionality. For each message publish a new connection and a channel is created. Pretty much like in the tutorial here - https://www.rabbitmq.com/tutorials/tutorial-two-dotnet.html
This is working great most of the time, but in production, sporadically from a different machine every time once or twice a day, we start getting this exception on ConnectionFactory.CreateConnection.
[BrokerUnreachableException: None of the specified endpoints were reachable]
RabbitMQ.Client.ConnectionFactory.CreateConnection():56
[TimeoutException: Connection to amqp://machinename.domain.net:5672 timed out]
RabbitMQ.Client.Impl.SocketFrameHandler.Connect(TcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout):65
RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func2 socketFactory, Int32 timeout):52
RabbitMQ.Client.Framing.Impl.ProtocolBase.CreateFrameHandler(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout):8
RabbitMQ.Client.ConnectionFactory.CreateConnection():45
Which is causing message loss. I have been investigating max concurrent connections for each machine setting - but did not lead me anywhere. This does not coincide with our peak traffic either. The most interesting clue i have is that it happens in bursts and when it happens it is ONLY happening to one out of the 11 machines publishing messages to the queue at a time.
I am using rabbitmq dot net client.
Any ideas or pointers on what could be the possible cause?
Probably some sort of packet loss? Why not Try...Catch..Retry?
Do a ping RabbitServerHostName -t (where RabbitServerHostName is the server where you have Rabbit installed) in a command window and see after couple of days how many packet losses you have.
Because of all the packet drops and network instability issues, it is almost always a recommended approach to retry your connection creation. EasyNetQ library does it really well. It is however not very complex to implement your own timer based retry when you get this exception until connection is established.
We have an issue in our Rebus/RabbitMQ setup where Rebus suddenly stops retrieving/handling messages from RabbitMQ. This has happened two times in the last month and we're not really sure how to proceed.
Our RabbitMQ setup has two nodes on different servers, and the Rebus side is a windows service.
We see no errors in Rebus or in the eventlog on the server where Rebus runs. We also do not see errors on the RabbitMQ servers.
Rebus (and the windows service) keeps running as we do see other log messages, like the DueTimeOutSchedular and timeoutreplies. However it seems the worker thread stops running, but without any errors being logged.
It results in a RabbitMQ input queue that keeps growing :(, we're adding logging to monitor this so we get notified if it happens again.
But I'm looking for advise on how to continue the "investigation" and ideas on how to prevent this. Maybe some of you have experienced this before?
UPDATE
It seems that we actually did have a node crashing, at least the last time it happened. The master RabbitMQ node crashed (the server crashed) and the slave was promoted to master. As far as I can see from the RabbitMQ logs on the nodes everything went according to planned. There are no other errors in the RabbitMQ logs.
At the time this happened Rebus was configured to connect only to the node that was the slave (then promoted to master) so Rebus did not experience the rabbitmq failure and thus no Rebus connection errors. However, it seems that Rebus stopped handling messages when the failure occurred.
We are actually experiencing this on a few queues it seems, and some of them, but not all seems to have ended up in an unsynchronized state.
UPDATE 2
I was able to reproduce the problem quite easily, so it might be a configuration issue in our setup. But this is what we do to reproduce it
Start two nodes in a cluster, ex. rabbit1 (master) and rabbit2 (slave)
Rebus connects to rabbit2, the slave
Close rabbit1, the master. rabbit2 is promoted to master
The queues are mirrored
We have two small tests apps to reproduce this, a "sender" that sends a message every second and a "consumer" that handles the messages.
When rabbit1 is closed, the "consumer" stops handling messages, but the "sender" keeps sending the messages and the queue keeps growing.
Start rabbit1 again, it joins as slave
This has no effect and the "consumer" still does not handle messages.
Restart the "consumer" app
When the "consumer" is restarted it retrieves all the messages and handles them.
I think I have followed the setup guides correctly, but it might be a configuration issue on our part. I can't seem to find anything that would suggest what we have done wrong.
Rebus is still connected to RabbitMQ, we see that in the connections tab on the management site, the "consumers" send/recieved B/s drop to about 2 B/s when it stops handling messages
UPDATE 3
Ok so I downloaded the Rebus source and attached to our process so I could see what happens in the "RabbitMqMessageQueue" class when it stops. When "rabbit1* is closed the "BasicDeliverEventArgs" is null, this is the code
BasicDeliverEventArgs ea;
if (!threadBoundSubscription.Next((int)BackoffTime.TotalMilliseconds, out ea))
{
return null;
}
// wtf??
if (ea == null)
{
return null;
}
See: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L178
I like the "wtf ??" comment :)
That sounds very weird!
Whenever Rebus' RabbitMQ transport experiences an error on the connection, it will throw out the connection, wait a few seconds, and ensure that the connection is re-established again when it can.
You can see the relevant place in the source here: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L205
So I guess the question is whether the RabbitMQ client library can somehow enter a faulted state, silently, without throwing an exception when Rebus attemps to get the next message...?
When you experienced the error, did you check out the 'connections' tab in RabbitMQ management UI and see if the client was still connected?
Update:
Thanks for you thorough investigation :)
The "wtf??" is in there because I once experienced a hiccup when ea had apparently been null, which was unexpected at the time, thus causing a NullReferenceException later on and the vomiting of exceptions all over my logs.
According to the docs, Next will return true and set the result to null when it reaches "end-of-stream", which is apparently what happens when the underlying model is closed.
The correct behavior in that case for Rebus would be to throw a proper exception and let the connection be re-established - I'll implement that right away!
Sit tight, I'll have a fix ready for you in a few minutes!
I am connecting to a remote MSMQ within a Windows Service, and doing a BeginReceive as follows:
msmq.ReceiveCompleted += new ReceiveCompletedEventHandler(Process);
msmq.BeginReceive();
The Process method gets the message and calls EndReceive like this:
message = msmq.EndReceive(asyncResult.AsyncResult);
and then processes the message, then calls BeginReceive again like this:
msmq.BeginReceive();
The problem is that for some reason when the MSMQ server reboots, the Process method fires, and gets to the EndReceive line which then throws a MessageQueueException. Once the remote server reboots, no more messages get received and processed until I restart the Windows Service.
It seems odd to me that the ReceiveCompletedEventHandler method (Process) is firing, and also why no more messages get received after the remote server reboots - I'm not quite sure how to ensure that the connection is reestablished after a reboot.
Does anyone know why this is happening? (and how to fix it?).
Note - I've now added some code that handles the case when the EndReceive call throws this specific error, and loops calling BeginReceive() again (with Thread.Sleeps) until there's no error any more. Annoyingly, even though this appears to work when the MSMQ server is back up and BeginReceive seems to work (ie doesn't throw any errors), still NO messages get received any more.
I seem to have fixed the problem. I've taken the following steps:
1) I've now moved my BeginReceive call into a separate method, which loops around calling BeginReceive() until there are no exceptions any more (sleeping for X seconds in between).
2) Wrapped the EndReceive call in a try catch, to catch the odd case where the ReceiveCompletedEventHandler is called when the MSMQ server is rebooted, and throws a MessageQueueException.
3) In the catch, I call Close() on the Message Queue. This is important, without this, it didn't work, then I call my BeginReceive method again.
I Seem to have similar problem but the problem seems to lie in the fact that the underlying msmq object has lost the connection to the queue. Trying to laborate with the Refresh() method...