I have the following setup and problem with MSMQ. Based on previous experience with MSMQ I'm betting that it is something simple I'm missing but I just don't know what it is.
The Setup
I have 3 load-balanced web servers (lets call them Servers W1, W2 and W3) and 1 server which processes certain events/data away from web requests (which I'll call P). All 3 of the web servers, once a particular event occurs within the web application, will send a message to a remote private queue on Server P, which will then process each message from the queue and carry out some task.
The Problem
For the most part - at a guess 95% of the time - everything runs fine, but occasionally Server P does not receive messages from the web servers. This is either because W1, W2 or W3 are not sending them or they are not being received by P, I just can't tell. This means I'm missing vital events happening from the users on the web application but I cannot find any errors listed in my own logs.
The Details
Here are all the details I can think of which may help explain my setup and what I've figured out so far:
The private queue on Server P is non-transactional.
The private queue has permissions setup for Everyone to both Send and Receive Messages.
This is the code I use (C#) to send the message to the remote private queue:
var queue = new MessageQueue(#"FormatName:DIRECT=OS:ServerP\PRIVATE$\MyMessageQueue");
var defaultProperties = queue.DefaultPropertiesToSend;
defaultProperties.AcknowledgeType = AcknowledgeTypes.FullReachQueue | AcknowledgeTypes.FullReceive;
defaultProperties.Recoverable = true;
defaultProperties.UseDeadLetterQueue = true;
defaultProperties.UseJournalQueue = true;
queue.Send(requestData);
Sending the message using the code above does not appear to throw an exception - if it did my error handler in the web application would have caught and logged it, so I'm assuming it is sent.
There are outgoing queues on W1, W2 and W3 all pointing to the private queue on P - all these are empty.
On W1, W2 and W3 I cannot see any "dead-letter" messages.
On P the private queue is empty so messages are being processed (which I can verify from my database).
On P there are no "dead-letter" messages. There are journal messages but they don't seem to correspond to any recent date/times.
All servers are running Windows Server 2012.
Most of the time messages are sent, received and processed just fine but, without any pattern visible to me, sometimes they are not. Can anyone see what is going wrong? Or explain to me how I can try and figure out what is happening?
Are you sure that the receiver on P does not crash/lose the message somehow? Because your queue is not transactional, if somehow processing fails then that's one lost message.
Anyway, there are many possible causes why this could fail.
What kind of logging do you have (DEBUG/INFO levels)?
I think the following will help tracking down the issue:
When an event is generated in the web app.
Right before you send an event from the web app, via MSMQ.
In the receiver when you get a message from the queue.
This way you could at least match sent messages to received messages and to processed messages.
As a side note, when you check for dead-letter messages you do so on the source computer and on any intermediary hops, not on the destination one. If you don't have any hops, then they will be relayed to the non-transactional dead-letter queue on the web servers.
Related
I have 3 machines, let's call them A, B (both servers) and Dev (my local machine).
I want to send a message queue item from A to B.
The actual C# code I have is rather simple and honestly, I really do not think it is problem here. (It's a webapi that takes a POSTed object and just shoves it down the queue).
I can send these messages just fine from Dev to B (while logged into a domain admin account) without a problem and I can inspect the body of the messages. However I cannot send messages from A to B. The private queue on B is set to allow "Everyone" the "Full Control" permissions.
If I pause the outbound queue on A and send the messages, they sit in the outbound queue and the body is exactly as I would expect it to be, but when I resume that outbound queue again, they are never received on the other end at B.
I can't figure out what's going on for the life of me. I tried the 'TCP' method but I need to refer to my queues by machine name not IP.
For reference, the code used to send the message is:
using (var queue = new MessageQueue($"FormatName:Direct=OS:MachineB\\private$\\Queue"))
{
var queueItem = new QueueItem();
queueItem.Object = this.postedObject;
var message = new System.Messaging.Message(queueItem);
queue.Send(message);
}
For reference and anyone else who comes across this:
The project sending the messages to the queue was a webapi. It was running under the AppPoolIdentity account in IIS and despite the receiving queue allowing Everyone access, it was denying these messages, but gracefully. So the webapi would fire the message and wouldn't throw any exceptions, it'd hit the outbound, but then never arrive.
We switched the account to NetworkService and it worked just fine.
I have a WCF service hosted with Net.TCP binding to which a lot of clients (> 100) may connect and receive various broadcast messages. The same message is sent to all clients and the current way I'm currently doing it is to have dedicated thread which waits on a BlockingCollection for new messages and as soon as new message arrives it iterates over the list of client callback connections and calls a method which receives the message as an argument.
So my code currently looks like this:
var msg = ... get message from queue ...
foreach(var client in clients)
client.SendMessage(message)
This design has following problems:
Clients can not receive new message until I'm finished sending a message to all clients
I would like to detect slow clients and possibly disconnect them
The message is being serialized as many times as I have clients (I could change it so that I serialize the message before sending it but I would need to change the signatore of SendMessage to SendMessage(byte[] content) and this is not something I would like to do)
Does anybody has experience with such problems? Any tips/tricks/hints?
It seems you have to use multicasts instead of dedicated communication. So each new client will need to join the cast channel (see IGMP for details) and then your server will fire-and-forget once per message you need to publish.
So...this one's got me baffled.
The target queue lives on ServerA where MSMQ is running in Workgroup mode. The queue is a non-transactional, private queue, with Full rights on pretty much the world (including NETWORK SERVICE, but EXCLUDING ANONYMOUS LOGON).
I'm specifying the queue address as such: FormatName:DIRECT=OS:ServerA\private$\targetqueue.
If I'm interested in sending "fire-and-forget"-style (no need for transaction as there is no other persistence going on), I would assume it would be fine to simply call:
Message message = ConstructMessageWithObjectPayload(serializableObject);
using (MessageQueue queue = new MessageQueue(queueAddress))
{
queue.Send(message);
}
But strangely, the message never arrives in the target queue and enabling negative source journaling (which interestingly enough causes the message to be sent to the Dead-letter messages queue on the target server) tells me that it is a "Nontransactional message".
Consequently, using
queue.Send(message, MessageQueueTransactionType.Single);
works! Having a hard time wrapping my head around this. What am I missing?
Also, I've seen a good number of posts by others where their similar problem was solved by giving ANONYMOUS LOGIN Full rights. In what scenario is this necessary? Giving NETWORK SERVICE access somewhat made sense because that is the account that MSMQ itself runs under. If running in Workgroup mode like I am, is it necessary at all to assign rights to Everyone or even the account that my process runs under?
Appreciate the help!
We have an issue in our Rebus/RabbitMQ setup where Rebus suddenly stops retrieving/handling messages from RabbitMQ. This has happened two times in the last month and we're not really sure how to proceed.
Our RabbitMQ setup has two nodes on different servers, and the Rebus side is a windows service.
We see no errors in Rebus or in the eventlog on the server where Rebus runs. We also do not see errors on the RabbitMQ servers.
Rebus (and the windows service) keeps running as we do see other log messages, like the DueTimeOutSchedular and timeoutreplies. However it seems the worker thread stops running, but without any errors being logged.
It results in a RabbitMQ input queue that keeps growing :(, we're adding logging to monitor this so we get notified if it happens again.
But I'm looking for advise on how to continue the "investigation" and ideas on how to prevent this. Maybe some of you have experienced this before?
UPDATE
It seems that we actually did have a node crashing, at least the last time it happened. The master RabbitMQ node crashed (the server crashed) and the slave was promoted to master. As far as I can see from the RabbitMQ logs on the nodes everything went according to planned. There are no other errors in the RabbitMQ logs.
At the time this happened Rebus was configured to connect only to the node that was the slave (then promoted to master) so Rebus did not experience the rabbitmq failure and thus no Rebus connection errors. However, it seems that Rebus stopped handling messages when the failure occurred.
We are actually experiencing this on a few queues it seems, and some of them, but not all seems to have ended up in an unsynchronized state.
UPDATE 2
I was able to reproduce the problem quite easily, so it might be a configuration issue in our setup. But this is what we do to reproduce it
Start two nodes in a cluster, ex. rabbit1 (master) and rabbit2 (slave)
Rebus connects to rabbit2, the slave
Close rabbit1, the master. rabbit2 is promoted to master
The queues are mirrored
We have two small tests apps to reproduce this, a "sender" that sends a message every second and a "consumer" that handles the messages.
When rabbit1 is closed, the "consumer" stops handling messages, but the "sender" keeps sending the messages and the queue keeps growing.
Start rabbit1 again, it joins as slave
This has no effect and the "consumer" still does not handle messages.
Restart the "consumer" app
When the "consumer" is restarted it retrieves all the messages and handles them.
I think I have followed the setup guides correctly, but it might be a configuration issue on our part. I can't seem to find anything that would suggest what we have done wrong.
Rebus is still connected to RabbitMQ, we see that in the connections tab on the management site, the "consumers" send/recieved B/s drop to about 2 B/s when it stops handling messages
UPDATE 3
Ok so I downloaded the Rebus source and attached to our process so I could see what happens in the "RabbitMqMessageQueue" class when it stops. When "rabbit1* is closed the "BasicDeliverEventArgs" is null, this is the code
BasicDeliverEventArgs ea;
if (!threadBoundSubscription.Next((int)BackoffTime.TotalMilliseconds, out ea))
{
return null;
}
// wtf??
if (ea == null)
{
return null;
}
See: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L178
I like the "wtf ??" comment :)
That sounds very weird!
Whenever Rebus' RabbitMQ transport experiences an error on the connection, it will throw out the connection, wait a few seconds, and ensure that the connection is re-established again when it can.
You can see the relevant place in the source here: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L205
So I guess the question is whether the RabbitMQ client library can somehow enter a faulted state, silently, without throwing an exception when Rebus attemps to get the next message...?
When you experienced the error, did you check out the 'connections' tab in RabbitMQ management UI and see if the client was still connected?
Update:
Thanks for you thorough investigation :)
The "wtf??" is in there because I once experienced a hiccup when ea had apparently been null, which was unexpected at the time, thus causing a NullReferenceException later on and the vomiting of exceptions all over my logs.
According to the docs, Next will return true and set the result to null when it reaches "end-of-stream", which is apparently what happens when the underlying model is closed.
The correct behavior in that case for Rebus would be to throw a proper exception and let the connection be re-established - I'll implement that right away!
Sit tight, I'll have a fix ready for you in a few minutes!
So now I'm building a ISO8583 Payment Gateway application. This application is a client-server application that can act as a client or server. In this case, I'm handling the client side of the application.
At first, I connected the (client)app to a external server. I was sending inquiry message and it ran well (returning success message). Then, i'm trying to run this app as both client and server (run 2 apps and set my ip as ip host), one as client and the other one as a server. I'm sending inquiry message and it keeps returning response code 67 (other error). Meanwhile it's succeed when I run the app as client only.
I don't know if it helps but here's the inquiry method
/// <summary>
/// Send Inquiry Message
/// </summary>
private void SendInquiryMessage()
{
var requestMsg = new Iso8583Message(200);
DateTime transmissionDate = DateTime.Now;
requestMsg.Fields.Add(7, string.Format("{0}{1}",
string.Format("{0:00}{1:00}", transmissionDate.Month, transmissionDate.Day),
string.Format("{0:00}{1:00}{2:00}", transmissionDate.Hour,
transmissionDate.Minute, transmissionDate.Second)));
requestMsg.Fields.Add(11, _sequencer.Increment().ToString());
requestMsg.Fields.Add((int)ISO8583ProtocolFields.PROCESSING_CODE, "341019");
requestMsg.Fields.Add((int)ISO8583ProtocolFields.ADDITIONAL_DATA_61, "5271720012002010802012");
#region Send 0200
SendRequestHandlerCtrl sndCtrl = _client.SendExpectingResponse(requestMsg, 1000, true, null);
sndCtrl.WaitCompletion(); // Wait send completion.
if (!sndCtrl.Successful)
{
Console.WriteLine(string.Format("Client: unsuccessful request # {0} ({1}.",
_sequencer.CurrentValue(), sndCtrl.Message));
if (sndCtrl.Error != null)
Console.WriteLine(sndCtrl.Error);
}
else
{
sndCtrl.Request.WaitResponse();
if (sndCtrl.Request.IsExpired)
_expiredRequests++;
else
_requestsCnt++;
}
latestInquiryMessage = sndCtrl.Request.ReceivedMessage as Iso8583Message;
Console.WriteLine(latestInquiryMessage.Fields[39].Value);
#endregion
}
Anyone know what the problem is? What I could possibly miss?
Thank you!
I don't know which specific ISO-8583 implementation you are attempting to write to but a couple guesses based on what I do see or do not see and what your actual question is.
It seems especially odd that it works when it communicating with the remote server as client but not as both. Where is your communications configuration?
This points to your TCP/IP configuration and my guess is that you are attempting to listen and communicate on perhaps the same port so are not truly completing the TCP/IP handshake. While I believe you technically can listen on a port and communicate out it for a different process I think it unnecessarily complicates things.
So my guess is that what is happening and is your problem is that you are attempting to communicate with yourself, are maybe not getting fully connected and instead of saying "91 - Issuer Not Available" or "96 - System error" it is giving you the odd "67 other error" as it may not have been able to actually send it.
Do you have trace, or have you watched the connectivity with netstat -a 1 or even better Wireshark if you do not have trace to verify that you are getting fully established?