I am creating a consumer that runs in an infinite loop to read messages from the queue. I am looking for advice/sample code on how to recover abd continue within my infinite loop even if there are network disruptions. The consumer has to stay running as it will be installed as a WindowsService.
1) Can someone please explain how to properly use these settings? What is the difference between them?
NetworkRecoveryInterval
AutomaticRecoveryEnabled
RequestedHeartbeat
2) Please see my current sample code for the consumer. I am using the .Net RabbitMQ Client v3.5.6.
How will the above settings do the "recovery" for me?
e.g. will consumer.Queue.Dequeue block until it is recovered?
That doesn't seem right
so...
Do I have to code for this manually? e.g. will consumer.Queue.Dequeue throw an exception for which I have to detect and manually re-create my connection, channel, and consumer? Or just the consumer, as "AutomaticRecovery" will recover the channel for me?
Does this mean I should move the consumer creation inside the while loop? what about the channel creation? and the connection creation?
3) Assuming I have to do some of this recovery code manually, are there event callbacks (and how do I register for them) to tell me that there are network problems?
Thanks!
public void StartConsumer(string queue)
{
using (IModel channel = this.Connection.CreateModel())
{
var consumer = new QueueingBasicConsumer(channel);
const bool noAck = false;
channel.BasicConsume(queue, noAck, consumer);
// do I need these conditions? or should I just do while(true)???
while (channel.IsOpen &&
Connection.IsOpen &&
consumer.IsRunning)
{
try
{
BasicDeliverEventArgs item;
if (consumer.Queue.Dequeue(Timeout, out item))
{
string message = System.Text.Encoding.UTF8.GetString(item.Body);
DoSomethingMethod(message);
channel.BasicAck(item.DeliveryTag, false);
}
}
catch (EndOfStreamException ex)
{
// this is likely due to some connection issue -- what am I to do?
}
catch (Exception ex)
{
// should never happen, but lets say my DoSomethingMethod(message); throws an exception
// presumably, I'll just log the error and keep on going
}
}
}
}
public IConnection Connection
{
get
{
if (_connection == null) // _connection defined in class -- private static IConnection _connection;
{
_connection = CreateConnection();
}
return _connection;
}
}
private IConnection CreateConnection()
{
ConnectionFactory factory = new ConnectionFactory()
{
HostName = "RabbitMqHostName",
UserName = "RabbitMqUserName",
Password = "RabbitMqPassword",
};
// why do we need to set this explicitly? shouldn't this be the default?
factory.AutomaticRecoveryEnabled = true;
// what is a good value to use?
factory.NetworkRecoveryInterval = TimeSpan.FromSeconds(5);
// what is a good value to use? How is this different from NetworkRecoveryInterval?
factory.RequestedHeartbeat = 5;
IConnection connection = factory.CreateConnection();
return connection;
}
RabbitMQ features
The documentation on RabbitMQ's site is actually really good. If you want to recover queues, exchanges and consumers, you're looking for topology recovery, which is enabled by default. Automatic Recovery (which is enabled by default) includes:
Reconnect
Restore connection listeners
Re-open channels
Restore channel listeners
Restore channel basic.qos setting, publisher confirms and transaction settings
The NetworkRecoveryInterval is the amount of time before a retry on an automatic recovery is performed (defaults to 5s).
Heartbeat has another purpose, namely to identify dead TCP connections. There are more to read about that at RabbitMQ's site.
Code sample
Writing reliable code for recovery is tricky. The EndOfStreamException is (as you suspect) most likely due to network problems. If you use the management plugin, you can reproduce this by closing the connection from there and see that the exception is triggered. For production-like applications, you might want to have a set of brokers that you alternate between in case of connection failure. If you have several RabbitMQ brokers, you might also want to guard yourself against long-term server failure on one or more of the servers. You might want to implement error strategies, like requeuing the message, or using a dead letter exchange.
I've been thinking a bit of these things and written a thin client, RawRabbit, that handles some of these things. Maybe it could be something for you? If not, I would suggest that you change the QueueingBasicConsumer to an EventingBasicConsumer. It is event driven, rather than thread blocking.
var eventConsumer = new EventingBasicConsumer(channel);
eventConsumer.Received += (sender, args) =>
{
var body = args.Body;
eventConsumer.Model.BasicAck(args.DeliveryTag, false);
};
channel.BasicConsume(queue, false, eventConsumer);
If you have topology recovery activated, the consumer will be restored by the RabbitMQ Client and start receiving messages again.
For more granular control, hook up event handlers for ConsumerCancelled and Shutdown to detect connectivity problems and Registered to know when the consumer can be used again.
Related
I'm trying to use activeMQ with an NMS (C#) consumer to get messages, do some processing and then send the contents to a webserivce via HttpClient.PostAsync(), all running within a windows service (via Topshelf).
The downstream system I'm communicating with is extremely touchy and I'm using individual acknowledgement so that I can check the response and act accordingly by acknowledging or triggering a custom retry (i.e. not session.recover).
Since the downstream system is unreliable, I've been trying a few different ways to reduce the throughput of my consumer. I thought I'd be able to accomplish this by converting to be synchronous and using prefetch, but it doesn't appear to have worked.
My understanding is that with an async consumer the prefetch 'limit' will never be hit but using synchronous method the prefetch queue will only be eaten away as messages are acknowledged, meaning that I can tune my listener to pass messages at a rate which the downstream component can handle.
With a queue loaded with 100 messages, and kick off my code using a listener (i.e. asynchronously) then I can successfully log that 100 msgs have been through.
When I change it to use consumer.Receive() (or ReceiveNoWait) then I never get a message.
Here is a snippet of what I'm trying for the synchronous consumer, with the async option included but commented out:
public Worker(LogWriter logger, ServiceConfiguration config, IConnectionFactory connectionFactory, IEndpointClient endpointClient)
{
log = logger;
configuration = config;
this.endpointClient = endpointClient;
connection = connectionFactory.CreateConnection();
connection.RedeliveryPolicy = GetRedeliveryPolicy();
connection.ExceptionListener += new ExceptionListener(OnException);
session = connection.CreateSession(AcknowledgementMode.IndividualAcknowledge);
queue = session.GetQueue(configuration.JmsConfig.SourceQueueName);
consumer = session.CreateConsumer(queue);
// Asynchronous
//consumer.Listener += new MessageListener(OnMessage);
// Synchronous
var message = consumer.Receive(TimeSpan.FromSeconds(5));
while (true)
{
if (!Equals(message, null))
{
OnMessage(message);
}
}
}
public void OnMessage(IMessage message)
{
log.DebugFormat("Message {count} Received. Attempt:{attempt}", message.Properties.GetInt("count"), message.Properties.GetInt("NMSXDeliveryCount"));
message.Acknowledge();
}
I believe you need to call Start() on your connection, e.g.:
connection.Start();
Calling Start() indicates that you want messages to flow.
It's also worth noting that there's no way to break out of your while(true) loop aside from throwing an exception from OnMessage.
I have a rabbitmq setup with a producer and many consumers.
What would the best practice way to tell the consumers that the producer isn't able to send due to crash or some other failure?
In case of a failure in the producer I'd like to notify and show a fitting message to all consumers.
There is not an automatic way to do that, but in general, the messages systems are designed to decoupling the producers and the consumers. The basic idea is that the consumers don't know anything about producers.
Said, that you should handle the producers crashes and maybe adopt policies as publish confirm you want more control about your producers
I know it's kind of late to answer your question but here is my method for letting know the consumers that the producer is alive : you can add a ping message for example inside a Task which will publish to RabbitMQ each X seconds.
This solution works with ACKS back from RMQ and it works when you have a big number of messages coming in. This does not affect your performance by using ACKS
For example, taking the code in C# :
...
m_mainTimer = new System.Timers.Timer();
m_mainTimer.Interval = 10000; // every 10 secs
m_mainTimer.Elapsed += m_mainTimer_Elapsed;
m_mainTimer.AutoReset = false; // makes it fire only once
m_mainTimer.Start(); // Start
...
void m_mainTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e){
try {
// send to RMQ
sendMessageToRabbitMQ("PING", "error");
m_timerTaskSuccess = true;
} catch (Exception ex) {
m_timerTaskSuccess = false;
} finally {
if (m_timerTaskSuccess) {
m_mainTimer.Start();
}
}
}
The actual message in RMQ:
{
"Message": "PING",
"Timestamp": 1620303014184
}
If you don't get this message in less than 11 seconds you know there is a problem.
I hope it helps the others as well.
I have an application that uses WebSphere MQ to send data through WebSphere to a datacentre in the Cloud. Part of the functionality is that if the server-side subscriber detects that a message has not been received for 30 minutes, the thread is paused for 5 minutes, and the connection is removed. When it restarts, it reconnects.
In practice, I've found that disconnecting has not removed the subscription. When attempting to reconnect, I see this error:
"There may have been a problem creating the subscription due to it being used by another message consumer.
Make sure any message consumers using this subscription are closed before trying to create a new subscription under the same name. Please see the linked exception for more information."
This shows the message handler is still connected, meaning disconnect has failed. Disconnect code for the XmsClient object (part of the library, although one of my colleagues might have changed it) is:
public override void Disconnect()
{
_producer.Close();
_producer.Dispose();
_producer = null;
_consumer.MessageListener = null;
_consumer.Close();
_consumer.Dispose();
_consumer = null;
_sessionRead.Close();
_sessionRead.Dispose();
_sessionRead = null;
_sessionWrite.Close();
_sessionWrite.Dispose();
_sessionWrite = null;
_connection.Stop();
_connection.Close();
_connection.Dispose();
_connection = null;
//GC.Collect();
IsConnected = false;
}
Anyone have any thoughts as to why the connection still exists?
From the error description it looks like server subscriber is creating a durable subscription. Durable subscription continues to receive messages even when subscribing application is not running. To remove a durable subscription you must call Session.Unsubscribe(). Simply closing the consumer does not remove subscription.
If your intention was to close a subscriber without removing the subscription, then issue Connection.Stop() first followed by deregister message listener and then close consumer. Calling connection.Stop method stops message delivery.
I've got a project called DotRas on CodePlex that exposes a component called RasConnectionWatcher which uses the RasConnectionNotification Win32 API to receive notifications when connections on a machine change. One of my users recently brought to my attention that if the machine comes out of sleep mode, and attempts to redial the connection, the connection goes into a loop indicating the connection is already being dialed even though it isn't. This loop will not end until the application is restarted, even if done through a synchronous call which all values on the structs are unique for that specific call, and none of it is retained once the call completes.
I've done as much as I can to fix the problem, but I fear the problem is something I've done with the RasConnectionNotification API and using ThreadPool.RegisterWaitForSingleObject which might be blocking something else in Windows.
The below method is used to register 1 of the 4 change types the API supports, and the handle to associate with it to monitor. During runtime, the below method would be called 4 times during initialization to register all 4 change types.
private void Register(NativeMethods.RASCN changeType, RasHandle handle)
{
AutoResetEvent waitObject = new AutoResetEvent(false);
int ret = SafeNativeMethods.Instance.RegisterConnectionNotification(handle, waitObject.SafeWaitHandle, changeType);
if (ret == NativeMethods.SUCCESS)
{
RasConnectionWatcherStateObject stateObject = new RasConnectionWatcherStateObject(changeType);
stateObject.WaitObject = waitObject;
stateObject.WaitHandle = ThreadPool.RegisterWaitForSingleObject(waitObject, new WaitOrTimerCallback(this.ConnectionStateChanged), stateObject, Timeout.Infinite, false);
this._stateObjects.Add(stateObject);
}
}
The event passed into the API gets signaled when Windows detects a change in the connections on the machine. The callback used just takes the change type registered from the state object and then processes it to determine exactly what changed.
private void ConnectionStateChanged(object obj, bool timedOut)
{
lock (this.lockObject)
{
if (this.EnableRaisingEvents)
{
try
{
// Retrieve the active connections to compare against the last state that was checked.
ReadOnlyCollection<RasConnection> connections = RasConnection.GetActiveConnections();
RasConnection connection = null;
switch (((RasConnectionWatcherStateObject)obj).ChangeType)
{
case NativeMethods.RASCN.Disconnection:
connection = FindEntry(this._lastState, connections);
if (connection != null)
{
this.OnDisconnected(new RasConnectionEventArgs(connection));
}
if (this.Handle != null)
{
// The handle that was being monitored has been disconnected.
this.Handle = null;
}
this._lastState = connections;
break;
}
}
catch (Exception ex)
{
this.OnError(new System.IO.ErrorEventArgs(ex));
}
}
}
}
}
Everything works perfectly, other than when the machine comes out of sleep. Now the strange thing is when this happens, if a MessageBox is displayed (even for 1 ms and closed by using SendMessage) it will work. I can only imagine something I've done is blocking something else in Windows so that it can't continue processing while the event is being processed by the component.
I've stripped down a lot of the code here, the full source can be found at:
http://dotras.codeplex.com/SourceControl/changeset/view/68525#1344960
I've come for help from people much smarter than myself, I'm outside of my comfort zone trying to fix this problem, any assistance would be greatly appreciated!
Thanks! - Jeff
After a lot of effort, I tracked down the problem. Thankfully it wasn't a blocking issue in Windows.
For those curious, basically once the machine came out of sleep the developer was attempting to immediately dial a connection (via the Disconnected event). Since the network interfaces hadn't finished initializing, an error was returned and the connection handle was not being closed. Any attempts to close the connection would throw an error indicating the connection was already closed, even though it wasn't. Since the handle was left open, any subsequent attempts to dial the connection would cause an actual error.
I just had to make an adjustment in the HangUp code to hide the error thrown when a connection is closed that has already been closed.
We're using TIBCO EMS from our ASP.NET 3.5 app for one interface to an external system, and it appears to be working just fine - except that the guys running the other side tells us we're racking up connections like crazy and never closing them....
What I'm doing is routing all TIBCO traffic through a single class with static member variables for both the TIBCO ConnectionFactory and the Connection itself, having been told that constructing them is pretty resource- and time-intensive:
private static ConnectionFactory Factory
{
get
{
if (HttpContext.Current.Application["EMSConnectionFactory"] == null)
{
ConnectionFactory connectionFactory = CreateTibcoFactory();
HttpContext.Current.Application["EMSConnectionFactory"] = connectionFactory;
}
return HttpContext.Current.Application["EMSConnectionFactory"] as ConnectionFactory;
}
}
private static Connection EMSConnection
{
get
{
if (HttpContext.Current.Application["EMSConnection"] == null)
{
Connection connection = Factory.CreateConnection(*username*, *password*);
connection.ExceptionHandler += new EMSExceptionHandler(TibcoConnectionExceptionHandler);
connection.Start();
HttpContext.Current.Application["EMSConnection"] = connection;
}
return HttpContext.Current.Application["EMSConnection"] as Connection;
}
}
Now my trouble is: where and how could I
tell the TIBCO connection to "auto-close" when no longer needed (like with the SqlConnection)
close the TIBCO connection on an error
close the TIBCO connection before our ASP.NET app finishes (or the user logs off)
I don't really seem to find much useful information on how to use TIBCO EMS from the C# / .NET world...... any takers?? Thanks!!
Firstly, I don't understand how you could be running out of connections. Since you're storing the connection in the application, you should only have a single connection for the entire IIS application.
That put aside, I would do the following:
When the connection is retrieved, create the connection as you do now;
After you've created the connection, spin up a background thread;
Set a DateTime to DateTime.Now;
Let the background check (e.g. every second or every 10 seconds) what the difference is between the date you've set and DateTime.Now. If that's longer than a specific timeout, kill the connection and set Application["EMSConnectionFactory"] to null;
When the background thread kills the connection, close the background thread;
Every time the connection gets requested, reset the DateTimetoDateTime.Now`.
This way, the connections should be closed automatically.
Note that you will have to introduce locking. You can use Application.Lock() and Application.Unlock() for this.
Concerning closing on an error: I see that you've attached an exception handler to the connection instance. Can't you close the connection with that?