We have a .NET framework 4.5.2 service running that connects to a WebSphere MQ Server (v7.5.0.9). Our service needs to connect to a Queue and put a message. It doesn’t need to receive anything after putting the message. We have this set up in a Test and Production environment. We've had this running for a while without any issues. Now we are facing an error only in the Production environments. The same code works fine in the Test environment. But Production is showing very inconstant results and we are unable to recreate the issue anywhere else.
The only way we are currently able to get it working is by restarting the .NET service multiple times until the service is able to connect to all Queue Managers. Every time we restart the service we get a different result. We may start the service and it would not be able to connect to any of the Queue Managers and then we restart again and 2 of the Queue Managers are able to connect. Once the connection has been made it is stable, the service will be able to put messages in any of the Queues without it ever disconnecting.
Some of the things we have tried
Before this issue, we were using the SYSTEM.DEF.SVRCONN channel to connect to the Queue Managers but we have changed that to use a "Server Connection" channel we have created in each Queue Manager. We can see the new channels are in an Active state but only if it is able to make the initial connection.
Originally we were connecting to a Queue Manager, putting a message, and closing the Queue but we were leaving the Queue Manager open. We have tried to Close and Disconnect the Queue Managers after every message but that seemed to make things worse.
The .Net service and Websphere are on the same box but we have tried disabling the windows firewall on the server in case there was something blocking it. That didn’t seem to make a difference either.
My background is in .NET so I'm not very familiar with the WebSphere UI and even less with the CLI. Any ideas on places to look or commands to run to get any insight on what is going on would be helpful.
The only error we get in WebSphere is "CompCode: 2, Reason: 2009" but in the service we are catching the exception, it says "Error Message: MQRC_CONNECTION_BROKEN"
Below is the code used to connect and send a message. We are using the amqmdnet.dll
try
{
properties = new Hashtable();
properties.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
properties.Add(MQC.HOST_NAME_PROPERTY, hostName);
properties.Add(MQC.PORT_PROPERTY, port);
properties.Add(MQC.CHANNEL_PROPERTY, channelName);
if (!QueueManagers.ContainsKey(queueManagerName))
{
queueManager = new MQQueueManager(queueManagerName, properties);
QueueManagers[queueManagerName] = queueManager;
}
else
{
queueManager = QueueManagers[queueManagerName];
if (!queueManager.IsConnected)
{
queueManager = new MQQueueManager(queueManagerName, properties);
QueueManagers[queueManagerName] = queueManager;
}
}
queue = queueManager.AccessQueue(queueName, MQC.MQOO_OUTPUT + MQC.MQOO_FAIL_IF_QUIESCING);
message = new MQMessage();
message.ClearMessage();
message.Format = MQC.MQFMT_STRING;
message.Encoding = MQC.MQENC_NATIVE;
message.CorrelationId = MQC.MQCI_NONE;
message.CharacterSet = MQC.MQCCSI_Q_MGR;
message.WriteString(messageString);
queue.Put(message);
}
catch (Exception ex)
{
sentToMQServer = false;
QueueManagers.TryRemove(queueManagerName, out var mgr);
queueManager?.Close();
queueManager?.Disconnect();
if (retry)
SendToMQServer(remoteClient, Message, false);
}
finally
{
message = null;
//QueueManagers.TryRemove(queueManagerName, out var mgr);
if (properties != null)
{
properties.Clear();
properties = null;
}
if (queue != null)
{
queue.Close();
queue = null;
}
//queueManager.Close();
//queueManager.Disconnect();
}
Related
I have a real head scratcher here (for me).
I have the following setup:
Kubernetes Cluster in Azure (linux VMs)
ASP.NET docker image with TCP server
Software simulating TCP clients
RabbitMQ for notifying incoming messages
Peer behaviour:
The client sends its heartbeat every 10 minutes
The server sends a keep-alive every 5 minutes (nginx-ingress kills connections after being idle for ~10 minutes)
I am testing the performance of my new TCP server. The previous one, written in Java, could easily handle the load I am about to explain. For some reason, the new TCP server, written in C#, loses the connection after about 10-15 minutes.
Here is what I do:
Use the simulator to start 500 clients with a ramp-up of 300s
All connections are there established correctly
Most of the time, the first heartbeats and keep-alives are sent and received
After 10+ minutes, I receive 0 bytes from Stream.EndRead() on BOTH ends of the connection.
This is the piece of code that is triggering the error.
var numberOfBytesRead = Stream.EndRead(result);
if (numberOfBytesRead == 0)
{
This.Close("no bytes read").Sync(); //this is where I end up
return;
}
In my logging on the server side, I see lots of disconnected ('no bytes read') lines and a lot of exceptions indicating that RabbitMQ is too busy: None of the specified endpoints were reachable.
My guesses would be that the Azure Load Balancer just bounces the connections, but that does not happen with the Java TCP server. Or that the ASP.NET environment is missing some configuration.
Does anyone know how this is happening, and more important, how to fix this?
--UPDATE #1--
I just used 250 devices and that worked perfectly.
I halved the ramp-up and that was a problem again. So this seems to be a performance issue. A component in my chain is too busy.
--UPDATE #2--
I disabled the publishing to RabbitMQ and it kept working now. Now I have to fix the RabbitMQ performance.
I ended up processing the incoming data in a new Task.
This is my code now:
public void ReceiveAsyncLoop(IAsyncResult? result = null)
{
try
{
if (result != null)
{
var numberOfBytesRead = Stream.EndRead(result);
if (numberOfBytesRead == 0)
{
This.Close("no bytes read").Sync();
return;
}
var newSegment = new ArraySegment<byte>(Buffer.Array!, Buffer.Offset, numberOfBytesRead);
// This.OnDataReceived(newSegment)); <-- previously this
Task.Run(() => This.OnDataReceived(newSegment));
}
Stream.BeginRead(Buffer.Array!, Buffer.Offset, Buffer.Count, ReadingClient.ReceiveAsyncLoop, null);
}
catch (ObjectDisposedException) { /*ILB*/ }
catch (Exception ex)
{
Log.Exception(ex, $"000001: {ex.Message}");
}
}
Now, everything is super fast.
I have a bit of a problem as I have developed a C# Xamarin app for Android, which is a client, and in parallel, a server in another language. They interact with each other via TCP/IP sockets.
The Android app is, broadly speaking a geo localized app. Now, because of the nature of mobile network, I have implemented a way to detect if there is a connection to the server, and if not, automatically reconnect. When an operation on a socket fails, the app is directly put in offline mode.
During this time, everything is supposed to work: the main activity is a map that follows the user.
If I make the server crash, it indeed works as expected: it tries to reconnect every time once in a while and during this time the map is "updated" (the angle changes depending on bearing, etc), a "no connection" button appears, and when the server is back online, it reconnects.
But when I have no more network or set my phone in plane mode, it freezes.
There is no exception. I've put breakpoints everywhere, to no effect. I have no idea what is going on at all.
On top of that, it seems that the Android system itself is freezing and I have no real idea, to be honest, how android does the things my app asks it to perform, and what would cause this freeze. I mostly need to either reboot my phone or wait very long times. The bearings aren't updated, the map doesn't move like it does when the server crashes.
I have tried making my socket timeout. I've tried to use other properties of the socket to determine if it's disconnected in this way. I've put try catch blocks around every use of the socket, forcing the disconnection whenever an operation fails.
It's not really possible to show entire activities as I have multiple thousands of lines on this project. I can however show how I determine connection status, and how I connect/reconnect my socket.
How I check connection status:
public bool checkConnectionStatus()
{
bool part1 = true;
try
{
part1 = sock.Poll(1000, SelectMode.SelectRead);
}
catch
{
connected = false;
return false;
}
bool part2 = (sock.Available == 0);
if (part1 && part2)
{
connected = false;
return false;
}
else
{
connected = true;
return true;
}
}
What is called to start the socket for the first time, but also whenever it's disconnected and I want it to reconnect:
public void socketStartup()
{
socketStartupLocked = true;
ip = IPAddress.Parse(ipstring);
ipe = new IPEndPoint(ip, port);
try
{
sock = new Socket(AddressFamily.InterNetwork, System.Net.Sockets.SocketType.Stream, ProtocolType.Tcp)
{
};
}
catch
{
connected = false;
socketStartupLocked = false;
return;
}
try
{
sock.Connect(ipe);
connected = true;
}
catch
{
connected = false;
socketStartupLocked = false;
return;
}
socketStartupLocked = false;
}
I expect my app to behave when there's no network exactly as it does when the server is down.
You should probably only check the server/socket connection after checking your device network connectivity.
For reference on how to do it:
https://github.com/xamarin/docs-archive/tree/master/Recipes/android/networking/detect-network-connection
And remember to add the following permission to your manifest:
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
My .NET code can connect and put a message to a remote queue successfuly. However, the same code does not work with local queue. It throws 2085 error. What different setting should be set in code to make that work with local queue?
Here is my code:
Hashtable queueProperties = new Hashtable();
queueProperties[MQC.HOST_NAME_PROPERTY] = "10.x.x.x";
queueProperties[MQC.PORT_PROPERTY] = 1451;
queueProperties[MQC.CHANNEL_PROPERTY] = "TST1.TRADE.CHANNEL";
try
{
// Attempt the connection
queueManager = new MQQueueManager("MYQUEUEMANAGER", queueProperties);
strReturn = "Connected Successfully";
}
catch (MQException mexc)
{
// TODO: Setup other exception handling
throw new Exception(mexc.Message
+ " ReasonCode: " + mexc.ReasonCode
+ "\n" + GetReason(mexc.ReasonCode), mexc);
}
Here, the code is internally using the IIS user id (application pool user) to connect with MQ because this code is run as part of WCF service.
If you run the mqrc utility you can find out what the error code translates to:
$mqrc 2085
2085 0x00000825 MQRC_UNKNOWN_OBJECT_NAME
This means the queue name you are attempting to open does not exist on the queue manager you are connected to.
I noted that the source you posted does not include any code related to opening the queue. You should check that the queue name you are attempting to open does in fact exist on the queue manager you are connecting to.
I am creating a consumer that runs in an infinite loop to read messages from the queue. I am looking for advice/sample code on how to recover abd continue within my infinite loop even if there are network disruptions. The consumer has to stay running as it will be installed as a WindowsService.
1) Can someone please explain how to properly use these settings? What is the difference between them?
NetworkRecoveryInterval
AutomaticRecoveryEnabled
RequestedHeartbeat
2) Please see my current sample code for the consumer. I am using the .Net RabbitMQ Client v3.5.6.
How will the above settings do the "recovery" for me?
e.g. will consumer.Queue.Dequeue block until it is recovered?
That doesn't seem right
so...
Do I have to code for this manually? e.g. will consumer.Queue.Dequeue throw an exception for which I have to detect and manually re-create my connection, channel, and consumer? Or just the consumer, as "AutomaticRecovery" will recover the channel for me?
Does this mean I should move the consumer creation inside the while loop? what about the channel creation? and the connection creation?
3) Assuming I have to do some of this recovery code manually, are there event callbacks (and how do I register for them) to tell me that there are network problems?
Thanks!
public void StartConsumer(string queue)
{
using (IModel channel = this.Connection.CreateModel())
{
var consumer = new QueueingBasicConsumer(channel);
const bool noAck = false;
channel.BasicConsume(queue, noAck, consumer);
// do I need these conditions? or should I just do while(true)???
while (channel.IsOpen &&
Connection.IsOpen &&
consumer.IsRunning)
{
try
{
BasicDeliverEventArgs item;
if (consumer.Queue.Dequeue(Timeout, out item))
{
string message = System.Text.Encoding.UTF8.GetString(item.Body);
DoSomethingMethod(message);
channel.BasicAck(item.DeliveryTag, false);
}
}
catch (EndOfStreamException ex)
{
// this is likely due to some connection issue -- what am I to do?
}
catch (Exception ex)
{
// should never happen, but lets say my DoSomethingMethod(message); throws an exception
// presumably, I'll just log the error and keep on going
}
}
}
}
public IConnection Connection
{
get
{
if (_connection == null) // _connection defined in class -- private static IConnection _connection;
{
_connection = CreateConnection();
}
return _connection;
}
}
private IConnection CreateConnection()
{
ConnectionFactory factory = new ConnectionFactory()
{
HostName = "RabbitMqHostName",
UserName = "RabbitMqUserName",
Password = "RabbitMqPassword",
};
// why do we need to set this explicitly? shouldn't this be the default?
factory.AutomaticRecoveryEnabled = true;
// what is a good value to use?
factory.NetworkRecoveryInterval = TimeSpan.FromSeconds(5);
// what is a good value to use? How is this different from NetworkRecoveryInterval?
factory.RequestedHeartbeat = 5;
IConnection connection = factory.CreateConnection();
return connection;
}
RabbitMQ features
The documentation on RabbitMQ's site is actually really good. If you want to recover queues, exchanges and consumers, you're looking for topology recovery, which is enabled by default. Automatic Recovery (which is enabled by default) includes:
Reconnect
Restore connection listeners
Re-open channels
Restore channel listeners
Restore channel basic.qos setting, publisher confirms and transaction settings
The NetworkRecoveryInterval is the amount of time before a retry on an automatic recovery is performed (defaults to 5s).
Heartbeat has another purpose, namely to identify dead TCP connections. There are more to read about that at RabbitMQ's site.
Code sample
Writing reliable code for recovery is tricky. The EndOfStreamException is (as you suspect) most likely due to network problems. If you use the management plugin, you can reproduce this by closing the connection from there and see that the exception is triggered. For production-like applications, you might want to have a set of brokers that you alternate between in case of connection failure. If you have several RabbitMQ brokers, you might also want to guard yourself against long-term server failure on one or more of the servers. You might want to implement error strategies, like requeuing the message, or using a dead letter exchange.
I've been thinking a bit of these things and written a thin client, RawRabbit, that handles some of these things. Maybe it could be something for you? If not, I would suggest that you change the QueueingBasicConsumer to an EventingBasicConsumer. It is event driven, rather than thread blocking.
var eventConsumer = new EventingBasicConsumer(channel);
eventConsumer.Received += (sender, args) =>
{
var body = args.Body;
eventConsumer.Model.BasicAck(args.DeliveryTag, false);
};
channel.BasicConsume(queue, false, eventConsumer);
If you have topology recovery activated, the consumer will be restored by the RabbitMQ Client and start receiving messages again.
For more granular control, hook up event handlers for ConsumerCancelled and Shutdown to detect connectivity problems and Registered to know when the consumer can be used again.
I have an application that uses WebSphere MQ to send data through WebSphere to a datacentre in the Cloud. Part of the functionality is that if the server-side subscriber detects that a message has not been received for 30 minutes, the thread is paused for 5 minutes, and the connection is removed. When it restarts, it reconnects.
In practice, I've found that disconnecting has not removed the subscription. When attempting to reconnect, I see this error:
"There may have been a problem creating the subscription due to it being used by another message consumer.
Make sure any message consumers using this subscription are closed before trying to create a new subscription under the same name. Please see the linked exception for more information."
This shows the message handler is still connected, meaning disconnect has failed. Disconnect code for the XmsClient object (part of the library, although one of my colleagues might have changed it) is:
public override void Disconnect()
{
_producer.Close();
_producer.Dispose();
_producer = null;
_consumer.MessageListener = null;
_consumer.Close();
_consumer.Dispose();
_consumer = null;
_sessionRead.Close();
_sessionRead.Dispose();
_sessionRead = null;
_sessionWrite.Close();
_sessionWrite.Dispose();
_sessionWrite = null;
_connection.Stop();
_connection.Close();
_connection.Dispose();
_connection = null;
//GC.Collect();
IsConnected = false;
}
Anyone have any thoughts as to why the connection still exists?
From the error description it looks like server subscriber is creating a durable subscription. Durable subscription continues to receive messages even when subscribing application is not running. To remove a durable subscription you must call Session.Unsubscribe(). Simply closing the consumer does not remove subscription.
If your intention was to close a subscriber without removing the subscription, then issue Connection.Stop() first followed by deregister message listener and then close consumer. Calling connection.Stop method stops message delivery.