Detect Lost Connection In Publish And Change Host RabbitMQ

Detect Lost Connection In Publish And Change Host RabbitMQ - c#

I develop a very simple app using RabbitMQ. One machine, multiple queues and exchanges, one publisher and one consumer. After reading further about Clustering and HA I connect a second machine to create a cluster, besides I mirrored queues to have at least one replica. Now when I want to publish some data into a queue, I use the first machine as my host and it works fine, but if RabbitMQ service of the first machine not running my app crashed. My question is how to know which machine is up for creating connection and how to change the host while publishing messages?
UPDATEI use one of CreateConnection overloads to pass all my hosts for creating a connection. OK, this will solve the problem of finding an available machine to create a connection. But the second question is still there, look at the code below:
for(int i = 0, i < 300, i++){
var message = string.Format("Message #{0}: {1}", i, Guid.NewGuid());
var messageBodyTypes = Encoding.UTF8.GetBytes(message);
channel.BasicPublish(ExchangeName, "123456", null, messageBodyBytes);
}
These lines of code is work perfect when the connection is OK, but assume that in the middle of publishing messages to an exchange, the service stopped unexpectedly, then in this case first System.IO.FileLoadException raised and if I continue the executation RabbitMQ.Client.Exceptions.AlreadyClosedException raised which is saying:
Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=320, text="CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", classId=0, methidId=0, cause=
I think there must be a way to change the host when the connection closed during publishing messages, but how, no IDEA!

These lines of code is work perfect when the connection is OK, but
assume that in the middle of publishing messages to an exchange, the
service stopped unexpectedly, then in this case first
System.IO.FileLoadException raised and if I continue the executation
RabbitMQ.Client.Exceptions.AlreadyClosedException raised which is
saying:
You must close the channel and current connection and open a new one of each mid-loop. That should use a different connection. You only have to do this when the exception is caught, not on every iteration of the loop.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Related

The format of the specified network name is not valid : HTTPListener error on system restart

I have implemented an Http Listener in c# as a windows service. The windows service is set to start automatically when the machine is restarted. When I manually start the service after installing it, the http listener works fine and it responds to the requests it receives. But, when the service is started on a system restart, I get the following error:
System.Net.HttpListenerException (0x80004005): The format of the specified network name is not valid
I get this error on listener.Start().
The code of http listener is like this:
HttpListener listener = new HttpListener();
listener.Prefixes.Add("http://myip:port/");
listener.Start();
I got a suggestion from this already asked question. If I follow what's given in the answer, it still doesn't work.
Furthermore, I tried running:
netsh http show iplisten
in powershell, the list is empty. Even when the http listener works (when the first time I install the service and run it), the output of this command is empty list. So I don't think this is an issue.
Any suggestions will be really helpful.

Answering my own question. It seems there are some other services that need to be running for us to be able to start an http listener. These are not yet started by the time windows starts my service. I found two solutions for this, one is to use delayed start
sc.exe config myservicename start=delayed-auto
The other is to have a try catch while starting the http listener, and if it fails, try again after a few seconds. In my case, time is of the essence so I'm using the second approach because it start the listener about 2 minutes faster than the first approach.

Kafka - C# - confluent-kafka-dotnet - Message time out

We have a simple standalone mode deployment of Kafka 1.1.0 on our Linux machine. In a server.properties we have modified:
listeners = PLAINTEXT://10.0.5.66:9092
advertised.listeners is commented out so it shall fall back to the default value found in listeners property.
We are using a .NET (C#) producer which is pushing messages through confluent-kafka-dotnet (0.11.4).
Sometimes the message gets transferred to Kafka and sometimes we receive "Message time out" error on the producer side.
We are running out of ideas of what might cause this issue. It happens from time to time. If one message fails, another which comes few seconds after the first one usually passes.
Another trace could be that from time to time we see the following message in the Kafka logs on the server: WARN: Attempting to send a response via a channel for which there is no open connection <IP:PORT>. This message sometimes contains the IP Address and port of the producer.
Any idea of what might be wrong?

SignalR Groups.Add times out and fails

I'm trying to add a member to a Group using SignalR 2.2. Every single time, I hit a 30 second timeout and get a "System.Threading.Tasks.TaskCanceledException: A task was canceled." error.
From a GroupSubscriptionController that I've written, I'm calling:
var hubContext = GlobalHost.ConnectionManager.GetHubContext<ProjectHub>();
await hubContext.Groups.Add(connectionId, groupName);
I've found this issue where people are periodically encountering this, but it happens to me every single time. I'm running the backend (ASP.NET 4.5) on one VS2015 launched localhost port, and the frontend (AngularJS SPA) on another VS 2015 launched localhost port.
I had gotten SignalR working to the point where messages were being broadcast to every connected client. It seemed so easy. Now, adding in the Groups part (so that people only get select messages from the server) has me pulling my hair out...

That task cancellation error could be being thrown because the connectionId can't be found in the SignalR registry of connected clients.
How are you getting this connectionId? You have multiple servers/ports going - is it possible that you're getting your wires crossed?

I know there is an accepted answer to this, but I came across this once for a different reason.
First off, do you know what Groups.Add does?
I had expected Groups.Add's task to complete almost immediately every time, but not so. Groups.Add returns a task that only completes, when the client (i.e. Javascript) acknowledges that it has been added to a group - this is useful for reconnecting so it can resubscribe to all its old groups. Note this acknowledgement is not visible to the developer code and nicely covered up for you.
The problem is that the client may not respond because they have disconnected (i.e. they've navigated to another page). This will mean that the await call will have to wait until the connection has disconnected (default timeout 30 seconds) before giving up by throwing a TaskCanceledException.
See http://www.asp.net/signalr/overview/guide-to-the-api/working-with-groups for more detail on groups

Rebus stops retrieving messages from RabbitMQ

We have an issue in our Rebus/RabbitMQ setup where Rebus suddenly stops retrieving/handling messages from RabbitMQ. This has happened two times in the last month and we're not really sure how to proceed.
Our RabbitMQ setup has two nodes on different servers, and the Rebus side is a windows service.
We see no errors in Rebus or in the eventlog on the server where Rebus runs. We also do not see errors on the RabbitMQ servers.
Rebus (and the windows service) keeps running as we do see other log messages, like the DueTimeOutSchedular and timeoutreplies. However it seems the worker thread stops running, but without any errors being logged.
It results in a RabbitMQ input queue that keeps growing :(, we're adding logging to monitor this so we get notified if it happens again.
But I'm looking for advise on how to continue the "investigation" and ideas on how to prevent this. Maybe some of you have experienced this before?
UPDATE
It seems that we actually did have a node crashing, at least the last time it happened. The master RabbitMQ node crashed (the server crashed) and the slave was promoted to master. As far as I can see from the RabbitMQ logs on the nodes everything went according to planned. There are no other errors in the RabbitMQ logs.
At the time this happened Rebus was configured to connect only to the node that was the slave (then promoted to master) so Rebus did not experience the rabbitmq failure and thus no Rebus connection errors. However, it seems that Rebus stopped handling messages when the failure occurred.
We are actually experiencing this on a few queues it seems, and some of them, but not all seems to have ended up in an unsynchronized state.
UPDATE 2
I was able to reproduce the problem quite easily, so it might be a configuration issue in our setup. But this is what we do to reproduce it
Start two nodes in a cluster, ex. rabbit1 (master) and rabbit2 (slave)
Rebus connects to rabbit2, the slave
Close rabbit1, the master. rabbit2 is promoted to master
The queues are mirrored
We have two small tests apps to reproduce this, a "sender" that sends a message every second and a "consumer" that handles the messages.
When rabbit1 is closed, the "consumer" stops handling messages, but the "sender" keeps sending the messages and the queue keeps growing.
Start rabbit1 again, it joins as slave
This has no effect and the "consumer" still does not handle messages.
Restart the "consumer" app
When the "consumer" is restarted it retrieves all the messages and handles them.
I think I have followed the setup guides correctly, but it might be a configuration issue on our part. I can't seem to find anything that would suggest what we have done wrong.
Rebus is still connected to RabbitMQ, we see that in the connections tab on the management site, the "consumers" send/recieved B/s drop to about 2 B/s when it stops handling messages
UPDATE 3
Ok so I downloaded the Rebus source and attached to our process so I could see what happens in the "RabbitMqMessageQueue" class when it stops. When "rabbit1* is closed the "BasicDeliverEventArgs" is null, this is the code
BasicDeliverEventArgs ea;
if (!threadBoundSubscription.Next((int)BackoffTime.TotalMilliseconds, out ea))
{
return null;
}
// wtf??
if (ea == null)
{
return null;
}
See: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L178
I like the "wtf ??" comment :)

That sounds very weird!
Whenever Rebus' RabbitMQ transport experiences an error on the connection, it will throw out the connection, wait a few seconds, and ensure that the connection is re-established again when it can.
You can see the relevant place in the source here: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L205
So I guess the question is whether the RabbitMQ client library can somehow enter a faulted state, silently, without throwing an exception when Rebus attemps to get the next message...?
When you experienced the error, did you check out the 'connections' tab in RabbitMQ management UI and see if the client was still connected?
Update:
Thanks for you thorough investigation :)
The "wtf??" is in there because I once experienced a hiccup when ea had apparently been null, which was unexpected at the time, thus causing a NullReferenceException later on and the vomiting of exceptions all over my logs.
According to the docs, Next will return true and set the result to null when it reaches "end-of-stream", which is apparently what happens when the underlying model is closed.
The correct behavior in that case for Rebus would be to throw a proper exception and let the connection be re-established - I'll implement that right away!
Sit tight, I'll have a fix ready for you in a few minutes!

How do I adjust the maximum message size for a BrokeredMessage in Windows Server Service Bus?

I've set up Windows Server Service Bus 1.0 on a VM running Windows Server 2008 R2 Datacenter.
I've written a console application to send and receive messages to and from it and this works well.
I've been sending messages of increasing size successfully but the console app currently falls over when getting to 5,226,338 bytes (5,226,154 bytes message body + 184 bytes header I believe) which is just under 5MB. Ideally we need a bit more room to play with.
Some of the stack trace is as below...
Unhandled Exception:
Microsoft.ServiceBus.Messaging.MessagingCommunicationException: The
socket connection was aborted. This could be caused by an error
processing your message or a receive timeout being exceeded by the
remote host, or an underlying network resource issue. Local socket
timeout was '00:01:09.2720000'. - -->
System.ServiceModel.CommunicationException: The socket connection was
aborted. This could be caused by an error processing your message or a
receive timeout being exceeded by the remote host, or an underlying
network resource issue. Local socket timeout was '00:01:09.2720000'.
---> System.IO.IOException: The write operation failed, see inner exception. ---> System.ServiceModel.CommunicationException: The socket
connection was aborted. This could be caused by an error processing
your message or a receive timeout being exceeded by the remote host,
or an underlying network resource issue. Local socket timeout was
'00:01:09.2720000' . ---> System.Net.Sockets.SocketException: An
established connection was aborted by the software in your host
machine
The Windows Azure Service Bus apparently has a fixed limit of 256KB but the on premise Windows Server Service Bus has a limit of 50MB. See the articles below.
Mention of the Azure limit of 256KB - http://msdn.microsoft.com/en-us/library/windowsazure/hh694235.aspx
Mention of 50MB - http://msdn.microsoft.com/en-us/library/windowsazure/jj193026.aspx
I'm struggling to achieve the 50MB limit and would like to know if there is something I need to do to configure this somehow or perhaps the message needs to be sent in a certain way. I noticed there was a parameter name in the above article and wondered if that could be used in PowerShell.
I've struggled to find some good information on this online. There is much confusion out there with some articles relating to Azure Service Bus but others relating to Windows Server Service Bus.
There is Service Bus 1.1 but I think this is in preview at the moment and I'm not sure this will help.
I am using code similar to the below to send the message.
namespaceManager = NamespaceManager.Create();
messagingFactory = MessagingFactory.Create();
queueClient = messagingFactory.CreateQueueClient(queueName);
queueClient.Send(new BrokeredMessage(new string('x', 5226154)));
This was taken from a combination of the articles below, the first one being slightly outdated and second one making it slightly clearer what needed to be changed in order to get things working.
http://haishibai.blogspot.co.uk/2012/08/walkthrough-setting-up-development.html
http://msdn.microsoft.com/en-us/library/windowsazure/jj218335(v=azure.10).aspx
I hope someone can help.

I've had the same problem but I have figured it out after few tries.
Just open file
C:\Program Files\Service Bus\1.1\Microsoft.ServiceBus.Gateway.exe.config
and change nettcp binding with name netMessagingProtocolHead set
maxReceivedMessageSize="2147483647"
maxBufferSize="2147483647"
and restart all service bus services.
Now I'am able to send and receive message with size new string('A', 49 * 1024 * 1024).
Enjoy :-)
Martin

The exception you're getting is a timeout, so your best bet is probably to fine tune your timeouts a little bit. Have you tried setting the client side timeout to a higher value? You can do that via the MessagingFactorySettings object. Also, have you checked the server side logs to see if anything there gives you a clue?
The parameter you mention is to set a quota. When you send a message that it's bigger than the quota it should be immediately rejected. In your case, the message is being accepted, but it is apparently timing out when in transit.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.