Stackexchange.Redis timeouts & socketfailures - c#

I am using Azure Redis (using Stackexchange.Redis) as a cache storage and its generally working fine. But I am getting timeouts errors now and then and I can't nail down why it is happening.
My redis connection settings:
value="dev.redis.cache.windows.net,ssl=true,password=secret,abortConnect=false,syncTimeout=3000"
I am getting all these exception in the same second (multiple calls): [I get these on GET operations aswell. Almost all these exceptions are on StringSet & StringGet. I rarely get exceptions on HashSets or HashGets]
Timeout performing SET {key}, inst: 1, mgr: ExecuteSelect, queue: 6, qu=0, qs=6, qc=0, wr=0/0, in=0/0
SocketFailure on SET
SocketFailure on SET
No connection is available to service this operation: SET
I am guessing that setting the object is taking longer than expected, this could be due to the object being large so I could potentially increase the synctimeout but would that be hiding some other problem?
I am only getting these exceptions on synchronous calls to stackexchange.redis, I have not seen an exception when the call is asynchronous.
Stacktrace:
StackExchange.Redis.RedisConnectionException: SocketFailure on SET
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) i
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.StringSet(RedisKey key, RedisValue value, Nullable`1 expiry, When when, CommandFlags flags)
at calling method
Edit: I am using StackExchange.Redis 1.0.414 package and I am using MessagePack to serialize my objects

Timeouts are typically caused by one of a few things. Here are some examples
Client or server CPU hitting 100%
Poorly configured ThreadPool settings, combined with bursts of traffic
Clients sending expensive commands to the server.
Maxing out your network Bandwidth (on client or on server)
Tips for Client side issues: https://gist.github.com/JonCole/db0e90bedeb3fc4823c2
Tips for server side issues: https://gist.github.com/JonCole/9225f783a40564c9879d
I would recommend upgrading to a newer version of the StackExchange.Redis also. Version 1.1.603 has some more detailed diagnostic info in the timeout error message that may help you identify some of common client-side the things I listed above.
As for Socket failures, a couple of common causes for connection drops between the client and server that I have seen are:
Scaling the client - I have seen brief client side connectivity issues when scaling client apps in Azure.
When Redis is patched, there will be some connection blips. Azure Redis patching is explained here: https://gist.github.com/JonCole/317fe03805d5802e31cfa37e646e419d

Please check the port number on which you are running Redis. In my case my port was 6359 but the actual port number 6379.

Related

Trace messages on StackExchange Redis client

We are using this StackExhange Redis C# client and occasionally experiencing errors such as this:
StackExchange.Redis.RedisConnectionException: No connection is
available to service this operation: EXISTS
OnDemand:ExportDocument:Subscription:f3d45517-c26e-4e99-82c0-5c532a68081b
at
StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message
message, ResultProcessor`1 processor, ServerEndPoint server) in
c:\TeamCity\buildAgent\work\3ae0647004edff78\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:line
1734
Looking through the code, it appears as though we can enable some sort of verbose tracing to gain an additional understanding of what is happening underneath the hood. I looked through the configuration part of the document and there is no mention of tracing there.
Any one with ideas on how to enable tracing on this client?
ConnectionMultiplexer.Connect() optionally accepts a TextWriter for logging.

RabbitMQ Brokerunreachable exception timeout intermittently from a machine

We have 11 windows webapp machines running IIS. These send messages to rabbitMQ server for tasks. We are using rabbit for basic work queue functionality. For each message publish a new connection and a channel is created. Pretty much like in the tutorial here - https://www.rabbitmq.com/tutorials/tutorial-two-dotnet.html
This is working great most of the time, but in production, sporadically from a different machine every time once or twice a day, we start getting this exception on ConnectionFactory.CreateConnection.
[BrokerUnreachableException: None of the specified endpoints were reachable]
RabbitMQ.Client.ConnectionFactory.CreateConnection():56
[TimeoutException: Connection to amqp://machinename.domain.net:5672 timed out]
RabbitMQ.Client.Impl.SocketFrameHandler.Connect(TcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout):65
RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func2 socketFactory, Int32 timeout):52
RabbitMQ.Client.Framing.Impl.ProtocolBase.CreateFrameHandler(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout):8
RabbitMQ.Client.ConnectionFactory.CreateConnection():45
Which is causing message loss. I have been investigating max concurrent connections for each machine setting - but did not lead me anywhere. This does not coincide with our peak traffic either. The most interesting clue i have is that it happens in bursts and when it happens it is ONLY happening to one out of the 11 machines publishing messages to the queue at a time.
I am using rabbitmq dot net client.
Any ideas or pointers on what could be the possible cause?
Probably some sort of packet loss? Why not Try...Catch..Retry?
Do a ping RabbitServerHostName -t (where RabbitServerHostName is the server where you have Rabbit installed) in a command window and see after couple of days how many packet losses you have.
Because of all the packet drops and network instability issues, it is almost always a recommended approach to retry your connection creation. EasyNetQ library does it really well. It is however not very complex to implement your own timer based retry when you get this exception until connection is established.

How do I adjust the maximum message size for a BrokeredMessage in Windows Server Service Bus?

I've set up Windows Server Service Bus 1.0 on a VM running Windows Server 2008 R2 Datacenter.
I've written a console application to send and receive messages to and from it and this works well.
I've been sending messages of increasing size successfully but the console app currently falls over when getting to 5,226,338 bytes (5,226,154 bytes message body + 184 bytes header I believe) which is just under 5MB. Ideally we need a bit more room to play with.
Some of the stack trace is as below...
Unhandled Exception:
Microsoft.ServiceBus.Messaging.MessagingCommunicationException: The
socket connection was aborted. This could be caused by an error
processing your message or a receive timeout being exceeded by the
remote host, or an underlying network resource issue. Local socket
timeout was '00:01:09.2720000'. - -->
System.ServiceModel.CommunicationException: The socket connection was
aborted. This could be caused by an error processing your message or a
receive timeout being exceeded by the remote host, or an underlying
network resource issue. Local socket timeout was '00:01:09.2720000'.
---> System.IO.IOException: The write operation failed, see inner exception. ---> System.ServiceModel.CommunicationException: The socket
connection was aborted. This could be caused by an error processing
your message or a receive timeout being exceeded by the remote host,
or an underlying network resource issue. Local socket timeout was
'00:01:09.2720000' . ---> System.Net.Sockets.SocketException: An
established connection was aborted by the software in your host
machine
The Windows Azure Service Bus apparently has a fixed limit of 256KB but the on premise Windows Server Service Bus has a limit of 50MB. See the articles below.
Mention of the Azure limit of 256KB - http://msdn.microsoft.com/en-us/library/windowsazure/hh694235.aspx
Mention of 50MB - http://msdn.microsoft.com/en-us/library/windowsazure/jj193026.aspx
I'm struggling to achieve the 50MB limit and would like to know if there is something I need to do to configure this somehow or perhaps the message needs to be sent in a certain way. I noticed there was a parameter name in the above article and wondered if that could be used in PowerShell.
I've struggled to find some good information on this online. There is much confusion out there with some articles relating to Azure Service Bus but others relating to Windows Server Service Bus.
There is Service Bus 1.1 but I think this is in preview at the moment and I'm not sure this will help.
I am using code similar to the below to send the message.
namespaceManager = NamespaceManager.Create();
messagingFactory = MessagingFactory.Create();
queueClient = messagingFactory.CreateQueueClient(queueName);
queueClient.Send(new BrokeredMessage(new string('x', 5226154)));
This was taken from a combination of the articles below, the first one being slightly outdated and second one making it slightly clearer what needed to be changed in order to get things working.
http://haishibai.blogspot.co.uk/2012/08/walkthrough-setting-up-development.html
http://msdn.microsoft.com/en-us/library/windowsazure/jj218335(v=azure.10).aspx
I hope someone can help.
I've had the same problem but I have figured it out after few tries.
Just open file
C:\Program Files\Service Bus\1.1\Microsoft.ServiceBus.Gateway.exe.config
and change nettcp binding with name netMessagingProtocolHead set
maxReceivedMessageSize="2147483647"
maxBufferSize="2147483647"
and restart all service bus services.
Now I'am able to send and receive message with size new string('A', 49 * 1024 * 1024).
Enjoy :-)
Martin
The exception you're getting is a timeout, so your best bet is probably to fine tune your timeouts a little bit. Have you tried setting the client side timeout to a higher value? You can do that via the MessagingFactorySettings object. Also, have you checked the server side logs to see if anything there gives you a clue?
The parameter you mention is to set a quota. When you send a message that it's bigger than the quota it should be immediately rejected. In your case, the message is being accepted, but it is apparently timing out when in transit.

Getting error reason code 2059 on MQ client (C#) when reconnecting to QueueManager after awhile

I can't reconnect to MQQueueManager after a while as an exception (reason code 2059 - MQRC_Q_MGR_NOT_AVAILABLE) is thrown when I'm constructing new object of MQQueueManager. My client app is written in .NET/C# and I'm running it on Win2003.
However I can connect to QM after I have restarted my client app. This would indicate that some state is incorrect in QM libraries? How can I reset the state in code so that I could reconnect to QM? Is there a way to reset/disconnect all active TCP connections to QM from client app code?
My connection code:
Hashtable properties = new Hashtable();
properties.Add( MQC.HOST_NAME_PROPERTY, Host );
properties.Add( MQC.PORT_PROPERTY, Port );
properties.Add( MQC.USER_ID_PROPERTY, UserId );
properties.Add( MQC.PASSWORD_PROPERTY, Password );
properties.Add( MQC.CHANNEL_PROPERTY, ChannelName );
properties.Add( MQC.TRANSPORT_PROPERTY, TransportType );
// Following line throws an exception randomly
MQQueueManager queueManager = new MQQueueManager( qmName, properties );
Stack trace:
Source: amqmdnet
CompletionCode: 2
ReasonCode: 2059
Reason: 2059
Stack Trace:
at IBM.WMQ.MQBase.throwNewMQException()
at IBM.WMQ.MQQueueManager.Connect(String queueManagerName)
at IBM.WMQ.MQQueueManager..ctor(String qmName, Hashtable properties)
at WebSphereMQOutboundAdapter.WebSphereMQOutbound.ConnectToWebSphereMQ()
Connections are per-thread so if you are attempting to create a new connection while the previous QMgr object is still instantiated, you would get this. If you close the previous connection and destroy the object before creating a new object you should be OK. Since queues and other WMQ objects depend on a connection handle these will also need to be destroyed and then reinstantiated after the new connection is made.
There are of course a few other explanations for this behavior but these are much less likely. For example, it is possible that a channel exit or (in WMQ v7) configuration could be limiting the number of simultaneous connections from a given IP address. When a connection is severed rather than closed, the channel agent holding the connection on the QMgr side has to time out before the QMgr sees the connection as closed. If connection limiting is in place, these "ghost" connections reduce the available pool. But as I said, this is far less common than programs not cleaning up old objects prior to a reconnect attempt.
There is also the possibility that this is a bug. To reduce that possibility, and for a variety of other reasons such as WMQ v6 going end of life next year, I'd recommend use of WMQ v7.0.1.2 for this project, at both the client and server side. In general, you can use v7.0.1.2 client with a v6.0.x server as long as you stick to v6 functionality. Among other things, .Net code is better integrated in v7 and the Cat-3 SupportPacs are now included in the base install media rather than a separate download.
After some months fighting with this issue and IBM support, the best solution I found is to change the connect/disconnect code in IBM MQ Driver.
Instead of calling manager.Disconnect() and manager.Close() for each GET/PUT, connect once and then reconnect only if you have some exception (like loosing connection).
What I've figure out is that some bug exists in IBM MQ Driver that caches some information for each connect/disconnect. When this buffer is full, the application stops reconnecting.
The driver version (client DLL's) I have this issue is: 7.0.1.6

What should the client do while the TIBCO EMS server attempts failover?

The TIBCO EMS user's guide (pg 292) says:
The backup server will work indefinitely to either A) become the
primary server or B) reconnect to the primary server. It also says
clients may receive fail-over notification when the switch is successful (see also TIBCO EMS .NET reference pg 220).
I have some questions spinning off of these facts...
What kind of errors occur on the client side while the servers are attempting fail-over/reconnect?
What is the appropriate response from the client?
Get new Connection objects from the ConnectionFactory until one works?
Wait for fail-over notification? (are current Connection instances fixed at this time? or do I need to get a new instance?)
I hope the scenario is clear, any related information or advice would be appreciated too.
I can at least answer #1 above.
If you have enabled Tibems.SetExceptionOnFTSwitch(true); and have set up an exception handler to capture the messages the server sends to the client, you will see the following:
For single-server, non-fault tolerant connection failures:
"Connection has been terminated".
For fault-tolerant connection failures:
"Connection has performed fault-tolerant switch to "
If you attempt to publish while the connection is down, a TIBCO.EMS.IllegalStateException is thrown with the "Producer is closed" message.
for #2 above, I think the answer is to allow the EMS library to handle as much as possible. Once we got the EMS reconnect functionality to work, it gracefully tried to reconnect until the server became available again and once it reconnected, it was like there was never a problem. The only gotcha is probably if you try to publish a message before the ems connection is back. This is where the exception handler comes in, Once notified that you are in failover mode, you can adjust exception handling on the publisher side to suppress the error until the connection is back. The thing I don't know is how do you tell when you've exhausted all reconnect attempts.
Anyway, Seems like our two worlds are closely related when it comes to EMS - hope our findings (based on your comments on my questions) help you.
We use TEMS (Tibco EMS - a Tibco Product for WCF) So it becomes a custom binding. We tried to break it by doing things like bounce the server to force switch overs and it works really well. make sure you are using version 1.2 not 1.1 because you cannot do anything other then client acknowledgement.

Categories