SignalR Groups.Add times out and fails - c#

I'm trying to add a member to a Group using SignalR 2.2. Every single time, I hit a 30 second timeout and get a "System.Threading.Tasks.TaskCanceledException: A task was canceled." error.
From a GroupSubscriptionController that I've written, I'm calling:
var hubContext = GlobalHost.ConnectionManager.GetHubContext<ProjectHub>();
await hubContext.Groups.Add(connectionId, groupName);
I've found this issue where people are periodically encountering this, but it happens to me every single time. I'm running the backend (ASP.NET 4.5) on one VS2015 launched localhost port, and the frontend (AngularJS SPA) on another VS 2015 launched localhost port.
I had gotten SignalR working to the point where messages were being broadcast to every connected client. It seemed so easy. Now, adding in the Groups part (so that people only get select messages from the server) has me pulling my hair out...

That task cancellation error could be being thrown because the connectionId can't be found in the SignalR registry of connected clients.
How are you getting this connectionId? You have multiple servers/ports going - is it possible that you're getting your wires crossed?

I know there is an accepted answer to this, but I came across this once for a different reason.
First off, do you know what Groups.Add does?
I had expected Groups.Add's task to complete almost immediately every time, but not so. Groups.Add returns a task that only completes, when the client (i.e. Javascript) acknowledges that it has been added to a group - this is useful for reconnecting so it can resubscribe to all its old groups. Note this acknowledgement is not visible to the developer code and nicely covered up for you.
The problem is that the client may not respond because they have disconnected (i.e. they've navigated to another page). This will mean that the await call will have to wait until the connection has disconnected (default timeout 30 seconds) before giving up by throwing a TaskCanceledException.
See http://www.asp.net/signalr/overview/guide-to-the-api/working-with-groups for more detail on groups

Related

The format of the specified network name is not valid : HTTPListener error on system restart

I have implemented an Http Listener in c# as a windows service. The windows service is set to start automatically when the machine is restarted. When I manually start the service after installing it, the http listener works fine and it responds to the requests it receives. But, when the service is started on a system restart, I get the following error:
System.Net.HttpListenerException (0x80004005): The format of the specified network name is not valid
I get this error on listener.Start().
The code of http listener is like this:
HttpListener listener = new HttpListener();
listener.Prefixes.Add("http://myip:port/");
listener.Start();
I got a suggestion from this already asked question. If I follow what's given in the answer, it still doesn't work.
Furthermore, I tried running:
netsh http show iplisten
in powershell, the list is empty. Even when the http listener works (when the first time I install the service and run it), the output of this command is empty list. So I don't think this is an issue.
Any suggestions will be really helpful.
Answering my own question. It seems there are some other services that need to be running for us to be able to start an http listener. These are not yet started by the time windows starts my service. I found two solutions for this, one is to use delayed start
sc.exe config myservicename start=delayed-auto
The other is to have a try catch while starting the http listener, and if it fails, try again after a few seconds. In my case, time is of the essence so I'm using the second approach because it start the listener about 2 minutes faster than the first approach.

How to make PostAsync respond if API is down

I am calling an API using these commands:
byte[] messageBytes = System.Text.Encoding.UTF8.GetBytes(message);
var content = new ByteArrayContent(messageBytes);
content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
HttpResponseMessage response = client.PostAsync(ApiUrl, content).Result;
However the code stops executing at the PostAsync line. I put a breakpoint on the next line but it is never reached. It does not throw an error immediately, but a few minutes later it throws an error like:
System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
I presume this means the API is down. What can I do to make PostAsync spit back an error immediately even if the API is down so that I can handle the error and inform the user?
Thank you.
Broadly speaking, what you're asking is "How can I check if an API is available?", and the answer to this depends how low level you want to get, and what you want to do for each level of unavailability:
Is there internet connectivity? Is it worth probing this locally first (as it's relatively quick to check)?
Is the server address correct? If it's wrong it doesn't matter how long you wait. Can the user configure this?
Is the address correct but the server is unable or unwilling to respond? What then?
If you're willing to lump them all into a single "can't contact server in a reasonable amount of time" bucket, there are a few approaches:
Decrease timeouts (beware)
In the case you gave, it sounds like your request is simply timing out: the address or port is wrong, the server is under immense load and can't respond in a timely fashion, you're attempting to contact a non-SSL endpoint using SSL or vice-versa, etc. In any of these cases, you can't know if the request has timed out, until it actually times out. One thing you can do is reduce the HttpClient request timeout. Beware: going too low will cause slow connections to time out on users, which is a worse problem than the one you have.
Pre-check
You could, either before each call, periodically, or at some point early in the client initialisation, do a quick probe of the API to see if it's responsive. This can be spun off into either an async or background task while the UI is being built, etc. This gives you more time to wait for a response, and as an added bonus if the API is responding slowly you can notify your users of this so they know not to expect immediate responses to their clicks. This will improve user experience. If the pre-check fails, you could show an error and advise the user to either check connectivity, check server address (if it's configurable), retry, etc.
Use a CancellationToken
You could pass a CancellationToken into PostAsync with a suitable timeout set, which also allows you to let the user cancel the request if they want to. Read up on CancellationToken for more information.
EDIT: as Alex pointed out, this line is not usually how you deal with async tasks:
HttpResponseMessage response = client.PostAsync(ApiUrl, content).Result;
Change this instead to:
HttpResponseMessage response = await client.PostAsync(ApiUrl, content);
Of course the calling method will then also need to be marked as async, and so on ("It's asyncs, all the way up"), but this is a good thing - it means that your code is not blocking a thread while it waits for a response from the server.
Have a read here for some good material.
Hope that helps

Detect and handle unresponsive RabbitMQ in .NET application

We recently had an outage where one of our APIs became unresponsive due to our rabbit cluster being given artificially high load. We where running out of threads in mono (.NET) and requests to the API failed. Although this is unlikely to happen again we would like to put some protection in against this. Ideally we would have calls to bus.Publish() timeout after a set amount of time but we can't workout how.
We then came across the blocked connections notification feature of RabbitMQ and thought this might help. However we can't figure out how to get at the connection object that is in the IServiceBus. So far we have tried
_serviceBus = serviceBus;
var connection =
((MassTransit.Transports.RabbitMq.RabbitMqEndpointAddress) _serviceBus.Endpoint.Address)
.ConnectionFactory.CreateConnection();
connection.ConnectionBlocked += Connection_ConnectionBlocked;
connection.ConnectionUnblocked += Connection_ConnectionUnblocked;
But when we do this we get a BrokerUnreachableException which I don't understand.
My questions are, is this the right approach to detect timeouts and fail (we have a backup mechanism to collect the data in the message and repost later) and if this is correct, how do we make it work?
I think you can manage this by combining System.Timer or Observable.Timer to schedule checks, and the check, which use request-response. Consumer for the request should be inside the same process. You can specify a cancellation token with reasonable timeout for the Request call and it you get a timeout - your messaging infrastructure is down or too busy, or your endpoint is too busy.

.NET WebSockets forcibly closed despite keep-alive and activity on the connection

We have written a simple WebSocket client using System.Net.WebSockets. The KeepAliveInterval on the ClientWebSocket is set to 30 seconds.
The connection is opened successfully and traffic flows as expected in both directions, or if the connection is idle, the client sends Pong requests every 30 seconds to the server (visible in Wireshark).
But after 100 seconds the connection is abruptly terminated due to the TCP socket being closed at the client end (watching in Wireshark we see the client send a FIN). The server responds with a 1001 Going Away before closing the socket.
After a lot of digging we have tracked down the cause and found a rather heavy-handed workaround. Despite a lot of Google and Stack Overflow searching we have only seen a couple of other examples of people posting about the problem and nobody with an answer, so I'm posting this to save others the pain and in the hope that someone may be able to suggest a better workaround.
The source of the 100 second timeout is that the WebSocket uses a System.Net.ServicePoint, which has a MaxIdleTime property to allow idle sockets to be closed. On opening the WebSocket if there is an existing ServicePoint for the Uri it will use that, with whatever the MaxIdleTime property was set to on creation. If not, a new ServicePoint instance will be created, with MaxIdleTime set from the current value of the System.Net.ServicePointManager MaxServicePointIdleTime property (which defaults to 100,000 milliseconds).
The issue is that neither WebSocket traffic nor WebSocket keep-alives (Ping/Pong) appear to register as traffic as far as the ServicePoint idle timer is concerned. So exactly 100 seconds after opening the WebSocket it just gets torn down, despite traffic or keep-alives.
Our hunch is that this may be because the WebSocket starts life as an HTTP request which is then upgraded to a websocket. It appears that the idle timer is only looking for HTTP traffic. If that is indeed what is happening that seems like a major bug in the System.Net.WebSockets implementation.
The workaround we are using is to set the MaxIdleTime on the ServicePoint to int.MaxValue. This allows the WebSocket to stay open indefinitely. But the downside is that this value applies to any other connections for that ServicePoint. In our context (which is a Load test using Visual Studio Web and Load testing) we have other (HTTP) connections open for the same ServicePoint, and in fact there is already an active ServicePoint instance by the time that we open our WebSocket. This means that after we update the MaxIdleTime, all HTTP connections for the Load test will have no idle timeout. This doesn't feel quite comfortable, although in practice the web server should be closing idle connections anyway.
We also briefly explore whether we could create a new ServicePoint instance reserved just for our WebSocket connection, but couldn't see a clean way of doing that.
One other little twist which made this harder to track down is that although the System.Net.ServicePointManager MaxServicePointIdleTime property defaults to 100 seconds, Visual Studio is overriding this value and setting it to 120 seconds - which made it harder to search for.
I ran into this issue this week. Your workaround got me pointed in the right direction, but I believe I've narrowed down the root cause.
If a "Content-Length: 0" header is included in the "101 Switching Protocols" response from a WebSocket server, WebSocketClient gets confused and schedules the connection for cleanup in 100 seconds.
Here's the offending code from the .Net Reference Source:
//if the returned contentlength is zero, preemptively invoke calldone on the stream.
//this will wake up any pending reads.
if (m_ContentLength == 0 && m_ConnectStream is ConnectStream) {
((ConnectStream)m_ConnectStream).CallDone();
}
According to RFC 7230 Section 3.3.2, Content-Length is prohibited in 1xx (Informational) messages, but I've found it mistakenly included in some server implementations.
For additional details, including some sample code for diagnosing ServicePoint issues, see this thread: https://github.com/ably/ably-dotnet/issues/107
I set the KeepAliveInterval for the socket to 0 like this:
theSocket.Options.KeepAliveInterval = TimeSpan.Zero;
That eliminated the problem of the websocket shutting down when the timeout was reached. But then again, it also probably turns off the send of ping messages altogether.
I studied this issue these days, compared capture packages in Wireshark(webclient-client of python and WebSocketClient of .Net), and found what happened. In WebSocketClient, "Options.KeepAliveInterval" only send one packet to the server when no message received from server in these period. But some server only judge if there is active message from client. So we have to manually send arbitrary packets (not necessarily ping packets,and WebSocketMessageType has no ping type) to the server at regular intervals,even if the server side continuously sends packets. That's the solution.

Rebus stops retrieving messages from RabbitMQ

We have an issue in our Rebus/RabbitMQ setup where Rebus suddenly stops retrieving/handling messages from RabbitMQ. This has happened two times in the last month and we're not really sure how to proceed.
Our RabbitMQ setup has two nodes on different servers, and the Rebus side is a windows service.
We see no errors in Rebus or in the eventlog on the server where Rebus runs. We also do not see errors on the RabbitMQ servers.
Rebus (and the windows service) keeps running as we do see other log messages, like the DueTimeOutSchedular and timeoutreplies. However it seems the worker thread stops running, but without any errors being logged.
It results in a RabbitMQ input queue that keeps growing :(, we're adding logging to monitor this so we get notified if it happens again.
But I'm looking for advise on how to continue the "investigation" and ideas on how to prevent this. Maybe some of you have experienced this before?
UPDATE
It seems that we actually did have a node crashing, at least the last time it happened. The master RabbitMQ node crashed (the server crashed) and the slave was promoted to master. As far as I can see from the RabbitMQ logs on the nodes everything went according to planned. There are no other errors in the RabbitMQ logs.
At the time this happened Rebus was configured to connect only to the node that was the slave (then promoted to master) so Rebus did not experience the rabbitmq failure and thus no Rebus connection errors. However, it seems that Rebus stopped handling messages when the failure occurred.
We are actually experiencing this on a few queues it seems, and some of them, but not all seems to have ended up in an unsynchronized state.
UPDATE 2
I was able to reproduce the problem quite easily, so it might be a configuration issue in our setup. But this is what we do to reproduce it
Start two nodes in a cluster, ex. rabbit1 (master) and rabbit2 (slave)
Rebus connects to rabbit2, the slave
Close rabbit1, the master. rabbit2 is promoted to master
The queues are mirrored
We have two small tests apps to reproduce this, a "sender" that sends a message every second and a "consumer" that handles the messages.
When rabbit1 is closed, the "consumer" stops handling messages, but the "sender" keeps sending the messages and the queue keeps growing.
Start rabbit1 again, it joins as slave
This has no effect and the "consumer" still does not handle messages.
Restart the "consumer" app
When the "consumer" is restarted it retrieves all the messages and handles them.
I think I have followed the setup guides correctly, but it might be a configuration issue on our part. I can't seem to find anything that would suggest what we have done wrong.
Rebus is still connected to RabbitMQ, we see that in the connections tab on the management site, the "consumers" send/recieved B/s drop to about 2 B/s when it stops handling messages
UPDATE 3
Ok so I downloaded the Rebus source and attached to our process so I could see what happens in the "RabbitMqMessageQueue" class when it stops. When "rabbit1* is closed the "BasicDeliverEventArgs" is null, this is the code
BasicDeliverEventArgs ea;
if (!threadBoundSubscription.Next((int)BackoffTime.TotalMilliseconds, out ea))
{
return null;
}
// wtf??
if (ea == null)
{
return null;
}
See: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L178
I like the "wtf ??" comment :)
That sounds very weird!
Whenever Rebus' RabbitMQ transport experiences an error on the connection, it will throw out the connection, wait a few seconds, and ensure that the connection is re-established again when it can.
You can see the relevant place in the source here: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L205
So I guess the question is whether the RabbitMQ client library can somehow enter a faulted state, silently, without throwing an exception when Rebus attemps to get the next message...?
When you experienced the error, did you check out the 'connections' tab in RabbitMQ management UI and see if the client was still connected?
Update:
Thanks for you thorough investigation :)
The "wtf??" is in there because I once experienced a hiccup when ea had apparently been null, which was unexpected at the time, thus causing a NullReferenceException later on and the vomiting of exceptions all over my logs.
According to the docs, Next will return true and set the result to null when it reaches "end-of-stream", which is apparently what happens when the underlying model is closed.
The correct behavior in that case for Rebus would be to throw a proper exception and let the connection be re-established - I'll implement that right away!
Sit tight, I'll have a fix ready for you in a few minutes!

Categories