RabbitMQ seems to be in a weird state. We install RabbitMQ and Erlang, one of its pre requsite in our application. When trying to send message in the queue, it throws us exceptions thus the queue is just filling up. We need to either reboot the PC or restart the RabbitMQ server to start sending the message again.
Note - I don't have any logs and know what is exact exception as this happened in our during installation at customer site ad we have no access to them. This issue was found happening only at customers site in many of the platforms.
I require suggestions as to what may cause such use case. Is there a way i can test for weird state of rabbitmq and restart them in such use case from code. Or any generic way to handle from code ?
Related
We are trying to find out why background requests to a particular endpoint (http or https) is never getting to IIS. The IIS logs show other requests from our device but none from our background requests. They do show in the HTTP.SYS logs though with no error code, just a 'Request_Cancelled'
Under a different environment in test these requests do work as expected. So the only differences seem to be firewall settings or something we haven't found yet.
Having searched for possible causes I have seen information about possible invalid SSL certificates. We current don't use https for where we are experiencing the issue, and in test we have tried both successfully. Another possible cause may have been latency, so we tried adding a 10 second delay to all requests in test and this also worked as expected.
What could other causes of this error be, and preventing a request to IIS?
I think its a bit difficult to find the root cause for Request cancelled.
You could try to capture and analyze http.sys etl log.
How to capture http.sys etl log.
run CMD.EXE with administrator
use cd c:\etl to locate your folder that you want to place etl log
Run logman start httptrace -p Microsoft-Windows-HttpService 0xFFFF -o httptrace.etl -ets
Try to reproduce the problem
Stop the trace with command logman stop httptrace -ets
Then you should see an .etl file displayed there
How to analyze these logs.
1.Download and install MS network monitor:
https://www.microsoft.com/en-us/download/details.aspx?id=4865
2.Open the etl log with network monitor
3.Select Parser Profiles->network monitor parser->Windows on upper right corner
4.Then you should be able all events happened when you access the endpoint from IOS.
Since we can't find much document about how to troubleshooting request_cancelled error. Maybe analyze http.sys etl would be a good beginning
I've just experienced a similar issue with a vendor-supplied application - one of the XHR requests failing with the web server just abandoning the connection. The IIS log records the request as successful (200), but the HTTPERR log reporting it as "Request_Cancelled".
In my case, this was the result of a response limit applied by the vendor in the web.config (specifically: configuration/system.serviceModel/behaviors/serviceBehaviors/behavior/dataContractSerializer#maxItemsInObjectGraph - as soon as the number of objects serialised in a single response exceeded this, the request got cancelled), and removing this limit allowed the application to work correctly.
Whilst this obviously isn't going to be the same for all cases of unexpected 'Request_Cancelled's, it may be worth checking both your web.config as well as your system-wide C:\Windows\System32\inetsrv\config\applicationHost.config for anything that limits the size of the response.
The Problem
I have a web service that saves a record in the database and, sometimes send out notification emails to a group of users letting them know an event occurred.
I am getting infrequent time-out errors from the client. Since the data are committed to the database, I think my problem is that sending the SMTP server sending the emails is taking longer than the timeout on the client.
The Need
What I am need to do is to send the email either in the background or add it to some sort of queue for sending later and return.
Constraints
Our site runs .Net 2.0 and IIS 6
I do not have admin rights to the server, although I do have file-system access to our web site. While I may be able to convince our server admin to install a custom windows service for me, I would prefer to avoid this if possible.
Our web server is old and slow and is shared with several other web sites.
This problem occurred in an important online data entry system, where downtime and errors cause political issues.
Ideas
I've looked at several solutions, but need some direction as to which way would be best.
I could spawn a thread to send the emails, but I don't know if that
would work, since the web service code would fall out of scope upon
return.
I could add the task to some sort of queue, and periodically send queued emails.
increase the timeout on the client side and ignore the problem.
Under #2, I've looked at Jeff Atwood's use of the HttpRuntime.Cache to simulate a windows service but am very concerned with the warning
You need to really be careful on the length of the task running.
Every new Task is a new Worker Thread and there’s a limited number of
those – as it “borrows” a thread from the managed thread pool.
An unresponsive web page is worse than the error I'm trying to solve.
What direction should I go?
The Web Service Code
[WebMethod(CacheDuration = 0)]
public static string SaveRecord(comRecord record, IList<QIData> qiItems)
{
using (WebDatabase db = new WebDatabase())
{
db.SaveRecord(record, qiItems, UserId, ComId);
if (qiItems.Count>0)
{
/*Then somehow invoke or queue the routine
db.SendQINotice(record,UserId, (int)ComId));
*/
}
}
}
Interesting - there are several ways to do it, but since you are on a web site, I would add an entry to an e-mail queue in a database and have another task send out the e-mail. Then you have the freedom to do some better error handling on the e-mail send if you need to without slowing down the web site. For instance, you could add some "transient" error handling to such an application. If you are interested in this approach, I can add to my response the "transient" error handler that I am using to retry on an exception to overcome some temporary error conditions.
I'm currently investigating this but thought I'd ask anyway. Will post an answer once I find out if not answered.
The problem is as follows:
An application calls RabbitHutch.CreateBus to create an instance of IBus/IAdvancedBus to publish messages to RabbitMQ. The instance is returned but the IsConnected flag is set to false (i.e. connection retry is done in the background). When the application serves a specific request, IAdvancedBus.PublishAsync is called to publish a message while the bus still isn't connected. Under significant load, requests to the application end up timing out as the bus was never able to connect to RabbitMQ.
Same behaviour is observed when connectivity to RabbitMQ is lost while processing requests.
The question is:
How is EasyNetQ handling attempts to publish messages while the bus is disconnected?
Are messages queued in memory until the connection can be established? If so, is it disposing of messages after it reaches some limit? Is this configurable?
Or is it forcing the bus to try to connect to RabbitMQ?
Or is it dumping the message altogether?
Is PublisherConfirms switched on impacting the behavior?
I haven't been able to test all scenarios described above, but it looks like before trying to publish to RabbitMQ, EasyNetQ is checking that the bus is connected. If it isn't, it is entering a connection loop more or less as described here: https://github.com/EasyNetQ/EasyNetQ/wiki/Error-Conditions#there-is-a-network-failure-between-my-subscriber-and-the-rabbitmq-broker
As we are increasing load, it looks as if connection loops are spiralling out of control as none of them ever manage to connect to RabbitMQ because our infrastructure or configuration is broken. Why are we getting timeouts I have not identified yet but I suspect that there could be a concurrency issue going on when several connection loops attempt to connect simultaneously.
I also doubt that switching off PublisherConfirms would help at all as we are not able to publish messages and therefore not waiting for acknowledgement from RabbitMQ.
Our solution:
So why have I not got a clear answer to this question? The truth is, at this point in time the messages that we are trying to publish are not mission critical, strictly speaking. If our configuration is wrong, deployment will fail when running a health check and we'll essentially abort the deployment. If RabbitMQ becomes unavailable for some reason, we are OK with not having these messages published.
Also, to avoid timing out, we're wrapping up message publishing with a circuit breaker to stop message publishing if we detect that the circuit between our application and RabbitMQ is opened. Roughly speaking, this is working as follows:
var bus = RabbitHutch.Create(...).Advanced;
var rabbitMqCircuitBreaker = new CircuitBreaker(...);
rabbitMqCircuitBreaker.AttemptCall(() => {
if (!bus.IsConnected)
throw new Exception(...);
bus.Publish(...);
});
Notice that we are notifying our circuit breaker that there is a problem when the IsConnected flag is set to false by throwing an exception. If the exception is thrown X number of times over a configured period of time, the circuit will open and we will stop trying to publish messages for a configured amount of time. We think that this is acceptable as the connection should be really quick and available 99.xxx% of the time if RabbitMQ is available. Also worth noting that the bus is created when our application is starting up, not before each call, therefore the likelihood of checking the flag before it is actually set in a valid scenario is pretty low.
Works for us at the moment, any additional information would be appreciated.
To be clear there are no Errors for the hosted service, just a generic Windows service error.
The error message says:
Error 1053: The service did not respond to the start or control request in a timely fashion.
If I run NServiceBus.Host explicitly (where the windows service is installed) I am presented with relevant messages indicating a successful "spinning up" of the end point, and, in fact, I can see subscription message(s) are persisted into a relevant private MSMQ queue and the exe then sits and waits, like a good server should, for something to happen upon it.
If I start the windows service (hosting the endpoint) there are no exceptions or events in the event viewer, or entries in the log file to indicate any errors or give me reason to believe something bad is happening. If I look in the log file and queue I can see subscription messages are indicated as dispatched, in effect, the same behavior as running it standalone, with the only difference being that the service wont start.
EDIT:
The windows service is provided by the NServiceBus framework in the form of a generic host, and therefore implementation of the various required windows service methods is not something I have control of, which you would normally have if you were creating the windows service yourself.
The most common reason that I've found for this is down to logging.
The user account running the service must have Performance Monitoring Access.
I add this through Server Manager > Users & Groups > Groups > Performance Log Users > Add.
I've recently started hosting a side project of mine on the new Azure VMs. The app uses Redis as an in-memory cache. Everything was working fine in my local environment but now that I've moved the code to Azure I'm seeing some weird exceptions coming out of Booksleeve.
When the app first fires up everything works fine. However, after about 5-10 minutes of inactivity the next request to the app experiences a network exception (I'm at work right now and don't have the exact error messages on me, so I will post them when I get home if people think they're germane to the discussion) This causes the internal MessageQueue to close, which results in every subsequent Enqueue() throwing an exception ("The Queue Is Closed").
So after some googling I found this SO post: Maintaining an open Redis connection using BookSleeve about a DIY connection manager. I can certainly implement something similar if that's the best course of action.
So, questions:
Is it normal for the RedisConnection to close periodically after a certain amount of time?
I've seen the conn.SetKeepAlive() method but I've tried many different values and none seem to make a difference. Is there more to this or am I barking up the wrong tree?
Is the connection manager idea from the post above the best way to handle this scenario?
Can anyone shed any additional light on why hosting my Redis instance in a new Azure VM causes this issue? I can also confirm that if I run my local environement against the Azure Redis VM I experience this issue.
Like I said, if it's unusual for a Redis connection to die after inactivity, I will post the stack traces and exceptions from my logs when I get home.
Thanks!
UPDATE
Didier pointed out in the comments that this may be related to the load balanacer that Azure uses: http://blogs.msdn.com/b/avkashchauhan/archive/2011/11/12/windows-azure-load-balancer-timeout-details.aspx
Assuming that's the case, what would be the best way to implement a connection manager that could account for this goofy problem. I assume I shouldn't create a connection per unit of work right?
From other answers/comments, it sounds like this is caused by the azure infrastructure shutting down sockets that look idle. You could simply have a timer somewhere that performs some kind of operation periodically, but note that this is already built into Booksleeve: when it connects, it checks what the redis connection timeout is, and configures a heartbeat to prevent redis from closing the socket. You might be able to piggy-back this to prevent azure closing the socket too. For example, in a redis-cli session:
config set timeout 30
should configure redis (on the fly, without having to restart) to have a 30 second connection timeout. Booksleeve should then automatically take steps to ensure that there is a heartbeat shortly before 30 seconds. Note that if this is successful, you should also edit your configuration file so that this setting applies after the next restart too.
The Load Balancer in Windows Azure will close the connection after X amount of time depend on total connection load on load balancer and because of it you will get a random timeout in your connection.
As I am not well known to Redis connections I am unable to suggest how to implement it correctly however in general the suggested workaround is the have a heartbeat pulse to keep your session alive. Have you have chance to look for the workaround suggested in blog and try to implement in Redis, if that works out for you?