Debugging application behavior during IIS App pool recycle

Debugging application behavior during IIS App pool recycle - c#

I have a web service written in C#.
It behaves rather strange during pool recycling.
If I configure a pool with 5 worker processes which should recycle after say 100 requests (in production its actually 10000 but nevermind that). I get the proper response for the first 100 per process (i.e 500 requests), but after that some of the requests returns an improper result (i also get timeouts but that is okay as the process is recycling).
Since these improper results seems to happen AFTER the recycle, while the service is starting up it is kinda hard to just attach the debugger and see what happens (as the debugger is dettached when the recycle occurs).
So my question(s) is/are:
1. Do anybody know a good method for debugging this kind of thing
Edit: 2. Anbody who happens to have an idea on what might be wrong (the service has no state information between requests) - I found the error, by attaching the debugger and luckily seeing an exception (caught in a global exception handler - god i hate those): But the 1 question still stands. Is there an easier way than attaching the debugger and hope you make it in time to see the error.

You should make it clear what is the improper result. If it is not a .NET error, you should review your code and add some application level logging on your own code.
A debugger can only help when you have nothing else to resort to.

What I have ended up doing (for now), is to remove most of those "semi-global" try/catch/do-nothing handlers and then write a SoapExtension for handling "Unhandled Exceptions", and dump out all the information I can come near.
I got most the inspiration from Jeff Atwood's article on CodeProject: http://www.codeproject.com/kb/aspnet/ASPNETExceptionHandling.aspx
Its not really the same as attaching the debugger, but will have to do for now.

Related

How to identify internal failures in a Windows service

We use a lot of custom Windows services in our applications. However, the one I'm currently working on has an infuriating problem: while the service keeps running, it simply stops functioning.
The Main method of the service is wrapped in a try/catch block, like this:
static void Main()
{
IRepository rep = new Repository();
ILogger log = LogManager.GetLogger(GetType().Name);
TimeSpan loadWindowStart = new TimeSpan(9, 0, 0);
TimeSpan loadWindowEnd = new TimeSpan(18, 0, 0);
foreach (SuppressionLoad sl in rep.GetSuppressionLoads().ToList())
{
try
{
// do stuff
}
catch(Exception ex)
{
// log error
}
}
}
The service also logs as it does stuff, and we can watch the logs fill up while it's busy.
Sometimes, however, the logs just stop. And activity elsewhere in the database suggests the entire service has stopped working. Checking in Services on the server, the service still shows a Status of "Started". It takes up almost zero system resources while it's in this state, although it's normally quite processor intensive. If you try and stop it, it just times out trying and, as far as we can tell, it never stops of its own accord. The process has to be killed in Task Manager.
There is nothing untoward in the log in the run up to these stalls. There is also nothing we can find in Event Viewer.
Since it doesn't log an error, I'm at a loss as to what's going on here, or what we can do to try and diagnose the fault from here. It's highly intermittent - it will often run for several days without problem before entering the state. What can we do to investigate what's going on?

Matt; Obscure problems such as these are difficult to find in the best of conditions - if your service happens to use threads (which I assume it does), it becomes tremendously more difficult and you can't rely on global try/catch.
A simple thing to try would be NBug (no association). It will catch un-handled exceptions and give you some info about them. I don't think it will get you enough though.
The general way to find these sorts of things is log, log, log. You have to be able to come as close to recreating the problem as possible - you need logs that tell your entry points into each method, the variable values, exception stack traces if hit, how long you spent in each method, etc. There are some really good tools out there for logging some logging tools so I won't bother with recommending any. You can wrap your logging in a conditional compile switch so once you find your issue you won't suffer a performance hit when you turn it off.
Probably not the answer you wanted, but the only thing that has really worked for me over the years.
SteveJ

It sounds like the issue could be anywhere and doesn't necessarily have much to do with code provided.
Suggestions on how to go about it
When service hangs, attach a debugger and take a look at threads and see where each one is. You may need to rebuild and run a debug version of your solution so that debugger has necessary contextual symbol data. Questions to ask:
Are all the threads that I'm expecting to be there are there, or are some gone or unaccounted for?
Are threads stuck in a deadlock (I'm suspecting that's what's happening), and if so, on what resources.
Turn on detailed logging and sprinkle in more debug log statements to isolate where in code flow it last was and where it didn't make it to, and then keep narrowing down the location. Consider logging contextual data so that when you isolate problematic line or code block, you have context to try to understand why odd behavior takes place. Just be mindful of logging sensitive information (i.e. passwords, PII, etc.)
With full credit to IInspectable's comment, you can try to take a full dump of the process (SysInternal's Process Explorer or ProcDump let's you do that, or Task Manager). It tends to be quite an involved experience using the tool, but used right can give a lot of insight, and possibly find the issue on first occurrence.
Considering that it happens infrequently, and the field of what and where is wide open, it'll likely take a few iterations of having the problem trigger in order to narrow down the scope.

IIS7 stops working after 5 requests

Here is my problem:
I have just been brought onto a massive asp.net C# project and I've been charged with fixing some performance issues (not my area of expertise). More specifically after 5 - 7 redirects/ajax calls the web server stops responding and the whole page (and eventually the browser) freezes.
I don't think this is a coding issue as I've set up break points in a few pages (Page_Load method) and after the 5 requests it does not even reach the break points.
I don't believe this is related to this issue as I've increased the browser's maximum connections per server parameter and I got the same behavior. Furthermore after these 5 request in one browser IE, the application stops working in FF as well.
This is not a resource issue as the w3wp.exe process never exceeds 500MB memory.
One thing I've noticed when using Fiddler and other tools to monitor the requests is that the server takes a very long time when loading image files (png, jpg). I don't know if this is relevant.
I've enabled failed request tracing on the server and the only thing I've noticed is that some request fail with a 401 error even dough I've set Anonymous Authentication to enabled.
Here is the exact message
MODULE_SET_RESPONSE_ERROR_STATUS
ModuleName ManagedPipelineHandler
Notification 128
HttpStatus 401
HttpReason Unauthorized
HttpSubStatus 0
ErrorCode 0
ConfigExceptionInfo
Notification EXECUTE_REQUEST_HANDLER
ErrorCode The operation completed successfully. (0x0)
This message is sometimes thrown with ModuleName: ScriptModule
I have already wasted 2 days on this thing and I'm running out of ideas so any suggestions would be appreciated.

Like any large generic problem, your best bet in diagnosing the issue is to figure out how to break down the issue into smaller parts, how to hypothesize the issues, and how to validate or invalidate your hypotheses. My first inclination would be to hypothesize that the server-side processes in this particular are taking a long time, causing your client requests to block, making the whole thing seem frozen.
From there, I would attempt to replicate the long running server side processes by creating isolated client side tests - perhaps if the URLs are HTTP gets, I would test the same URLs individually. If they were HTTP posts, I'd create an isolated test form if feasible to see what happens with each request. If a long running server side process is found then you have a starting point.
If there are no long running server side processes then it may be JavaScript / client side coding issues that need to be looked into. But definitely when you're working a large, unfamiliar project, your best bet is to figure out how to break down the issue into smaller components that can then be tested

I solved the issue finally. Here is what I did:
Experimented with IIS settings and App_Pool recycling and noticed that there is nothing wrong with the way it handles requests that actually reach it.
I focused on the Http.sys module and noticed that in the log files there were a lot of Timer_ConnectionIdle and Client_Reset errors.
After some more experimentation and a lot of Google searches, I accidentally found this answer and it solved my issue. As the answer suggests the problem was caused by the AVG antivirus installed and incorrectly configured on the server.
Thanks for all the help and suggestions.

If it's ajax calls that are causing your browser to freeze, make sure they are not blocking ajax calls.

Just appending to Shan's answer, which is a good one.
First off, there is obviously a code issue as this is by no means 'normal' behavior for IIS.
That said, you must isolate it as Shan indicated. For example, given the server itself no longer accepts connections then we can pretty well eliminate javascript as the source of the problem and relegate it to being just a symptom.
Typically when a worker process spins into space like this it is due to either an infinite loop or an issue where multiple threads are trying to lock the same resource. I bet if you let it run long enough IIS itself will timeout, kill and restart the process.
With that in mind you want to look for any type of multithreaded garbage (which I highly recommend you don't do in a web server) or for anything that indicates a tight infinite loop. A loop is going to become apparent if you execute the requests individually. A multi-threaded issue will only show up if you happen to get a collision.
Run various performance counters on the web server. Also, once it locks up, let it sit that way for awhile. Once IIS performs it's own reset on the worker process go look for indicators in the event log.

Service call works in main thread, but crashes when multithreaded

My company has an application that keeps track of information related to web sites that are hosted on various machines. A central server runs a windows service that gets a list of sites to check, and then queries a service running on those target sites to get a response that can be used to update the local data.
My task has been to apply multithreading to this process to reduce the time it takes to run through all the sites (almost 3000 sites that take about 8 hours to run sequentially). The service runs through successfuly when it's not multithreaded, but the moment I spread out the work to multiple threads (testing with 3 right now, plus a watcher thread) there's a bizarre crash that seems to originate from the call to the remote services that are supposed to provide the data. It's a SOAP/XML call.
When run on the test server, the service just gives up and doesn't complete it's task, but doesn't stop running. When run through the debugger (Dev Studio 2010) the whole thing just stops. I'll run it, and seconds later it'll stop debugging, but not because it completed. It does not throw an exception or give me any kind of message. With breakpoints I can walk through to the point where it just stops. Event logging leads me to the same spot. It stops on the line of code that tries to get a response from the web service on the other sites. And again: it only does that when multithreaded.
I found some information that suggested there's a limit to the number of connections that defaults to 2. The proposed solution is to add some tags to the app.config, but that hasn't solved the problem...
<system.net>
<connectionManagement>
<add address="*" maxconnection="20"/>
</connectionManagement>
</system.net>
I still think it might be related to the number of allowed connections, but I have been unable to find information around it online very well. Is there something straightforward I'm missing? Any help would be much appreciated.

No crash however bizarre will escape the stack-dump. Try going through that dump and see if it points out to some obvious function.
Are you using some third party tool or some other component for the actual service call ? If yes, then please check the documentation/contact-the-person-who-wrote-it, to confirm that their components are thread safe. If they are not, you have large task ahead. :) (I have worked on DB which are not safe, so trust me it is not very uncommon to find few global static variables thrown around..)
Lastly if you are 100% sure that this is due multiple threads then, put a lock in your worked thread. Initially say it covers entire main-while-loop. Therotically it should not crash not as even though it is multi-threaded, you have serialized the execution.
Next step is to reduce to scope of the thread. Say, there are three functions in the
main-while-loop , say f1(), f2(), f3(), then start locking f2() and f3() while leaving f1 unlocked... If things work out, then problem is somewhere in f2 or f3().
I hope you got the idea of what I am suggest
I know this is like blind man guessing elephant, but that is the best you can do, if your code uses LOT many external component which are not adequately documented.

IIS hosted web service method call randomly dies

We have an IIS hosted web method which is randomly dying on us about 10% of the time. In trying to debug this we've added Log.Debug() messages in front of every real code line and it appears to be dying on random lines.
Has anyone seen this or have an idea on how to debug this?
[Additional Details]
We've spent a lot of time looking at it and have discovered the following...
We have a seperate self-hosted WCF Service that access the same database and lives on the same machine. When it is under heavy load the web method croaks every time. If it's not under load then things usually work fine (but not 100%).
High CPU doesn't seem to be part of the problem. We ran a small app that created a high cpu load and the web service did not die.
The web service dies when we either new up an XmlSerializer (without doing the sgen precomp) OR have NHibernate create a SessionFactory. The only two things these things have in common is that they 1) seem like things people commonly do.. 2) seem like they would be fairly intensive.
We've added a Global.asax to try to capture Application_End and Application_Error but neither event gets fired. This to me implies that we're not dealing with a normal application pool resetting?

Sounds like it might be a threading issue. You are using informative debug messages -- you should try to reproduce the issue while running the debugger and breaking on all exceptions. Make sure you check all the windows logs for information on why the app pool crashed.
Per comment: It's hard to say, but many things can cause a thread to appear to "just die." Memory issues: are you doing any interop? Improper marshaling: are you touching data on another thread? But, I will play the probabilities and ask if you're sure your handling any exception that might be happening and logging it. Are you sure you are not gobbling up an exception and not reporting it? Somewhere down low? Is this a permissions issue? Are you running partial trust or on a low privilege user account?

Figured it out.. two problems really..
We added Global.asax but it didn't get copied over which explains why we weren't seeing any messages. We fixed this and found out that...
Our WCF log was being written out to the bin directory of the IIS Web Service. In retrospect this is kind of silly since the WS is an old school web service. The WCF stuff is in the same directory only for some reason that is unknown to us since the initial person who set things up is gone..
Lesson learned.. Somewhere there is a message that explains everything.. you just have to find it.

How to debug random crashes?

we have a dotnet 2.0 desktop winforms app and it seems to randomly crash. no stack trace, vent log, or anything. it just dissapears.
There are a few theories:
machine simply runs out of resources. some people have said you will always get a window handle exception or a gdi exception but others say it might simply cause crashes.
we are using wrappers around non managed code for 2 modules. exceptions inside either of these modules could cause this behavior.
again, this is not reproducible so i wanted to see if there were any suggestions on how to debug better or anything i can put on the machine to "catch" the crash before it happens to help us understand whats going on.

Your best bet is to purchase John Robbins' book "Debugging Microsoft .NET 2.0 Applications". Your question can go WAY deeper than we have room to type here.

Sounds for me like you need to log at first - maybe you can attach with PostSharp a logger to your methods (see Log4PostSharp) . This will certainly slow you down a lot and produce tons of messages. But you should be able to narrow the problematic code down ... Attach there more logs - remove others. Maybe you can stress-test this parts, later. If the suspect parts are small enough you might even do a code review there.
I know, your question was about debugging - but this could be an approach, too.

You could use Process Monitor from SysInternals (now a part of Microsoft) filtered down to just what you want to monitor. This would give you a good place to start. Just start with a tight focus or make sure you have plenty of space for the log file.

I agree with Boydski. But I also offer this suggestion. Take a close look at the threads if you're doing multi-threading. I had an error like this once that took a long long time to figure out and actually ended up with John Robbins on the phone helping out with it. It turned out to be improper exception handling with threads.

Run your app, with pdb files, and attach WinDbg, make run.
Whem crash occur WinDbg stop app.
Execute this command to generate dump file :
.dump /ma c:\myapp.dmp
An auxiliary tool of analysis is ADPlus
Or try this :
Capturing user dumps using Performance alert
How to use ADPlus to troubleshoot "hangs" and "crashes"
Debugging on the Windows Platform

Do you have a try/catch block around the line 'Application.Run' call that starts the GUI thread? If not do add one and put some logging in for any exceptions thrown there.
You can also wrie up the Application.ThreadException event and log that in case that gives you some more hints.

you should use a global exception handler:
http://msdn.microsoft.com/en-us/library/system.windows.forms.application.threadexception.aspx

I my past I have this kind of behaviour primarily in relation to COM objects not being released and/or threading issues. Dive into the suggestions that you have gotten here already, but I would also suggest to look into whether you properly release the non-managed objects, so that they do not leak memory.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.