I have a windows c# program (.net 3.5) that runs in the background on a real time controller (Windows 7 Embedded).
After a few days the process apparantly terminates.
Now I am thoroughly logging the actions and exceptions of my program.
Everything that can throw an exception is surrounded by try/catch and is being writting to a log file, and if the logger has an exception that is written to the windows event log.
But there is nothing in the logs that hints to an exception that could cause a program termination.
Are there other techniques to catch and log program termination?
So far I couldn't reproduce the bug in an online debug session.
The program itself is very simple.
It uses two stopwatches and spawns a thread to communicate with a real time program. It writes an 'alive' boolean, an 'error' integer and reads another integer. The log file size is limited to 10 MBytes. If the size is reached, the file is renamed with a date and a new file is opened. Files older than 30days will be deleted.
I have checked the windows event logs and couldn't find anything that directly relates to the program.
What would you do?
I could my answer my own question.
Unfortunately I left out a detail in my question which I deemed irrelevant.
The application is started by the windows task scheduler at system boot before any user logs in. There is a checkbox at the task properties that terminates the task if it takes longer than 3 days. I forgot to uncheck it :(.
Thank you for your contributions.
The best approach for you would be to consider the global exceptions. You can Catching the First Chance Exceptions which will catch every exception that occurs in your application which can be done as simply as adding the following piece of code:
AppDomain.CurrentDomain.FirstChanceException += (sender, eventArgs) =>
{
Debug.WriteLine(eventArgs.Exception.ToString());
};
Also for more details about handling the global exception you can refer to this really helpful article.
Related
I have an application (written in C# / ClickOnce) which, for the most part, works fine; it has no memory leaks and runs reliably and is stable for days at a time.
However, it also utilises MEF (so plugins/extensions can be dynamically added to the core assembly). Again, this 'works' currently, but if an exception/fatal error occurs in an externally linked assembly/plugin, it will crash the entire application.
After some recent testing, I found that the application had crashed after around 14 hours of [successful] operation.
With that in mind, my question is really two-fold:
a) is it possible to catch any unhanded exception a plugin (or the main application) may throw, so it can at least output info for debugging assistance?
b) I can't be sure if it was the plugin or the main application which failed. Therefore, I cannot think where to start debugging/tracing the issue. How does one go about finding a bug which only occurs after such a long period of time?
Thanks for any input.
As noted in the comments (I guess I should have read those before I started typing up an answer...)
When an application throws an exception that is unhandled, it fires the UnhandledException event just before death and you can subscribe to that in order to log any details that will help you figure out what happened.
See this SO post for an example on how to use it (includes ThreadException).
I need a little advice.
I've got windows service that runs at night. In my development environment, it runs without exception, but when I running it "installed on other machines", when I come in the morning, I'm welcomed with a System.Overflow exception that says that I've set an int32 to a value that is too large or small.
I've carefully combed the service's c# code, and I have try/catch statements around everything, that should catch any error and write it to a log without completely stopping my service with this overflow exception. But still, it occurs and stops the service.
I'd appreciate any conceptual advice on how to pin point what's causing an error such as this.
The problem with adding try/catch blocks throughout your code is it's easy to miss a place and hence not log the exception. A much more robust approach is to use the value AppDomain.UnhandledException. This will fire for any exception which is not handled within the current AppDomain. You can hook into this to write to your log file and hopefully get a better idea at where the error is coming form
AppDomain.CurrentDomain.UnhandledException += OnUnhandledException;
...
void OnUnhandledException(object sender, UnhandledExceptionEventArgs e) {
// Log the exception here
}
You must have missed an instance of try/catch or you have an error in your error handler or it would be handled.
In windows you can setup your service to restart if it ends unexpected manner.
First, let me tell you that overflow/underflow can be disabled, by opening project properties, selecting "Build" (from the left menu), and then "Advanced...".
By the way, I do it all the time, because it adds overhead, which I don't want in my scientific applications. Manually, then I control things.
But that does not solve the issues... It just hides the "problem", if it is a problem.
If you only care to check about a zero/non zero condition, somewhere in your code, disabling overflow at project settings solves the problem, because even when an overflow occurs you get a nonzero value, and your service continues without problems.
Also, for WinForms applications:
http://msdn.microsoft.com/en-us/library/system.windows.forms.application.threadexception.aspx
And WPF:
http://msdn.microsoft.com/en-us/library/system.windows.application.dispatcherunhandledexception.aspx
To get visibility of exceptions on UI threads, since they're not forwarded to AppDomain.CurrentDomain.UnhandledException.
I am working on a tool that monitors a number of applications and ensures they are always running and in a clean state. Some of these applications have unhandled exceptions which do occur periodically and present the 'send crash report' window. I do not have the source code to these applications.
Is there any mechanism I could use to catch the exceptions, or simply identify their exception type, as well as identify the application's main executable file that threw the exception.
I'm not trying to do anything crazy like catch and handle it on the applications behalf, I'm simply trying to capture the exception type, log it and then restart the application.
Trapping unhandled exceptions requires calling SetUnhandledExceptionFilter() in the process. That's going to be difficult to do if you don't have source code, although it is technically possible by injecting a DLL into the process. This however cannot be done with managed code, you can't get the CLR initialized properly.
The default unhandled exception handler that Windows installs will always run WerFault.exe, the Windows Error Reporting tool. That can be turned off but that's a system setting. Expecting your user or admin to do this is not realistic. Only after WER runs will the JIT debugger get a shot at it.
I recommend a simpler approach, one that's also much more selective. Use the Process class to get the program you're interested in protecting started. When the Exited event fires, use the ExitCode property to find out how it terminated. Any negative value is a sure sign that the process died on an unhandled exception, the exit code matches the exception code. You can use the EventLog class to read the event message that WER writes to the Windows event log. And you can restart it the same way you got it started.
Without modifying the source of the application or injecting a DLL into the process I do not believe this is possible in a reliable fashion. What you're attempting to do is inspect type information across a process boundary. This is not easy to achieve for a number of reasons
Getting the name of the exception implies executing code in the target process. The name can be generated a number of ways (.ToString, or .GetType().Name) but all involve executing some method
You want to do this after the application has already crashed and hence may be in a very bad state. Consider trying to execute code if memory was corrupted (perhaps even corrupting the in memory definitions of the type).
The crash could occur in native code and not involve any managed data
If you want to monitor application crashes system wide, you can register yourself as a just-in-time debugger. You can edit the registry to specify which debugger to run when an application crashes. The example they give is Doctor Watson, but it could be your application instead.
there is this check-in-out program here at my workplace, it only takes the data from check-in-out machine and store it in our database, but suddenly out of nowhere started to report an error on Thursdays but only once at a random time during the day, so when I detect the error, I run the program but nothing happens, so I want to debug it every 5-10 mins to see if I catch the error to see what is happening, how can I do this?
Logging is your friend. Add lots of logging (either use the built-in Trace logging or use some framework such as log4net). Use the different log levels to control how much logging you get out. At verbose levels you can for instance log when you enter and exit important methods, log the input arguments and return values. Log in catch blocks and so on. Then analyse the log files after the next error is reported.
What kind of error logging are you currently implementing in this application? If none, would you consider adding in comprehensive application logging, such as the Log4Net tool? Or if this is a web application the ELMAH tool?
This way you can log every error that happens along with its details, like stack trace to track down the problem.
Some thoughts:
Check the server event log to see if there are any crash minidumps you can pull out. These can tell you a lot about what happened when the program crashed (call stack, etc).
Or write a wrapper program that can run your program and detect when it fails, then take a snapshot of the server's state at that moment so you can (hopefully) re-execute the task with the necessary data to get a repeatable crash in your debugger.
Or just add loads of logging. You could use PostSharp to add trace that tells you every method that you enter and exit, so you can easily determine which method was running when it failed.
And you can add robustification code. Check religiously for nulls, etc. and you may well find you've corrected the problem without necessarily knowing which fix fixed it.
And if the program's not too big, just being old fashioned and desk checking (reading a print-out of) the code may well turn up some bugs.
Another approach (getting a bit more experimental) might be to modify the program to run continuously so you can stick it in a debugger and leave it till you hit an exception. (run a loop, wait for a trigger file to be refreshed or something, and then kick off the normal process - about 5 lines of code would probably suffice)
So I am sold on the concept of attempting to collect data automatically from a program - i.e., popping up a dialog box that asks the user to send the report when something goes wrong.
I'm working in MS Visual Studio C#.
From an implementation point of view, does it make sense to put a try/catch loop in my main program.cs file, around where the application is run? Like this:
try
{
Application.Run(new myMainForm());
}
catch (Exception ex)
{
//the code to build the report I want to send and to
//pop up the Problem Report form and ask the user to send
}
or does it make sense to put try/catch loops throughout pieces of the code to catch more specific exception types? (I'm thinking not because this is a new application, and putting in more specific exception catches means I have an idea of what's going to go wrong... I don't, which is why the above seems to make sense to me.)
-Adeena
I think you are right, you would not know what's going to go wrong, which is the point.
However, you might as well consider adding a handler to the ThreadException event instead.
The code above will work but there will be scenarios where multi threading might be an issue with such code, since not all code inside your windows forms program will be running in the main Application.Run loop thread.
Here's a sample code from the linked article:
[STAThread]
static void Main()
{
System.Windows.Forms.Application.ThreadException += new ThreadExceptionEventHandler(ReportError);
System.Windows.Forms.Application.Run(new MainForm());
}
private static void ReportError(object sender, ThreadExceptionEventArgs e)
{
using (ReportErrorDialog errorDlg = new ReportErrorDialog(e.Exception))
{
errorDlg.ShowDialog();
}
}
More documentation on MSDN.
On a minor point, using the ThreadException event also allow your main message loop to continue running in case the exception isn't fatal (i.e. fault-tolerance scenarios) whilst the try/catch approach may requires that you restart the main message loop which could cause side effects.
From an implementation point of view, does it make sense to put a try/catch loop in my main program.cs file, around where the application is run?
Sure and always.
You should use Try/Catch-Blocks wherever you do something critical, that might raise an exception.
Therefore you can not really stick to a pattern for that, because you should now, when what exception will be raised. Otherwise these are unhandled exceptions, which let your program crash.
But there are many exceptions, that need not to stop the application entirely, exceptions, that just can be swallowed over, as they are expected and do not critically need the application to stop. Example for that are UnauthorizedAccessExceptions when moving or accessing data with your programm.
You should try to keep you Try/Catch-Blocks as small as needed, as well as to use not too much of them, because of performance.
Some out there use Try/Catch for steering the program's execution. That should entirely be avoided wherever possible, cause raising an Exception is performance killer number 1.
Wrapping a try catch around the whole application will mean the application will exit upon error.
While using a try and catch around each method is hard to maintain.
Best practice is to use specific try catches around units of code that will throw specific exception types such as FormatException and leave the general exception handling to the application level event handlers.
try
{
//Code that could error here
}
catch (FormatException ex)
{
//Code to tell user of their error
//all other errors will be handled
//by the global error handler
}
Experience will tell you the type of things that could go wrong. Over time you will notice your app often throwing say IO exceptions around file access so you may then later catch these and give the user more information.
The global handlers for errors will catch everything else. You use these by hooking up event handlers to the two events System.Windows.Forms.Application.ThreadException (see MSDN) and AppDomain.UnhandledException (see MSDN)
Be aware that Out of Memory exceptions and StackOverflowException may not be caught by any error catching.
If you do want to automatically get stack traces, Microsoft do allow you to submit them via the Error Reporting Services. All you need to do is sign up for a Digital Certificate from VeriSign and register it (for free) with Microsoft.
Microsoft then gives you a login for you to download the mini-dumps from a website, which are submitted when a user clicks "Send Error Report".
Although people can hit "Don't Send", at least it is a Microsoft dialog box and possibly not one that you have to code yourself. It would be running 24/7, you wouldn't have to worry about uptime of your web server AND you can submit workaround details for users AND you can deliver updates via Windows Update.
Information about this service is located in this "Windows Error Reporting: Getting Started" article.
The best approach is to sing up for
AppDomain.UnhandledException and Application.ThreadException In your application's main function. This will allow you to record any unhandled exceptions in your application. Wraping run in a try catch block does not catch every thing.
if you just want to capture crashes, then ignore all errors and let DrWatson generate a minidump for you. Then you can look at that ina debugger (windbg is preferred for minidumps) and it'll show you the line your code went wrong on, and also all the parameters, stack trace, and registers. You can set Drwatson to generate a full dump where you'll get an entire memory core dump to investigate.
I wouldn't recommend putting a try/catch around the entire app unless you want your app to never "crash" in front of the user - it'll always be handled, and probably ignored as there's nothing you can do with the exception at that point.
Sending the minidump to you is another matter though, here's an article, you'll have to do some work to get it sent via email/http/ftp/etc.