there is this check-in-out program here at my workplace, it only takes the data from check-in-out machine and store it in our database, but suddenly out of nowhere started to report an error on Thursdays but only once at a random time during the day, so when I detect the error, I run the program but nothing happens, so I want to debug it every 5-10 mins to see if I catch the error to see what is happening, how can I do this?
Logging is your friend. Add lots of logging (either use the built-in Trace logging or use some framework such as log4net). Use the different log levels to control how much logging you get out. At verbose levels you can for instance log when you enter and exit important methods, log the input arguments and return values. Log in catch blocks and so on. Then analyse the log files after the next error is reported.
What kind of error logging are you currently implementing in this application? If none, would you consider adding in comprehensive application logging, such as the Log4Net tool? Or if this is a web application the ELMAH tool?
This way you can log every error that happens along with its details, like stack trace to track down the problem.
Some thoughts:
Check the server event log to see if there are any crash minidumps you can pull out. These can tell you a lot about what happened when the program crashed (call stack, etc).
Or write a wrapper program that can run your program and detect when it fails, then take a snapshot of the server's state at that moment so you can (hopefully) re-execute the task with the necessary data to get a repeatable crash in your debugger.
Or just add loads of logging. You could use PostSharp to add trace that tells you every method that you enter and exit, so you can easily determine which method was running when it failed.
And you can add robustification code. Check religiously for nulls, etc. and you may well find you've corrected the problem without necessarily knowing which fix fixed it.
And if the program's not too big, just being old fashioned and desk checking (reading a print-out of) the code may well turn up some bugs.
Another approach (getting a bit more experimental) might be to modify the program to run continuously so you can stick it in a debugger and leave it till you hit an exception. (run a loop, wait for a trigger file to be refreshed or something, and then kick off the normal process - about 5 lines of code would probably suffice)
Related
I have a windows c# program (.net 3.5) that runs in the background on a real time controller (Windows 7 Embedded).
After a few days the process apparantly terminates.
Now I am thoroughly logging the actions and exceptions of my program.
Everything that can throw an exception is surrounded by try/catch and is being writting to a log file, and if the logger has an exception that is written to the windows event log.
But there is nothing in the logs that hints to an exception that could cause a program termination.
Are there other techniques to catch and log program termination?
So far I couldn't reproduce the bug in an online debug session.
The program itself is very simple.
It uses two stopwatches and spawns a thread to communicate with a real time program. It writes an 'alive' boolean, an 'error' integer and reads another integer. The log file size is limited to 10 MBytes. If the size is reached, the file is renamed with a date and a new file is opened. Files older than 30days will be deleted.
I have checked the windows event logs and couldn't find anything that directly relates to the program.
What would you do?
I could my answer my own question.
Unfortunately I left out a detail in my question which I deemed irrelevant.
The application is started by the windows task scheduler at system boot before any user logs in. There is a checkbox at the task properties that terminates the task if it takes longer than 3 days. I forgot to uncheck it :(.
Thank you for your contributions.
The best approach for you would be to consider the global exceptions. You can Catching the First Chance Exceptions which will catch every exception that occurs in your application which can be done as simply as adding the following piece of code:
AppDomain.CurrentDomain.FirstChanceException += (sender, eventArgs) =>
{
Debug.WriteLine(eventArgs.Exception.ToString());
};
Also for more details about handling the global exception you can refer to this really helpful article.
We use a lot of custom Windows services in our applications. However, the one I'm currently working on has an infuriating problem: while the service keeps running, it simply stops functioning.
The Main method of the service is wrapped in a try/catch block, like this:
static void Main()
{
IRepository rep = new Repository();
ILogger log = LogManager.GetLogger(GetType().Name);
TimeSpan loadWindowStart = new TimeSpan(9, 0, 0);
TimeSpan loadWindowEnd = new TimeSpan(18, 0, 0);
foreach (SuppressionLoad sl in rep.GetSuppressionLoads().ToList())
{
try
{
// do stuff
}
catch(Exception ex)
{
// log error
}
}
}
The service also logs as it does stuff, and we can watch the logs fill up while it's busy.
Sometimes, however, the logs just stop. And activity elsewhere in the database suggests the entire service has stopped working. Checking in Services on the server, the service still shows a Status of "Started". It takes up almost zero system resources while it's in this state, although it's normally quite processor intensive. If you try and stop it, it just times out trying and, as far as we can tell, it never stops of its own accord. The process has to be killed in Task Manager.
There is nothing untoward in the log in the run up to these stalls. There is also nothing we can find in Event Viewer.
Since it doesn't log an error, I'm at a loss as to what's going on here, or what we can do to try and diagnose the fault from here. It's highly intermittent - it will often run for several days without problem before entering the state. What can we do to investigate what's going on?
Matt; Obscure problems such as these are difficult to find in the best of conditions - if your service happens to use threads (which I assume it does), it becomes tremendously more difficult and you can't rely on global try/catch.
A simple thing to try would be NBug (no association). It will catch un-handled exceptions and give you some info about them. I don't think it will get you enough though.
The general way to find these sorts of things is log, log, log. You have to be able to come as close to recreating the problem as possible - you need logs that tell your entry points into each method, the variable values, exception stack traces if hit, how long you spent in each method, etc. There are some really good tools out there for logging some logging tools so I won't bother with recommending any. You can wrap your logging in a conditional compile switch so once you find your issue you won't suffer a performance hit when you turn it off.
Probably not the answer you wanted, but the only thing that has really worked for me over the years.
SteveJ
It sounds like the issue could be anywhere and doesn't necessarily have much to do with code provided.
Suggestions on how to go about it
When service hangs, attach a debugger and take a look at threads and see where each one is. You may need to rebuild and run a debug version of your solution so that debugger has necessary contextual symbol data. Questions to ask:
Are all the threads that I'm expecting to be there are there, or are some gone or unaccounted for?
Are threads stuck in a deadlock (I'm suspecting that's what's happening), and if so, on what resources.
Turn on detailed logging and sprinkle in more debug log statements to isolate where in code flow it last was and where it didn't make it to, and then keep narrowing down the location. Consider logging contextual data so that when you isolate problematic line or code block, you have context to try to understand why odd behavior takes place. Just be mindful of logging sensitive information (i.e. passwords, PII, etc.)
With full credit to IInspectable's comment, you can try to take a full dump of the process (SysInternal's Process Explorer or ProcDump let's you do that, or Task Manager). It tends to be quite an involved experience using the tool, but used right can give a lot of insight, and possibly find the issue on first occurrence.
Considering that it happens infrequently, and the field of what and where is wide open, it'll likely take a few iterations of having the problem trigger in order to narrow down the scope.
I am working on a tool that monitors a number of applications and ensures they are always running and in a clean state. Some of these applications have unhandled exceptions which do occur periodically and present the 'send crash report' window. I do not have the source code to these applications.
Is there any mechanism I could use to catch the exceptions, or simply identify their exception type, as well as identify the application's main executable file that threw the exception.
I'm not trying to do anything crazy like catch and handle it on the applications behalf, I'm simply trying to capture the exception type, log it and then restart the application.
Trapping unhandled exceptions requires calling SetUnhandledExceptionFilter() in the process. That's going to be difficult to do if you don't have source code, although it is technically possible by injecting a DLL into the process. This however cannot be done with managed code, you can't get the CLR initialized properly.
The default unhandled exception handler that Windows installs will always run WerFault.exe, the Windows Error Reporting tool. That can be turned off but that's a system setting. Expecting your user or admin to do this is not realistic. Only after WER runs will the JIT debugger get a shot at it.
I recommend a simpler approach, one that's also much more selective. Use the Process class to get the program you're interested in protecting started. When the Exited event fires, use the ExitCode property to find out how it terminated. Any negative value is a sure sign that the process died on an unhandled exception, the exit code matches the exception code. You can use the EventLog class to read the event message that WER writes to the Windows event log. And you can restart it the same way you got it started.
Without modifying the source of the application or injecting a DLL into the process I do not believe this is possible in a reliable fashion. What you're attempting to do is inspect type information across a process boundary. This is not easy to achieve for a number of reasons
Getting the name of the exception implies executing code in the target process. The name can be generated a number of ways (.ToString, or .GetType().Name) but all involve executing some method
You want to do this after the application has already crashed and hence may be in a very bad state. Consider trying to execute code if memory was corrupted (perhaps even corrupting the in memory definitions of the type).
The crash could occur in native code and not involve any managed data
If you want to monitor application crashes system wide, you can register yourself as a just-in-time debugger. You can edit the registry to specify which debugger to run when an application crashes. The example they give is Doctor Watson, but it could be your application instead.
i have to implement a Info Terminal. I choose dot.net and the terminal is only a touchpad.
So this System running 7 days 24 hours.
So i call a Webservice, display Data, show Website stuff. Many things can going wrong.
Have you some recommendations for this scenario?
Every function in an try catch? AppDomain.CurrentDomain.UnhandledException event?
Thanks Andreas
Basically, you should handle any error as soon as it is possible - so if you're calling webservice wrap all the calls in a try/catch block and handle the error there - you can, for example, log the exact error, aggregate many webservice-related exception in more generic, DataSourceFaultException (name is only for example), which will be then received by UI and UI will be able to easily determine, that it can't display requested info because communication failed, and choose to retry, notify user or do anything else.
However - with long running application there are many more errors you'll have to deal with. Many of them are not easy to predict, as they're not necessarily related to any specific call - you can run out of memory, a recursion might cause stack overflow, a system timer can reach it's max value and start from the beginning etc.
You shouldn't handle those errors in every method, as it will only hurt code readibility and will be error prone. Those errors are best handled by UnhandledException event. However, you must remember, that when the exception reaches UnhandledException event, you cannot assume anything about state of your application - the error might have corrupted some (or even all) of internal state. So when such condition occurs, it's best to try to create an error log and gracefuly restart the application (not necessarily whole application - maybe it will be possible to reinitialize application's state - if so, that's a valid option too. However, you must be aware that you won't be able to recover from some errors and handle such situation anyway).
It depends.
If you can handle appropriately an exception within a function - handle it. If not - create a global exception handler to inform user or log it.
we have a dotnet 2.0 desktop winforms app and it seems to randomly crash. no stack trace, vent log, or anything. it just dissapears.
There are a few theories:
machine simply runs out of resources. some people have said you will always get a window handle exception or a gdi exception but others say it might simply cause crashes.
we are using wrappers around non managed code for 2 modules. exceptions inside either of these modules could cause this behavior.
again, this is not reproducible so i wanted to see if there were any suggestions on how to debug better or anything i can put on the machine to "catch" the crash before it happens to help us understand whats going on.
Your best bet is to purchase John Robbins' book "Debugging Microsoft .NET 2.0 Applications". Your question can go WAY deeper than we have room to type here.
Sounds for me like you need to log at first - maybe you can attach with PostSharp a logger to your methods (see Log4PostSharp) . This will certainly slow you down a lot and produce tons of messages. But you should be able to narrow the problematic code down ... Attach there more logs - remove others. Maybe you can stress-test this parts, later. If the suspect parts are small enough you might even do a code review there.
I know, your question was about debugging - but this could be an approach, too.
You could use Process Monitor from SysInternals (now a part of Microsoft) filtered down to just what you want to monitor. This would give you a good place to start. Just start with a tight focus or make sure you have plenty of space for the log file.
I agree with Boydski. But I also offer this suggestion. Take a close look at the threads if you're doing multi-threading. I had an error like this once that took a long long time to figure out and actually ended up with John Robbins on the phone helping out with it. It turned out to be improper exception handling with threads.
Run your app, with pdb files, and attach WinDbg, make run.
Whem crash occur WinDbg stop app.
Execute this command to generate dump file :
.dump /ma c:\myapp.dmp
An auxiliary tool of analysis is ADPlus
Or try this :
Capturing user dumps using Performance alert
How to use ADPlus to troubleshoot "hangs" and "crashes"
Debugging on the Windows Platform
Do you have a try/catch block around the line 'Application.Run' call that starts the GUI thread? If not do add one and put some logging in for any exceptions thrown there.
You can also wrie up the Application.ThreadException event and log that in case that gives you some more hints.
you should use a global exception handler:
http://msdn.microsoft.com/en-us/library/system.windows.forms.application.threadexception.aspx
I my past I have this kind of behaviour primarily in relation to COM objects not being released and/or threading issues. Dive into the suggestions that you have gotten here already, but I would also suggest to look into whether you properly release the non-managed objects, so that they do not leak memory.