How to debug random crashes? - c#

we have a dotnet 2.0 desktop winforms app and it seems to randomly crash. no stack trace, vent log, or anything. it just dissapears.
There are a few theories:
machine simply runs out of resources. some people have said you will always get a window handle exception or a gdi exception but others say it might simply cause crashes.
we are using wrappers around non managed code for 2 modules. exceptions inside either of these modules could cause this behavior.
again, this is not reproducible so i wanted to see if there were any suggestions on how to debug better or anything i can put on the machine to "catch" the crash before it happens to help us understand whats going on.

Your best bet is to purchase John Robbins' book "Debugging Microsoft .NET 2.0 Applications". Your question can go WAY deeper than we have room to type here.

Sounds for me like you need to log at first - maybe you can attach with PostSharp a logger to your methods (see Log4PostSharp) . This will certainly slow you down a lot and produce tons of messages. But you should be able to narrow the problematic code down ... Attach there more logs - remove others. Maybe you can stress-test this parts, later. If the suspect parts are small enough you might even do a code review there.
I know, your question was about debugging - but this could be an approach, too.

You could use Process Monitor from SysInternals (now a part of Microsoft) filtered down to just what you want to monitor. This would give you a good place to start. Just start with a tight focus or make sure you have plenty of space for the log file.

I agree with Boydski. But I also offer this suggestion. Take a close look at the threads if you're doing multi-threading. I had an error like this once that took a long long time to figure out and actually ended up with John Robbins on the phone helping out with it. It turned out to be improper exception handling with threads.

Run your app, with pdb files, and attach WinDbg, make run.
Whem crash occur WinDbg stop app.
Execute this command to generate dump file :
.dump /ma c:\myapp.dmp
An auxiliary tool of analysis is ADPlus
Or try this :
Capturing user dumps using Performance alert
How to use ADPlus to troubleshoot "hangs" and "crashes"
Debugging on the Windows Platform

Do you have a try/catch block around the line 'Application.Run' call that starts the GUI thread? If not do add one and put some logging in for any exceptions thrown there.
You can also wrie up the Application.ThreadException event and log that in case that gives you some more hints.

you should use a global exception handler:
http://msdn.microsoft.com/en-us/library/system.windows.forms.application.threadexception.aspx

I my past I have this kind of behaviour primarily in relation to COM objects not being released and/or threading issues. Dive into the suggestions that you have gotten here already, but I would also suggest to look into whether you properly release the non-managed objects, so that they do not leak memory.

Related

C# : catch all errors/exceptions of a mixed managed/unmanaged process

I have a big and complex process that runs on a production environment that's basically a WPF user interface developed in C#. It also hosts threads and DLL's written in C++ unmanaged and managed code.
Typically, if an exception raises, it's caught and the related stack dump is written in a log file for post-mortem debugging purposes. Unfortunately, from time to time, the application crashes without writing any information in the log so we have no clue about who's causing the crash.
Does anybody know how to detect and eventually trace all the causes that make the application crash and are not detected with a simple try-catch block?
To give an example I saw that StackOverflow Exception is not caught and also errors like 0xc0000374 coming from unmanaged code are not detected. This is not a matter of debugging it. I know I can attach a debugger to the system and try to reproduce the problem. But as I told this is a production system and I have to analyze issues coming from the field after the problem occurred.
Unlike C# exceptions, C++ exceptions do not catch hardware exceptions such as access violations or stack overflows since C++ apps run unmanaged and directly on the cpu.
For post-crash analysis I would suggest using something like breakpad. breakpad will create a dump file which will give you very useful information such as a call-stack, running threads and stack/heap memory depending on your configuration.
This way you would not need to witness the crash happening or even try to reproduce it which I know from experience can be very difficult. All you would need is a way to retrieve these crash dumps from your users devices.
You can log exception by subscribing to AppDomain.UnhandledException event. Its args.ExceptionObject argument is of type object and is designed not to be limited by C# exceptions, so you can call ToString method to log it somewhere.
Also check MSDN docs for its limitations. For instance:
Starting with the .NET Framework 4, this event is not raised for exceptions that corrupt the state of the process, such as stack overflows or access violations, unless the event handler is security-critical and has the HandleProcessCorruptedStateExceptionsAttribute attribute.
Solved ! I followed Mohamad Elghawi suggestion and I integrated breakpad. After I struggled a lot in order to make it compiling, linking and working under Visual Studio 2008, I was able to catch critical system exceptions and generate a crash dump. I'm also able to generate a dump on demand, this is useful when the application stuck for some reason and I'm able to identify this issue with an external thread that monitors all the others.
Pay attention ! The visual studio solution isn't included in the git repo and the gyp tool, in contradiction as wrongly mentioned in some threads, it's also not there. You have to download the gyp tool separately and work a bit on the .gyp files inside the breadpad three in order to generate a proper solution. Furthermore some include files and definitions are missing if you want to compile it in Visual Studio 2008 so you have also to manage this.
Thanks guys !

How to debug an application which suddenly terminates without any feedback?

The application uses Xamarin.Android, which may be a big problem in itself. The problem is that sometimes it just quits (process is being terminated) and there's nothing in the log that can be associated with it. (although I guess that it's related to running out of memory, but I can't yet prove it — according to DDMS, most of the times all is OK, and if Xamarin.Android uses another pool of memory, then I don't know how to measure it)
I've searched the code base for "Environment.Exit" and, of course, didn't found anything.
What are the options for finding the culprit of such thing?
You could try to use the garbage collector by yourself. Just run
Runtime.getRuntime().gc();
The Runtime instance has also a method to read the free memory space. So you could figure out by yourself whether it's a memory problem.
EDIT:
Oh I read that Xamarin uses the C# language. But I'm quite sure that C# has similar methods.
When you say log, are you referring to an application log, or the device log?
When tracking down these sorts of bugs, I've always found aLogCat invaluable.
I open it, clear all the current logs, then use my application up to the point where it crashes. Then I quickly go back to aLogCat, pause it and scroll up to where the error is - it's usually found in the nearest red/orange blocks.
There's a blog post here about how I found attributes left out by the Xamarin linker using this method.

Try... Catch block infestation

I'm developing a suite of Excel add-ins for a company. I haven't done add-ins before, so I'm not terribly familiar with some of the intricacies. After delivering my first product, the user encountered errors that I didn't experience/encounter/notice during my testing. Additionally, I was having difficulty reproducing them from within Visual Studios debug environment.
I wound up writing a light weight logging class that received messages from various parts of the program. The program isn't huge, so it wasn't a whole lot of work. But what I did end up with was nearly every single line of code wrapped up in Try... Catch blocks so I could log things happening in the users environment.
I think I implemented it decently enough, I tried to avoid wrapping calls to other classes or modules and instead putting the block inside the call, so I could more accurately identify who was throwing, and I didn't swallow anything, I always threw the exception after I recorded the information I was interested in.
My question is, essentially, is this okay? Is there a better way to tackle this? Am I waaaay off base?
Quick Edit: Importantly, it did work. And I was able to nail down the bug and resolve it.
No, you are not way off base. I believe this is the only way to handle errors when writing Add-ins. I am selling an Outlook add-in myself which uses this pattern. A couple of notes though:
You only need to wrap the top-level methods, either exposed to the user interface directly or triggered by other events.
Make sure your logging routine traverses the Exception tree recursively, also logging InnerExceptions.
Instead of rethrowing the exception you might consider displaying some sort of error form instead.
And then a couple of comments to those notes:
I'm sure you understand this, but your comment "nearly every single line of code is wrapped(...)" made me want to underline this. But yes, all your code should eventually end up in a catch (System.Exception)-block so that you can log your Exception. I disagree completely with Greg saying this is "dangerous". What is dangerous is not handling your exceptions.
If you do this I don't think you need to "avoid wrapping calls to other classes and modules", if I understand you correctly. I have a published a convenient extension method GetAsString that allows me to log what I need at github.
In Outlook, if an Exception bubbles up to Outlook itself, your Add-in might get disabled or even crash Outlook if it happens on a background thread. Isn't it the same in Excel? Therefore I go to great lengths not to let any exception out of my application. Of course you need to make sure your application can continue running after this, or allow for a graceful shutdown.

Find out where a multi-threading C# program is stuck

I have written a C# program having some thousand lines and several threads. Execution works fine for several hours/days until the application, which is running on a Windows server, starts to slow everything (other programs/web server) down as it suddenly begins to use up about 50-80% of the CPU.
I assume that it is stuck in some while-loop, but I do not know where exactly. It would already be a help to know which thread uses up the largest share of the system resources. As there is not any exception thrown, I do not see any direct possibility to find out.
The code has already been inspected but I did not find any significant/obvious programming mistakes.
Does anybody know a good way to let Visual Studio monitor the current CPU-load in order to show where it is used up?
Oh, how I love Managed Stack Explorer. http://mse.codeplex.com/ for most versions of .NET and https://github.com/vadimskipin/MSE for 4.0
Point it at your running application, and it snapshots the stacks of each thread. After looking at a few snapshots, you'll work out where the problem is. Oh, the nastiness of the problem I was facing when I first learnt of this. Oh the joy when it let me find what I'd been hunting for days in just a few minutes.
You can use Perfview (http://www.microsoft.com/en-us/download/details.aspx?id=28567) to capture and the analyse ETL traces when this happens.

What is Environment.FailFast?

What is Environment.FailFast?
How is it useful?
It is used to kill an application. It's a static method that will instantly kill an application without being caught by any exception blocks.
Environment.FastFail(String) can
actually be a great debugging tool.
For example, say you have an
application that is just downright
giving you some weird output. You have
no idea why. You know it's wrong, but
there are just no exceptions bubbling
to the surface to help you out. Well,
if you have access to Visual Studio
2005's Debug->Exceptions... menu item,
you can actually tell Visual Studio to
allow you to see those first chance
exceptions. If you don't have that,
however you can put
Environment.FastFail(String) in an
exception, and use deductive reasoning
and process of elimination to find out
where your problem in.
Reference
It also creates a dump and event viewer entry, which might be useful.
It's a way to immediately exit your application without throwing an exception.
Documentation is here.
Might be useful in some security or data-critical contexts.
Failfast can be used in situations where you might be endangering the user's data. Say in a database engine, when you detect a corruption of your internal data structures, the only sane course of action is to halt the process as quickly as possible, to avoid writing garbage to the database and risk corrupting it and lose the user's data. This is one possible scenario where failfast is useful.
Another use is to catch programmer errors. Say you are writing a library and some function accepts a pointer that cannot be null in any circumstance, that is, if it's null, you are clearly in presence of a programmer error. You can return an error like E_POINTER or throw some InvalidArgument exception and hope someone notices, but you'll get their attention better by failing fast :-)
Note that I'm not restricting the example to pointers, you can generalize to any parameter or condition that should never happen. Failing fast ultimately results in better quality apps, as many bugs no longer go unnoticed.
Finally, failing fast helps with capturing the state of the process as faithfully as possible (as a memory dump gets created), in particular when failing fast immediately upon detecting an unrecoverable error or a really unexpected condition.
If the process was allowed to continue, say the 'finally' clauses would run or the stack would be unwound, and things would get destroyed or disposed-of, before a memory dump is taken, then the state of the process might be altered in such as way that makes it much more difficult to diagnose the root cause of the problem.
It kills the application and even skips try/finally blocks.
From .NET Framework Design Guidelines on Exception Throwing:
✓ CONSIDER terminating the process by calling System.Environment.FailFast (.NET Framework 2.0 feature) instead of throwing an exception if your code encounters a situation where it is unsafe for further execution.
Joe Duffy discusses failing fast and the discipline to make it useful, here.
http://joeduffyblog.com/2014/10/13/if-youre-going-to-fail-do-it-fast/
Essentially, he's saying that for programming bugs - i.e. unexpected errors that are the fault of the programmer and not the programme user or other inputs or situations that can be reasonable expected to be bad - then deciding to always fail fast for unexpected errors has been seen to improve code quality.
I think since its an optional team decision and discipline, use of this API in C# is rare since in reality we're all mostly writing LoB apps for 12 people in HR or an online shop at best.
So for us, we'd maybe use this when we want deny the consumer of our API the opportunity of making any further moves.
An unhandled exception that is thrown (or rethrown) within a Task won't take effect until the Task is garbage-collected, at some perhaps-random time later.
This method lets you crash the process now -- see this answer.

Categories