I have an application (written in C# / ClickOnce) which, for the most part, works fine; it has no memory leaks and runs reliably and is stable for days at a time.
However, it also utilises MEF (so plugins/extensions can be dynamically added to the core assembly). Again, this 'works' currently, but if an exception/fatal error occurs in an externally linked assembly/plugin, it will crash the entire application.
After some recent testing, I found that the application had crashed after around 14 hours of [successful] operation.
With that in mind, my question is really two-fold:
a) is it possible to catch any unhanded exception a plugin (or the main application) may throw, so it can at least output info for debugging assistance?
b) I can't be sure if it was the plugin or the main application which failed. Therefore, I cannot think where to start debugging/tracing the issue. How does one go about finding a bug which only occurs after such a long period of time?
Thanks for any input.
As noted in the comments (I guess I should have read those before I started typing up an answer...)
When an application throws an exception that is unhandled, it fires the UnhandledException event just before death and you can subscribe to that in order to log any details that will help you figure out what happened.
See this SO post for an example on how to use it (includes ThreadException).
Related
In all WPF application I develop there is a global exception handler subscribed to AppDomain.CurrentDomain.UnhandledException which logs everything it can find and then shows a dialog box telling the user to contact the author, where the log file is etc. This works extremely well and both clients and me are very happy with it because it allows fixing problems fast.
However during development of a mixed WPF / C# / CLI / C++ application there are sometimes application crashes that do not make it to the aforementioned exception handler. Instead, a standard windows dialog box saying "XXX hast stopped working" pops up. In the details it shows eg
Problem Event Name: BEX
Application Name: XXX.exe
Fault Module Name: clr.dll
...
This mostly happens when calling back a managed function from within umanaged code, and when that function updates the screen. I didn't took me long to figure out the root cause of problem, but only because I can reproduce the crash at my machine and hook up the debugger: in all occasions the native thread was still at the point of calling the function pointer to the managed delegate that calls directly into C#/WPF code.
The real problem would be when this happens on a client machine: given that usually clients are not the best error reporters out there it migh take very, very long to figure out what is wrong when all they can provide me are the details above.
The question: what can I do to get more information for crashes like these? Is there a way to get an exception like this call a custom error handler anyway? Or get a process dump? When loading the symbols for msvcr100_clr0004.dll and clr.dll (loaded on the thread where the break occurrs), the call stack is like this:
msvcr100_clr0400.dll!__crt_debugger_hook()
clr.dll!___report_gsfailure() + 0xeb bytes
clr.dll!_DoJITFailFast#0() + 0x8 bytes
clr.dll!CrawlFrame::CheckGSCookies() + 0x2c3b72 bytes
Can I somehow hook some native C++ code into __crt_debugger_hook() (eg for writing a minidump)? Which leads me to an additional question: how does CheckGSCookies behave on a machine with no debugger installed, would it still call the same code?
update some clarification on the code: native C++ calls a CLI delegate (to which a native function pointer is acquired using GetFunctionPointerForDelegate which in turn calls a C# System.Action. This Action updates a string (bound to a WPF label) and raises a propertychanged event. This somehow invokes a buffer overflow (when updating very fast) in an unnamed thread that was not created directly in my code.
update looking into SetUnhandledExceptionFilter, which didn't do anything at first, I found this nifty article explaining how to catch any exception. It works, and I was able to write a minidump in an exception filter installed by using that procedure. The dump gives basically the same information as hooking the debugger: the real problem seems to be that the Status string is being overwritten (by being called from the native thread) while at the same time being read from the ui thread. All nice ans such, but it does require dll hooking which is not my favorite method to solve things. Another way would still be nice.
The .NET 4 version of the CLR has protection against buffer overflow attacks. The basic scheme is that the code writes a "cookie" at the end of an array or stack frame, then checks later if that cookie still has the original value. If it has changed, the runtime assumes that malware has compromised the program state and immediately aborts the program. This kind of abort does not go through the usual unhandled exception mechanism, that would be too exploitable.
The odds that your program is really being attacked are small of course. Much more likely is that your native code has a pointer bug and scribbles junk into a CLR structure or stack frame. Debugging that isn't easy, the damage is usually done well before the crash. One approach is to comment out code until the crash disappears.
I'm getting close to desperate.. I am developing a field service application for Windows Mobile 6.1 using C# and quite some p/Invoking. (I think I'm referencing about 50 native functions)
On normal circumstances this goes without any problem, but when i start stressing the GC i'm getting a nasty 0xC0000005 error witch seems uncatchable. In my test i'm rapidly closing and opening a dialog form (the form did make use of native functions, but for testing i commented these out) and after a while the Windows Mobile error reporter comes around to tell me that there was an fatal error in my application.
My code uses a try-catch around the Application.Run(masterForm); and hooks into the CurrentDomain.UnhandledException event, but the application still crashes. Even when i attach the debugger, visual studio just tells me "The remote connection to the device has been lost" when the exception occurs..
Since I didn't succeed to catch the exception in the managed environment, I tried to make sense out of the Error Reporter log file. But this doesn't make any sense, the only consistent this about the error is the application where it occurs in.
The thread where the application occurs in is unknown to me, the module where the error occurs differs from time to time (I've seen my application.exe, WS2.dll, netcfagl3_5.dll and mscoree3_5.dll), even the error code is not always the same. (most of the time it's 0xC0000005, but i've also seen an 0X80000002 error, which is a warning accounting the first byte?)
I tried debugging through bugtrap, but strangely enough this crashes with the same error code (0xC0000005). I tried to open the kdmp file with visual studio, but i can't seem to make any sense out of this because it only shows me disassembler code when i step into the error (unless i have the right .pbb files, which i don't). Same goes for WinDbg.
To make a long story short: I frankly don't have a single clue where to look for this error, and I'm hoping some bright soul on stackoverflow does. I'm happy to provide some code but at this moment I don't know which piece to provide..
Any help is greatly appreciated!
[EDIT May 3rd 2010]
As you can see in my comment to Hans I retested the whole program after I uncommented all P/Invokes, but that did not solve my problem. I tried reproducing the error with as little code as possible and eventually it looks like multi-threaded access is the one giving me all the problems.
In my application I have a usercontrol that functions as a finger / flick scroll list. In this control I use a bitmap for each item in the list as a canvas. Drawing on this canvas is handled by a separate thread and when i disable this thread, the error seems to disappear.. I'll do some more tests on this and will post the results here.
Catching this exception is not an option. It is the worst kind of heart attack a thread can suffer, the CPU has detected a serious problem and cannot continue running code. This is invariably caused by misbehaving unmanaged code, it sounds like you've got plenty of it running in your program. You need to focus on debugging that unmanaged code to get somewhere.
The two most common causes of an AV are
Heap corruption. The unmanaged code has written data to the heap improperly, destroying the structural integrity of the heap. Typically caused by overflowing the boundary of an allocated block of memory. Or using a heap block after it was freed. Very hard to diagnose, the exception will be raised long after the damage was done.
Stack corruption. Most typically caused by overflowing the boundaries of an array that was allocated on the stack. This can overwrite the values of other variables on the stack or destroy the function return address. A bit easier to diagnose, it tends to repeat well and has an immediate effect. One side-effect is that the debugger loses its ability to display the call stack right after the damage was done.
Heap corruption is the likely one and the hard one. This is most typically tackled by debugging the code in the debug build with a debug allocator that watches the integrity of the heap. The <crtdbg.h> header provides one. It's not a guaranteed approach, you can have some really nasty Heisenbugs that only rear their head in the Release build. Very few options available then, other than careful code review. Good luck, you'll need it.
It turns out to be an exception caused by Interlocked.
In my code there is an integer _drawThreadIsRunning which is set to 1 when the draw-thread is running, and set to 0 otherwise. I set this value using Interlocked:
if (Interlocked.Exchange(ref _drawThreadIsRunning, 1) == 0) { /* run thread */ }
When i change this line the whole thing works, so it seems that there is a problem with threadsafety somewhere, but i can't figure it out. (ie. i don't want to waste more time figuring it out)
Thanks for the help guys!
Using C#'s System.Diagnostics.Process object, I start an unmanaged exe, that later starts yet another unmanaged exe.
The 2nd exe is causing an unhandled-exception that I'd like my application to ignore, but can't seem to.
I'm using a try/catch statement when I start the first process, but it doesn't seem to catch the exception raised by the 2nd process. When the exception occurs, the just-in-time debugger notifies me and halts my application until I manually click "yes" I want to debug or "no". Then my application proceeds.
The JIT debugger doesn't have source code for the 2ndprocess.exe that is throwing the exception. So, it doesn't tell me what the exception is. I don't really care what the exception is, I just want to know how to catch it and ignore it so my application doesn't get halted by it. By the time the exception occurs, the work is done anyway.
Can anyone offer some insight?
You should be properly handling the exception in the second executable. Your main program won't catch the exception because it isn't throwing one, it is executing something that is.
Edit:
Do you have access to the source of the second process (the one throwing the exception)? Your application shouldn't ever just crash. If the exceptional case gets handled correctly in the second process, you won't have this problem in your primary application.
Edit2:
Since you have access to the source (open source) I recommend you fix the bug. This will help you in two ways:
1) Your program will finally work.
2) You can say you contributed to an open source project.
And, as a special bonus, you get to help out a project you use frequently. Win/Win
Since you are using process.start to actually launch the application, the application creates a separate application domain. Capturing the exception from that application is not something that I believe will be possible, since more than likely the JIT dialog is coming up due to that failed process.
Although not a solution you could stop the dialog if needed, but that has issues of its own.
I have a multithreaded .Net C# application, it uses Direct3D 9/10 and XAudio2. (Direct3D is accessed by only one thread, same for XAudio2. Direct3D isn't the problem cause the error is manifest in either DX9 or DX10 mode without any change in its behaviour.)
Sometimes (there are some areas that gives this problem randomly) this application crash in a rather unspectacular way. Even if the application is started through visual studio with debugger it crash without giving ANY kind of exception. (It start by saying "applicationname.svchost.exe is crashed, etc..etc..Do you want to debug?", if I press yes it tells me "you cannot debug an application already closed.)
There is no way to find out what's the cause of the crash? Cause i've run out of ideas, the debugger isn't giving me any information at all. Without an exception I can't even do a stacktrace or a dump. :P (I'm supposing is a synchronization problem (even thought in that area I'm only doing sequential work...), but hey why isn't launching an exception? :|)
In the areas where the problem occurs I'm unloading a reloading a series of classes related to a novel (in a sequential core thread, so I doubt it can be an issue) and starting a new music through XAudio2. (BTW, what are the multithreading consideration about XAudio2? Is it safe to call from multiple thread?)
Thanks for the help.
P.S. There is a software to attach to mine to monitor all the calls and tells me what's the last call before the crash?
You should try using Windbg, analyzing the crash dump should point you to the problem, if your suspicion is right and it is a synchronization problem, the cause of the problem may be hard to spot.
Have you checked Event Logs in your Windows Administration Panel?
All error of any kind are always logged in this section with minimal details.
One time I had an application that was crashing without exceptions and the only help I found was the Event Log Viewer where I discovered that the source of the crash was a StackOverflowException.
I know I'm opening myself to a royal flaming by even asking this, but I thought I would see if StackOverflow has any solutions to a problem that I'm having...
I have a C# application that is failing at a client site in a way that I am unable to reproduce locally. Unfortunately, it is very difficult (impossible) for me to get any information that at all helps in isolating the source of the problem.
I have in place a rather extensive error monitoring framework which is watching for unhandled exceptions in all the usual places:
Backstop exception handler in threads I control
Application.ThreadException for WinForms exceptions
AppDomain.CurrentDomain.UnhandledException
Which logs detailed information in a place where I have access to them.
This has been very useful in the past to identify issues in production code, but has not been giving me any information at about the current series of issues.
My best guess is that the core issue is one of the "rude" exception types (thread abort, out of memory, stack overflow, access violation, etc.) that are escalating to a rude shutdown that are ripping down the process before I have a chance to see what is going on.
Is there anything that I can be doing to snapshot information as my process is crashing that would be useful? Ideally, I would be able to write out my custom log format, but I would be happy if I could have a reliable way of ensuring that a crash dump is written somewhere.
I was hoping that I could implement class deriving from CriticalFinalizerObject and have it spit a last-chance error log out when it is disposing, but that doesn't seem to be triggered in the StackOverflow scenario which I tested.
I am unable to use Windows Error Reporting and friends due to the lack of a code signing certificate.
I'm not trying to "recover" from arbitrary exceptions, I'm just trying to make a note of what went wrong on the way down.
Any ideas?
You could try creating a minidump file. This is a C++ API, but it should be possible to write a small C++ program that starts your application keeps a handle to the process, waits on the process handle, and then uses the process handle to create a minidump when the application dies.
If you have done what you claim:
Try-Catch on the Application.Run
Unhandled Domain Exceptions
Unhandled Thread Exceptions
Try Catch handlers in all threads
Then you would have caught the exception except perhaps if it is being thrown by a third party or COM component.
You certainly haven't given enough information.
What events does the client say leads up to the exception?
What COM or third party components do you use? (Do you properly instance and reference these components? Do you pass valid arguments to COM function calls?)
Do you make use of any un-managed - un-safe code?
Are you positive that you have all throw-capable calls covered with try-catch?
I'm just saying that no-one can offer you any helpful advice unless you post a heck of lot more information and even at that we probably can only speculate as to the source of you problem.
Have a set of fresh eyes look at your code.
Some errors cannot be caught by logging.
See this similar question for more details:
StackOverflowException in .NET
Here's a link explaining asynchronous exceptions (and why you can't recover from them):
http://www.bluebytesoftware.com/blog/PermaLink.aspx?guid=c1898a31-a0aa-40af-871c-7847d98f1641