Third-party dll crashes program with no exception thrown - c#

I am using Visual Studio 2010, and coding in C#. I have a third-party dll that I am using in my project. When I attempt to use a specific method, at seemingly random occasions, the program simply crashes, with no exception thrown. The session simply ends. Is there any way I can trace what is going on?

The way the stack for a thread is laid out in Windows goes like this (roughly; this is not an exact description of everything that goes on, just enough to give you the gist. And the way the CLR handles stack pages is somewhat different than how unmanaged code handles it also.)
At the top of the stack there are all the committed pages that you are using. Then there is a "guard page" - if you hit that page then the guard page becomes a new page of stack, and the following page becomes the new guard page. However, the last page of stack is special. If you hit it once, you get a stack overflow exception. If you hit it twice then the process is terminated immediately. By "immediately" I mean "immediately" - no exception, go straight to jail, do not pass go, do not collect $200. The operating system reasons that at this point the process is deeply diseased and possibly it has become actively hostile to the user. The stack has overflowed and the code that is overflowing the stack might be trying to write arbitrarily much garbage into memory. (*)
Since the process is potentially a hazard to itself and others, the operating system takes it down without allowing any more code to run.
My suspicion is that something in your unmanaged code is hitting the final stack page twice. Almost every time I see a process suddenly disappear with no exception or other explanation its because the "don't mess with me" stack page was hit.
(*) Back in the early 1990s I worked on database drivers for a little operating system called NetWare. It did not have these sorts of protections that more modern operating systems now have routinely. I needed to be able to "switch stacks" dynamically while running at kernel protection level; I knew when my driver had accidentally blown the stack because it would eventually write into screen memory and I could then debug the problem by looking at what garbage had been written directly to the screen. Ah, those were the days.

Have you checked the Windows Event Log? You can access that in the Admin Tools menu > Event Viewer. Check in the Application and System logs particularly.

Try to force the debugger to catch even handled exceptions - especially the bad ones like Access Violation and Stack Overflow. You can do this in Debug -> Exceptions.
It is possible that the third-party DLL catches all exceptions and then calls exit() or some similar beauty which quits the whole program.

If your third-party dll is managed, using Runtime Flow (developed by me) you can see what happens inside of it before the crash - a stack overflow, a forceful exit or an exception will be clearly identifiable.

Related

WPF crash dump with original exception call stack and memory

I have a large WPF application which also uses C++ libraries for some functionality.
From time to time the application crashes due to a unhandled exception or access violation in the C++ code. EDIT: it sometime also crashes on the main thread due to unhandled C# exception.
I already employ the following handlers to log information about the crash:
DispatcherUnhandledException
TaskScheduler.UnobservedTaskException
AppDomain.CurrentDomain.UnhandledException
(EDIT: I register to these events very similar to this example: https://stackoverflow.com/a/46804709/2523211)
However, if I enabled dump file creation and these functions are reached (i.e. in an unhandled exception scenario) then the stack of the original exception has already unwound itself and there is no way for me to inspect the call stack along with the memory and threads at the moment of the error itself.
I can obviously see the stack trace itself in the exception that I get as argument to those handlers, but that is as far as that goes and I wish to see the state of my code when the exception was thrown.
(EDIT: The call stack just shows that I'm in a dispatcher frame and I cannot inspect the variables and other memory state of the application at the moment of the exception. I can use the data from the exception itself and see the call stack from it, but that's not enough to reproduce or really understand why the exception happened)
If I don't subscribe to those events then nothing changes, I still can't see the original call stack in the dump file. (EDIT: because I only get a call stack in the dispatcher)
I've also tried to set e.Handled = false but I think that that is the default value anyways.
Is there someway to indicate to WPF's dispatcher or maybe the issue is somewhere else, so that if an exception does happen, let it propagate all the way up such that when a dump is created for it, I will be able to have a helpful dump file?
What you are asking for is impossible to do within dotnet. You want to handle an exception without the stack being unwound, and dump your core for later analysis.
You can however do this outside of dotnet using the WinDbg APIs. Luckily, there is a wrapper for the WinDbg APIs called clrmd.
Just implement IDebugEventCallbacks.Exception and then call WriteDumpFile.
See https://github.com/microsoft/clrmd/blob/124b189a2519c4f13610c6bf94e516243647af2e/src/TestTasks/src/Debugger.cs for more details.
However, you should note. This solution would be attaching a debugger to production. Thus has the associated costs.
If you are just trying to collect information to solve the problem some options include
Use ProcDump https://learn.microsoft.com/en-us/sysinternals/downloads/procdump This can be configured to dump on 1st chance and 2nd chance exceptions, collecting multiple dump files if required i.e. allow app to keep running during 1st chance exceptions while capturing the app state in a dump file. Extensive configuration is possible via cmd line i.e. only handle specific exceptions etc, number of dump files to generate.
Use Time Travel Debugging feature of WinDbg, in this case you can walk backwards through your trace to find cause. This will severely degrade performance and if it takes more than several minutes to reproduce your issue these traces can quickly get very massive. If you copy the TTD folder out of WinDbg installation you can use it via command line to do a "ring mode" trace and specify a maximum buffer size which is more suitable if it takes a long time to reproduce issue, finding the options by running ttd.exe with no parameters.
Create a parent process that attaches as a debugger and will give you near full control over how you to decide to handle exceptions in the child process. Example here of creating a basic debugger

StackOverflowException in .NET >= 4.0 - give other threads chance to gracefully exit

Is there a way how to at least postpone termination of managed app (by few dozens of milliseconds) and set some shared flag to give other threads chance to gracefully terminate (the SO thread itself wouldn't obviously execute anything further)? I'm contemplating to use JIT debugger or CLR hosting for this - I'm curios if anybody tried this before.
Why would I want to do something so wrong?:
Without too much detail - imagine this analogy - you are in a casino betting on a roulette and suddenly find out that the roulette is unreliable fake. So you want to immediately leave the casino, BUT likely want to collect your bets from the table first.
Unfortunately I cannot leverage separate process for this as there are very tight performance requirements.
Tried and didn't work:
.NET behavior for StackOverflowException (and contradicting info on MSDN) has been discussed several times on SO - to quickly sum up:
HandleProcessCorruptedStateExceptionsAttribute (e.g. on appdomain unhandled exception handler) doesn't work
ExecuteCodeWithGuaranteedCleanup doesn't work
legacyUnhandledExceptionPolicy doesn't work
There may be few other attempts how to handle StackOverflowExceptions - but it seems to be apparent that CLR terminates the whole process as is mentioned in this great answer by Hans Passant.
Considering to try:
JIT debugger - leave the thread with exception frozen, set some
shared flag (likely in pinned location) and thaw other threads for a
short time.
CLR hosting and setting unhandled exception policy
Do you have any other idea? Or any experience (successful/unsuccessful) with those two ways?
The word "fake" isn't quite the correct one for your casino analogy. There was a magnitude 9 earth quake and the casino building along with the roulette table, the remaining chips and the player disappeared in a giant cloud of smoke and dust.
The only shot you have at running code after an SOE is to stay far away from that casino, it has to run in another process. A "guard" process that starts your misbehaving program, it can use the Process.ExitCode to detect the crash. It will be -1073741571 (0xc00000fd). The process state is gone, you'll have to use one of the .NET out-of-process interop methods (like WCF, named pipes, sockets, memory-mapped file) to make the guard process aware of things that need to be done to clean up. This needs to be transactional, you cannot reason about the exact point in time that the crash occurred since it might have died while updating the guard.
Do beware that this is rarely worth the effort. Because an SOE is pretty indistinguishable from an everyday process abort. Like getting killed by Task Manager. Or the machine losing power. Or being subjected to the effects of an earth quake :)
A StackOverflowException is an immediate and critical exception from which the runtime cannot recover - that's why you can't catch it, or recover from it, or anything else. In order to run another method (whether that's a cleanup method or anything else), you have to be able to create a stack frame for that method, and the stack is already full (that's what a StackOverflowException means!). You can't run another method because running a method is what causes the exception in the first place!
Fortunately, though, this kind of exception is always caused by program structure. You should be able to diagnose and fix the error in your code: when you get the exception, you will see in your call stack that there's a loop of one or more methods recursing indefinitely. You need to identify what the faulty logic is and fix it, and that'll be a lot easier than trying to fix the unfixable exception.

How to get information on a Buffer Overflow Exception in a mixed application?

In all WPF application I develop there is a global exception handler subscribed to AppDomain.CurrentDomain.UnhandledException which logs everything it can find and then shows a dialog box telling the user to contact the author, where the log file is etc. This works extremely well and both clients and me are very happy with it because it allows fixing problems fast.
However during development of a mixed WPF / C# / CLI / C++ application there are sometimes application crashes that do not make it to the aforementioned exception handler. Instead, a standard windows dialog box saying "XXX hast stopped working" pops up. In the details it shows eg
Problem Event Name: BEX
Application Name: XXX.exe
Fault Module Name: clr.dll
...
This mostly happens when calling back a managed function from within umanaged code, and when that function updates the screen. I didn't took me long to figure out the root cause of problem, but only because I can reproduce the crash at my machine and hook up the debugger: in all occasions the native thread was still at the point of calling the function pointer to the managed delegate that calls directly into C#/WPF code.
The real problem would be when this happens on a client machine: given that usually clients are not the best error reporters out there it migh take very, very long to figure out what is wrong when all they can provide me are the details above.
The question: what can I do to get more information for crashes like these? Is there a way to get an exception like this call a custom error handler anyway? Or get a process dump? When loading the symbols for msvcr100_clr0004.dll and clr.dll (loaded on the thread where the break occurrs), the call stack is like this:
msvcr100_clr0400.dll!__crt_debugger_hook()
clr.dll!___report_gsfailure() + 0xeb bytes
clr.dll!_DoJITFailFast#0() + 0x8 bytes
clr.dll!CrawlFrame::CheckGSCookies() + 0x2c3b72 bytes
Can I somehow hook some native C++ code into __crt_debugger_hook() (eg for writing a minidump)? Which leads me to an additional question: how does CheckGSCookies behave on a machine with no debugger installed, would it still call the same code?
update some clarification on the code: native C++ calls a CLI delegate (to which a native function pointer is acquired using GetFunctionPointerForDelegate which in turn calls a C# System.Action. This Action updates a string (bound to a WPF label) and raises a propertychanged event. This somehow invokes a buffer overflow (when updating very fast) in an unnamed thread that was not created directly in my code.
update looking into SetUnhandledExceptionFilter, which didn't do anything at first, I found this nifty article explaining how to catch any exception. It works, and I was able to write a minidump in an exception filter installed by using that procedure. The dump gives basically the same information as hooking the debugger: the real problem seems to be that the Status string is being overwritten (by being called from the native thread) while at the same time being read from the ui thread. All nice ans such, but it does require dll hooking which is not my favorite method to solve things. Another way would still be nice.
The .NET 4 version of the CLR has protection against buffer overflow attacks. The basic scheme is that the code writes a "cookie" at the end of an array or stack frame, then checks later if that cookie still has the original value. If it has changed, the runtime assumes that malware has compromised the program state and immediately aborts the program. This kind of abort does not go through the usual unhandled exception mechanism, that would be too exploitable.
The odds that your program is really being attacked are small of course. Much more likely is that your native code has a pointer bug and scribbles junk into a CLR structure or stack frame. Debugging that isn't easy, the damage is usually done well before the crash. One approach is to comment out code until the crash disappears.

log a StackOverflowException in .NET

I recently had a stack overflow exception in my .NET app (asp.net website), which I know because it showed up in my EventLog. I understand that a StackOverflow exception cannot be caught or handled, but is there a way to log it before it kills your application? I have 100k lines of code. If I knew the stack trace, or just part of the stack trace, I could track down the source of the infinite loop/recursion. But without any helpful diagnostic information, it looks like it'll take a lot of guess-and-test.
I tried setting up an unhandled exception handler on my app domain (in global.asax), but that didn't seem to execute, which makes sense if a stack overflow is supposed to terminate and not run any catch/finally blocks.
Is there any way to log these, or is there some secret setting that I can enable that will save the stack frames to disk when the app crashes?
Your best bet is to use ADPlus.
The Windows NT Kernel Debugging Team has a nice blog post on how to catch CLR exceptions with it. There are a lot of details on that there about different configuration options for it.
ADPlus will monitor the application. You can specify that it run in crash mode so you can tell ADPlus to take a memory dump right as the StackOverflowException is happening.
Once you have a memory dump, you can use WinDbg and a managed debugging extension like Psscor to see what was going on during the stack overflow.
Very wise to ask about StackOverFlowException on StackOverflow :)
AFAIK, about the only thing you can do is get a dump on a crash. The CLR is going to take you down unless you host your own CLR.
A related question on getting a dump of IIS worker process on crash:
Getting IIS Worker Process Crash dumps
Here's a couple other SO related threads:
How to print stack trace of StackOverflowException
C# catch a stack overflow exception
Hope these pointers help.
You could try logging messages at key points in your application and take out the part where the flow of log messages break.
Then, try and replicate it with that section of code using unit tests.
The only way to get a trace for a stack overflow error that I know of is to have an intermediary layer taking snapshots of the running code and writing that out. This always has a heavy performance impact.

Uncatchable AccesViolationException

I'm getting close to desperate.. I am developing a field service application for Windows Mobile 6.1 using C# and quite some p/Invoking. (I think I'm referencing about 50 native functions)
On normal circumstances this goes without any problem, but when i start stressing the GC i'm getting a nasty 0xC0000005 error witch seems uncatchable. In my test i'm rapidly closing and opening a dialog form (the form did make use of native functions, but for testing i commented these out) and after a while the Windows Mobile error reporter comes around to tell me that there was an fatal error in my application.
My code uses a try-catch around the Application.Run(masterForm); and hooks into the CurrentDomain.UnhandledException event, but the application still crashes. Even when i attach the debugger, visual studio just tells me "The remote connection to the device has been lost" when the exception occurs..
Since I didn't succeed to catch the exception in the managed environment, I tried to make sense out of the Error Reporter log file. But this doesn't make any sense, the only consistent this about the error is the application where it occurs in.
The thread where the application occurs in is unknown to me, the module where the error occurs differs from time to time (I've seen my application.exe, WS2.dll, netcfagl3_5.dll and mscoree3_5.dll), even the error code is not always the same. (most of the time it's 0xC0000005, but i've also seen an 0X80000002 error, which is a warning accounting the first byte?)
I tried debugging through bugtrap, but strangely enough this crashes with the same error code (0xC0000005). I tried to open the kdmp file with visual studio, but i can't seem to make any sense out of this because it only shows me disassembler code when i step into the error (unless i have the right .pbb files, which i don't). Same goes for WinDbg.
To make a long story short: I frankly don't have a single clue where to look for this error, and I'm hoping some bright soul on stackoverflow does. I'm happy to provide some code but at this moment I don't know which piece to provide..
Any help is greatly appreciated!
[EDIT May 3rd 2010]
As you can see in my comment to Hans I retested the whole program after I uncommented all P/Invokes, but that did not solve my problem. I tried reproducing the error with as little code as possible and eventually it looks like multi-threaded access is the one giving me all the problems.
In my application I have a usercontrol that functions as a finger / flick scroll list. In this control I use a bitmap for each item in the list as a canvas. Drawing on this canvas is handled by a separate thread and when i disable this thread, the error seems to disappear.. I'll do some more tests on this and will post the results here.
Catching this exception is not an option. It is the worst kind of heart attack a thread can suffer, the CPU has detected a serious problem and cannot continue running code. This is invariably caused by misbehaving unmanaged code, it sounds like you've got plenty of it running in your program. You need to focus on debugging that unmanaged code to get somewhere.
The two most common causes of an AV are
Heap corruption. The unmanaged code has written data to the heap improperly, destroying the structural integrity of the heap. Typically caused by overflowing the boundary of an allocated block of memory. Or using a heap block after it was freed. Very hard to diagnose, the exception will be raised long after the damage was done.
Stack corruption. Most typically caused by overflowing the boundaries of an array that was allocated on the stack. This can overwrite the values of other variables on the stack or destroy the function return address. A bit easier to diagnose, it tends to repeat well and has an immediate effect. One side-effect is that the debugger loses its ability to display the call stack right after the damage was done.
Heap corruption is the likely one and the hard one. This is most typically tackled by debugging the code in the debug build with a debug allocator that watches the integrity of the heap. The <crtdbg.h> header provides one. It's not a guaranteed approach, you can have some really nasty Heisenbugs that only rear their head in the Release build. Very few options available then, other than careful code review. Good luck, you'll need it.
It turns out to be an exception caused by Interlocked.
In my code there is an integer _drawThreadIsRunning which is set to 1 when the draw-thread is running, and set to 0 otherwise. I set this value using Interlocked:
if (Interlocked.Exchange(ref _drawThreadIsRunning, 1) == 0) { /* run thread */ }
When i change this line the whole thing works, so it seems that there is a problem with threadsafety somewhere, but i can't figure it out. (ie. i don't want to waste more time figuring it out)
Thanks for the help guys!

Categories