WPF crash dump with original exception call stack and memory - c#

I have a large WPF application which also uses C++ libraries for some functionality.
From time to time the application crashes due to a unhandled exception or access violation in the C++ code. EDIT: it sometime also crashes on the main thread due to unhandled C# exception.
I already employ the following handlers to log information about the crash:
DispatcherUnhandledException
TaskScheduler.UnobservedTaskException
AppDomain.CurrentDomain.UnhandledException
(EDIT: I register to these events very similar to this example: https://stackoverflow.com/a/46804709/2523211)
However, if I enabled dump file creation and these functions are reached (i.e. in an unhandled exception scenario) then the stack of the original exception has already unwound itself and there is no way for me to inspect the call stack along with the memory and threads at the moment of the error itself.
I can obviously see the stack trace itself in the exception that I get as argument to those handlers, but that is as far as that goes and I wish to see the state of my code when the exception was thrown.
(EDIT: The call stack just shows that I'm in a dispatcher frame and I cannot inspect the variables and other memory state of the application at the moment of the exception. I can use the data from the exception itself and see the call stack from it, but that's not enough to reproduce or really understand why the exception happened)
If I don't subscribe to those events then nothing changes, I still can't see the original call stack in the dump file. (EDIT: because I only get a call stack in the dispatcher)
I've also tried to set e.Handled = false but I think that that is the default value anyways.
Is there someway to indicate to WPF's dispatcher or maybe the issue is somewhere else, so that if an exception does happen, let it propagate all the way up such that when a dump is created for it, I will be able to have a helpful dump file?

What you are asking for is impossible to do within dotnet. You want to handle an exception without the stack being unwound, and dump your core for later analysis.
You can however do this outside of dotnet using the WinDbg APIs. Luckily, there is a wrapper for the WinDbg APIs called clrmd.
Just implement IDebugEventCallbacks.Exception and then call WriteDumpFile.
See https://github.com/microsoft/clrmd/blob/124b189a2519c4f13610c6bf94e516243647af2e/src/TestTasks/src/Debugger.cs for more details.
However, you should note. This solution would be attaching a debugger to production. Thus has the associated costs.

If you are just trying to collect information to solve the problem some options include
Use ProcDump https://learn.microsoft.com/en-us/sysinternals/downloads/procdump This can be configured to dump on 1st chance and 2nd chance exceptions, collecting multiple dump files if required i.e. allow app to keep running during 1st chance exceptions while capturing the app state in a dump file. Extensive configuration is possible via cmd line i.e. only handle specific exceptions etc, number of dump files to generate.
Use Time Travel Debugging feature of WinDbg, in this case you can walk backwards through your trace to find cause. This will severely degrade performance and if it takes more than several minutes to reproduce your issue these traces can quickly get very massive. If you copy the TTD folder out of WinDbg installation you can use it via command line to do a "ring mode" trace and specify a maximum buffer size which is more suitable if it takes a long time to reproduce issue, finding the options by running ttd.exe with no parameters.
Create a parent process that attaches as a debugger and will give you near full control over how you to decide to handle exceptions in the child process. Example here of creating a basic debugger

Related

Exception thrown not caught by a `catch` block [duplicate]

NOTE ADDED AFTER SOLUTION: An AccessViolationException was being thrown inside the method called by reflection. This was the reason the TargetInvocationException couldn't be caught.
NOTE: This is OUTSIDE of the IDE. The referenced question is NOT the same.
TL;DR
Can't get a stack trace
Can't get inner exceptions
Can't use debugger (copy protection scheme of a third party library gets in the way)
Any changes to the code prevent the exception from occurring - means I can't add logging to find out where the exception occurrs
How can I get the exception to be caught or get the needed information some other way?
Long description:
I am having a problem with an exception that occurs in a method that gets called by reflection. The exception itself actually occurs in the called method, but since the method is called by reflection, the real exception gets wrapped in a System.Reflection.TargetInvocationException. No problem, just catch it and get the internal exception - except the System.Reflection.TargetInvocationException doesn't get caught. My program crashes, and I get a dump along with an entry in the Windows event log.
The Windows event log doesn't contain the internal exceptions, and neither does the dump.
I can't attach a debugger to the program because then an external library (which is needed to get to the reflection call) won't run - copy protection, don't you know.
If I put a try/catch into the offending method, the exception doesn't occur - which is bad. The cause isn't fixed, it just doesn't happen any more.
The same effect happens if I put logging into the offending method - the exception doesn't occur any more.
I can't use logging, I can't use a debugger, and in the one place where I could catch the exception and log it the exception doesn't get caught.
I'm using Visual Studio 2010 and dotnet 4.0.
To make it clear: The try/catch doesn't work when the program is run outside of Visual Studio, and I can't run it inside of Visual Studio in the debugger because then the program can't reach the point where the exception occurs. This is NOT within the IDE.
Eliminating the reflection isn't an option (I've tried it for just the one case, and the exception goes away.)
The method being called does a lot of stuff, but breaking it down into smaller methods doesn't help - the exception just goes away.
The exception doesn't occur all the time, only when I do a certain sequence of steps - and when it occurs then it is always on the second time through the whole sequence.
In the sequence I am using, the method gets called by two threads almost simultaneously - a particular bunch of data gets entered, which causes a copy of a report and another document to be printed on two separate printers - one report and document to each printer. Since generating the reports and printing the document can take a while, they are done on threads in the background so the user can continue working.
I suspect that the threads are stepping on each others toes (lots of file and database manipulations going on) but without knowing what is really happening I can't fix it.
The code below shows a simplified version of the call by reflection.
Does any one have a suggestion as to what may be causing the System.Reflection.TargetInvocationException to not be caught, or maybe an alternative way to catch the inner exception?
Try
Dim methode As System.Reflection.MethodInfo
methode = GetType(AParticularClass).GetMethod("OneOfManyMethods", Reflection.BindingFlags.NonPublic Or Reflection.BindingFlags.Static)
Dim resultobject As Object = methode.Invoke(Nothing, Reflection.BindingFlags.InvokeMethod Or Reflection.BindingFlags.NonPublic Or Reflection.BindingFlags.Static, Nothing, New Object() {SomeBooleanVariable, SomeStringVariable}, Nothing)
result = DirectCast(resultobject, DataSet)
Catch ex As Exception
'Log the error here.
End Try
Found the reason why I couldn't catch the exception:
The actual exception was an AccessViolationException, which can't be caught in dotnet 4.0 without taking special steps (How to handle AccessViolationException.)
To make things more fun, when AccessViolationException gets thrown in a method called by reflection, only a TargetInvocationException gets logged in the Windows event log, and only the TargetInvocationException is available in the dump.
Once I managed to get the real exception, I found that that it was a call to Application.DoEvents() from a non-GUI thread that caused the AccessViolation. DoEvents() can cause enough fun when called on the GUI-Thread (Use of Application.DoEvents()), let alone when called from a background thread.
Once that was fixed, I found that our third party library (the one with the copy protection) doesn't like being called simultaneously in separate instances. Fixing that was a matter of a synclock in the right place.
The code causing all of this fun was at one time all in the GUI-Thread, and was originally written back in the days of dotnet 1.1 - this explains the call to DoEvents. The code has been converted piece-wise to run in parallel in background threads through several different stages, with no one single developer having a complete overview of the process.
I am not sure I can replicate your issue, but this is how we get it and it work's just fine...
Try
'YOUR CODE'
Catch ex As Exception
'This is where we grab it from... It needs to be in this block to work...
System.Reflection.MethodInfo.GetCurrentMethod.ToString
End Try
Let me know how it work's out for you?

How to get information on a Buffer Overflow Exception in a mixed application?

In all WPF application I develop there is a global exception handler subscribed to AppDomain.CurrentDomain.UnhandledException which logs everything it can find and then shows a dialog box telling the user to contact the author, where the log file is etc. This works extremely well and both clients and me are very happy with it because it allows fixing problems fast.
However during development of a mixed WPF / C# / CLI / C++ application there are sometimes application crashes that do not make it to the aforementioned exception handler. Instead, a standard windows dialog box saying "XXX hast stopped working" pops up. In the details it shows eg
Problem Event Name: BEX
Application Name: XXX.exe
Fault Module Name: clr.dll
...
This mostly happens when calling back a managed function from within umanaged code, and when that function updates the screen. I didn't took me long to figure out the root cause of problem, but only because I can reproduce the crash at my machine and hook up the debugger: in all occasions the native thread was still at the point of calling the function pointer to the managed delegate that calls directly into C#/WPF code.
The real problem would be when this happens on a client machine: given that usually clients are not the best error reporters out there it migh take very, very long to figure out what is wrong when all they can provide me are the details above.
The question: what can I do to get more information for crashes like these? Is there a way to get an exception like this call a custom error handler anyway? Or get a process dump? When loading the symbols for msvcr100_clr0004.dll and clr.dll (loaded on the thread where the break occurrs), the call stack is like this:
msvcr100_clr0400.dll!__crt_debugger_hook()
clr.dll!___report_gsfailure() + 0xeb bytes
clr.dll!_DoJITFailFast#0() + 0x8 bytes
clr.dll!CrawlFrame::CheckGSCookies() + 0x2c3b72 bytes
Can I somehow hook some native C++ code into __crt_debugger_hook() (eg for writing a minidump)? Which leads me to an additional question: how does CheckGSCookies behave on a machine with no debugger installed, would it still call the same code?
update some clarification on the code: native C++ calls a CLI delegate (to which a native function pointer is acquired using GetFunctionPointerForDelegate which in turn calls a C# System.Action. This Action updates a string (bound to a WPF label) and raises a propertychanged event. This somehow invokes a buffer overflow (when updating very fast) in an unnamed thread that was not created directly in my code.
update looking into SetUnhandledExceptionFilter, which didn't do anything at first, I found this nifty article explaining how to catch any exception. It works, and I was able to write a minidump in an exception filter installed by using that procedure. The dump gives basically the same information as hooking the debugger: the real problem seems to be that the Status string is being overwritten (by being called from the native thread) while at the same time being read from the ui thread. All nice ans such, but it does require dll hooking which is not my favorite method to solve things. Another way would still be nice.
The .NET 4 version of the CLR has protection against buffer overflow attacks. The basic scheme is that the code writes a "cookie" at the end of an array or stack frame, then checks later if that cookie still has the original value. If it has changed, the runtime assumes that malware has compromised the program state and immediately aborts the program. This kind of abort does not go through the usual unhandled exception mechanism, that would be too exploitable.
The odds that your program is really being attacked are small of course. Much more likely is that your native code has a pointer bug and scribbles junk into a CLR structure or stack frame. Debugging that isn't easy, the damage is usually done well before the crash. One approach is to comment out code until the crash disappears.

log a StackOverflowException in .NET

I recently had a stack overflow exception in my .NET app (asp.net website), which I know because it showed up in my EventLog. I understand that a StackOverflow exception cannot be caught or handled, but is there a way to log it before it kills your application? I have 100k lines of code. If I knew the stack trace, or just part of the stack trace, I could track down the source of the infinite loop/recursion. But without any helpful diagnostic information, it looks like it'll take a lot of guess-and-test.
I tried setting up an unhandled exception handler on my app domain (in global.asax), but that didn't seem to execute, which makes sense if a stack overflow is supposed to terminate and not run any catch/finally blocks.
Is there any way to log these, or is there some secret setting that I can enable that will save the stack frames to disk when the app crashes?
Your best bet is to use ADPlus.
The Windows NT Kernel Debugging Team has a nice blog post on how to catch CLR exceptions with it. There are a lot of details on that there about different configuration options for it.
ADPlus will monitor the application. You can specify that it run in crash mode so you can tell ADPlus to take a memory dump right as the StackOverflowException is happening.
Once you have a memory dump, you can use WinDbg and a managed debugging extension like Psscor to see what was going on during the stack overflow.
Very wise to ask about StackOverFlowException on StackOverflow :)
AFAIK, about the only thing you can do is get a dump on a crash. The CLR is going to take you down unless you host your own CLR.
A related question on getting a dump of IIS worker process on crash:
Getting IIS Worker Process Crash dumps
Here's a couple other SO related threads:
How to print stack trace of StackOverflowException
C# catch a stack overflow exception
Hope these pointers help.
You could try logging messages at key points in your application and take out the part where the flow of log messages break.
Then, try and replicate it with that section of code using unit tests.
The only way to get a trace for a stack overflow error that I know of is to have an intermediary layer taking snapshots of the running code and writing that out. This always has a heavy performance impact.

Third-party dll crashes program with no exception thrown

I am using Visual Studio 2010, and coding in C#. I have a third-party dll that I am using in my project. When I attempt to use a specific method, at seemingly random occasions, the program simply crashes, with no exception thrown. The session simply ends. Is there any way I can trace what is going on?
The way the stack for a thread is laid out in Windows goes like this (roughly; this is not an exact description of everything that goes on, just enough to give you the gist. And the way the CLR handles stack pages is somewhat different than how unmanaged code handles it also.)
At the top of the stack there are all the committed pages that you are using. Then there is a "guard page" - if you hit that page then the guard page becomes a new page of stack, and the following page becomes the new guard page. However, the last page of stack is special. If you hit it once, you get a stack overflow exception. If you hit it twice then the process is terminated immediately. By "immediately" I mean "immediately" - no exception, go straight to jail, do not pass go, do not collect $200. The operating system reasons that at this point the process is deeply diseased and possibly it has become actively hostile to the user. The stack has overflowed and the code that is overflowing the stack might be trying to write arbitrarily much garbage into memory. (*)
Since the process is potentially a hazard to itself and others, the operating system takes it down without allowing any more code to run.
My suspicion is that something in your unmanaged code is hitting the final stack page twice. Almost every time I see a process suddenly disappear with no exception or other explanation its because the "don't mess with me" stack page was hit.
(*) Back in the early 1990s I worked on database drivers for a little operating system called NetWare. It did not have these sorts of protections that more modern operating systems now have routinely. I needed to be able to "switch stacks" dynamically while running at kernel protection level; I knew when my driver had accidentally blown the stack because it would eventually write into screen memory and I could then debug the problem by looking at what garbage had been written directly to the screen. Ah, those were the days.
Have you checked the Windows Event Log? You can access that in the Admin Tools menu > Event Viewer. Check in the Application and System logs particularly.
Try to force the debugger to catch even handled exceptions - especially the bad ones like Access Violation and Stack Overflow. You can do this in Debug -> Exceptions.
It is possible that the third-party DLL catches all exceptions and then calls exit() or some similar beauty which quits the whole program.
If your third-party dll is managed, using Runtime Flow (developed by me) you can see what happens inside of it before the crash - a stack overflow, a forceful exit or an exception will be clearly identifiable.

How to catch exceptions from another program (for logging)?

I am working on a tool that monitors a number of applications and ensures they are always running and in a clean state. Some of these applications have unhandled exceptions which do occur periodically and present the 'send crash report' window. I do not have the source code to these applications.
Is there any mechanism I could use to catch the exceptions, or simply identify their exception type, as well as identify the application's main executable file that threw the exception.
I'm not trying to do anything crazy like catch and handle it on the applications behalf, I'm simply trying to capture the exception type, log it and then restart the application.
Trapping unhandled exceptions requires calling SetUnhandledExceptionFilter() in the process. That's going to be difficult to do if you don't have source code, although it is technically possible by injecting a DLL into the process. This however cannot be done with managed code, you can't get the CLR initialized properly.
The default unhandled exception handler that Windows installs will always run WerFault.exe, the Windows Error Reporting tool. That can be turned off but that's a system setting. Expecting your user or admin to do this is not realistic. Only after WER runs will the JIT debugger get a shot at it.
I recommend a simpler approach, one that's also much more selective. Use the Process class to get the program you're interested in protecting started. When the Exited event fires, use the ExitCode property to find out how it terminated. Any negative value is a sure sign that the process died on an unhandled exception, the exit code matches the exception code. You can use the EventLog class to read the event message that WER writes to the Windows event log. And you can restart it the same way you got it started.
Without modifying the source of the application or injecting a DLL into the process I do not believe this is possible in a reliable fashion. What you're attempting to do is inspect type information across a process boundary. This is not easy to achieve for a number of reasons
Getting the name of the exception implies executing code in the target process. The name can be generated a number of ways (.ToString, or .GetType().Name) but all involve executing some method
You want to do this after the application has already crashed and hence may be in a very bad state. Consider trying to execute code if memory was corrupted (perhaps even corrupting the in memory definitions of the type).
The crash could occur in native code and not involve any managed data
If you want to monitor application crashes system wide, you can register yourself as a just-in-time debugger. You can edit the registry to specify which debugger to run when an application crashes. The example they give is Doctor Watson, but it could be your application instead.

Categories