I recently had a stack overflow exception in my .NET app (asp.net website), which I know because it showed up in my EventLog. I understand that a StackOverflow exception cannot be caught or handled, but is there a way to log it before it kills your application? I have 100k lines of code. If I knew the stack trace, or just part of the stack trace, I could track down the source of the infinite loop/recursion. But without any helpful diagnostic information, it looks like it'll take a lot of guess-and-test.
I tried setting up an unhandled exception handler on my app domain (in global.asax), but that didn't seem to execute, which makes sense if a stack overflow is supposed to terminate and not run any catch/finally blocks.
Is there any way to log these, or is there some secret setting that I can enable that will save the stack frames to disk when the app crashes?
Your best bet is to use ADPlus.
The Windows NT Kernel Debugging Team has a nice blog post on how to catch CLR exceptions with it. There are a lot of details on that there about different configuration options for it.
ADPlus will monitor the application. You can specify that it run in crash mode so you can tell ADPlus to take a memory dump right as the StackOverflowException is happening.
Once you have a memory dump, you can use WinDbg and a managed debugging extension like Psscor to see what was going on during the stack overflow.
Very wise to ask about StackOverFlowException on StackOverflow :)
AFAIK, about the only thing you can do is get a dump on a crash. The CLR is going to take you down unless you host your own CLR.
A related question on getting a dump of IIS worker process on crash:
Getting IIS Worker Process Crash dumps
Here's a couple other SO related threads:
How to print stack trace of StackOverflowException
C# catch a stack overflow exception
Hope these pointers help.
You could try logging messages at key points in your application and take out the part where the flow of log messages break.
Then, try and replicate it with that section of code using unit tests.
The only way to get a trace for a stack overflow error that I know of is to have an intermediary layer taking snapshots of the running code and writing that out. This always has a heavy performance impact.
Related
I have a large WPF application which also uses C++ libraries for some functionality.
From time to time the application crashes due to a unhandled exception or access violation in the C++ code. EDIT: it sometime also crashes on the main thread due to unhandled C# exception.
I already employ the following handlers to log information about the crash:
DispatcherUnhandledException
TaskScheduler.UnobservedTaskException
AppDomain.CurrentDomain.UnhandledException
(EDIT: I register to these events very similar to this example: https://stackoverflow.com/a/46804709/2523211)
However, if I enabled dump file creation and these functions are reached (i.e. in an unhandled exception scenario) then the stack of the original exception has already unwound itself and there is no way for me to inspect the call stack along with the memory and threads at the moment of the error itself.
I can obviously see the stack trace itself in the exception that I get as argument to those handlers, but that is as far as that goes and I wish to see the state of my code when the exception was thrown.
(EDIT: The call stack just shows that I'm in a dispatcher frame and I cannot inspect the variables and other memory state of the application at the moment of the exception. I can use the data from the exception itself and see the call stack from it, but that's not enough to reproduce or really understand why the exception happened)
If I don't subscribe to those events then nothing changes, I still can't see the original call stack in the dump file. (EDIT: because I only get a call stack in the dispatcher)
I've also tried to set e.Handled = false but I think that that is the default value anyways.
Is there someway to indicate to WPF's dispatcher or maybe the issue is somewhere else, so that if an exception does happen, let it propagate all the way up such that when a dump is created for it, I will be able to have a helpful dump file?
What you are asking for is impossible to do within dotnet. You want to handle an exception without the stack being unwound, and dump your core for later analysis.
You can however do this outside of dotnet using the WinDbg APIs. Luckily, there is a wrapper for the WinDbg APIs called clrmd.
Just implement IDebugEventCallbacks.Exception and then call WriteDumpFile.
See https://github.com/microsoft/clrmd/blob/124b189a2519c4f13610c6bf94e516243647af2e/src/TestTasks/src/Debugger.cs for more details.
However, you should note. This solution would be attaching a debugger to production. Thus has the associated costs.
If you are just trying to collect information to solve the problem some options include
Use ProcDump https://learn.microsoft.com/en-us/sysinternals/downloads/procdump This can be configured to dump on 1st chance and 2nd chance exceptions, collecting multiple dump files if required i.e. allow app to keep running during 1st chance exceptions while capturing the app state in a dump file. Extensive configuration is possible via cmd line i.e. only handle specific exceptions etc, number of dump files to generate.
Use Time Travel Debugging feature of WinDbg, in this case you can walk backwards through your trace to find cause. This will severely degrade performance and if it takes more than several minutes to reproduce your issue these traces can quickly get very massive. If you copy the TTD folder out of WinDbg installation you can use it via command line to do a "ring mode" trace and specify a maximum buffer size which is more suitable if it takes a long time to reproduce issue, finding the options by running ttd.exe with no parameters.
Create a parent process that attaches as a debugger and will give you near full control over how you to decide to handle exceptions in the child process. Example here of creating a basic debugger
I have a big and complex process that runs on a production environment that's basically a WPF user interface developed in C#. It also hosts threads and DLL's written in C++ unmanaged and managed code.
Typically, if an exception raises, it's caught and the related stack dump is written in a log file for post-mortem debugging purposes. Unfortunately, from time to time, the application crashes without writing any information in the log so we have no clue about who's causing the crash.
Does anybody know how to detect and eventually trace all the causes that make the application crash and are not detected with a simple try-catch block?
To give an example I saw that StackOverflow Exception is not caught and also errors like 0xc0000374 coming from unmanaged code are not detected. This is not a matter of debugging it. I know I can attach a debugger to the system and try to reproduce the problem. But as I told this is a production system and I have to analyze issues coming from the field after the problem occurred.
Unlike C# exceptions, C++ exceptions do not catch hardware exceptions such as access violations or stack overflows since C++ apps run unmanaged and directly on the cpu.
For post-crash analysis I would suggest using something like breakpad. breakpad will create a dump file which will give you very useful information such as a call-stack, running threads and stack/heap memory depending on your configuration.
This way you would not need to witness the crash happening or even try to reproduce it which I know from experience can be very difficult. All you would need is a way to retrieve these crash dumps from your users devices.
You can log exception by subscribing to AppDomain.UnhandledException event. Its args.ExceptionObject argument is of type object and is designed not to be limited by C# exceptions, so you can call ToString method to log it somewhere.
Also check MSDN docs for its limitations. For instance:
Starting with the .NET Framework 4, this event is not raised for exceptions that corrupt the state of the process, such as stack overflows or access violations, unless the event handler is security-critical and has the HandleProcessCorruptedStateExceptionsAttribute attribute.
Solved ! I followed Mohamad Elghawi suggestion and I integrated breakpad. After I struggled a lot in order to make it compiling, linking and working under Visual Studio 2008, I was able to catch critical system exceptions and generate a crash dump. I'm also able to generate a dump on demand, this is useful when the application stuck for some reason and I'm able to identify this issue with an external thread that monitors all the others.
Pay attention ! The visual studio solution isn't included in the git repo and the gyp tool, in contradiction as wrongly mentioned in some threads, it's also not there. You have to download the gyp tool separately and work a bit on the .gyp files inside the breadpad three in order to generate a proper solution. Furthermore some include files and definitions are missing if you want to compile it in Visual Studio 2008 so you have also to manage this.
Thanks guys !
I am trying to figure out a way to debug exceptions that I have received in Azure's Application insights.
I am new to this type of debugging since I've only really dealt with bugs in Visual Studio, where an active debugger is running. However, with Application Insights, there are null reference exceptions which only provide a call stack, and no useful exception message.
Exception Message: Arg_NullReferenceException
Callstack:at SharedLibrary!<BaseAddress>+0x68d4c5
--- End of stack trace from previous location where exception was thrown ---
at SharedLibrary!<BaseAddress>+0x329115
at SharedLibrary!<BaseAddress>+0x329207
at SharedLibrary!<BaseAddress>+0x34d603
Other Exceptions have messages such as Excep_FromHResult 0x800455A0,
While others actually show the methods they trace back to.
Is there a way to find where these exceptions came from deciphering the callstack or Base Address or HResult?
This will be very useful in eliminating bugs in my app.
UPDATE: This is now supported by the HockeyApp telemetry stack. See: http://support.hockeyapp.net/kb/client-integration-windows-and-windows-phone/crash-reporting-for-uwp
When your application is compiled with .NET Native, the resulting binary doesn't contain all of the rich metadata that is normally available to .NET applications. (You get the same behavior if you call Environment.StackTrace when compiled with .NET Native.) We do write all of that data into the pdb file that is generated but it's not available at runtime.
The solution here is to post facto rebuild your stacks using the information from the pdb files. I know the AppInsights team had this on their backlog but it doesn't seem to have happened. We have some diagnostic tools that we're attempting to get published so you can do this re-combobulation yourself but there's a bit of a morass getting them published.
If you send a mail to dotnetnative#microsoft.com describing this issue it may help to grease some wheels.
I have inherited a large and complex C# windows service project that crashes every now and then. The logging system is not logging any messages which I initially thought strange but I now understand that logging might fail if there's a stack overflow or out-of-memory exception.
So one of the tasks that I have is to try and find any recursive functions that might blow the stack. Is there any tooling in VS2010 or other code analysis software that would help detect recursive code?
As a second question: What else could cause logging to fail in a windows service?
(Project uses VS2010 but still targets .net 3.5 with C# 3.0)
Download Debug Diagnostic Tool, point it to your service and add stack overflow in the exception lists and let it run. When the service fails it will dump the memory. Open the dump in Visual Studio and check all stacks on all threads to identify the offensive code. You might need the original debugging symbols for your service to get intelligible inforamtion.
More about memory dumps debugging with VS2010 here. More about debugging this kind of problems with Tess Ferrandez watch this
Update: Tutorial on a stack overflow exception with details. It is based on a web app in IIS but you can easily apply the same technique to a service, it is just the way you take the memory dump that is different.
HTH
Are you attaching to the AppDomain.UnHandledException event? It should raise an event if an unhandled exception occurs. Also, have you checked the Eventlog?
It's very difficult to try and guess what could cause your service to crash. If you are attached to the event I mentioned then I guess it could only really be one of a few events, a StackOverflow exception being one. If you're not attaching to that event it could be anything.
If you're really at a loss you can always try to run the service as a console application from within Visual Studio. Visual Studio should then show you the error if it does occur. This is not always possible depending on your environment.
I am using Visual Studio 2010, and coding in C#. I have a third-party dll that I am using in my project. When I attempt to use a specific method, at seemingly random occasions, the program simply crashes, with no exception thrown. The session simply ends. Is there any way I can trace what is going on?
The way the stack for a thread is laid out in Windows goes like this (roughly; this is not an exact description of everything that goes on, just enough to give you the gist. And the way the CLR handles stack pages is somewhat different than how unmanaged code handles it also.)
At the top of the stack there are all the committed pages that you are using. Then there is a "guard page" - if you hit that page then the guard page becomes a new page of stack, and the following page becomes the new guard page. However, the last page of stack is special. If you hit it once, you get a stack overflow exception. If you hit it twice then the process is terminated immediately. By "immediately" I mean "immediately" - no exception, go straight to jail, do not pass go, do not collect $200. The operating system reasons that at this point the process is deeply diseased and possibly it has become actively hostile to the user. The stack has overflowed and the code that is overflowing the stack might be trying to write arbitrarily much garbage into memory. (*)
Since the process is potentially a hazard to itself and others, the operating system takes it down without allowing any more code to run.
My suspicion is that something in your unmanaged code is hitting the final stack page twice. Almost every time I see a process suddenly disappear with no exception or other explanation its because the "don't mess with me" stack page was hit.
(*) Back in the early 1990s I worked on database drivers for a little operating system called NetWare. It did not have these sorts of protections that more modern operating systems now have routinely. I needed to be able to "switch stacks" dynamically while running at kernel protection level; I knew when my driver had accidentally blown the stack because it would eventually write into screen memory and I could then debug the problem by looking at what garbage had been written directly to the screen. Ah, those were the days.
Have you checked the Windows Event Log? You can access that in the Admin Tools menu > Event Viewer. Check in the Application and System logs particularly.
Try to force the debugger to catch even handled exceptions - especially the bad ones like Access Violation and Stack Overflow. You can do this in Debug -> Exceptions.
It is possible that the third-party DLL catches all exceptions and then calls exit() or some similar beauty which quits the whole program.
If your third-party dll is managed, using Runtime Flow (developed by me) you can see what happens inside of it before the crash - a stack overflow, a forceful exit or an exception will be clearly identifiable.