Uncatchable AccesViolationException - c#

I'm getting close to desperate.. I am developing a field service application for Windows Mobile 6.1 using C# and quite some p/Invoking. (I think I'm referencing about 50 native functions)
On normal circumstances this goes without any problem, but when i start stressing the GC i'm getting a nasty 0xC0000005 error witch seems uncatchable. In my test i'm rapidly closing and opening a dialog form (the form did make use of native functions, but for testing i commented these out) and after a while the Windows Mobile error reporter comes around to tell me that there was an fatal error in my application.
My code uses a try-catch around the Application.Run(masterForm); and hooks into the CurrentDomain.UnhandledException event, but the application still crashes. Even when i attach the debugger, visual studio just tells me "The remote connection to the device has been lost" when the exception occurs..
Since I didn't succeed to catch the exception in the managed environment, I tried to make sense out of the Error Reporter log file. But this doesn't make any sense, the only consistent this about the error is the application where it occurs in.
The thread where the application occurs in is unknown to me, the module where the error occurs differs from time to time (I've seen my application.exe, WS2.dll, netcfagl3_5.dll and mscoree3_5.dll), even the error code is not always the same. (most of the time it's 0xC0000005, but i've also seen an 0X80000002 error, which is a warning accounting the first byte?)
I tried debugging through bugtrap, but strangely enough this crashes with the same error code (0xC0000005). I tried to open the kdmp file with visual studio, but i can't seem to make any sense out of this because it only shows me disassembler code when i step into the error (unless i have the right .pbb files, which i don't). Same goes for WinDbg.
To make a long story short: I frankly don't have a single clue where to look for this error, and I'm hoping some bright soul on stackoverflow does. I'm happy to provide some code but at this moment I don't know which piece to provide..
Any help is greatly appreciated!
[EDIT May 3rd 2010]
As you can see in my comment to Hans I retested the whole program after I uncommented all P/Invokes, but that did not solve my problem. I tried reproducing the error with as little code as possible and eventually it looks like multi-threaded access is the one giving me all the problems.
In my application I have a usercontrol that functions as a finger / flick scroll list. In this control I use a bitmap for each item in the list as a canvas. Drawing on this canvas is handled by a separate thread and when i disable this thread, the error seems to disappear.. I'll do some more tests on this and will post the results here.

Catching this exception is not an option. It is the worst kind of heart attack a thread can suffer, the CPU has detected a serious problem and cannot continue running code. This is invariably caused by misbehaving unmanaged code, it sounds like you've got plenty of it running in your program. You need to focus on debugging that unmanaged code to get somewhere.
The two most common causes of an AV are
Heap corruption. The unmanaged code has written data to the heap improperly, destroying the structural integrity of the heap. Typically caused by overflowing the boundary of an allocated block of memory. Or using a heap block after it was freed. Very hard to diagnose, the exception will be raised long after the damage was done.
Stack corruption. Most typically caused by overflowing the boundaries of an array that was allocated on the stack. This can overwrite the values of other variables on the stack or destroy the function return address. A bit easier to diagnose, it tends to repeat well and has an immediate effect. One side-effect is that the debugger loses its ability to display the call stack right after the damage was done.
Heap corruption is the likely one and the hard one. This is most typically tackled by debugging the code in the debug build with a debug allocator that watches the integrity of the heap. The <crtdbg.h> header provides one. It's not a guaranteed approach, you can have some really nasty Heisenbugs that only rear their head in the Release build. Very few options available then, other than careful code review. Good luck, you'll need it.

It turns out to be an exception caused by Interlocked.
In my code there is an integer _drawThreadIsRunning which is set to 1 when the draw-thread is running, and set to 0 otherwise. I set this value using Interlocked:
if (Interlocked.Exchange(ref _drawThreadIsRunning, 1) == 0) { /* run thread */ }
When i change this line the whole thing works, so it seems that there is a problem with threadsafety somewhere, but i can't figure it out. (ie. i don't want to waste more time figuring it out)
Thanks for the help guys!

Related

COM Add-in: Resolve the error DisconnectedContext in WinWord.exe

I built an add-on to Microsoft Word. When the user clicks a button, it runs a number of processes that export a list of Microsoft Word documents to Filtered HTML. This works fine.
Where the code falls down is in processing large amounts of files. After the file conversions are done and I call the next function, the app crashes and I get this information from Visual Studio:
Managed Debugging Assistant 'DisconnectedContext' has detected a problem in 'C:\Program Files\Microsoft Office\root\Office16\WINWORD.EXE'.
Additional information: Transition into COM context 0x56255b88 for
this RuntimeCallableWrapper failed with the following error: System
call failed. (Exception from HRESULT: 0x80010100
(RPC_E_SYS_CALL_FAILED)). This is typically because the COM context
0x56255b88 where this RuntimeCallableWrapper was created has been
disconnected or it is busy doing something else. Releasing the
interfaces from the current COM context (COM context 0x56255cb0). This
may cause corruption or data loss. To avoid this problem, please
ensure that all COM contexts/apartments/threads stay alive and are
available for context transition, until the application is completely
done with the RuntimeCallableWrappers that represents COM components
that live inside them.
After some testing, I realized that if I simply remove all the code after the file conversions, there are no problems. To resolve this, I place the remainder of my code in yet another button.
The problem is I don't want to give the user two buttons. After reading various other threads, it sounds like my code has a memory or threading issue. The answers I am reading do not help me truly understand what to do next.
I feel like this is what I want to do:
1- Run conversion.
2- Close thread/cleanup memory issue from conversion.
3- Continue running code.
Unfortunately, I really don't know how to do #2 or if it is even possible. Your help is very much appreciated.
or it is busy doing something else
The managed debugging assistant diagnostic you got is pretty gobbledygooky but that's the part of the message that accurately describes the real problem. You have a firehose problem, the 3rd most common issue associated with threading. The mishap is hard to diagnose because this goes wrong inside the Word plumbing and not your code.
Trying not to commit the same gobbledygook sin myself, what goes wrong is that the interop calls you make into the Office program are queued, waiting for their turn to get executed. The underlying "system call" that the error code hints at is PostMessage(). Wherever there is a queue, there is a risk that the queue gets too large. Happens when the producer (your program) is adding items too the queue far faster than the consumer (the Office program) removes them. The firehose problem. Unless the producer slows down, the queue will grow without bounds and something is going to fail if it is allowed to grow endlessly, at a minimum the process runs out of memory.
It is not allowed to get close to that problem. The underlying queue that PostMessage() uses is protected by the OS. Windows fails the call when the queue already contains 10,000 messages. That's a fatal error that RPC does not know how to recover from, or rather should not try to recover from. Something is amiss and it isn't pretty. It returns an error code to your program to tell you about it. That's RPC_E_SYS_CALL_FAILED. Nothing much better happens in your program, the CLR doesn't know how to recover from it either, nor does your code. So the show is over, the interop call you made got lost and was not executed by Word.
Finding a completely reliable workaround for this awkward problem is not that straight-forward. Beware that this can happen on any interop call, so catching the exception and trying again is pretty drastically unpractical. But do keep in mind that the Q+D fix is very simple. The plain problem is that your program is running too fast, slowing it down with a Thread.Sleep() or Task.Delay() call is quite crude but will always fix the issue. Well, assuming you delay enough.
I think, but don't know for a fact because nobody ever posts repro code, that this issue is also associated with using a console mode app or a worker thread in your program. If it is a console mode app then try applying the [STAThread] attribute to your Main() method. If it is a worker thread then call Thread.SetApartmentState() before starting the thread, but beware it is very important to also create the Application interface on that worker thread. Not otherwise a workaround for an add-in.
If neither of those workarounds is effective or too unpractical then consider that you can automagically slow your program down, and ensure the queue is emptied, by occasionally reading something back from the Office program. Something silly, any property getter call will do. Necessarily you can't get the property value until the Office program catches up. That can still fail, there is also a 60 second time-out on the interop call. But that's something you can fix, you can call CoRegisterMessageFilter() in your program to install a callback that runs when the timeout trips. Very gobbledygooky as well, but the cut-and-paste code is readily available.

StackOverflowException in .NET >= 4.0 - give other threads chance to gracefully exit

Is there a way how to at least postpone termination of managed app (by few dozens of milliseconds) and set some shared flag to give other threads chance to gracefully terminate (the SO thread itself wouldn't obviously execute anything further)? I'm contemplating to use JIT debugger or CLR hosting for this - I'm curios if anybody tried this before.
Why would I want to do something so wrong?:
Without too much detail - imagine this analogy - you are in a casino betting on a roulette and suddenly find out that the roulette is unreliable fake. So you want to immediately leave the casino, BUT likely want to collect your bets from the table first.
Unfortunately I cannot leverage separate process for this as there are very tight performance requirements.
Tried and didn't work:
.NET behavior for StackOverflowException (and contradicting info on MSDN) has been discussed several times on SO - to quickly sum up:
HandleProcessCorruptedStateExceptionsAttribute (e.g. on appdomain unhandled exception handler) doesn't work
ExecuteCodeWithGuaranteedCleanup doesn't work
legacyUnhandledExceptionPolicy doesn't work
There may be few other attempts how to handle StackOverflowExceptions - but it seems to be apparent that CLR terminates the whole process as is mentioned in this great answer by Hans Passant.
Considering to try:
JIT debugger - leave the thread with exception frozen, set some
shared flag (likely in pinned location) and thaw other threads for a
short time.
CLR hosting and setting unhandled exception policy
Do you have any other idea? Or any experience (successful/unsuccessful) with those two ways?
The word "fake" isn't quite the correct one for your casino analogy. There was a magnitude 9 earth quake and the casino building along with the roulette table, the remaining chips and the player disappeared in a giant cloud of smoke and dust.
The only shot you have at running code after an SOE is to stay far away from that casino, it has to run in another process. A "guard" process that starts your misbehaving program, it can use the Process.ExitCode to detect the crash. It will be -1073741571 (0xc00000fd). The process state is gone, you'll have to use one of the .NET out-of-process interop methods (like WCF, named pipes, sockets, memory-mapped file) to make the guard process aware of things that need to be done to clean up. This needs to be transactional, you cannot reason about the exact point in time that the crash occurred since it might have died while updating the guard.
Do beware that this is rarely worth the effort. Because an SOE is pretty indistinguishable from an everyday process abort. Like getting killed by Task Manager. Or the machine losing power. Or being subjected to the effects of an earth quake :)
A StackOverflowException is an immediate and critical exception from which the runtime cannot recover - that's why you can't catch it, or recover from it, or anything else. In order to run another method (whether that's a cleanup method or anything else), you have to be able to create a stack frame for that method, and the stack is already full (that's what a StackOverflowException means!). You can't run another method because running a method is what causes the exception in the first place!
Fortunately, though, this kind of exception is always caused by program structure. You should be able to diagnose and fix the error in your code: when you get the exception, you will see in your call stack that there's a loop of one or more methods recursing indefinitely. You need to identify what the faulty logic is and fix it, and that'll be a lot easier than trying to fix the unfixable exception.

program has exited with code -1073610751 (0xc0020001)

I'm getting a strange error on a SharpDX program I made.
The program contains one form MainForm, which inherits from SharpDX.Windows.RenderForm (I'm doing Direct3D 9). I have some logic that kills the program by calling MainForm.Close(), and it works perfectly.
However, when I close the form with the X button, or by double clicking the top left corner of the screen, the program ends with code -1073610751 (0xc0020001).
This is a relatively minor annoyance, because it only happens when the program is finishing, so it doesn't really matter if it exits with an error, because it is actually finishing.
However, this error does not happen when I set a breakpoint at the last line of my Main(). If I do so, and then close the window as I explained, the breakpoint gets hit, and resuming ends the program with code 0.
Apart from SharpDX and one pure C DLL I am calling to one-shot process some data, I am not doing mixed code, or any other weird stuff.
I've looked around, but this code appears to be related to string bindings? other people seem to have this problem when doing weird mixed C++/CLI stuff, but I'm not doing anything like that.
Any ideas? at least on how to get more concise information on this error code?
It is a very low-level RPC error. Which is likely to be used in your program, it is the underlying protocol on top of which COM runs. There are plenty of candidates, SharpDX itself uses the COM interop layer to make DirectX calls. And DirectX itself is very likely to make such kind of calls to your video driver.
It is also the kind of error code you'd expect to get triggered if there's a shutdown-order problem. Like using a COM interface after it was already released. Shutting down a program cleanly can be a difficult problem to solve, especially when there are lots of threads. There are in any DirectX app. It is also very easy to ignore such a problem, even if it is known and recorded in somebody's bug database. Because, as you noted, the program otherwise shuts down okay without any nasty exceptions. RPC already prevented it from blowing up, you are seeing the error code it generated.
There's very little you can do yourself about this problem, this is code you did not write and you'll never find the programmer who did. If you see a first-chance exception notification in the Output window then you could enable the unmanaged debugger, use Debug + Exceptions and tick the Thrown checkbox for Win32 exception, enable the Microsoft Symbol server and you'll get a stack trace when the exception is thrown. Beware this will be in the bowels of native code with no source to look at. But it could pin-point the DLL that's causing the problem. Still nothing you can do to fix that DLL. I'd recommend a video driver update, the most common source of trouble. That's about as far as you can take it.

Windows Application doesn't crash when debugging, but crashes otherwise

I have a windows application that calls an external .dll. After a while, there were fatal errors being brought to my attention that had to do with user marshaling. There was a source online that with that particular error I was to change my target to x86 rather than AnyCPU. I did so, and now whenever I let the app run, it will exit debug mode and crash the application. But if I set a break point immediately after the .dll call, and step over each line until I receive control of the application again, it doesn't crash. Is there anything specific that could be causing this? Has does one debug this issue?
Thanks!
Stepping code solving an issue is often a symptom of timing problems in the original code. If an external resource loads asynchronously, it will not show up on the stack of the current thread in the debugger, but will be able to be called. Stepping over code induces a delay in the flow.
Thank you all for you suggestions! Fortunately, I ended up getting it to work (with minimal understanding as to why it works) but changing the build target to specifically x86 machines rather than "AnyCPU." This was suggested by a website and can no longer find :\ Hope this helps others than run into a similar issue!
I consider the most common cause of this sort of thing to be uninitialized variables. They pick up whatever was in memory and the presence of a debugger can easily change what's in the unused part of the stack--the memory that's going to become local variables when the next routine is called. Check the DLLs code.
Note that your "fix" makes me even more suspect that this is the real answer.
(Then there's also the really crazy case of a problem with the debugger. Long ago I hit a case where the debugger had no problem loading an invalid value into a segment register if you were single stepping.)

Application crash without any kind of Exception

I have a multithreaded .Net C# application, it uses Direct3D 9/10 and XAudio2. (Direct3D is accessed by only one thread, same for XAudio2. Direct3D isn't the problem cause the error is manifest in either DX9 or DX10 mode without any change in its behaviour.)
Sometimes (there are some areas that gives this problem randomly) this application crash in a rather unspectacular way. Even if the application is started through visual studio with debugger it crash without giving ANY kind of exception. (It start by saying "applicationname.svchost.exe is crashed, etc..etc..Do you want to debug?", if I press yes it tells me "you cannot debug an application already closed.)
There is no way to find out what's the cause of the crash? Cause i've run out of ideas, the debugger isn't giving me any information at all. Without an exception I can't even do a stacktrace or a dump. :P (I'm supposing is a synchronization problem (even thought in that area I'm only doing sequential work...), but hey why isn't launching an exception? :|)
In the areas where the problem occurs I'm unloading a reloading a series of classes related to a novel (in a sequential core thread, so I doubt it can be an issue) and starting a new music through XAudio2. (BTW, what are the multithreading consideration about XAudio2? Is it safe to call from multiple thread?)
Thanks for the help.
P.S. There is a software to attach to mine to monitor all the calls and tells me what's the last call before the crash?
You should try using Windbg, analyzing the crash dump should point you to the problem, if your suspicion is right and it is a synchronization problem, the cause of the problem may be hard to spot.
Have you checked Event Logs in your Windows Administration Panel?
All error of any kind are always logged in this section with minimal details.
One time I had an application that was crashing without exceptions and the only help I found was the Event Log Viewer where I discovered that the source of the crash was a StackOverflowException.

Categories