Crash in GC finalizer thread, what's the problem with "DestroyScout"? - c#

I'm facing with a .Net server application, which crashes on an almost weekly basis on a problem in a "GC Finalizer Thread", more exactly at line 798 of "mscorlib.dll ...~DestroyScout()", according to Visual Studio.
Visual Studio also tries to open the file "DynamicILGenerator.gs". I don't have this file, but I've found a version of that file, where line 798 indeed is inside the destructor or the DestroyScout (whatever this might mean).
I have the following information in my Visual Studio environment:
Threads :
Not Flagged > 5892 0 Worker Thread GC Finalizer Thread mscorlib.dll!System.Reflection.Emit.DynamicResolver.DestroyScout.~DestroyScout
Call stack:
[Managed to Native Transition]
> mscorlib.dll!System.Reflection.Emit.DynamicResolver.DestroyScout.~DestroyScout() Line 798 C#
[Native to Managed Transition]
kernel32.dll!#BaseThreadInitThunk#12() Unknown
ntdll.dll!__RtlUserThreadStart() Unknown
ntdll.dll!__RtlUserThreadStart#8() Unknown
Locals (no way to be sure if that $exception object is correct):
+ $exception {"Exception of type 'System.ExecutionEngineException' was thrown."} System.ExecutionEngineException
this Cannot obtain value of the local variable or argument because it is not available at this instruction pointer,
possibly because it has been optimized away. System.Reflection.Emit.DynamicResolver.DestroyScout
Stack objects No CLR objects were found in the stack memory range of the current frame.
Source code of "DynamicILGenerator.cs", mentioning the DestroyScout class (line 798 is mentioned in comment):
private class DestroyScout
{
internal RuntimeMethodHandleInternal m_methodHandle;
[System.Security.SecuritySafeCritical] // auto-generated
~DestroyScout()
{
if (m_methodHandle.IsNullHandle())
return;
// It is not safe to destroy the method if the managed resolver is alive.
if (RuntimeMethodHandle.GetResolver(m_methodHandle) != null)
{
if (!Environment.HasShutdownStarted &&
!AppDomain.CurrentDomain.IsFinalizingForUnload())
{
// Somebody might have been holding a reference on us via weak handle.
// We will keep trying. It will be hopefully released eventually.
GC.ReRegisterForFinalize(this);
}
return;
}
RuntimeMethodHandle.Destroy(m_methodHandle); // <===== line 798
}
}
Watch window (m_methodHandle):
m_methodHandle Cannot obtain value of the local variable or argument because
it is not available at this instruction pointer,
possibly because it has been optimized away.
System.RuntimeMethodHandleInternal
General dump module information:
Dump Summary
------------
Dump File: Application_Server2.0.exe.5296.dmp : C:\Temp_Folder\Application_Server2.0.exe.5296.dmp
Last Write Time: 14/06/2022 19:08:30
Process Name: Application_Server2.0.exe : C:\Runtime\Application_Server2.0.exe
Process Architecture: x86
Exception Code: 0xC0000005
Exception Information: The thread tried to read from or write to a virtual address
for which it does not have the appropriate access.
Heap Information: Present
System Information
------------------
OS Version: 10.0.14393
CLR Version(s): 4.7.3920.0
Modules
-------
Module Name Module Path Module Version
----------- ----------- --------------
...
clr.dll C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll 4.7.3920.0
...
Be aware: the dump arrived on a Windows-Server 2016 computer, I'm investigating the dump on my Windows-10 environment (don't be mistaking on OS Version in the dump summary)!
Edit
What might the destroyscout be trying to destroy? That might be very interesting.

I don't know what exactly is causing this crash, but I can tell you what DestroyScout does.
It's related to creating dynamic methods. The class DynamicResolver needs to clean up related unmanaged memory, which is not tracked by GC. But it cannot be cleaned up until there are definitely no references to the method anymore.
However, because malicious (or outright weird) code can use a long WeakReference which can survive a GC, and therefore resurrect the reference to the dynamic method after its finalizer has run. Hence DestroyScout comes along with its strange GC.ReRegisterForFinalize code in order to ensure that it's the last reference to be destroyed.
It's explained in a comment in the source code
// We can destroy the unmanaged part of dynamic method only after the managed part is definitely gone and thus
// nobody can call the dynamic method anymore. A call to finalizer alone does not guarantee that the managed
// part is gone. A malicious code can keep a reference to DynamicMethod in long weak reference that survives finalization,
// or we can be running during shutdown where everything is finalized.
//
// The unmanaged resolver keeps a reference to the managed resolver in long weak handle. If the long weak handle
// is null, we can be sure that the managed part of the dynamic method is definitely gone and that it is safe to
// destroy the unmanaged part. (Note that the managed finalizer has to be on the same object that the long weak handle
// points to in order for this to work.) Unfortunately, we can not perform the above check when out finalizer
// is called - the long weak handle won't be cleared yet. Instead, we create a helper scout object that will attempt
// to do the destruction after next GC.
As to your crash, this is happening in internal code, and is causing an ExecutionEngineException. This most likely happens when there is memory corruption, when memory is used in a way it wasn't supposed to be.
Memory corruption can happen for a number of reasons. In order of likelihood:
Incorrect use of PInvoke to native Win32 functions (DllImport and asscociated marshalling).
Incorrect use of unsafe (including library classes such as Unsafe and Buffer which do the same thing).
Multi-threaded race conditions on objects which the Runtime does not expect to be used multi-threaded. This can cause such problems as torn reads and memory-barrier violations.
A bug in .NET itself. This can be the easiest to exclude: just upgrade to the latest build.
Consider submitting the crash report to Microsoft for investigation.
Edit from the author:
In order to submit a crash report to Microsoft, the following URL can be used: https://www.microsoft.com/en-us/unifiedsupport. Take into account that this is a paying service and that you might need to deliver your entire source code Microsoft in order to get a full analysis of your crash dump.

Related

.NET Interop Memory Leak, strong handle cannot be dumped

I try to figure out an Memory Leak in my mixed VC++/C# application. Interop is done by com. I figured out the garbage collector does not clean up the managed Object LcUI.Portal.PortalMainForm. It is created and released in the native part of the application. But each time this is done an instance gets created and not cleaned up.
Here a sample output from windbg:
!dumpheap -type LcUI.Portal.PortalMainForm
Address MT Size
234528ac 1dcf176c 652
!gcroot find the object referenced as (strong handle) on the HandleTable:
!gcroot -all 234528ac
HandleTable:
02fd530c (strong handle)
-> 234528ac LcUI.Portal.PortalMainForm
But I'm not able to get the handle info:
!handle 02fd530c
Could not duplicate handle 2fd530c, error 6
Because of the (strong handle) I believe a static self reference from LcUI.Portal.PortalMainForm to the instance of LcUI.Portal.PortalMainForm prevents cleanup by the gc. Or is this a misinterpretation?
Via !DumpObj /d 234528ac I did not find such a reference. How may I find who and where the handle is still held?
After a long time of debugging I found it out. It was the nativ windows handel that was hanging in the WinForms Framework which caused the memory leak.
This happened because in the OnClosing event the close was canceled but some method that was called fromt he event did also call close.
After I cleaned this state so there was no need to cancel the close the memory leak was gone.

RCW Finalizer Access Violation

I am using COM interop for creating a managed plugin into an unmanaged application using VS2012/.NET 4.5/Win8.1. All the interop stuff seems to be going ok, but when I close the app I get an MDA exception telling me AV's have happened while Releasing COM objects the RCW's were holding onto during Finalizing.
This is the call stack:
clr.dll!MdaReportAvOnComRelease::ReportHandledException() + 0x91 bytes
clr.dll!**SafeRelease_OnException**() + 0x55 bytes
clr.dll!SafeReleasePreemp() + 0x312d5f bytes
clr.dll!RCW::ReleaseAllInterfaces() + 0xf3 bytes
clr.dll!RCW::ReleaseAllInterfacesCallBack() + 0x4f bytes
clr.dll!RCW::Cleanup() + 0x24 bytes
clr.dll!RCWCleanupList::ReleaseRCWListRaw() + 0x16 bytes
clr.dll!RCWCleanupList::ReleaseRCWListInCorrectCtx() + 0x9c bytes
clr.dll!RCWCleanupList::CleanupAllWrappers() + 0x2cd1b6 bytes
clr.dll!RCWCache::ReleaseWrappersWorker() + 0x277 bytes
clr.dll!AppDomain::ReleaseRCWs() + 0x120cb2 bytes
clr.dll!ReleaseRCWsInCaches() + 0x3f bytes
clr.dll!InnerCoEEShutDownCOM() + 0x46 bytes
clr.dll!WKS::GCHeap::**FinalizerThreadStart**() + 0x229 bytes
clr.dll!Thread::intermediateThreadProc() + 0x76 bytes
kernel32.dll!BaseThreadInitThunk() + 0xd bytes
ntdll.dll!RtlUserThreadStart() + 0x1d bytes
My guess is that the Application has already destroyed its COM objects, of which some references were passed to the managed plugin - and the call to the IUnknown::Release the RCW makes makes it go boom.
I can clearly see in the output window (VS) that the app has already started unloading some of it's dll's.
'TestHost.exe': Unloaded 'C:\Windows\System32\msls31.dll'
'TestHost.exe': Unloaded 'C:\Windows\System32\usp10.dll'
'TestHost.exe': Unloaded 'C:\Windows\System32\riched20.dll'
'TestHost.exe': Unloaded 'C:\Windows\System32\version.dll'
First-chance exception at 0x00000001400cea84 in VST3PluginTestHost.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.
First-chance exception at 0x00000001400cea84 in VST3PluginTestHost.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.
Managed Debugging Assistant 'ReportAvOnComRelease' has detected a problem in 'C:\Program Files\Steinberg\VST3PluginTestHost\VST3PluginTestHost.exe'.
Additional Information: An exception was caught but handled while releasing a COM interface pointer through Marshal.Release or Marshal.ReleaseComObject or implicitly after the corresponding RuntimeCallableWrapper was garbage collected. This is the result of a user refcount error or other problem with a COM object's Release. Make sure refcounts are managed properly. The COM interface pointer's original vtable pointer was 0x406975a8. While these types of exceptions are caught by the CLR, they can still lead to corruption and data loss so if possible the issue causing the exception should be addressed
So I though I would manage the lifetime my self and wrote a ComReference class that calls Marshal.ReleaseComObject. That did not work correctly and after reading up on it I have to agree that calling Marshal.ReleaseComObject in a scenrario where references are passed around freely, is not a good idea.
Marshal.ReleaseComObject Considered Dangerous
So the question is: Is there a way to manage this situation in order not to cause AV's when exiting the host application?
There are only three real solutions to this problem, and I think that interpretting the "Marshall.ReleaseComObject considered dangerous" article as "Don't use Marshall.ReleaseComObject" can mislead you. Your takeaway could just as easily have been "don't share RCWs freely".
Your three solutions are:
1: Change the execution of your host application to unload plugins before it unloads itself. That's easier said than done. If the plugin system of the host process includes a shutdown event, that would be a good place to deal with it. All of your services that are holding on to RCWs need to release them during shutdown.
2: Use Marshall.ReleaseComObject in a Dispose()-like pattern, ensuring that objects are only stored within a local scope in a manner similar to a using block. This is straight-forward to implement, allows you to release the COM references deterministically, and is generally a very good first approach.
3: Use a COM object broker that can hand out reference counted instances of RCWs and then release those objects when no one is using them. Ensure that every consumer of those objects clean-up prior to the application unloading.
Option #2 works fine as long as you don't store/share references to the managed RCW. I would use #2 up until you identify that your COM object has high activation costs and that caching/sharing is relevant.
This is a problem with the native COM reference counts. Your object is being Release()d from native code with refcount=1, it is destroyed, then the CLR comes along and tries Release() it. You need to track down where the reference count is going wrong. It crashes in the CLR because it runs cleanup after the native code is finished.
First step is to track down the type of object that isn't being counted properly. I did this by running gflags.exe against my .exe file and turn on "User mode stack traces". "Full page heap" may help also.
Run the application in windbg. Run .symfix. Run bp clr!SafeReleasePreemp "r rcx; gc"; g to log the interface pointers. When it crashes, the previous log entry should contain the interface pointer that was already destroyed. Run !heap -p -a [address of COM pointer] and it will print the stack of where it was released.
If you're unlucky, it won't crash right away, and the interface pointer that is causing problems won't be the most recent log. If you can run your native COM under the Debug configuration, it may help.
MS made the RCW header available. The members m_pIdentity (offset 0x88 on x64) and m_aInterfaceEntries (offset 0x8 on x64) are of interest. The RCW is in #rdx on entry to SafeReleasePreemp
Next step is to rerun with breakpoints on Interface::AddRef, Interface::QueryInterface, and Interface::Release to see which one is mismatched. _ATL_DEBUG_INTERFACES may help if you're using ATL.

Debugging .NET memory leaks - how to know what is holding a reference to what?

I am working on a .NET application where there appears to be a memory leak. I know the text-book answers, that events should be unsubscribed, disposable objects should be disposed etc...
I have a test harness that can reproduce the error. In the finalizer of a certain class I write to console
public class Foo
{
// Ctor
public Foo()
{
}
~public Foo()
{
Console.WriteLine("Foo Finalized");
}
}
In the test harness, I am creating a single instance of Foo (which in turn creates and interacts with hundreds of other types) then removing it and invoking the Garbage collector.
I am finding the Foo Finalizer is never called. I have a similar class with this setup which is finalized as a control test.
So my question is this:
How can I determine using commercial or open source tools exactly what
is holding a reference to Foo?
I have a professional license to dotTrace Memory profiler but can't figure out from the help files how to use it.
Update: I am now using dotMemory 4.0, which is the successor to the (good, but unusable) dotTrace Memory 3.5.
Have a look at the SOS debugger extension (It's free, an can be used within Visual Studio).
You may find this and this helpful to get startet.
If you have succefully set up SOS (this can be tricky sometimes), knowing what holds a reference to what is as easy as
// load sos
.load sos
// list of all instances of YourTypeName in memory with their method tables
!DumpHeap -type YourTypeName
// put here the method table displayed by the previous command
// it will show you the memory address of the object
!DumpHeap -mt 07f66b44
// displays information about references the object at the specified address
!GCRoot 02d6ec94
Debugging memory leaks can be quite involved process and requires thorough understanding of your program logic and at least some .Net internals (especially garbage collector behaviour).
For more information see the following links:
Good introduction
http://msdn.microsoft.com/en-us/library/ee658248.aspx
Hands-on course:
http://www.dotnetfunda.com/articles/article508.aspx
http://www.dotnetfunda.com/articles/article524.aspx
GC and .Net internals
http://blogs.msdn.com/b/tess/archive/2008/04/17/how-does-the-gc-work-and-what-are-the-sizes-of-the-different-generations.aspx
http://msdn.microsoft.com/en-us/magazine/cc163491.aspx
http://blogs.msdn.com/b/maoni/archive/2004/06/03/148029.aspx
WinDbg with SOS extension
http://www.codeproject.com/Articles/19490/Memory-Leak-Detection-in-NET
http://www.simple-talk.com/dotnet/.net-framework/investigating-.net-memory-management-and-garbage-collection/
Good Luck!
The finalizer isn't deterministically called, so beware of using it to track things in a reliable way. If you remove the finalizer and instead use a WeakReference<Foo> you should be able to determine whether the object was collected.
All memory profilers should be able to find an issue such as this, but with varying degree of difficulty. I have personally used ANTS which is very easy yo use, but not free. It will help you show a reference diagram to the Foo instance, all the way from a GC root object. Seeing this diagram it is usually easy to spot who is holding the reference.
You can use memory profilers to identify the memory leaks. Here are some,
MemProfiler
ANTS Profiler
Firstly you shouldn't use a finalizer, because:
Finalize operations have the following limitations:
The exact time when the finalizer executes during garbage collection
is undefined. Resources are not guaranteed to be released at any
specific time, unless calling a Close method or a Dispose method.
The finalizers of two objects are not guaranteed to run in any
specific order, even if one object refers to the other. That is, if
Object A has a reference to Object B and both have finalizers, Object
B might have already finalized when the finalizer of Object A starts.
The thread on which the finalizer is run is unspecified.
Quote from: http://msdn.microsoft.com/en-us/library/system.object.finalize.aspx
I would suggest using Dispose method instead.
Secondly, any memory profiler should be able to find what holds those references. Personally I was using ANTS Profiler, it's a very nice tool and has quite rich documentation. You can try reading this doc: http://downloads.red-gate.com/HelpPDF/ANTS_Memory_Profiler/InstanceCategorizer.pdf
Instance categorizer displays chains of references from sets of objects to GC root.

CallbackOnCollectedDelegate - what happens when no debugger is attached?

I'm trying to diagnose a client crash which we cannot reproduce thus far in a debug environment.
I am trying to determine whether a CallbackOnCollectedDelegate MDA notification (resulting from third-party code) would have otherwise resulted in a crash if the debugger was not attached.
So, the question is, could the problem in the third-party code that is causing callbacks on collected delegates be the cause of this behaviour - an MDA when debugging and a client-crash when not?
Info on this MDA: http://msdn.microsoft.com/en-us/library/43yky316(v=vs.80).aspx
If you got that MDA warning then you definitely repro-ed the problem. Yes, that will be a hard crash without a debugger, the native code will bomb when it makes the callback. The stub that marshals the call from native to managed code is no longer there. The likelihood for an AVE is high, albeit never 100% guaranteed since the memory location might refer to a valid address when it got re-used after the stub was collected. Random code execution is then the failure mode. Either outcome is excessively ugly and hard to diagnose, never let it come this far.
It is caused by not storing a reference to the delegate that you passed to the native code. Or not keeping the object that stores the reference alive, same thing. The garbage collector cannot see and does not know that the native code is using the stub. In fact, the CLR destroys the stub when the delegate gets collected, that is how it manages the memory allocations for stubs.
It is up to you to ensure this cannot happen. The most often correct solution is to store the delegate object reference in a private static variable. Only set it back to null when you explicitly told the native code to no longer make callbacks. Never setting it back to null is quite common. Also add a test to ensure it is null before you assign the variable, throw an InvalidOperationException if it is not. If you need an extra level of indirection then use GCHandle.Alloc(Object). Same recipe, don't call Free() until you know it is safe.

Memory Management Of Unmanaged Component By CLR

I am having a little confusion , may be this question is very silly one.
where does the memory allocated for a unmanaged component?
In my .net code if i initiated an unmanaged component, where this component is going to be loaded and memory is allocated ?
How CLR marshall call between Managed and Unmanaged heap ?
EDIT
Thanks for your reply but what i am asking is say suppose i do a DLLIMPORT of User32.Dll , this is clearly a unmanaged dll and i call some function in User32.DLL now my question , how CLR marshall my call to this unmanged dll?
It starts out pretty easy. The pinvoke marshaller first calls LoadLibrary and passes the DLL name you specified, the DllImportAttribute.Value property. In your case, user32.dll is already loaded because it gets loaded by the .NET bootstrapper, its reference count just gets incremented. But normally the Windows loader gets the DLL mapped into the address space of the process so the exported functions can be called.
Next is GetProcAddress to get the address of the function to call, the DllImportAttribute.EntryPoint property. The marshaller makes a couple of tries unless you used ExactSpelling. A function name like "foo" is tested several possible ways, foo and fooW or fooA. Nasty implementation detail of Win32 related to the difference between Unicode and Ansi characters. The CharSet property matters here.
Now I need to wave hands a bit because it gets tricky. The marshaller constructs a stack frame, setting up the arguments that need to be passed to the exported function. This requires low level code, carefully excluded from prying eyes. Take it at face value that it performs the kind of translations that the Marshal class supports to convert between managed and unmanaged types. The DllImportAttribute.CallingConvention property matters here because that determines what argument value needs to be place where so that the called function can read it properly.
Next it sets up an SEH exception handler so that hardware exceptions raised by the called code can be caught and translated into a managed exception. The one that generates the more common one, AccessViolationException. And others.
Next, it pushes a special cookie on the stack to indicate that unmanaged code is about to start using stack. This prevents the garbage collector from blundering into unmanaged stack frames and interpret the pointers it finds there as managed object references. You can see this cookie back in the debugger's call stack, [Managed to Native Transition].
Next, just an indirect call to the function address as found with GetProcAddress(). That gets the unmanaged code running.
After the call, cleanup might need to be done to release memory that was allocated to pass the unmanaged arguments. The return value might need to be translated back to a managed value. And that's it, assuming nothing nasty happened, execution continues on the next managed code statement.
Unmanaged memory allocations come from the process heap. You are responsible for allocating/deallocating the memory, since it will not get garbage collected because the GC does not know about these objects.
Just as an academic piece of info expanding on what has been posted here:
There are about 8 different heaps that the CLR uses:
Loader Heap: contains CLR structures and the type system
High Frequency Heap: statics, MethodTables, FieldDescs, interface map
Low Frequency Heap: EEClass, ClassLoader and lookup tables
Stub Heap: stubs for CAS, COM wrappers, P/Invoke
Large Object Heap: memory allocations that require more than 85k bytes
GC Heap: user allocated heap memory private to the app
JIT Code Heap: memory allocated by mscoreee (Execution Engine) and the JIT compiler for managed code
Process/Base Heap: interop/unmanaged allocations, native memory, etc
HTH
Part of your question is answered by Michael. I answer the other part.
If CLR loaded into an unmanaged process, it is called CLR hosting. This usually involves calling an entry point in mscoree DLL and then the default AppDomain is loaded. In such a case, CLR asks for a block of memory from the process and when given, that becomes its memory space and will have a stack and heap.

Categories