(I know, this is a ridiculously long question. I tried to separate the question from my investigation so far, so it's slightly easier to read.)
I'm running my unit tests using MSTest.exe. Occasionally, I see this test error:
On the individual unit test method: "The agent process was stopped while the test was running."
On the entire test run:
One of the background threads threw exception:
System.NullReferenceException: Object reference not set to an instance of an object.
at System.Runtime.InteropServices.Marshal.ReleaseComObject(Object o)
at System.Management.Instrumentation.MetaDataInfo.Dispose()
at System.Management.Instrumentation.MetaDataInfo.Finalize()
So, here's what I think I need to do: I need to track down what is causing the error in MetaDataInfo, but I'm drawing a blank. My unit test suite takes over half an hour to run, and the error doesn't happen every time, so it's hard to get it to reproduce.
Has anyone else seen this type of failure in running unit tests? Were you able to track it down to a specific component?
Edit:
The code under test is a mix of C#, C++/CLI, and a little bit of unmanaged C++ code. The unmanaged C++ is used only from the C++/CLI, never directly from the unit tests. The unit tests are all C#.
The code under test will be running in a standalone Windows Service, so there's no complication from ASP.net or anything like that. In the code under test, there's threads starting & stopping, network communication, and file I/O to the local hard drive.
My investigation so far:
I spent some time digging around the multiple versions of the System.Management assembly on my Windows 7 machine, and I found the MetaDataInfo class in System.Management that's in my Windows directory. (The version that's under Program Files\Reference Assemblies is much smaller, and doesn't have the MetaDataInfo class.)
Using Reflector to inspect this assembly, I found what seems to be an obvious bug in MetaDataInfo.Dispose():
// From class System.Management.Instrumentation.MetaDataInfo:
public void Dispose()
{
if (this.importInterface == null) // <---- Should be "!="
{
Marshal.ReleaseComObject(this.importInterface);
}
this.importInterface = null;
GC.SuppressFinalize(this);
}
With this 'if' statement backwards, MetaDataInfo will leak the COM object if present, or throw a NullReferenceException if not. I've reported this on Microsoft Connect: https://connect.microsoft.com/VisualStudio/feedback/details/779328/
Using reflector, I was able to find all uses of the MetaDataInfo class. (It's an internal class, so just searching the assembly should be a complete list.) There is only one place it is used:
public static Guid GetMvid(Assembly assembly)
{
using (MetaDataInfo info = new MetaDataInfo(assembly))
{
return info.Mvid;
}
}
Since all uses of MetaDataInfo are being properly Disposed, here's what's happening:
If MetaDataInfo.importInterface is not null:
static method GetMvid returns MetaDataInfo.Mvid
The using calls MetaDataInfo.Dispose
Dispose leaks the COM object
Dispose sets importInterface to null
Dispose calls GC.SuppressFinalize
Later, when the GC collects the MetaDataInfo, the finalizer is skipped.
.
If MetaDataInfo.importInterface is null:
static method GetMvid gets a NullReferenceException calling MetaDataInfo.Mvid.
Before the exception propagates up, the using calls MetaDataInfo.Dispose
Dispose calls Marshal.ReleaseComObject
Marshal.ReleaseComObject throws a NullReferenceException.
Because an exception is thrown, Dispose doesn't call GC.SuppressFinalize
The exception propagates up to GetMvid's caller.
Later, when the GC collects the MetaDataInfo, it runs the Finalizer
Finalize calls Dispose
Dispose calls Marshal.ReleaseComObject
Marshal.ReleaseComObject throws a NullReferenceException, which propagates all the way up to the GC, and the application is terminated.
For what it's worth, here's the rest of the relevant code from MetaDataInfo:
public MetaDataInfo(string assemblyName)
{
Guid riid = new Guid(((GuidAttribute) Attribute.GetCustomAttribute(typeof(IMetaDataImportInternalOnly), typeof(GuidAttribute), false)).Value);
// The above line retrieves this Guid: "7DAC8207-D3AE-4c75-9B67-92801A497D44"
IMetaDataDispenser o = (IMetaDataDispenser) new CorMetaDataDispenser();
this.importInterface = (IMetaDataImportInternalOnly) o.OpenScope(assemblyName, 0, ref riid);
Marshal.ReleaseComObject(o);
}
private void InitNameAndMvid()
{
if (this.name == null)
{
uint num;
StringBuilder szName = new StringBuilder {
Capacity = 0
};
this.importInterface.GetScopeProps(szName, (uint) szName.Capacity, out num, out this.mvid);
szName.Capacity = (int) num;
this.importInterface.GetScopeProps(szName, (uint) szName.Capacity, out num, out this.mvid);
this.name = szName.ToString();
}
}
public Guid Mvid
{
get
{
this.InitNameAndMvid();
return this.mvid;
}
}
Edit 2:
I was able to reproduce the bug in the MetaDataInfo class for Microsoft. However, my reproduction is slightly different from the issue I'm seeing here.
Reproduction: I try to create a MetaDataInfo object on a file that isn't a managed assembly. This throws an exception from the constructor before importInterface is initialized.
My issue with MSTest: MetaDataInfo is constructed on some managed assembly, and something happens to make importInterface null, or to exit the constructor before importInterface is initialized.
I know that MetaDataInfo is created on a managed assembly, because MetaDataInfo is an internal class, and the only API that calls it does so by passing the result of Assembly.Location.
However, re-creating the issue in Visual Studio meant that it downloaded the source to MetaDataInfo for me. Here's the actual code, with the original developer's comments.
public void Dispose()
{
// We implement IDisposable on this class because the IMetaDataImport
// can be an expensive object to keep in memory.
if(importInterface == null)
Marshal.ReleaseComObject(importInterface);
importInterface = null;
GC.SuppressFinalize(this);
}
~MetaDataInfo()
{
Dispose();
}
The original code confirms what was seen in reflector: The if statement is backwards, and they shouldn't be accessing the managed object from the Finalizer.
I said before that because it was never calling ReleaseComObject, that it was leaking the COM object. I read up more on the use of COM objects in .Net, and if I understand it properly, that was incorrect: The COM object isn't released when Dispose() is called, but it is released when the garbage collector gets around to collecting the Runtime Callable Wrapper, which is a managed object. Even though it's a wrapper for an unmanaged COM object, the RCW is still a managed object, and the rule about "don't access managed objects from the finalizer" should still apply.
Try to add the following code to your class definition:
bool _disposing = false // class property
public void Dispose()
{
if( !disposing )
Marshal.ReleaseComObject(importInterface);
importInterface = null;
GC.SuppressFinalize(this);
disposing = true;
}
If MetaDataInfo uses the IDisposable pattern, then there should also be a finalizer (~MetaDataInfo() in C#). The using statement will make sure to call Dispose(), which sets the importInterface to null. Then when the GC is ready to finalize, the ~MetaDataInfo() is called, which would normally call Dispose (or rather the overload taking a bool disposing: Dispose(false)).
I would say that this error should turn up quite often.
Are you trying to fix this for your tests? If so, rewrite your using. Don't dispose of it yourself but write some code to use reflection to access the private fields and dispose of them correctly and then call GC.SuppressFinalize to prevent the finalizer from running.
As a brief aside (loved your investigation btw) you state that Dispose calls Finalize. It's the other way round, Finalize when invoked by the GC calls Dispose.
Related
How do I handle situations when the my app is terminating, using a callback prior to termination?
The .NET handlers do not work in the following scenario, is SetUnhandledExceptionHandler the correct choice? It appears to have the shortcomings discussed in the following.
Scenario
I want to respond to all cases of app termination with a message and error report to our service in our .net app.
However, I have a WPF app in which two of our testers get unhandled exceptions that bypass:
AppDomain.UnhandledException (most importantly)
Application.ThreadException
Dispatcher.UnhandledException
They are marked SecuirtyCritical and HandleProcessCorruptedStateExceptions.
legacyCorruptedStateExceptionsPolicy is set to true in the app.config
My two examples in the wild
VirtualBox running widows10 throws inside some vboxd3d.dll when initialising WPF somewhere (turning off vbox 3d accel "fixes it")
Win8 machine with suspicious option to "run on graphics card A/B" in system context menu, crashes somewhere (:/) during WPF startup but only when anti-cracking tools are applied.
Either way, when live, the app must to respond to these kinds of failures prior to termination.
I can reproduce this with an unmanaged exception, that occurs in an unmanaged thread of a PInvoked method in .net:
test.dll
BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
DWORD WINAPI myThread(LPVOID lpParameter)
{
long testfail = *(long*)(-9022);
return 1;
}
extern "C" __declspec(dllexport) void test()
{
DWORD tid;
HANDLE myHandle = CreateThread(0, 0, myThread, NULL, 0, &tid);
WaitForSingleObject(myHandle, INFINITE);
}
app.exe
class TestApp
{
[DllImport("kernel32.dll")]
static extern FilterDelegate SetUnhandledExceptionFilter(FilterDelegate lpTopLevelExceptionFilter);
[UnmanagedFunctionPointer(CallingConvention.StdCall)]
delegate int FilterDelegate(IntPtr exception_pointers);
static int Win32Handler(IntPtr nope)
{
MessageBox.Show("Native uncaught SEH exception"); // show + report or whatever
Environment.Exit(-1); // exit and avoid WER etc
return 1; // thats EXCEPTION_EXECUTE_HANDLER, although this wont be called due to the previous line
}
[DllImport("test.dll")]
static extern void test();
[STAThread]
public static void Main(string[] args)
{
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);
SetUnhandledExceptionFilter(Win32Handler);
test(); // This is caught by Win32Handler, not CurrentDomain_UnhandledException
}
[SecurityCritical, HandleProcessCorruptedStateExceptions ]
static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
{
Exception ex = e.ExceptionObject as Exception;
MessageBox.Show(ex.ToString()); // show + report or whatever
Environment.Exit(-1); // exit and avoid WER etc
}
}
This handles the failure in the vboxd3d.dll in a bare WPF test app, which of course also has the WCF Dispatcher and WinForms Application (why not) exception handlers registered.
Updates
In the production code I am trying to use this on, the handler appears to get overwritten by some other caller, I can get around that by calling the method every 100ms which is stupid of course.
On the machine with the vbox3d.dll problem, doing the above replaces the exception with one in clr.dll.
It appears at the time of crash, the managed function pointer passed into kernel32 is no longer valid. Setting the handler with a native helper dll, which calls a native function inside appears to be working. The managed function is a static method - I'm not sure pinning applies here, perhaps the clr is in the process of terminating...
Indeed the managed delegate was being collected. No "overwriting" of the handler was occuring. I've added as an answer..not sure what to accept or what the SO convention is here...
The problem with the code in the question was this:
SetUnhandledExceptionFilter(Win32Handler);
Which since a delegate is automatically created, is eqivilant to:
FilterDelegate del = new FilterDelegate(Win32Handler);
SetUnhandledExceptionFilter(del);
Problem being, that the GC can collect it, and the native->managed thunk that is created, at any point after it's final reference. So:
SetUnhandledExceptionFilter(Win32Handler);
GC.Collect();
native_crash_on_unmanaged_thread();
Will always cause a nasty crash where the handler passed into kernel32.dll is no longer a valid function pointer. This is remedied by not allowing the GC to collect:
public class Program
{
static FilterDelegate mdel;
public static void Main(string[] args)
{
FilterDelegate del = new FilterDelegate(Win32Handler);
SetUnhandledExceptionFilter(del);
GC.KeepAlive(del); // do not collect "del" in this scope (main)
// You could also use mdel, which I dont believe is collected either
GC.Collect();
native_crash_on_unmanaged_thread();
}
}
The other answers are also a great resource; not sure what to mark as the answer right now.
I've had to deal with, shall we say, unpredictable unmanaged libraries.
If you're P/Invoking into the unmanaged code, you may have problems there. I've found it easier to use C++/CLI wrappers around the unmanaged code and in some cases, I've written another set of unmanaged C++ wrappers around the library before getting to the C++/CLI.
You might be thinking, "why on earth would you write two sets of wrappers?"
The first is that if you isolate the unmanaged code, it makes it easier to trap exceptions and make them more palatable.
The second is purely pragmatic - if you have a library (not a dll) which uses stl, you will find that the link will magically give all code, managed and unmanaged, CLI implementation of the stl functions. The easiest way to prevent that is to completely isolate the code that uses stl, which means that everytime you access a data structure through stl in unmanaged code you end up doing multiple transitions between managed and unmanaged code and your performance will tank. You might think to yourself, "I'm a scrupulous programmer - I'll be super careful to put #pragma managed and/or #pragma unmanaged wrappers in the right places and I'm all set." Nope, nope, and nope. Not only is this difficult and unreliable, when (not if) you fail to do it properly, you won't have a good way to detect it.
And as always, you should ensure that whatever wrappers you write are chunky rather than chatty.
Here is a typical chunk of unmanaged code to deal with an unstable library:
try {
// a bunch of set up code that you don't need to
// see reduced to this:
SomeImageType *outImage = GetImage();
// I was having problems with the heap getting mangled
// so heapcheck() is conditional macro that calls [_heapchk()][1]
heapcheck();
return outImage;
}
catch (std::bad_alloc &) {
throw MyLib::MyLibNoMemory();
}
catch (MyLib::MyLibFailure &err)
{
throw err;
}
catch (const char* msg)
{
// seriously, some code throws a string.
throw msg;
}
catch (...) {
throw MyLib::MyLibFailure(MyKib::MyFailureReason::kUnknown2);
}
An exception that can't be handled properly can always happen, and the process may die unexpectedly no matter how hard you try to protect it from within. However, you can monitor it from the outside.
Have another process that monitors your main process. If the main process suddenly disappears without logging an error or reporting things gracefully, the second process can do that. The second process can be a lot simpler, with no unmanaged calls at all, so chances of it disappearing all of a sudden are significantly smaller.
And as a last resort, when your processes start check if they've shut down properly. If not, you can report a bad shutdown then. This will be useful if the entire machine dies.
I have a class that implements IDisposable, because it uses image resources (Bitmap class) from GDI+. I use it for wrapping all that gimmicky LockBits/UnlockBits. And it works fine, be it when I call Dispose() or with the using statement.
However, if I leave the program to terminate without disposing, I get a System.AccessViolationException. Intuitively, I think the GC would call the Dispose the same way I do, and the object would gracefully release the resources, but that's not what happening. Why?
Here's the IDisposable code:
private bool _disposing = false;
~QuickBitmap() {
Dispose(false);
}
public void Dispose() {
Dispose(true);
GC.SuppressFinalize(this);
}
private void Dispose(bool safeDispose) {
if (_disposing)
return;
SaveBits(); // private wrapper to UnlockBits
bytes = null; // byte[] of the image
bmpData = null; // BitmapData object
if (safeDispose && bm != null) {
bm.Dispose(); // Bitmap object
bm = null;
}
_disposing = true;
}
Here's when it works ok:
using (var qbm = new QuickBitmap("myfile.jpg"))
{
// use qbm.GetPixel/qbm.SetPixel at will
}
Here's when it doesn't work:
public static void Main (string[] args) {
// this is just an example, simply constructing the object and doing nothing will throw the exception
var qbm = new QuickBitmap(args[0]);
qbm.SetPixel(0, 0, Color.Black);
qbm.Save();
}
The complete excetion is (there's no inner exception):
An unhandled exception of type 'System.AccessViolationException' occurred in mscorlib.dll
Additional information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
The reproduction is 100%, even on different machines. I do figure out we should use using or Dispose(), not using it is bad and all that stuff. I just want to know: why does this happen? Why is the memory "protected"? Which kind of access am I violating?
The problem occurs because you included an unnecessary finalizer in your implementation. Code executed from a finalizer generally cannot access managed objects safely. It is likely that the call to SaveBits results in the use of managed objects in violation of this rule, though you did not include the code for that method.
The best solution would be to simply remove the finalizer from QuickBitmap, since the QuickBitmap class does not directly own unmanaged resources.
The whole point of IDisposable is to gracefully clean up unmanaged resources. The literal definition of an unmanaged resource is a resource that the GC won't be able to clean up on its own. Unsurprisingly, it won't be able to clean it up on its own if you don't. Objects that can be cleaned up entirely through the GC need not be disposable, and this require no manual disposal. If the object could be cleaned up without manual disposable then it wouldn't have a need to implement IDisposable in the first place.
I'm puzzled with an occurance of AccessViolationException. It's quite impossible (see answer) to have a clean reproduction but here goes the general idea:
class MyClass
{
public List<SomeType> MyMethod(List<string> arg)
{
// BREAKPOINT here
// Simple stuff here, nothing fancy, no external libs used
}
}
delegate List<SomeType> MyDelegate(List<string> arg);
...
var myObject = new MyClass();
Func<List<string>, List<SomeType>> myFunc = myObject.MyMethod;
MyDelegate myDelegate = myObject.MyMethod;
myFunc(null) // works fine
myDelegate(null) // works fine
myObject.MyMethod(null) // throws AccessViolationException
The weird part is I'm not using any unsafe code. I don't have any dependencies on external libraries anywhere close (and anywhere in the whole program execution AFAIK).
The weirdest part is this is 100% reproducable and even after refactoring the code slightly, moving the method invocation elsewhere, putting extra code before it etc. - in all cases AccessViolationException is still thrown on that particular method invocation. The method is never entered when invoked directly (breakpoint is not hit). Invoking it through delegate or Func<> works fine.
Any clues as to what could cause it or how to debug it?
UPDATE
Following antiduh's question: There is no call to a virtual method from a constructor anywhere close. The actual stack trace when this happens is very simple, just two static methods and a simple instance one.
The only clue seems to be threading. There is Parallel.ForEach() and Thread.Sleep() invoked before this in program execution (not in call stack). Any clues as to how mishandled threading (with regular, managed classes) could cause AVE?
UPDATE
Narrowed it down to a VS bug, see my answer.
This seems to be a VS bug. Have a look at this full solution. Code is simple:
using System;
using System.Collections.Generic;
namespace Foo
{
public class MyClass
{
public virtual object Foo(object o1, object o2, object o3, object o4)
{
return new object();
}
}
public sealed class Program
{
public static void Main(string[] args)
{
var myClass = new MyClass();
object x = new object();
myClass.Foo(null, null, null, new object()); // put a breakpoint here and once it stops, step over (F10) - AccessViolationException should be thrown in VS
}
}
}
The important fact I have missed before is that the code actually works fine when ran normally. Only when that particular line is being stepped over in VS (F10), the Access Violation occurs and it actually occurs in VS hosting process (even though the final stack frame is my code). It's possible to continue execution fine.
The issue happens for me on VS 2013, version 12.0.21005.1 REL. It also happens on 3 other machines I tested this on.
UPDATE
Installing .NET Framework 4.5.2 solves this.
I have the weirdest problem, and most likely there is something I am missing or I don't know.
I created a C# COM interface/class that for this question we'll call it: Ics_obj and Ccs_obj.
In the destructor of the object I added a trace to the output debug:
~Ccs_obj()
{
OutputDebugString("Releasing: "+this.GetHashCode()); // OutputDebugString with p/invoke
}
That I wrote a C++ class that Ics_obj is a member in that class, and the member is the com_ptr_t wrapper of Ics_obj generated in the .tlh:
class lib_exp_macro cpp_client
{
public:
cpp_client();
~cpp_client();
...
...
Ics_objPtr _csobj; // the Ics_obj* wrapped in com_ptr_t.
}
In the constructor I create an instance of Ccs_obj, and in the destructor I release it. I also added the following traces:
cpp_client::cpp_client()
{
HRESULT hr = _csobj.CreateInstance( __uuidof( Ccs_obj ) );
if(FAILED(hr)){ /* throw exception */ }
OutputDebugString("Creating client");
}
cpp_client::~cpp_client()
{
OutputDebugString("before releasing _csobj");
}
Now, I have created 2 instances of cpp_client through my code (creating them with the same thread during initialization of my application), and I get 2 traces of "Creating Client".
The problem is that during shutdown I get ACCESS VIOLATION and the traces are as follow:
before release // first object being released
Releasing: (some number X)
Releasing: (some number Y, Y != X) // <- where did that come from?
before releasing _csobj
SYSTEM CRASH HORRIBLY WITH ACCESS VIOLATION! :-(
When I debug I get ACCESS VIOLATION for accessing the v-table of the COM object.
Does anyone know why I get ACCESS VIOLATION? what am I doing wrong? I am REALLY LOST HERE! :-(
Thanks in advance!
UPDATE:
With the help of two answers (that were deleted), I understood more things, but I still have open questions...
First and the most important thing is that the 2nd release was in dllmain PROCESS_DETACH. When I moved the release code out of this context, everything was fine, but why can't I release the COM object in the PROCESS_DETACH context?!
One of the answers that were deleted said that CLR of the .Net has shutdown, therefore the 2nd release in my traces. This made a lot of sense and what lead me into moving the releasing code out of PROCESS_DETACH context, but if it is so and CLR does shutdown because the PROCESS_DETACH code, why don't I get any of the "Releasing: ..." traces if I'm not releasing the COM object (by detaching _com_ptr_t)??? Or in other words, why does CLR doesn't shutdown if I don't release the 1st object?
Thanks again for your amazing help!
This is a logging class with a constructor:
public QFXLogger(
int maxFileSize,
TraceLevel logLevel)
{
this.maxFileSize = maxFileSize;
logSwitch.Level = logLevel;
//Configure log listener
traceListener = new FileLogTraceListener();
traceListener.DiskSpaceExhaustedBehavior = DiskSpaceExhaustedOption.DiscardMessages;
traceListener.CustomLocation = #".\Log";
traceListener.BaseFileName = "QFXLog";
traceListener.AutoFlush = true;
//Remove all other listeners
Trace.Listeners.Clear();
//Add QFX listener
Trace.Listeners.Add(traceListener);
//Write header
WriteSessionHeader();
}
And this is the destrcutor:
~QFXLogger()
{
WriteSessionFooter();
traceListener.Close();
}
I just want to write a footer to the underlying stream before the logger gets GC.
Without the destructor everything is fine, but with it I get the following:
Unhandled Exception: System.ObjectDisposedException: Cannot access a closed file
.
at System.IO.__Error.FileNotOpen()
at System.IO.FileStream.Flush(Boolean flushToDisk)
at System.IO.FileStream.Flush()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.IO.StreamWriter.Flush()
at Microsoft.VisualBasic.Logging.FileLogTraceListener.ReferencedStream.CloseS
tream()
at Microsoft.VisualBasic.Logging.FileLogTraceListener.CloseCurrentStream()
at Microsoft.VisualBasic.Logging.FileLogTraceListener.Write(String message)
at System.Diagnostics.TraceInternal.Write(String message)
at System.Diagnostics.Trace.Write(String message)
at QFXShell.QFXLogger.WriteSessionFooter()
at QFXShell.QFXLogger.Finalize()
It seems to me that the underlying stream was already closed.
How can I suppress this closing(of the underlying stream) or is this another issue?
Finalizers (destructors) in c# should not be used in this method. Finalizers are intended only to release unmanaged resources. When a finalizer is called, access to other .net objects is not guaranteed and should only release unmanaged resources that you directly allocated.
Finalize operations have the following limitations:
The finalizers of two objects are not guaranteed to run in any specific order, even if one object refers to the other. That is, if Object A has a reference to Object B and both have finalizers, Object B might have already finalized when the finalizer of Object A starts.
What you need to do is implement the IDisposable Interface to properly close your logging file. Some additional information can be found at Kelly Leahy's IDisposable and Garbage Collection.
If you are not in a situation where a disposable class will help (global object, etc.) you can always implement a Close method yourself to ensure the file is valid before it is released.
It is too late for you to close the tracer object in the destructor. In this moment a lot of objects which are not needed any more can be already disposed.
I would propose to implement a IDisposable interface with a public Dispose method and to call this method as soon as you know that you are not going to need this QFXLogger object any more. In this Dispose method, I would check if the tracer object is not NULL, open and then I would call Close on it.
See Proper use of the IDisposable interface for more details.