In C# (or maybe in .NET in general) individual assemblies cannot be unloaded from memory.
Unloading can only occur at the AppDomain level.
I am wondering what are there reasons behind this design? Other languages support this feature (C++ i think)
Here is an MSDN blog post listing some reasons why not. The main issue is:
First off, you are running that code in the app domain (duh!). That means there are potentially call sites and call stacks with addresses in them that are expecting to keep working. Have you ever gotten an access violation where your EIP points to 0x???????? That is an example where someone freed up a DLL, the pages got unmapped by the memory system, and then you tried to branch to it. This typically happens in COM when you have a ref counting error and you make an interface method call. We cannot afford to be as lose with managed code. We must guarantee we know all of the code you are executing and that it is type safe and verifiable. That means explicit tracking of anything that could be using that code, including GC objects and COM interop wrappers. This tracking is handled today around an app domain boundary. Tracking it at the assembly level becomes quite expensive.
I'll summarise this in higher-level language:
Basically, things that go wrong if you simply delete executable code go wrong on the unmanaged level. You would have compiled code that points to other compiled code that is no longer there, so your code would jump into an area that is invalid, and possibly contains arbitrary data.
This is unacceptable in managed code, because things are meant to be safe and have some guarantees around them. One of these guarantees is that your code can't execute arbitrary sections of memory.
To handle this issue properly you'd have to track many more things more closely, and this would be a large overhead. The alternative is to only track these things at appdomain boundaries, which is what is done.
Related
We've been utilizing the OR tools to solve linear optimizations in a real-time, .NET application. That is, solving linear optimizations regularly using different inputs as time progresses.
Recently we ran into an issue that we haven't seen before while running our application on a server for extended periods of time, in which seemingly random attempts to solve the optimizations were causing AccessViolationExceptions. Specifically,
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
at Google.OrTools.LinearSolver.operations_research_linear_solverPINVOKE.Variable_SolutionValue(System.Runtime.InteropServices.HandleRef)
...
I'm trying to find out more specifically where this is happening in the pipeline, but given the output there I believe it is a section in which we are trying to retrieve the individual variable solution values out of the solver after solving the optimization.
We are using a wide variety of constraints over a decent sized number of variables.
Has anyone seen this before?
Reference github issue link
After some testing we found that what appears to have been happening is that the garbage collector was collecting some of the Variables we were using during the P/Invoke, as per this.
Unfortunately, this seems to be a side effect of the way that SWIG creates its .NET wrappers and their IDisposable implementations, using HandleRefs instead of something like SafeHandles, which 'handle' this as per the documentation:
Platform invoke operations automatically increment the reference count of handles encapsulated by a SafeHandle and decrement them upon completion. This ensures that the handle will not be recycled or closed unexpectedly.
More information here.
Without wanting to get into the business of creating our own SWIG typemap or compiling a new version of SWIG, .NET provides a way of keeping objects 'alive' with regard to the Garbage Collector. That is, calling GC.KeepAlive on all of the objects which we will be accessing values from via P/Invoke (in our case the Solver and our Variables) at the end of the optimization procedure, prevents the garbage collector from thinking that they are collectible until the end of the scope of the KeepAlive method without side effects (as per their documentation).
Preliminary testing has shown this to work, though given that it was already intermittently occurring before, we'll be watching for this happening going forward.
Going forward, I think either making a request of SWIG to use SafeHandles is probably the best idea (it has been discussed before and is still an open issue) or changing the typemap to use SafeHandles directly, is likely the best option. I may try investigating the later option myself, but because this fix ended up only adding 3 lines of code (plus a host of comments) to our code base for what seems like a full fix, it's going to be low priority for me. That said, a fix for this would be nice for an upcoming version.
Sometimes you need to use a particular third-party library, like in my case, one that loads up PowerPoints and allows the user to modify them in code. We discovered that this particular library has some memory leaks, but we would still like to use it because these leaks only occur in one particular scenario that occurs very rarely. You can see objects lying around despite trying to dispose all references to them, and despite having these objects go out of scope, and despite having manually invoked garbage collection. For sure, this library creates leaks. Our application is single-threaded.
Now, that being said, I am wondering if there is any way to clean up all memory that the library has used during runtime. Are there any ways to unload and reload the DLL that might cause all memory allocations from that library to be cleared, or anything that we can do at runtime at all to clean the memory that this third party library uses and then potentially reload the library in our application?
You could investigate loading the referenced library inside a custom AppDomain; an app-domain is a unit of isolation inside a process - and can be unloaded. It does, however, require you to do some communications between the two app-domains (the default domain and the hosted domain); MarshalByRefObject is the easiest trick there.
Alternatively: just use an entire separate process for this work. On windows, creating a process is relatively expensive, but not so expensive that you should never do it. Shutting down the process when done is the equivalent of nuking it from orbit. You can always re-spawn another process later.
There is an entire series of "How to" topics about AppDomain linked from MSDN here
I'm having issues with finalizers seemingly being called early in a C++/CLI (and C#) project I'm working on. This seems to be a very complex problem and I'm going to be mentioning a lot of different classes and types from the code. Fortunately it's open source, and you can follow along here: Pstsdk.Net (mercurial repository) I've also tried linking directly to the file browser where appropriate, so you can view the code as you read. Most of the code we deal with is in the pstsdk.mcpp folder of the repository.
The code right now is in a fairly hideous state (I'm working on that), and the current version of the code I'm working on is in the Finalization fixes (UNSTABLE!) branch. There are two changesets in that branch, and to understand my long-winded question, we'll need to deal with both. (changesets: ee6a002df36f and a12e9f5ea9fe)
For some background, this project is a C++/CLI wrapper of an unmanaged library written in C++. I am not the coordinator of the project, and there are several design decisions that I disagree with, as I'm sure many of you who look at the code will, but I digress. We wrap much of the layers of original library in the C++/CLI dll, but expose the easy-to-use API in the C# dll. This is done because the intention of the project is to convert the entire library to managed C# code.
If you're able to get the code to compile, you can use this test code to reproduce the problem.
The problem
The latest changeset, entitled moved resource management code to finalizers, to show bug, shows the original problem I was having. Every class in this code is uses the same pattern to free the unmanaged resources. Here is an example (C++/CLI):
DBContext::~DBContext()
{
this->!DBContext();
GC::SuppressFinalize(this);
}
DBContext::!DBContext()
{
if(_pst.get() != nullptr)
_pst.reset(); // _pst is a clr_scoped_ptr (managed type)
// that wraps a shared_ptr<T>.
}
This code has two benefits. First, when a class such as this is in a using statement, the resources are properly freed immediately. Secondly, if a dispose is forgotten by the user, when the GC finally decides to finalize the class, the unmanaged resources will be freed.
Here is the problem with this approach, that I simply cannot get my head around, is that occasionally, the GC will decide to finalize some of the classes that are used to enumerate over data in the file. This happens with many different PST files, and I've been able to determine it has something to do with the Finalize method being called, even though the class is still in use.
I can consistently get it to happen with this file (download)1. The finalizer that gets called early is in the NodeIdCollection class that is in DBAccessor.cpp file. If you are able to run the code that was linked to above (this project can be difficult to setup because of the dependencies on the boost library), the application would fail with an exception, because the _nodes list is set to null and the _db_ pointer was reset as a result of the finalizer running.
1) Are there any glaring problems with the enumeration code in the NodeIdCollection class that would cause the GC to finalize this class while it's still in use?
I've only been able to get the code to run properly with the workaround I've described below.
An unsightly workaround
Now, I was able to work around this problem by moving all of the resource management code from the each of the finalizers (!classname) to the destructors (~classname). This has solved the problem, though it hasn't solved my curiosity of why the classes are finalized early.
However, there is a problem with the approach, and I'll admit that it's more a problem with the design. Due to the heavy use of pointers in the code, nearly every class handles its own resources, and requires each class be disposed. This makes using the enumerations quite ugly (C#):
foreach (var msg in pst.Messages)
{
// If this using statement were removed, we would have
// memory leaks
using (msg)
{
// code here
}
}
The using statement acting on the item in the collection just screams wrong to me, however, with the approach it's very necessary to prevent any memory leaks. Without it, the dispose never gets called and the memory is never freed, even if the dispose method on the pst class is called.
I have every intention trying to change this design. The fundamental problem when this code was first being written, besides the fact that I knew little to nothing about C++/CLI, was that I couldn't put a native class inside of a managed one. I feel it might be possible to use scoped pointers that will free the memory automatically when the class is no longer in use, but I can't be sure if that's a valid way to go about this or if it would even work. So, my second question is:
2) What would be the best way to handle the unmanaged resources in the managed classes in a painless way?
To elaborate, could I replace a native pointer with the clr_scoped_ptr wrapper that was just recently added to the code (clr_scoped_ptr.h from this stackexchange question). Or would I need to wrap the native pointer in something like a scoped_ptr<T> or smart_ptr<T>?
Thank you for reading all of this, I know it was a lot. I hope I've been clear enough so that I might get some insight from people a little more experienced than I am. It's such a large question, I intend on adding a bounty when it allows me too. Hopefully, someone can help.
Thanks!
1This file is part of the freely available enron dataset of PST files
The clr_scoped_ptr is mine, and comes from here.
If it has any errors, please let me know.
Even if my code isn't perfect, using a smart pointer is the correct way to deal with this issue, even in managed code.
You do not need to (and should not) reset a clr_scoped_ptr in your finalizer. Each clr_scoped_ptr will itself be finalized by the runtime.
When using smart pointers, you do not need to write your own destructor or finalizer. The compiler-generated destructor will automatically call destructors on all subobjects, and every subobject finalizer will run when it is collected.
Looking closer at your code, there is indeed an error in NodeIdCollection. GetEnumerator() must return a different enumerator object each time it is called, so that each enumeration would begin at the start of the sequence. You're reusing a single enumerator, meaning that position is shared between successive calls to GetEnumerator(). That's bad.
Refreshing my memory of destructors/finalalisers, from some Microsoft documentation, you could at least simplify your code a little, I think.
Here's my version of your sequence:
DBContext::~DBContext()
{
this->!DBContext();
}
DBContext::!DBContext()
{
delete _pst;
_pst = NULL;
}
The "GC::SupressFinalize" is automatically done by C++/CLI, so no need for that. Since the _pst variable is initialised in the constructor (and deleting a null variable causes no problems anyway), I can't see any reason to complicate the code by using smart pointers.
On a debugging note, I wonder if you can help make the problem more apparent by sprinkling in a few calls to "GC::Collect". That should force finalization on dangling objects for you.
Hope this helps a little,
I've got a p-invoke call to an unmanaged DLL that was failing in my WPF app but not in a simple, starter WPF app. I tried to figure out what the problem was but eventually came to the conclusion that if I assign too much memory before making the call, the call fails. I had two separate blocks of code, both of which would succeed on their own, but that would cause failure if both were run. (They had nothing to do with what the p-invoke call is trying to do).
What kind of issues in the unmanaged library would cause such an issue? I thought that the managed and unmanaged heaps were supposed to be automatically separated.
The crash as far as I can tell is happening in a dynamically loaded secondary DLL from the one p-invoked into. Could that have something to do with it?
Unmanaged code is prone to corrupt the heap. The side effects of that corruption are very unpredictable, it depends on what happens afterwards with that corrupted memory. It is not uncommon that nothing bad happens if the corruption is not in a crucial location. Changing the memory allocation pattern of your program can change that outcome.
All you really know right now is that the unmanaged code can't be trusted. Doing something about it is invariably hard, especially from a managed host program. You won't get anywhere until you start writing unit tests for that unmanaged code, using unmanaged code to exercise it, and find a reproducible bomb that you could tackle with an unmanaged debugger.
A shot in the dark given there is not much info to work with.
Is it possible that the unmanaged DLL needs to be loaded at a specific base address and when you allocate too much memory or other assemblies are loaded, the DLL is not able to load at the correct address.
http://msdn.microsoft.com/en-us/library/w368ysh2.aspx
I wrote C++ for 10 years. I encountered memory problems, but they could be fixed with a reasonable amount of effort.
For the last couple of years I've been writing C#. I find I still get lots of memory problems. They're difficult to diagnose and fix due to the non-determinancy, and because the C# philosophy is that you shouldn't have to worry about such things when you very definitely do.
One particular problem I find is that I have to explicitly dispose and cleanup everything in code. If I don't, then the memory profilers don't really help because there is so much chaff floating about you can't find a leak within all the data they're trying to show you. I wonder if I've got the wrong idea, or if the tool I've got isn't the best.
What kind of strategies and tools are useful for tackling memory leaks in .NET?
I use Scitech's MemProfiler when I suspect a memory leak.
So far, I have found it to be very reliable and powerful. It has saved my bacon on at least one occasion.
The GC works very well in .NET IMO, but just like any other language or platform, if you write bad code, bad things happen.
Just for the forgetting-to-dispose problem, try the solution described in this blog post. Here's the essence:
public void Dispose ()
{
// Dispose logic here ...
// It's a bad error if someone forgets to call Dispose,
// so in Debug builds, we put a finalizer in to detect
// the error. If Dispose is called, we suppress the
// finalizer.
#if DEBUG
GC.SuppressFinalize(this);
#endif
}
#if DEBUG
~TimedLock()
{
// If this finalizer runs, someone somewhere failed to
// call Dispose, which means we've failed to leave
// a monitor!
System.Diagnostics.Debug.Fail("Undisposed lock");
}
#endif
We've used Ants Profiler Pro by Red Gate software in our project. It works really well for all .NET language-based applications.
We found that the .NET Garbage Collector is very "safe" in its cleaning up of in-memory objects (as it should be). It would keep objects around just because we might be using it sometime in the future. This meant we needed to be more careful about the number of objects that we inflated in memory. In the end, we converted all of our data objects over to an "inflate on-demand" (just before a field is requested) in order to reduce memory overhead and increase performance.
EDIT: Here's a further explanation of what I mean by "inflate on demand." In our object model of our database we use Properties of a parent object to expose the child object(s). For example if we had some record that referenced some other "detail" or "lookup" record on a one-to-one basis we would structure it like this:
class ParentObject
Private mRelatedObject as New CRelatedObject
public Readonly property RelatedObject() as CRelatedObject
get
mRelatedObject.getWithID(RelatedObjectID)
return mRelatedObject
end get
end property
End class
We found that the above system created some real memory and performance problems when there were a lot of records in memory. So we switched over to a system where objects were inflated only when they were requested, and database calls were done only when necessary:
class ParentObject
Private mRelatedObject as CRelatedObject
Public ReadOnly Property RelatedObject() as CRelatedObject
Get
If mRelatedObject is Nothing
mRelatedObject = New CRelatedObject
End If
If mRelatedObject.isEmptyObject
mRelatedObject.getWithID(RelatedObjectID)
End If
return mRelatedObject
end get
end Property
end class
This turned out to be much more efficient because objects were kept out of memory until they were needed (the Get method was accessed). It provided a very large performance boost in limiting database hits and a huge gain on memory space.
You still need to worry about memory when you are writing managed code unless your application is trivial. I will suggest two things: first, read CLR via C# because it will help you understand memory management in .NET. Second, learn to use a tool like CLRProfiler (Microsoft). This can give you an idea of what is causing your memory leak (e.g. you can take a look at your large object heap fragmentation)
Are you using unmanaged code? If you are not using unmanaged code, according to Microsoft, memory leaks in the traditional sense are not possible.
Memory used by an application may not be released however, so an application's memory allocation may grow throughout the life of the application.
From How to identify memory leaks in the common language runtime at Microsoft.com
A memory leak can occur in a .NET
Framework application when you use
unmanaged code as part of the
application. This unmanaged code can
leak memory, and the .NET Framework
runtime cannot address that problem.
Additionally, a project may only
appear to have a memory leak. This
condition can occur if many large
objects (such as DataTable objects)
are declared and then added to a
collection (such as a DataSet). The
resources that these objects own may
never be released, and the resources
are left alive for the whole run of
the program. This appears to be a
leak, but actually it is just a
symptom of the way that memory is
being allocated in the program.
For dealing with this type of issue, you can implement IDisposable. If you want to see some of the strategies for dealing with memory management, I would suggest searching for IDisposable, XNA, memory management as game developers need to have more predictable garbage collection and so must force the GC to do its thing.
One common mistake is to not remove event handlers that subscribe to an object. An event handler subscription will prevent an object from being recycled. Also, take a look at the using statement which allows you to create a limited scope for a resource's lifetime.
This blog has some really wonderful walkthroughs using windbg and other tools to track down memory leaks of all types. Excellent reading to develop your skills.
I just had a memory leak in a windows service, that I fixed.
First, I tried MemProfiler. I found it really hard to use and not at all user friendly.
Then, I used JustTrace which is easier to use and gives you more details about the objects that are not disposed correctly.
It allowed me to solve the memory leak really easily.
If the leaks you are observing are due to a runaway cache implementation, this is a scenario where you might want to consider the use of WeakReference. This could help to ensure that memory is released when necessary.
However, IMHO it would be better to consider a bespoke solution - only you really know how long you need to keep the objects around, so designing appropriate housekeeping code for your situation is usually the best approach.
I prefer dotmemory from Jetbrains
Big guns - Debugging Tools for Windows
This is an amazing collection of tools. You can analyze both managed and unmanaged heaps with it and you can do it offline. This was very handy for debugging one of our ASP.NET applications that kept recycling due to memory overuse. I only had to create a full memory dump of living process running on production server, all analysis was done offline in WinDbg. (It turned out some developer was overusing in-memory Session storage.)
"If broken it is..." blog has very useful articles on the subject.
After one of my fixes for managed application I had the same thing, like how to verify that my application will not have the same memory leak after my next change, so I've wrote something like Object Release Verification framework, please take a look on the NuGet package ObjectReleaseVerification. You can find a sample here https://github.com/outcoldman/OutcoldSolutions-ObjectReleaseVerification-Sample, and information about this sample http://outcoldman.com/en/blog/show/322
The best thing to keep in mind is to keep track of the references to your objects. It is very easy to end up with hanging references to objects that you don't care about anymore.
If you are not going to use something anymore, get rid of it.
Get used to using a cache provider with sliding expirations, so that if something isn't referenced for a desired time window it is dereferenced and cleaned up. But if it is being accessed a lot it will say in memory.
One of the best tools is using the Debugging Tools for Windows, and taking a memory dump of the process using adplus, then use windbg and the sos plugin to analyze the process memory, threads, and call stacks.
You can use this method for identifying problems on servers too, after installing the tools, share the directory, then connect to the share from the server using (net use) and either take a crash or hang dump of the process.
Then analyze offline.
From Visual Studio 2015 consider to use out of the box Memory Usage diagnostic tool to collect and analyze memory usage data.
The Memory Usage tool lets you take one or more snapshots of the managed and native memory heap to help understand the memory usage impact of object types.
one of the best tools I used its DotMemory.you can use this tool as an extension in VS.after run your app you can analyze every part of memory(by Object, NameSpace, etc) that your app use and take some snapshot of that, Compare it with other SnapShots.
DotMemory