Early finalization and memory leaks in C++/CLI library

Early finalization and memory leaks in C++/CLI library - c#

I'm having issues with finalizers seemingly being called early in a C++/CLI (and C#) project I'm working on. This seems to be a very complex problem and I'm going to be mentioning a lot of different classes and types from the code. Fortunately it's open source, and you can follow along here: Pstsdk.Net (mercurial repository) I've also tried linking directly to the file browser where appropriate, so you can view the code as you read. Most of the code we deal with is in the pstsdk.mcpp folder of the repository.
The code right now is in a fairly hideous state (I'm working on that), and the current version of the code I'm working on is in the Finalization fixes (UNSTABLE!) branch. There are two changesets in that branch, and to understand my long-winded question, we'll need to deal with both. (changesets: ee6a002df36f and a12e9f5ea9fe)
For some background, this project is a C++/CLI wrapper of an unmanaged library written in C++. I am not the coordinator of the project, and there are several design decisions that I disagree with, as I'm sure many of you who look at the code will, but I digress. We wrap much of the layers of original library in the C++/CLI dll, but expose the easy-to-use API in the C# dll. This is done because the intention of the project is to convert the entire library to managed C# code.
If you're able to get the code to compile, you can use this test code to reproduce the problem.
The problem
The latest changeset, entitled moved resource management code to finalizers, to show bug, shows the original problem I was having. Every class in this code is uses the same pattern to free the unmanaged resources. Here is an example (C++/CLI):
DBContext::~DBContext()
{
this->!DBContext();
GC::SuppressFinalize(this);
}
DBContext::!DBContext()
{
if(_pst.get() != nullptr)
_pst.reset(); // _pst is a clr_scoped_ptr (managed type)
// that wraps a shared_ptr<T>.
}
This code has two benefits. First, when a class such as this is in a using statement, the resources are properly freed immediately. Secondly, if a dispose is forgotten by the user, when the GC finally decides to finalize the class, the unmanaged resources will be freed.
Here is the problem with this approach, that I simply cannot get my head around, is that occasionally, the GC will decide to finalize some of the classes that are used to enumerate over data in the file. This happens with many different PST files, and I've been able to determine it has something to do with the Finalize method being called, even though the class is still in use.
I can consistently get it to happen with this file (download)1. The finalizer that gets called early is in the NodeIdCollection class that is in DBAccessor.cpp file. If you are able to run the code that was linked to above (this project can be difficult to setup because of the dependencies on the boost library), the application would fail with an exception, because the _nodes list is set to null and the _db_ pointer was reset as a result of the finalizer running.
1) Are there any glaring problems with the enumeration code in the NodeIdCollection class that would cause the GC to finalize this class while it's still in use?
I've only been able to get the code to run properly with the workaround I've described below.
An unsightly workaround
Now, I was able to work around this problem by moving all of the resource management code from the each of the finalizers (!classname) to the destructors (~classname). This has solved the problem, though it hasn't solved my curiosity of why the classes are finalized early.
However, there is a problem with the approach, and I'll admit that it's more a problem with the design. Due to the heavy use of pointers in the code, nearly every class handles its own resources, and requires each class be disposed. This makes using the enumerations quite ugly (C#):
foreach (var msg in pst.Messages)
{
// If this using statement were removed, we would have
// memory leaks
using (msg)
{
// code here
}
}
The using statement acting on the item in the collection just screams wrong to me, however, with the approach it's very necessary to prevent any memory leaks. Without it, the dispose never gets called and the memory is never freed, even if the dispose method on the pst class is called.
I have every intention trying to change this design. The fundamental problem when this code was first being written, besides the fact that I knew little to nothing about C++/CLI, was that I couldn't put a native class inside of a managed one. I feel it might be possible to use scoped pointers that will free the memory automatically when the class is no longer in use, but I can't be sure if that's a valid way to go about this or if it would even work. So, my second question is:
2) What would be the best way to handle the unmanaged resources in the managed classes in a painless way?
To elaborate, could I replace a native pointer with the clr_scoped_ptr wrapper that was just recently added to the code (clr_scoped_ptr.h from this stackexchange question). Or would I need to wrap the native pointer in something like a scoped_ptr<T> or smart_ptr<T>?
Thank you for reading all of this, I know it was a lot. I hope I've been clear enough so that I might get some insight from people a little more experienced than I am. It's such a large question, I intend on adding a bounty when it allows me too. Hopefully, someone can help.
Thanks!
1This file is part of the freely available enron dataset of PST files

The clr_scoped_ptr is mine, and comes from here.
If it has any errors, please let me know.
Even if my code isn't perfect, using a smart pointer is the correct way to deal with this issue, even in managed code.
You do not need to (and should not) reset a clr_scoped_ptr in your finalizer. Each clr_scoped_ptr will itself be finalized by the runtime.
When using smart pointers, you do not need to write your own destructor or finalizer. The compiler-generated destructor will automatically call destructors on all subobjects, and every subobject finalizer will run when it is collected.
Looking closer at your code, there is indeed an error in NodeIdCollection. GetEnumerator() must return a different enumerator object each time it is called, so that each enumeration would begin at the start of the sequence. You're reusing a single enumerator, meaning that position is shared between successive calls to GetEnumerator(). That's bad.

Refreshing my memory of destructors/finalalisers, from some Microsoft documentation, you could at least simplify your code a little, I think.
Here's my version of your sequence:
DBContext::~DBContext()
{
this->!DBContext();
}
DBContext::!DBContext()
{
delete _pst;
_pst = NULL;
}
The "GC::SupressFinalize" is automatically done by C++/CLI, so no need for that. Since the _pst variable is initialised in the constructor (and deleting a null variable causes no problems anyway), I can't see any reason to complicate the code by using smart pointers.
On a debugging note, I wonder if you can help make the problem more apparent by sprinkling in a few calls to "GC::Collect". That should force finalization on dangling objects for you.
Hope this helps a little,

Related

What are the rules for when dispose() is required?

Although I have been coding for some time, I'm really just barely into what I would call an intermediate level coder. So I understand the principle of dispose(), which is to release memory reserved for variables and/or resources. I have also found sometimes using EF I have to dispose() in order for other operations to work properly. What I don't understand is just exactly what requires a release, when to employ dispose().
For example, we don't dispose variables like string, integer or booleans. But somewhere we cross 'a line' and the variables and/or resources we use need to be disposed. I don't understand where the line is.
Is there a single principle or a few broad principles to apply when knowing when to use dispose()?
I read these SO posts (a specific situation, more about how rather than when ) but I don't feel like I understand the basics of knowing when to use dispose(). One comment I saw asked if memory is released when a variable goes out of scope, and that got my attention because until I saw the response was no, it doesn't get released just because it goes out of scope, I would have thought that it does get released when it goes out of scope. I don't want to be what one person in the 2nd link called a 'clueless developer', although I thought that was a little harsh. Some of us are still learning.
So that's why my question is "What determines when a dispose() is really necessary?"
My question is less one of how and more one of when. Of course comments on how would be useful, but even if the method for invoking dispose() is a Using statement, I still need to know when.
Edit to original question: I know this is a long explanation as the marked as duplicate comment note requests, and it's not a rant, I just don't know how to make sure I get the focus on my precise question. Often times we just trip up over the way we ask something. As I mention at the end of this long text I'll edit all this out after we've gotten focused on my issue, assuming we get there. Based on what I've read, I think it's an important question.
The proposed "answer" post is a great post, but doesn't really answer my question. CodeNotFound's comment below also gives a great link but it doesn't really answer my question either. I've provided comments regarding these posts to try and help refine my precise question:
When should I dispose my objects in .NET?: The first answer starts with a comment that
Disposable objects represent objects holding a valuable resource which the CLR is not intrinsically aware of.
Unfortunately I don't understand what the term "Disposable objects ... the CLR is not intrinsically aware of" includes. That's what I'm asking. How do I know if something falls into the category of what I must dispose? We define things to use in code all the time. When do we cross the line and it becomes an object I need to dispose()? BTW, I noticed the author of that post never marked an answer. I don't know if that means he didn't feel the question was answered or if it was just poor follow up on his part, but hopefully I've refined a little what I hoping to understand. When you look closely at the answers, they're not really addressing the issue of which objects require a developer's action to dispose() of them, or how I might go about knowing how to identify which objects. I just don't know what objects or things I create require that I am responsible for disposing. And I get it that GC and other provisions come into play, but again, that's just the how. What seems clear is that most experienced and professional developers know when something they've created needs to be disposed of. I don't understand how to know that.
Proper use of the IDisposable interface: Clearly a popular answer (1681 upvotes), but the marked answer begins with
The point of Dispose is to free unmanaged resources".
OK, but my question is how do I know by looking at something that it is an unmanaged resource? And I don't understand how the note that follows applies to what needs to be disposed.
If you found it in the .NET framework it's managed. If you went poking around MSDN yourself, it's unmanaged... and you're now responsible for cleaning it up."
I don't understand how to use that type of explanation to categorize what I need to dispose() and what I don't. There's all kinds of stuff in the .net framework; how do I separate out things that require I dispose() of them? What do I look at to tell me I'm responsible for it?
After that, the answer goes on to speak at great length about how to dispose(), but I'm still stuck back at what needs to be disposed. To further complicate the topic for me, that author later says "So now we will...
get rid of unmanaged resources (because we have to), and
get rid of managed resources (because we want to be helpful)
So now I need to consider disposing a whole new set of objects that use memory, and I don't know what those are either. The author of the answer later says
For anyone who likes the style of this answer (explaining the why, so the how becomes obvious)...
I understand the author was suggesting other articles, but the author's suggestion that understanding the "why" makes the "how" obvious isn't really legitimate because what's obvious to one person isn't always obvious to another. And even at that, the author focused more on the why and how, and my question is about when, meaning what needs to be disposed(), as opposed to when I'm done with it. I know when I'm done with things, I just don't know which things I'm responsible for when I'm done with them.
It may be obvious or instinctive to most developers what needs to be disposed(), but it's not obvious to me and I'm sure many others at my stage of experience and I was hoping to get a more focused dialogue on what. Certainly why is useful, but in this case only if the why is attached to a what. For example: You have to dispose a DbContext because the CLR won't dispose it - the because explains why, but in this case, it's the DbContext that is the what that must be disposed.
I was hoping there is a general principle for what must be disposed rather than a long list of specific items which would not be particularly useful for people like me who are looking for simple guidelines.
Again, I get it that memory release is important, and also that a lot of experience and expertise goes into learning why and how, but I'm still left struggling to understand what needs to be disposed. Once I understand what I have to dispose(), then I can begin the struggle to learn how to do it.
So is this still a bad question? I'll edit out all of this explanation later to leave the post more succinct assuming we're able to achieve more focus on what I'm asking.
Final Edit: Although I said above I would edit out what I originally thought to be unnecessary text in the question, I think it better to leave it in. I think the way questions are asked has the potential to help us understand the answer. Even though the answer never changes, if we don't connect the answer to the way we framed the question in our minds, we may not really understand the answer. So if the way this question was framed connects with someone, I encourage you to fully read the post marked as the answer, along with the comments. While the answer in the end was really simple, there's a great deal of history and also context that is important to understand the answer to this question. For clarity's sake, the answer was also edited back and forth over the life of this discussion regarding dispose(). Enjoy...

I understand the principle of dispose(), which is to release memory reserved for variables and/or resources.
You do not understand the purpose of dispose. It is not for releasing memory associated with variables.
What I don't understand is just exactly what requires a release, when to employ dispose().
Dispose anything that implements IDisposable when you are sure that you are done with it.
For example, we don't dispose variables like string, integer or booleans. But somewhere we cross 'a line' and the variables and/or resources we use need to be disposed. I don't understand where the line is.
The line is demarcated for you. When an object implements IDisposable, it should be disposed.
I note that variables are not things that are disposed at all. Objects are disposed. Objects are not variables and variables are not objects. Variables are storage locations for values.
Is there a single principle or a few broad principles to apply when knowing when to use dispose()?
A single principle: dispose when the object is disposable.
I don't feel like I understand the basics of knowing when to use dispose().
Dispose all objects that are disposable.
One comment I saw asked if memory is released when a variable goes out of scope, and that got my attention because until I saw the response was no, it doesn't get released just because it goes out of scope, I would have thought that it does get released when it goes out of scope.
Be careful in your use of language. You are confusing scope and lifetime, and you are confusing variables with the contents of the variables.
First: the scope of a variable is the region of program text in which that variable may be referred to by name. The lifetime of a variable is the period of time during the execution of the program in which the variable is considered to be a root of the garbage collector. Scope is purely a compile-time concept, lifetime is purely a run-time concept.
The connection between scope and lifetime is that a local variable's lifetime is often starting when control enters the scope of the variable, and ending when it leaves. However, various things can change the lifetime of a local, including being closed over, in an iterator block, or in an async method. The jitter optimizer may also shorten or extend the life of a local.
Remember also that a variable is storage, and that it may refer to storage. When the lifetime of a local ends, the storage associated with the local might be reclaimed. But there is no guarantee whatsoever that the storage associated with the thing the local refers to will be reclaimed, at that time or ever.
So that's why my question is "What determines when a dispose() is really necessary?"
Dispose is necessary when the object implements IDisposable. (There are a small number of objects that are disposable that need not be disposed. Tasks, for instance. But as a general rule, if it is disposable, dispose it.)
My question is less one of how and more one of when.
Only dispose a thing when you are done with it. Not before, and not later.
When should I dispose my objects in .NET?
Dispose objects when they implement IDisposable, and you are done using them.
How do I know if something falls into the category of what I must dispose?
When it is implementing IDisposable.
I just don't know what objects or things I create require that I am responsible for disposing.
The disposable ones.
most experienced and professional developers know when something they've created needs to be disposed of. I don't understand how to know that.
They check to see if the object is disposable. If it is , they dispose it.
The point of Dispose is to free unmanaged resources". OK, but my question is how do I know by looking at something that it is an unmanaged resource?
It implements IDisposable.
I don't understand how to use that type of explanation to categorize what I need to dispose() and what I don't. There's all kinds of stuff in the .net framework; how do I separate out things that require I dispose() of them? What do I look at to tell me I'm responsible for it?
Check to see if it is IDisposable.
After that, the answer goes on to speak at great length about how to dispose(), but I'm still stuck back at what needs to be disposed.
Anything that implements IDisposable needs to be disposed.
my question is about when, meaning what needs to be disposed(), as opposed to when I'm done with it. I know when I'm done with things, I just don't know which things I'm responsible for when I'm done with them.
The things that implement IDisposable.
I was hoping there is a general principle for what must be disposed rather than a long list of specific items which would not be particularly useful for people like me who are looking for simple guidelines.
The simple guideline is that you should dispose disposable things.
Again, I get it that memory release is important, and also that a lot of experience and expertise goes into learning why and how, but I'm still left struggling to understand what needs to be disposed. Once I understand what I have to dispose(), then I can begin the struggle to learn how to do it.
Dispose things that implement IDisposable by calling Dispose().
So is this still a bad question?
It is a very repetitive question.
Your patience is a kindness.
Thanks for taking this somewhat silly answer in the humour in which it was intended!
WRT scope != lifetime & variables != objects, very helpful.
These are very commonly confused, and most of the time, it makes little difference. But I find that often people who are struggling to understand a concept are not at all well served by vagueness and imprecision.
In VS is it so simple as looking in Object Browser / Intellisense to see if the object includes Dispose()?
The vast majority of the time, yes.
There are some obscure corner cases. As I already mentioned, the received wisdom from the TPL team is that disposing Task objects is not only unnecessary, but can be counterproductive.
There are also some types that implement IDisposable, but use the "explicit interface implementation" trick to make the "Dispose" method only accessible by casting to IDisposable. In the majority of these cases there is a synonym for Dispose on the object itself, typically called "Close" or some such thing. I don't much like this pattern, but some people use it.
For those objects, the using block will still work. If for some reason you want to explicitly dispose such objects without using using then either (1) call the "Close" method, or whatever it is called, or (2) cast to IDisposable and dispose it.
The general wisdom is: if the object is disposable then it does not hurt to dispose it, and it is a good practice to do so when you're done with it.
The reason being: disposable objects often represent a scarce shared resource. A file, for example, might be opened in a mode that denies the rights of other processes to access that file while you have it opened. It's the polite thing to do to ensure that the file is closed as soon as you're done with it. If one process wanted to use a file, odds are pretty good another one will soon.
Or a disposable might represent something like a graphics object. The operating system will stop giving out new graphics objects if there are more than ten thousand active in a process, so you've got to let them go when you're done with them.
WRT implementing IDisposable #Brian's comment suggests in "normal" coding I likely don't need to. So would I only do that if my class pulled in something unmanaged?
Good question. There are two scenarios in which you should implement IDisposable.
(1) The common scenario: you are writing an object that holds onto another IDisposable object for a long time, and the lifetime of the "inner" object is the same as the lifetime of the "outer" object.
For example: you are implementing a logger class that opens a log file and keeps it open until the log is closed. Now you have a class that is holding onto a disposable, and so it itself should also be disposable.
I note that there is no need in this case for the "outer" object to be finalizable. Just disposable. If for some reason the dispose is never called on the outer object, the finalizer of the inner object will take care of finalization.
(2) the rare scenario: you are implementing a new class which asks the operating system or other external entity for a resource that must be aggressively cleaned up, and the lifetime of that resource is the same as the lifetime of the object holding onto it.
In this exceedingly rare case you should first, ask yourself if there is any way to avoid it. This is a bad situation to be in for a beginner-to-intermediate programmer. You really need to understand how the CLR interacts with unmanaged code in order to get this stuff solid.
If you cannot avoid it, you should prefer to not attempt to implement the disposal and finalization logic yourself, particularly if the unmanaged object is represented by a Windows handle. There should already be wrappers around most of the OS services represented by handles, but if there is not one, what you want to do is carefully study the relationship between IntPtr, SafeHandle and HandleRef. IntPtr, SafeHandle and HandleRef - Explained
If you really truly do need to write the disposal logic for an unmanaged, non-handle-based resource, and the resource requires backstopping the disposal with finalization, then you have a significant engineering challenge.
The standard dispose pattern code may look simple but there are real subtleties to writing correct finalization logic that is robust in the face of error conditions. Remember, a finalizer runs on a different thread and can run on that thread concurrent with the constructor in thread abort scenarios. Writing threadsafe logic which cleans up an object while it is still being constructed on another thread can be extraordinarily difficult and I recommend against trying.
For more on the challenges of writing finalizers, see my series of articles on the subject: http://ericlippert.com/2015/05/18/when-everything-you-know-is-wrong-part-one/
A question you did not ask but I will answer anyways:
Are there scenarios in which I should not be implementing IDisposable?
Yes. Many people implement IDisposable any time they wish to have a coding pattern that has the semantics of:
Make a change in the world
Do stuff in the new world
Revert the change
So, for example, "impersonate an administrator, do some admin tasks, revert to normal user". Or "start handling an event, do stuff when the event happens, stop handling the event". Or "create an in-memory error tracker, do some stuff that might make errors, stop tracking errors". And so on. You get the general pattern.
This is a poor fit for the disposable pattern, but that doesn't stop people from writing up classes that represent no unmanaged resource whatsoever, but still implement IDisposable as though they did.
This opinion puts me in a minority; lots of people have no problem whatsoever with this abuse of the mechanism. But when I see a disposable I think "the author of this class wishes me to be polite and clean up after myself when I am good and ready." But the actual contract of the class is often "you must dispose this at a particular point in the program and if you do not then the rest of the program logic will be wrong until you do". That is not the contract I expect to have to implement when I see a disposable. I expect that I have to make a good-faith effort to clean up a resource at my convenience.

If you have an object that implements IDisposable then you must always explicitly call .Dispose() on that object. If it doesn't implement IDisposable then obviously you don't call .Dispose() because you can't.
If you are writing your own objects then the rule as to whether or not to implement IDisposable is simply this: If your object holds references to unmanaged objects OR if it holds references to objects that implement IDisposable then it must implement IDisposable.
The GC never calls .Dispose() for you. You must always do it - either directly or through a finalizer.
The GC may (most likely, but not always) call a finalizer, so you can write a finalizer to call dispose, but be careful that you implement the disposable pattern correctly, and be certain that you understand that the finalizer may not ever run so if your dispose method does something vitally important it might be better to call .Dispose() directly before you lose the reference to the object.

The garbage collector (GC) guarantees that managed memory resources that are no longer used are released before the memory limit is reached.
Let's break that down:
managed: Loosely, this means resources entirely within .NET/CLR. Memory allocated by non-.NET C++ libraries, for example, are not released by the GC.
memory: The GC only provides a guarantee on memory usage. Many other resource types exist, such as file handles. The GC has no logic to ensure file handles are released expediently.
no longer used: This means that all variables with a reference to that memory have ended their lifetime. As Eric Lippert explained, lifetime != scope.
before the memory limit is reached: The GC monitors "memory pressure" and makes sure to release memory when needed. It uses a bunch of algorithms to decide when it is most efficient to do this, but it's important to note that the GC decides when to release resources all on its own (nondeterministically to your program).
This leaves us with several scenarios when relying on the GC is not appropriate:
The object references unmanaged resources (including referencing a managed object that references an unmanaged resource).
The object references a resource other than memory that must be released.
The object references a resource that must be released at a specific point in the code (deterministically).
In any of these cases, the object should implement IDisposable to handle resource cleanup.
Any object instantiated that implements IDisposable must be cleaned up by either calling Dispose or using the using block (which takes care of calling Dispose for you).

The Dispose pattern can be used to clear both managed and unmanaged resources. If you have unmanaged resources in your class, according to the proper IDisposable implementation you must have both Dispose and Finalize methods.
How do I know what GC knows/doesn't?
GC knows/interested only about managed objects. GC is supposed to clear objects those don’t have any strong references. It doesn’t know about your logics. For a simple and obvious example.
Let’s say you have a MainView instance which is a lasting a long time and you create another LittleView that subscribe to an event in the MainView instance.
Then you close the LittleView and it disappears. You know that you don’t need that LittleView instance anymore. But GC doesn’t know if you still need LittleView or not as there is an active event subscription to the MainWindow.
So, GC will not bother to clear LittleView instance from memory. So, what you should do is unsubscribe the event when you close the view. Then GC knows that there are no strong references to the LittleView and it’s reachable.
The post also complicates things saying managed resources may include
unmanaged resources. Wow. This is getting deeper than I initially
imagined. I'm still looking to easily recognize how to know what needs
to be disposed. Is it the case this is a complicated list w/
conditions and context?
That’s correct, managed objects also can have unmanaged resources. All those managed objects have their finalize method to clear umanaged resources.
Wonder why you need finalize method in addition to the Dispose?
Unmanaged resources can create the most dangerous memory leaks as such memory leaks can hold on to memory until you restart the PC. So, they are very bad.
Let’s say there is managed instance InsA with some unmanaged. InsA has implemented the Dispose method in a way to clear unmanaged resources too. But what will happen if you don’t/forget to call the Dispose method? It will never clear that unmanaged memory. That’s why Finalization has come in to the functionality. So, if you forget/don’t call the Dispose, finalization will gurantee to execute the Dispose in a way that it release unmanaged resources.

How do we distinguish between managed and unmanaged resources in C#? Is TextFieldParser unmanaged?

I recently learned how to use Microsoft.VisualBasic.FileIO.TextFieldParser to parse a text input. In the example given to me, the TextFieldParser is called using the keyword using
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(new StringReader(str)))
Though after some further researches, I notice that the practice of using using keyword for TextFieldParser is not universal.
As far as I understand, .Net Framework has both managed and unmanaged resource. And when we use unmanaged resource, we should worry about memory leak and thus we should take care of the disposal of the unmanaged resource we use. One best way to do this is by putting them on using context.
All these drive me to have two questions in my mind, one particular and one general. Here are my questions:
The particular: Is TextFieldParser actually managed or unmanaged?
The general: Is there a definite way for us to know if a resource is managed or unmanaged(such as looking at X thing in the class, or such alike, or even check something from MSDN - if any should be checked - will do). I was told some guidance along my short programming experience such as that (i) most of the .Net classes are managed, (ii) System.Drawing classes have some unmanaged resources, (iii) beware of all database, network, and COM classes because they are typically unmanaged, etc... and the list goes on which I keep adding till now. But I wonder if there is any definite way to know this?
I would really appreciate if the more experienced could help to further guide me on this issue.

You're missing the point. Whenever any class implements IDisposable, you should call Dispose when you're done with it.
Whether the class uses unmanaged resources or not is internal to the class, you shouldn't care about it at all. Every class that uses unmanaged resources should also have a finalizer that clears up those unmanaged resources if you didn't dispose of the class explicitly. Dispose just allows you to clean its resources (both managed and unmanaged, though this doesn't necessarily mean freeing up the memory immediately) in a more deterministic and immediate fashion. For example, Disposeing a FileStream will release the file handle immediately, while if you don't Dispose (or Close), the file will be opened until the next collection and finalization.
EDIT:
To show that Dispose may also be necessary to clean up managed resources, we only have to look at event handlers. In particular, the case when you're subscribing on an event of a class that has a longer lifetime than you do:
var control = new MyHelperControl();
MyParentForm.Click += control.DoSomething();
Now, even if control goes out of scope, it will survive as long as MyParentForm - it's still referenced by the event handler. The same problem grows to absurd proportions when the parent has the same lifetime as the whole application - that's can be a huge memory leak. An example would be registering event handlers on the main form of an application, or on a static event.
There might also be other things that happen in the Dispose. For example, again with Windows forms, when you call Dispose on a Control, a whole lot of things happen:
All the unmanaged resources are released. Since Winforms controls are wrappers of the native controls to an extent, this is usually a lot of resources - the control itself, any pens, brushes, images... all of those are native resources. They will also be released if you forget a Dispose, since they all have finalizers, but it might take more time - which is especially painful when you create and destroy a lot of those objects. For example, there's a limited supply of GDI+ object handles, and if you run out, you get OutOfMemoryException and you're out.
Dispose of all sub-controls.
Remove any context menu handlers (remember, context menu is a separate component that's only linked to another control).
Remove any data bindings.
Remove itself from the parent container, if any.
If the control is a window with its own message loop, kill the message loop.
The funny part is that the unmanaged resources matter the least, really - they always have a finalizer. The managed resources are trickier, because you're more or less forbidden to handle managed references in a finalizer (because they might have already been released, or they might be in the middle of being released, or they might start being released in the middle of your finalizer... it's complicated). So doing MyParentForm.Click -= this.OnClick; is not a good thing to have in your finalizer - not to mention that it would require you to make every such class finalizable, which isn't exactly free, especially when you expect the finalizer to actually run (when you do a Dispose, the GC is alerted that this instance no longer needs a finalization).

The use of any classes implment IDisposable interface should be either wrapped in using (...) {} or be disposed nicely where appropriate.

This use of GC.SuppressFinalize() doesn't feel right

I have been having some issues with utilizing a vendor library where occasionally an entity calculated by the library would be null when it should always have valid data in it.
The functioning code (after debugging the issue with the vendor) is roughly as follows:
Task.Factory.StartNew(() => ValidateCalibration(pelRectRaw2Ds, crspFeatures, Calibration.Raw2DFromPhys3Ds));
.....
private void ValidateCalibration(List<Rectangle> pelRectRaw2Ds, List<List<3DCrspFeaturesCollection>> crspFeatures, List<3DCameraCalibration> getRaw2DFromPhys3Ds)
{
var calibrationValidator = new 3DCameraCalibrationValidator();
// This is required according to vendor otherwise validationResultsUsingRecomputedExtrinsics is occasionally null after preforming the validation
GC.SuppressFinalize(calibrationValidator);
3DCameraCalibrationValidationResult validationResultUsingOriginalCalibrations;
3DCameraCalibrationValidationResult validationResultsUsingRecomputedExtrinsics;
calibrationValidator.Execute(pelRectRaw2Ds, crspFeatures, getRaw2DFromPhys3Ds, out validationResultUsingOriginalCalibrations, out validationResultsUsingRecomputedExtrinsics);
Calibration.CalibrationValidations.Add(new CalibrationValidation
{
Timestamp = DateTime.Now,
UserName = Globals.InspectionSystemObject.CurrentUserName,
ValidationResultUsingOriginalCalibrations = validationResultUsingOriginalCalibrations,
ValidationResultsUsingRecomputedExtrinsics = validationResultsUsingRecomputedExtrinsics
});
}
The validation process is a fairly time consuming operation so I hand it off to a Task. The problem I had was that originally I did not have the call to GC.SuppressFinalize(calibrationValidator) and when the application was run from a Release build, then the out parameter validationResultsUsingRecomputedExtrinsics would be null. If I ran the application from a Debug build (either with or without the Debugger attached) then validationResultsUsingRecomputedExtrinsics would contain valid data.
I don't fully understand what GC.SuppressFinalize() has done in this situation, or how it has fixed the problem. Everything I can find regarding GC.SuppressFinalize() is that it is used when implementing IDisposable. I can't find any use of it in "standard" code.
How/why does the addition of the call to GC.SuppressFinalize(calibrationValidator) fix this problem?
I understand that without intimate knowledge of the internals of the vendor library, it might not be possible to know for sure, but any insight would help.
The application is compiled with VS2012, targeting .NET 4.0. That vendor library requires that the useLegacyV2RuntimeActivationPolicy="true" option is specified in app.config.
This is the justification I received from the vendor:
The SuppressFinalize command makes sure that the garbage collector will not clean something up “early”. It seems like for some reason your application was sometimes having the garbage collector get a bit zealous and clean up the object before you were truly done with it; it is almost certainly scope related and possibly due to the multi-threading causing confusion on the scope of the calibrationValidator. Below is the response I got from Engineering.
Because the variable was created in the local scope, and that function runs in the background thread, Garbage Collection runs in the main thread, and it seems that the Garbage collection is not smart enough in handling multi-thread situations. Sometimes, it just releases it too early (internal execution of validator not finished yet, and still needs this variable).

This is in all likelihood a hack to solve a premature garbage collection problem. Not uncommon with unmanaged code, typical in camera applications. It is not a healthy hack, good odds that this will cause a resource leak because the finalizer doesn't execute. Wrappers for unmanaged code almost always have something to do in the finalizer, it is very common that they need to release unmanaged memory.
At issue is that the calibrationValidator object can be garbage collected while the unmanaged code is running. Having another thread in your program makes this likely since that other thread can be allocating objects and trigger a GC. This is very easy to miss by the owner of the code while testing, either by never having tested it while using multiple threads or just not getting lucky enough to trigger a GC at the wrong time.
The proper fix on your end is to ensure that the jitter marks the object in use past the call so that the garbage collector won't collect it. You do so by adding GC.KeepAlive(calibrationValidator) after the Execute() call.

When it comes to understanding IDisposable, GC.SuppressFinalize, and finalizers in C#, I don't think a better explanation exists than the following article.
DG Update: Dispose, Finalization, and Resource Management
Alright! Here it is: the revised “Dispose, Finalization, and Resource Management” Design Guideline entry. I mentioned this work previously here and here. At ~25 printed pages, it's not what I would consider to be a minor update. Took me much longer than anticipated, but I'm happy with the result. I got to work with and received good amounts of feedback from HSutter, BrianGru, CBrumme, Jeff Richter, and a couple other folks on it... Good fun.
Key Concept for this question:
It is so obvious that GC.SuppressFinalize() should only be called on this that the article doesn't even mention that directly. It does however mention the practice of wrapping finalizable objects to isolate them from a public API in order to ensure that external code is not able to call GC.SuppressFinalize() on those resources (see the following quote). Whoever designed the library described in the original question has no grasp on the way finalization in .NET works.
Quoted from the blog article:
Even in the absence of one of the rare situations noted above, a finalizable object with a publicly accessible reference could have its finalization suppressed by any arbitrary untrusted caller. Specifically, they can call GC.SuppressFinalize on you and prevent finalization from occurring altogether, including critical finalization. A good mitigation strategy to deal with this is to wrap critical resources in a non-public instance that has a finalizer. So long as you do not leak this to callers, they will not be able to suppress finalization. If you migrate to using SafeHandle in your class and never expose it outside your class, you can guarantee finalization of your resources (with the caveats mentioned above and assuming a correct SafeHandle implementation).

There are some mentions of multithreading or native code being the cause of this issue. But the same thing can happen in a purely managed and mostly single-threaded program.
Consider the following program:
using System;
class Program
{
private static void Main()
{
var outer = new Outer();
Console.WriteLine(outer.GetValue() == null);
}
}
class Outer
{
private Inner m_inner = new Inner();
public object GetValue()
{
return m_inner.GetValue();
}
~Outer()
{
m_inner.Dispose();
}
}
class Inner
{
private object m_value = new object();
public object GetValue()
{
GC.Collect();
GC.WaitForPendingFinalizers();
return m_value;
}
public void Dispose()
{
m_value = null;
}
}
Here, while outer.GetValue() is being called, outer will be garbage collected and finalized (at least in Release mode). The finalizer nulls out the field of the Inner object, which means GetValue() will return null.
In real code, you most likely wouldn't have the GC calls there. Instead you would create some managed object, which (non-deterministically) causes the garbage collector to run.
(I said this code is mostly single-threaded. In fact, the finalizer will run on another thread, but because of the call to WaitForPendingFinalizers(), it's almost as if it ran on the main thread.)

How to find dispose and memory issues? C#

My app was using 150mb of memory not to long ago, now it is at 286mb. It slowly rises so i must be forgetting to dispose something. This isnt much of a problem for me since i have 4gb but i want to send this to others who have only 1gb of ram. other then going through the code line by line how can i find objects that need to be disposed of or just generally large objects?

Extending both JP and Reed's answers.
I wanted to clear up a bit of confusion. If you are seeing significant increases in memory the issue is unlikely to be a problem with calling Dispose. Dispose is typically used to free up unmanaged resources like handles. These don't take up much memory but instead are more precious as resources.
Increases in memory are generally associated with large objects or collections being accessible from a managed object being rooted directly or indirectly via a stack object or a strong GC handle. This is the area you will likely want to focus your investigation on.

Check out the .NET Memory Profiler. There is a 15-day trial and it's well worth the license fee.
Easily identify memory leaks by
collecting and comparing snapshots of
.NET memory Snapshots include data
about the .NET instance allocations
and live instances at the time the
snapshot was collected. They provide a
lot of useful information and make it
easy to identify potential memory
leaks, especially when two snapshots
are compared.

heres a method i use:
http://www.codeproject.com/KB/dotnet/Memory_Leak_Detection.aspx

You can also use WinDbg and SOS. These have the advantage of being free and very, very thorough, if a bit tricky to get used to.
Here's a blog post describing the process.

Here are couple of tricks using the ANTS Memory Profiler to help find undisposed objects and fix the leaks.
The ANTS Memory profiler allows a filter to be set that show just
objects that contain a Dispose() method. Turn this on, and you'll be
given a list of live objects that have not been disposed.
While finding undisposed objects is useful, even more useful is
knowing why these objects are not being disposed. Finding where these
undisposed objects are created goes a long way to finding the
reason for the leak. If you are able to change the code of the leaking object, a
useful trick is to introduce a field to hold a stacktrace, and
populate this field at the time of object construction. Then because
the ANTS memory profiler allows you to inspect fields of objects,
you can simply read off the stacktrace as it was at the time the leaking objects were created. This will give you a strong clue as to who the
owner of the leaking objects should be, and how they should go about calling Dispose on the objects they are responsible for.

Code project currently has a link to an app specifically for finding undisposed objects.
http://www.codeproject.com/KB/dotnet/undisposed.aspx

Check out this link
Stephen Toub goes into great length to explain various techniques for doing this,
Following are some brief highlights from his article
By adding a finalizer for debugging purposes, you can introduce a way to find out when
a class was not properly disposed,if the finalizer is never invoked, you know that
disposed wasn't called
To get additional information about the instance, threadIds etc to aid in narrowing down
which instance didn't have it's disposed invoked, he creates a FinalizationDebgger class
which your disposable class would hold onto which wouldin turn call the Dispose of the
FinalizationDebugger class instance when it itself is disposed. If Dispose isn't called on
your class instance then when the Finalizer runs it will invoke the finalizer for
FinalizationDebgger instance where in you could assert or throw an exception to help debug
the problem,
Move all the tracking related code into a base class which your disposable class would
then inherit from, which makes the code much cleaner. This approach may or may not work
since your burn a base class and if you are inheriting from another base already.
In the last option everything is factored out into a static class that your instances
call into. The FinalizationDebugger becomes a static class that exposes three static
methods: Constructor, Dispose, and Finalizer. The idea is that you call these methods
from the appropriate place in your class(dispose/finalize/constructor).This is minimally
invasive into your code, as it typically involves adding only three lines of code. All
of these methods are marked with a ConditionalAttribute such that they'll only be called
by your class when you compile your class in DEBUG mode.
Stephen also explains the pros and cons of each of his approaches.
The solutions presents various options and you would need to evaluate each to figure out which one works best for your situation. A MUST read IMHO
Hope this helps.

Also try out ANTS Memory Profiler. There's a 14 day fully functional free trial, and I really like the UI.
Disclosure: They sponsor Herding Code right now, which is why I tried it out. But I'm really impressed with it - I profiled an application for 30 hours and got tons of useful information out of it. The UI is really helpful - it guides you through the process, and it looks dang purty.
alt text http://www.red-gate.com/products/ants_memory_profiler/images/object_retention_graph.gif

What strategies and tools are useful for finding memory leaks in .NET?

I wrote C++ for 10 years. I encountered memory problems, but they could be fixed with a reasonable amount of effort.
For the last couple of years I've been writing C#. I find I still get lots of memory problems. They're difficult to diagnose and fix due to the non-determinancy, and because the C# philosophy is that you shouldn't have to worry about such things when you very definitely do.
One particular problem I find is that I have to explicitly dispose and cleanup everything in code. If I don't, then the memory profilers don't really help because there is so much chaff floating about you can't find a leak within all the data they're trying to show you. I wonder if I've got the wrong idea, or if the tool I've got isn't the best.
What kind of strategies and tools are useful for tackling memory leaks in .NET?

I use Scitech's MemProfiler when I suspect a memory leak.
So far, I have found it to be very reliable and powerful. It has saved my bacon on at least one occasion.
The GC works very well in .NET IMO, but just like any other language or platform, if you write bad code, bad things happen.

Just for the forgetting-to-dispose problem, try the solution described in this blog post. Here's the essence:
public void Dispose ()
{
// Dispose logic here ...
// It's a bad error if someone forgets to call Dispose,
// so in Debug builds, we put a finalizer in to detect
// the error. If Dispose is called, we suppress the
// finalizer.
#if DEBUG
GC.SuppressFinalize(this);
#endif
}
#if DEBUG
~TimedLock()
{
// If this finalizer runs, someone somewhere failed to
// call Dispose, which means we've failed to leave
// a monitor!
System.Diagnostics.Debug.Fail("Undisposed lock");
}
#endif

We've used Ants Profiler Pro by Red Gate software in our project. It works really well for all .NET language-based applications.
We found that the .NET Garbage Collector is very "safe" in its cleaning up of in-memory objects (as it should be). It would keep objects around just because we might be using it sometime in the future. This meant we needed to be more careful about the number of objects that we inflated in memory. In the end, we converted all of our data objects over to an "inflate on-demand" (just before a field is requested) in order to reduce memory overhead and increase performance.
EDIT: Here's a further explanation of what I mean by "inflate on demand." In our object model of our database we use Properties of a parent object to expose the child object(s). For example if we had some record that referenced some other "detail" or "lookup" record on a one-to-one basis we would structure it like this:
class ParentObject
Private mRelatedObject as New CRelatedObject
public Readonly property RelatedObject() as CRelatedObject
get
mRelatedObject.getWithID(RelatedObjectID)
return mRelatedObject
end get
end property
End class
We found that the above system created some real memory and performance problems when there were a lot of records in memory. So we switched over to a system where objects were inflated only when they were requested, and database calls were done only when necessary:
class ParentObject
Private mRelatedObject as CRelatedObject
Public ReadOnly Property RelatedObject() as CRelatedObject
Get
If mRelatedObject is Nothing
mRelatedObject = New CRelatedObject
End If
If mRelatedObject.isEmptyObject
mRelatedObject.getWithID(RelatedObjectID)
End If
return mRelatedObject
end get
end Property
end class
This turned out to be much more efficient because objects were kept out of memory until they were needed (the Get method was accessed). It provided a very large performance boost in limiting database hits and a huge gain on memory space.

You still need to worry about memory when you are writing managed code unless your application is trivial. I will suggest two things: first, read CLR via C# because it will help you understand memory management in .NET. Second, learn to use a tool like CLRProfiler (Microsoft). This can give you an idea of what is causing your memory leak (e.g. you can take a look at your large object heap fragmentation)

Are you using unmanaged code? If you are not using unmanaged code, according to Microsoft, memory leaks in the traditional sense are not possible.
Memory used by an application may not be released however, so an application's memory allocation may grow throughout the life of the application.
From How to identify memory leaks in the common language runtime at Microsoft.com
A memory leak can occur in a .NET
Framework application when you use
unmanaged code as part of the
application. This unmanaged code can
leak memory, and the .NET Framework
runtime cannot address that problem.
Additionally, a project may only
appear to have a memory leak. This
condition can occur if many large
objects (such as DataTable objects)
are declared and then added to a
collection (such as a DataSet). The
resources that these objects own may
never be released, and the resources
are left alive for the whole run of
the program. This appears to be a
leak, but actually it is just a
symptom of the way that memory is
being allocated in the program.
For dealing with this type of issue, you can implement IDisposable. If you want to see some of the strategies for dealing with memory management, I would suggest searching for IDisposable, XNA, memory management as game developers need to have more predictable garbage collection and so must force the GC to do its thing.
One common mistake is to not remove event handlers that subscribe to an object. An event handler subscription will prevent an object from being recycled. Also, take a look at the using statement which allows you to create a limited scope for a resource's lifetime.

This blog has some really wonderful walkthroughs using windbg and other tools to track down memory leaks of all types. Excellent reading to develop your skills.

I just had a memory leak in a windows service, that I fixed.
First, I tried MemProfiler. I found it really hard to use and not at all user friendly.
Then, I used JustTrace which is easier to use and gives you more details about the objects that are not disposed correctly.
It allowed me to solve the memory leak really easily.

If the leaks you are observing are due to a runaway cache implementation, this is a scenario where you might want to consider the use of WeakReference. This could help to ensure that memory is released when necessary.
However, IMHO it would be better to consider a bespoke solution - only you really know how long you need to keep the objects around, so designing appropriate housekeeping code for your situation is usually the best approach.

I prefer dotmemory from Jetbrains

Big guns - Debugging Tools for Windows
This is an amazing collection of tools. You can analyze both managed and unmanaged heaps with it and you can do it offline. This was very handy for debugging one of our ASP.NET applications that kept recycling due to memory overuse. I only had to create a full memory dump of living process running on production server, all analysis was done offline in WinDbg. (It turned out some developer was overusing in-memory Session storage.)
"If broken it is..." blog has very useful articles on the subject.

After one of my fixes for managed application I had the same thing, like how to verify that my application will not have the same memory leak after my next change, so I've wrote something like Object Release Verification framework, please take a look on the NuGet package ObjectReleaseVerification. You can find a sample here https://github.com/outcoldman/OutcoldSolutions-ObjectReleaseVerification-Sample, and information about this sample http://outcoldman.com/en/blog/show/322

The best thing to keep in mind is to keep track of the references to your objects. It is very easy to end up with hanging references to objects that you don't care about anymore.
If you are not going to use something anymore, get rid of it.
Get used to using a cache provider with sliding expirations, so that if something isn't referenced for a desired time window it is dereferenced and cleaned up. But if it is being accessed a lot it will say in memory.

One of the best tools is using the Debugging Tools for Windows, and taking a memory dump of the process using adplus, then use windbg and the sos plugin to analyze the process memory, threads, and call stacks.
You can use this method for identifying problems on servers too, after installing the tools, share the directory, then connect to the share from the server using (net use) and either take a crash or hang dump of the process.
Then analyze offline.

From Visual Studio 2015 consider to use out of the box Memory Usage diagnostic tool to collect and analyze memory usage data.
The Memory Usage tool lets you take one or more snapshots of the managed and native memory heap to help understand the memory usage impact of object types.

one of the best tools I used its DotMemory.you can use this tool as an extension in VS.after run your app you can analyze every part of memory(by Object, NameSpace, etc) that your app use and take some snapshot of that, Compare it with other SnapShots.
DotMemory

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.