Assume I have a C# method like this: (obviously not real code)
byte[] foo()
{
var a = MethodThatReturns500mbObject();
var b = MethodThatReturns200mbObject(a);
byte[] c = MethodThatReturns150mbByteArray(b);
byte[] d = UnwiselyCopyThatHugeArray(c);
return d;
}
As you can guess by the naming, the objects that are returned by these methods are gigantic. Hundreds of megabytes total RAM required by each, although the first two objects are composed of millions of smaller objects instead of one huge chunk like the latter two arrays.
We're going to optimize this into a streaming solution soon, but in the meantime I'd like to make sure that at least we're not preventing GC of the earlier objects while executing code to produce the later objects.
My question is this: will object a be eligible for GC as soon as MethodThatReturns200mbObject(a) returns? If not, what's the best way to let the GC know that there's a 500MB present waiting for it?
The core of my question is whether the .NET GC's determination of "this object has no references" is smart enough to know that a cannot be referenced after MethodThatReturns200mbObject(a) returns. Even though var a is still theoretically available to later code, a is not referenced anywhere below the second line of the method. In theory, the compiler could let the GC know that a is unreferenced. But in practice, I'm not sure how it behaves. Do you know?
This post explains it with examples.
In theory, the compiler could let the GC know that a is unreferenced. But in practice, I'm not sure how it behaves. Do you know?
The correct answer is that it depends on the project configuration
whether the object will be eligible for garbage collection at the end
of the method. As discussed in When do I need to use GC.KeepAlive?
(which also describes the purpose of GC.KeepAlive – in short, it’s a
way of referencing or “using” a variable making sure that the
optimizer won’t optimize the usage away), the garbage collector might
decide to collect objects as soon as they are not usable by any
executing code anymore. This can very well happen in situations where
it would be valid to access a reference (at compile time), but no such
code has been written.
However, when compiling and executing code in Debug-mode, the compiler
prevents this from happening to ease debugging. As a result, the
correct implementation of our test method includes a preprocessor
directive:
Another good read When do I need to use GC.KeepAlive?
Related
I suspect that the reason this function doesn't exist is that implementing it is complex and that few people need it. To be safe, you'd want pinning to work transitively, i.e., you'd want the entire graph of reachable objects to be pinned. But it doesn't seem like something that fundamentally can't be done.
E.g., suppose you have the following class:
[StructLayout(LayoutKind.Sequential)]
class SomeObject
{
public SomeObject r;
}
which you allocate like:
SomeObject o = new SomeObject();
and you try to pin it with:
GCHandle oh = GCHandle.Alloc(o, GCHandleType.Pinned);
you'll get the dreaded:
Object contains non-primitive or non-blittable data.
OK, fine, I can live with that. But suppose I had access to .NET's garbage collector implementation. What would be the obstacles? Here are the obstacles I see:
Circular references.
You want the garbage collector to limit itself to objects inside the app's heap(s).
It could take a long time.
It would be hard/painful to make the operation atomic.
It seems to me that the GC already has to deal with some of these issues. So what am I forgetting?
NOTE: Before you ask "What are you trying to accomplish?", etc., the purpose of my asking is for research code, not necessarily limited to C#, and not necessarily limited to the CLR. I understand that fiddling with the runtime's own memory is not a typical scenario. In any case, this isn't a purely speculative question.
NOTE 2: Also, I don't care about marshaling. I'm just concerned about pinning.
The GC just knows that whatever you are going to next isn't going to work. You pin memory for a reason, surely it is to obtain a stable IntPtr to the object. Which you then next, say, pass to unmanaged code.
There is however a problem with the content of the pointed-to memory. It contains a pointer to a managed object. That pointer is going to randomly change whenever another thread allocates memory and triggers a collection. Which will play havoc on any code that uses the pinned memory content. There is no way to obtain a stable pointer, you can't "freeze" the collector. Pinning the pointer doesn't work either, it just passes the buck to the next pointed-to object. Hopefully it becomes null sooner or later, it would have to, but GC.Alloc doesn't travel the entire dependency graph to check that, there's no decent upper-bound on how long that can take. Getting an entire generation pinned is possible, that's a very hard deadlock.
Ugly problems, it was just much simpler to forbid it. Not much of a real problem, pinvoke happens everyday anyway.
I'm converting a C# project to C++ and have a question about deleting objects after use. In C# the GC of course takes care of deleting objects, but in C++ it has to be done explicitly using the delete keyword.
My question is, is it ok to just follow each object's usage throughout a method and then delete it as soon as it goes out of scope (ie method end/re-assignment)?
I know though that the GC waits for a certain size of garbage (~1MB) before deleting; does it do this because there is an overhead when using delete?
As this is a game I am creating there will potentially be lots of objects being created and deleted every second, so would it be better to keep track of pointers that go out of scope, and once that size reachs 1MB to then delete the pointers?
(as a side note: later when the game is optimised, objects will be loaded once at startup so there is not much to delete during gameplay)
Your problem is that you are using pointers in C++.
This is a fundamental problem that you must fix, then all your problems go away. As chance would have it, I got so fed up with this general trend that I created a set of presentation slides on this issue. – (CC BY, so feel free to use them).
Have a look at the slides. While they are certainly not entirely serious, the fundamental message is still true: Don’t use pointers. But more accurately, the message should read: Don’t use delete.
In your particular situation you might find yourself with a lot of long-lived small objects. This is indeed a situation which a modern GC handles quite well, and which reference-counting smart pointers (shared_ptr) handle less efficiently. If (and only if!) this becomes a performance problem, consider switching to a small object allocator library.
You should be using RAII as much as possible in C++ so you do not have to explicitly deleteanything anytime.
Once you use RAII through smart pointers and your own resource managing classes every dynamic allocation you make will exist only till there are any possible references to it, You do not have to manage any resources explicitly.
Memory management in C# and C++ is completely different. You shouldn't try to mimic the behavior of .NET's GC in C++. In .NET allocating memory is super fast (basically moving a pointer) whereas freeing it is the heavy task. In C++ allocating memory isn't that lightweight for several reasons, mainly because a large enough chunk of memory has to be found. When memory chunks of different sizes are allocated and freed many times during the execution of the program the heap can get fragmented, containing many small "holes" of free memory. In .NET this won't happen because the GC will compact the heap. Freeing memory in C++ is quite fast, though.
Best practices in .NET don't necessarily work in C++. For example, pooling and reusing objects in .NET isn't recommended most of the time, because the objects get promoted to higher generations by the GC. The GC works best for short lived objects. On the other hand, pooling objects in C++ can be very useful to avoid heap fragmentation. Also, allocating a larger chunk of memory and using placement new can work great for many smaller objects that need to be allocated and freed frequently, as it can occur in games. Read up on general memory management techniques in C++ such as RAII or placement new.
Also, I'd recommend getting the books "Effective C++" and "More effective C++".
Well, the simplest solution might be to just use garbage collection in
C++. The Boehm collector works well, for example. Still, there are
pros and cons (but porting code originally written in C# would be a
likely candidate for a case where the pros largely outweigh the cons.)
Otherwise, if you convert the code to idiomatic C++, there shouldn't be
that many dynamically allocated objects to worry about. Unlike C#, C++
has value semantics by default, and most of your short lived objects
should be simply local variables, possibly copied if they are returned,
but not allocated dynamically. In C++, dynamic allocation is normally
only used for entity objects, whose lifetime depends on external events;
e.g. a Monster is created at some random time, with a probability
depending on the game state, and is deleted at some later time, in
reaction to events which change the game state. In this case, you
delete the object when the monster ceases to be part of the game. In
C#, you probably have a dispose function, or something similar, for
such objects, since they typically have concrete actions which must be
carried out when they cease to exist—things like deregistering as
an Observer, if that's one of the patterns you're using. In C++, this
sort of thing is typically handled by the destructor, and instead of
calling dispose, you call delete the object.
Substituting a shared_ptr in every instance that you use a reference in C# would get you the closest approximation at probably the lowest effort input when converting the code.
However you specifically mention following an objects use through a method and deleteing at the end - a better approach is not to new up the object at all but simply instantiate it inline/on the stack. In fact if you take this approach even for returned objects with the new copy semantics being introduced this becomes an efficient way to deal with returned objects also - so there is no need to use pointers in almost every scenario.
There are a lot more things to take into considerations when deallocating objects than just calling delete whenever it goes out of scope. You have to make sure that you only call delete once and only call it once all pointers to that object have gone out of scope. The garbage collector in .NET handles all of that for you.
The construct that is mostly corresponding to that in C++ is tr1::shared_ptr<> which keeps a reference counter to the object and deallocates when it drops to zero. A first approach to get things running would be to make all C# references in to C++ tr1::shared_ptr<>. Then you can go into those places where it is a performance bottleneck (only after you've verified with a profile that it is an actual bottleneck) and change to more efficient memory handling.
GC feature of c++ has been discussed a lot in SO.
Try Reading through this!!
Garbage Collection in C++
I'm having issues with finalizers seemingly being called early in a C++/CLI (and C#) project I'm working on. This seems to be a very complex problem and I'm going to be mentioning a lot of different classes and types from the code. Fortunately it's open source, and you can follow along here: Pstsdk.Net (mercurial repository) I've also tried linking directly to the file browser where appropriate, so you can view the code as you read. Most of the code we deal with is in the pstsdk.mcpp folder of the repository.
The code right now is in a fairly hideous state (I'm working on that), and the current version of the code I'm working on is in the Finalization fixes (UNSTABLE!) branch. There are two changesets in that branch, and to understand my long-winded question, we'll need to deal with both. (changesets: ee6a002df36f and a12e9f5ea9fe)
For some background, this project is a C++/CLI wrapper of an unmanaged library written in C++. I am not the coordinator of the project, and there are several design decisions that I disagree with, as I'm sure many of you who look at the code will, but I digress. We wrap much of the layers of original library in the C++/CLI dll, but expose the easy-to-use API in the C# dll. This is done because the intention of the project is to convert the entire library to managed C# code.
If you're able to get the code to compile, you can use this test code to reproduce the problem.
The problem
The latest changeset, entitled moved resource management code to finalizers, to show bug, shows the original problem I was having. Every class in this code is uses the same pattern to free the unmanaged resources. Here is an example (C++/CLI):
DBContext::~DBContext()
{
this->!DBContext();
GC::SuppressFinalize(this);
}
DBContext::!DBContext()
{
if(_pst.get() != nullptr)
_pst.reset(); // _pst is a clr_scoped_ptr (managed type)
// that wraps a shared_ptr<T>.
}
This code has two benefits. First, when a class such as this is in a using statement, the resources are properly freed immediately. Secondly, if a dispose is forgotten by the user, when the GC finally decides to finalize the class, the unmanaged resources will be freed.
Here is the problem with this approach, that I simply cannot get my head around, is that occasionally, the GC will decide to finalize some of the classes that are used to enumerate over data in the file. This happens with many different PST files, and I've been able to determine it has something to do with the Finalize method being called, even though the class is still in use.
I can consistently get it to happen with this file (download)1. The finalizer that gets called early is in the NodeIdCollection class that is in DBAccessor.cpp file. If you are able to run the code that was linked to above (this project can be difficult to setup because of the dependencies on the boost library), the application would fail with an exception, because the _nodes list is set to null and the _db_ pointer was reset as a result of the finalizer running.
1) Are there any glaring problems with the enumeration code in the NodeIdCollection class that would cause the GC to finalize this class while it's still in use?
I've only been able to get the code to run properly with the workaround I've described below.
An unsightly workaround
Now, I was able to work around this problem by moving all of the resource management code from the each of the finalizers (!classname) to the destructors (~classname). This has solved the problem, though it hasn't solved my curiosity of why the classes are finalized early.
However, there is a problem with the approach, and I'll admit that it's more a problem with the design. Due to the heavy use of pointers in the code, nearly every class handles its own resources, and requires each class be disposed. This makes using the enumerations quite ugly (C#):
foreach (var msg in pst.Messages)
{
// If this using statement were removed, we would have
// memory leaks
using (msg)
{
// code here
}
}
The using statement acting on the item in the collection just screams wrong to me, however, with the approach it's very necessary to prevent any memory leaks. Without it, the dispose never gets called and the memory is never freed, even if the dispose method on the pst class is called.
I have every intention trying to change this design. The fundamental problem when this code was first being written, besides the fact that I knew little to nothing about C++/CLI, was that I couldn't put a native class inside of a managed one. I feel it might be possible to use scoped pointers that will free the memory automatically when the class is no longer in use, but I can't be sure if that's a valid way to go about this or if it would even work. So, my second question is:
2) What would be the best way to handle the unmanaged resources in the managed classes in a painless way?
To elaborate, could I replace a native pointer with the clr_scoped_ptr wrapper that was just recently added to the code (clr_scoped_ptr.h from this stackexchange question). Or would I need to wrap the native pointer in something like a scoped_ptr<T> or smart_ptr<T>?
Thank you for reading all of this, I know it was a lot. I hope I've been clear enough so that I might get some insight from people a little more experienced than I am. It's such a large question, I intend on adding a bounty when it allows me too. Hopefully, someone can help.
Thanks!
1This file is part of the freely available enron dataset of PST files
The clr_scoped_ptr is mine, and comes from here.
If it has any errors, please let me know.
Even if my code isn't perfect, using a smart pointer is the correct way to deal with this issue, even in managed code.
You do not need to (and should not) reset a clr_scoped_ptr in your finalizer. Each clr_scoped_ptr will itself be finalized by the runtime.
When using smart pointers, you do not need to write your own destructor or finalizer. The compiler-generated destructor will automatically call destructors on all subobjects, and every subobject finalizer will run when it is collected.
Looking closer at your code, there is indeed an error in NodeIdCollection. GetEnumerator() must return a different enumerator object each time it is called, so that each enumeration would begin at the start of the sequence. You're reusing a single enumerator, meaning that position is shared between successive calls to GetEnumerator(). That's bad.
Refreshing my memory of destructors/finalalisers, from some Microsoft documentation, you could at least simplify your code a little, I think.
Here's my version of your sequence:
DBContext::~DBContext()
{
this->!DBContext();
}
DBContext::!DBContext()
{
delete _pst;
_pst = NULL;
}
The "GC::SupressFinalize" is automatically done by C++/CLI, so no need for that. Since the _pst variable is initialised in the constructor (and deleting a null variable causes no problems anyway), I can't see any reason to complicate the code by using smart pointers.
On a debugging note, I wonder if you can help make the problem more apparent by sprinkling in a few calls to "GC::Collect". That should force finalization on dangling objects for you.
Hope this helps a little,
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Garbage Collection: Is it necessary to set large objects to null in a Dispose method?
Does anyone know if explicitly de-referencing an object;
finalResults = null;
gives the garbage collector any more of a nudge to clean up? I have a rather large object (not huge, but big enough that I don't want it hanging around for too long after it's been used)
Would the above help or is it pointless code? I am specifically avoiding programatically talking to the GC itself, I just need to know if the above would act as any sort of prompt/hint to it.
Is finalResults = null; pointless?
Not enough information.
If finalResults is a local variable then it is pointless and potentially even harmful. You're just interfering with the optimizer.
If it is a class-member (property or field) it may be useful. Not very often but if you have a point in time where you can be very sure the value won't be used anymore then it won't hurt to set it to null.
As far as I know, not really.
The main rule the garbage collector uses (to work out if it needs to do a collection) is to work out if there is enough space in the Gen-0 heap to allocate a new object when it's asked for. If it can't allocate the object, it then performs a collection.
Collections are messy and noisy (because of heap compression, promotion from objects from Gen--0 to Gen-1 and Gen-1 to Gen-2) so it's best to leave the GC to worry about it.
The GC will finalize your object when it needs to, so don't worry about it sitting around.
If you're really concerned, then in a debug build try putting a call to
GC.Collect();
Where you'd set the object to Null, and see what affect it has, but really my best advise is to not lose any sleep about it.
Don't spend any time setting variables to null. It does nothing to "nudge" the garbage collector.
The whole point of having a GC is that you don't need to worry about object lifetime.
Setting a variable to null is only of value of the C# compiler can not work out it's self that it is not going to be used again.
In well writen clear code there are very few cases when the c# compiler can not track the last time a local variable is used it's self.
Now if finalResults was a field, it would be a different case.
The answers above are correct, but nobody seems to be making this distinction, so I will:
It depends on how finalResults was declared. If it's a local variable that was declared in a method, then there will be no effect at all; the object it was referencing would be eligible for garbage collection when the method goes out of scope in any case (and it will still be up to the garbage collector to figure out when it wants to clean up).
If, however, finalResults was a class field, or property, then it's a slightly different scenario (although the small "f" seems to suggest that it is not). In this case, the object it forms part of will hold a reference to the object referenced by finalResults, until it can be garbage collected itself (which happens when there's nothing holding a reference to it, in turn). In a situation like this, you may actually want to set it to null, to allow for the object to be eligible for GC earlier (assuming the referencing object is still going to be around for a significant amount of time).
Check the generated IL; you may well find that the compiler knows the variable is never referenced again, so doesn't bother to generate any code for that line, in which case you know there won't be a difference in behaviour.
I need to dispose of an object so it can release everything it owns, but it doesn't implement the IDisposable so I can't use it in a using block. How can I make the garbage collector collect it?
You can force a collection with GC.Collect(). Be very careful using this, since a full collection can take some time. The best-practice is to just let the GC determine when the best time to collect is.
Does the object contain unmanaged resources but does not implement IDisposable? If so, it's a bug.
If it doesn't, it shouldn't matter if it gets released right away, the garbage collector should do the right thing.
If it "owns" anything other than memory, you need to fix the object to use IDisposable. If it's not an object you control this is something worth picking a different vendor over, because it speaks to the core of how well your vendor really understands .Net.
If it does just own memory, even a lot of it, all you have to do is make sure the object goes out of scope. Don't call GC.Collect() — it's one of those things that if you have to ask, you shouldn't do it.
You can't perform garbage collection on a single object. You could request a garbage collection by calling GC.Collect() but this will effect all objects subject to cleanup. It is also highly discouraged as it can have a negative effect on the performance of later collections.
Also, calling Dispose on an object does not clean up it's memory. It only allows the object to remove references to unmanaged resources. For example, calling Dispose on a StreamWriter closes the stream and releases the Windows file handle. The memory for the object on the managed heap does not get reclaimed until a subsequent garbage collection.
Chris Sells also discussed this on .NET Rocks. I think it was during his first appearance but the subject might have been revisited in later interviews.
http://www.dotnetrocks.com/default.aspx?showNum=10
This article by Francesco Balena is also a good reference:
When and How to Use Dispose and Finalize in C#
http://www.devx.com/dotnet/Article/33167/0/page/1
Garbage collection in .NET is non deterministic, meaning you can't really control when it happens. You can suggest, but that doesn't mean it will listen.
Tells us a little bit more about the object and why you want to do this. We can make some suggestions based off of that. Code always helps. And depending on the object, there might be a Close method or something similar. Maybe the useage is to call that. If there is no Close or Dispose type of method, you probably don't want to rely on that object, as you will probably get memory leaks if in fact it does contain resourses which will need to be released.
If the object goes out of scope and it have no external references it will be collected rather fast (likely on the next collection).
BEWARE: of f ra gm enta tion in many cases, GC.Collect() or some IDisposal is not very helpful, especially for large objects (LOH is for objects ~80kb+, performs no compaction and is subject to high levels of fragmentation for many common use cases) which will then lead to out of memory (OOM) issues even with potentially hundreds of MB free. As time marches on, things get bigger, though perhaps not this size (80 something kb) for LOH relegated objects, high degrees of parallelism exasperates this issue due simply due to more objects in less time (and likely varying in size) being instantiated/released.
Array’s are the usual suspects for this problem (it’s also often hard to identify due to non-specific exceptions and assertions from the runtime, something like “high % of large object heap fragmentation” would be swell), the prognosis for code suffering from this problem is to implement an aggressive re-use strategy.
A class in Systems.Collections.Concurrent.ObjectPool from the parallel extensions beta1 samples helps (unfortunately there is not a simple ubiquitous pattern which I have seen, like maybe some attached property/extension methods?), it is simple enough to drop in or re-implement for most projects, you assign a generator Func<> and use Get/Put helper methods to re-use your previous object’s and forgo usual garbage collection. It is usually sufficient to focus on array’s and not the individual array elements.
It would be nice if .NET 4 updated all of the .ToArray() methods everywhere to include .ToArray(T target).
Getting the hang of using SOS/windbg (.loadby sos mscoreei for CLRv4) to analyze this class of issue can help. Thinking about it, the current garbage collection system is more like garbage re-cycling (using the same physical memory again), ObjectPool is analogous to garbage re-using. If anybody remembers the 3 R’s, reducing your memory use is a good idea too, for performance sakes ;)