When an array is subject to Garbage Collection?

When an array is subject to Garbage Collection? - c#

A few years ago I read the book the CLR via C# and the other day I got asked whether an array and still got a bit puzzled, the question was to figure out when the array in the method below is available to garbage collection:
public static double ThingRatio()
{
var input = new [] { 1, 1, 2 ,3, 5 ,8 };
var count = input.Length;
// Let's suppose that the line below is the last use of the array input
var thingCount = CountThings(input);
input = null;
return (double)thingCount / count;
}
According to the answer given here: When is an object subject to garbage collection? which states:
They will both become eligible for collection as soon as they are not
needed anymore. This means that under some circumstances, objects can
be collected even before the end of the scope in which they were
defined. On the other hand, the actual collection might also happen
much later.
I would tend to say that starting after line 6 (i.e. input = null;) the array becomes subject to GC but I am not that sure... (I mean the array is supposedly surely no longer needed after the assignment, but also struggling that it's after the CountThings call but at the same time the array is "needed" for the null assignment).

Remember objects and variables are not the same thing. A variable has a scope to particular method or type, but the object it refers to or used to refer to has no such concept; it's just a blob of memory. If the GC runs after input = null; but before the end of the method, the array is just one more orphaned object. It's not reachable, and therefore eligible for collection.
And "reachable" (rather then "needed" ) is the key word here. The array object is no longer needed after this line: var thingCount = CountThings(input);. However, it's still reachable, and so could not be collected at that point...
We also need to remember it isn't collected right away. It's only eligible to be collected. As a practical matter, I've found the .Net runtime doesn't tend to invoke the GC in the middle of a user method unless it really has to. Generally speaking, it is not needed or helpful to set a variable to null early, and in some rare cases can even be harmful.
Finally, I'll add that the code we read and write is not the same code actually used by the machine. Remember, there is also a compile step to translate all this to IL, and later a JIT process to create the final machine code that really runs. Even concept of one line following next is already an abstraction away from what actually happens. One line may expand to be several lines of actual IL, or in some cases even be re-written to involve all new compiler-generated types as with closures or iterator blocks. So everything here is really only referring to the simple case.

GC Myth: setting an object's reference to null will force the GC to collect it right away.
GC Truth: setting an object's reference to null will sometimes allow the GC to collect it sooner.
Taking part of the blogpost I'm referencing below and applying it to your question, the answer is as follows:
The JIT is usually smart enough to realize that input = null can be optimized away. That leaves CountThings(input) as the last reference to the object. So after that call, the input is no longer used and is removed as a GC Root. That leaves the Object in memory orphaned (no references pointing to it), making it eligible for collection. When the GC actually goes about collecting it, is another matter.
More information to be found at To Null or Not to Null

No object can be garbage-collected while it is recognized as existing. An object will exist in .NET for as along as any reference to it exists or it has a registered finalizer, and will cease to exist once neither condition applies. References in objects will exist as long as the objects themselves exist, and references in automatic variables will exist as long as there is any means via which they will be observed. If the garbage collector detects that the only references to an object with no registered finalizer are held in weak references, those references will be destroyed, causing the object to cease to exist. If the garbage collector detects that the only references to an object with a registered finalizer are held in weak references, any weak references whose "track resurrection" property is false, a reference to the object will be placed in a strongly-rooted list of objects needing "immediate" finalization, and the finalizer will be unregistered (thus allowing it to cease to exist if and when the finalizer reaches a point in execution where no reference to the object could ever be observed).
Note that some sources confuse the triggering of an object's finalizer with garbage-collection, but an object whose finalizer is triggered is guaranteed to continue to exist for at least as long as that finalizer takes to execute, and may continue to exist indefinitely if any references to it exist when the finalizer finishes execution.
In your example, there are three scenarios that could apply, depending upon what CountThings does with the passed-in reference:
If CountThings does not store a copy of the reference anywhere, or any copies of references that it does store get overwritten before input gets overwritten, then it will cease to exist as soon as input gets overwritten or ceases to exist [automatic-duration variables may cease to exist any time a compiler determines that their value will no longer be observed].
If CountThings stores a copy of the reference somewhere that continues to exist after it returns, and the last extant reference is held by something other than a weak reference, then the object will cease to exist as soon as the last reference is destroyed.
If the last existing reference the array ends up being held in a weak reference, the array will continue to exist until the first GC cycle where that is the case, whereupon the weak reference will be cleared, causing the array to cease to exist. Note that the lack of non-weak references to the array will only be relevant when a GC cycle occurs. It is possible (and not particularly uncommon) for a program to store a copy of a reference into a WeakReference, ConditionalWeakTable, or other object holding some form of weak reference, destroy all other copies, and then read out the weak reference to produce a non-weak copy of the reference before the next GC cycle. If that occurs, the system will neither know nor care that there was a time when non non-weak copies of the reference existed. If the GC cycle occurs before the reference gets read out, however, then code which later examines the weak reference will find it blank.
A key observation is that while finalizers and weak references complicate things slightly, the only way in which the GC destroys objects is by invalidating weak forms of references. As far as the GC is concerned, the only kinds of storage that exist when the system isn't actually performing a GC cycle are those used by objects that exist, those used for .NET's internal purposes, and regions of storage that are available to satisfy future allocations. If an object is created, the storage it occupied will cease to be a region of storage available for future allocations. If the object later ceases to exist, the storage that had contained the object will also cease to exist in any form the GC knows about until the next GC cycle. The next GC cycle won't destroy the object (which had already ceased to exist), but will instead add the storage which had contained it back to its list of areas that are available to add future allocations (causing that storage to exist again).

Related

How does garbage collector decodes that a reference type object is of no use? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
CLR garbage collector actively goes through all objects that have been created and works out if they are being used. But, how does garbage collector decide which object are to be killed and which are in use?
I understand the concept of assigning a null value to object will suffice. But, what if I write only
string obj = new string(new char[] {'a'});
and not null assignment lineobj = null;.
How will garbage collector determine when to clean it?

The CLR Garbage Collector is (at its core) a so-called tracing GC. (The other "big" class of garbage collectors are so-called reference-counting GCs.)
Tracing GCs work by, well recursively "tracing" the set of reachable objects from a set of objects that are already known to be reachable. Here's how that works:
Assume that we already have a set of objects that we know are reachable. For every object in that set, follow all the references (e.g. fields, and also internal pointers such as the class pointer etc.). Those objects are also reachable. Repeat until you have visited all objects at least once. Now you know all reachable objects. (We could say that we have computed the transitive closure with respect to reachability.) All objects that you haven't visited are not reachable and thus eligible for garbage collection.
Now, we just have to figure out how to start this algorithm, i.e. how do get the first set of objects that are known to be reachable. Well, every language usually has a set of objects that are known to be always reachable. This set from which we are starting our trace from, is called the root set. It includes things like:
globals
pointers in CPU registers
objects referenced by local variables on the stack
objects on the stack
Thread-Local Storage
unsafe memory
native memory
VM-internal data structures
the root namespace
…
That's it.
There are, of course, many variations of this theme. The most simple implementation of this tracing idea is called mark-sweep. It has two phases, mark and sweep (duh!) The mark phase is the tracing phase, you trace the reachable objects, and then you set a bit in the object header which says "yep, reachable". In the sweep phase, you collect all objects which don't have the bit set and reset the bit to false in the other objects.
A slight improvement of this scheme is to keep a separate marking table. For one, you don't have to write all over the entire RAM just to set those marking bits (which throws all data out of the cache, for example, and also triggers a copy-on-write if the memory is shared with another process). And secondly, you don't have to visit the reachable objects to reset the marking bit, you can just throw away the marking table after you're done.
The biggest dis-advantage of this scheme is that it leads to memory fragmentation. The biggest advantage is that objects don't move around in memory, which means for example that you can hand out pointers to objects without fear that those pointers may become invalid.
Another, very simple scheme, is Henry Baker's semi-space copying collector. It is called "semi-space" because it always uses at most 50% of the allocated memory. It is also a tracing collector, but it is a copying collector instead of mark-sweep. Instead of marking the objects when visiting them, it copies them over to the empty half of the memory. Afterwards, the old half can be simply freed in constant time.
The advantage is that everytime you copy the objects, they will be neatly tightly packed in memory without holes, so there is no fragmentation. But, they move around in memory, so you cannot just hand out pointers to those objects.
Note: the CLR's Garbage Collectors (it actually has two of them!) are much more complex and sophisticated than those two schemes I presented. They are, however, both tracing GCs.
The second big class of collectors are reference-counting collectors. Instead of tracing references only when a collection occurs, they count references, everytime a reference is created or destroyed. So, when you assign an object to a local variable, or a field, or pass it as an argument, …, the system increments a reference counter in the object header, and everytime you assign a different object to a local variable, or the local variable goes out of scope, or the object that the field belongs to gets GCd, …, the reference is decremented. If the reference count hits 0, there are no more references, and the object is eligible for garbage collection.
The big advantage of this scheme is that you always know exactly when an object becomes unreachable. The big disadvantage is that you can get disconnected cycles whose reference count(s) will never be 0. If you have a reference from A → B, from B → C, from C → A, and from D → B, then A's reference count is 1, B's reference count is 2, C's reference count is 1. If you now remove the reference from D, B's reference count drops to 1, and there is no reference from the rest of the system to either A or B or C, so they are all not reachable, but their reference count will never drop to 0, so they will never be collected.
A third big idea in GC is the Generational Hypothesis:
Objects die young
Older objects don't reference younger objects
As it turns out, for typical systems, this is true for almost all objects. Which means, it makes sense to treat objects differently depending on their age. A generational GC divides the objects into different generations, and has different garbage collection and memory allocation strategies for each. (Let's leave it at that.)
For more information about garbage collection in general, you should read The Garbage Collection Handbook – The art of automatic memory management by Richard Jones, Antony Hosking, Eliot Moss.

Scope of Garbage Collector [duplicate]

Consider the below code:
public class Class1
{
public static int c;
~Class1()
{
c++;
}
}
public class Class2
{
public static void Main()
{
{
var c1=new Class1();
//c1=null; // If this line is not commented out, at the Console.WriteLine call, it prints 1.
}
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(Class1.c); // prints 0
Console.Read();
}
}
Now, even though the variable c1 in the main method is out of scope and not referenced further by any other object when GC.Collect() is called, why is it not finalized there?

You are being tripped up here and drawing very wrong conclusions because you are using a debugger. You'll need to run your code the way it runs on your user's machine. Switch to the Release build first with Build + Configuration manager, change the "Active solution configuration" combo in the upper left corner to "Release". Next, go into Tools + Options, Debugging, General and untick the "Suppress JIT optimization" option.
Now run your program again and tinker with the source code. Note how the extra braces have no effect at all. And note how setting the variable to null makes no difference at all. It will always print "1". It now works the way you hope and expected it would work.
Which does leave with the task of explaining why it works so differently when you run the Debug build. That requires explaining how the garbage collector discovers local variables and how that's affected by having a debugger present.
First off, the jitter performs two important duties when it compiles the IL for a method into machine code. The first one is very visible in the debugger, you can see the machine code with the Debug + Windows + Disassembly window. The second duty is however completely invisible. It also generates a table that describes how the local variables inside the method body are used. That table has an entry for each method argument and local variable with two addresses. The address where the variable will first store an object reference. And the address of the machine code instruction where that variable is no longer used. Also whether that variable is stored on the stack frame or a cpu register.
This table is essential to the garbage collector, it needs to know where to look for object references when it performs a collection. Pretty easy to do when the reference is part of an object on the GC heap. Definitely not easy to do when the object reference is stored in a CPU register. The table says where to look.
The "no longer used" address in the table is very important. It makes the garbage collector very efficient. It can collect an object reference, even if it is used inside a method and that method hasn't finished executing yet. Which is very common, your Main() method for example will only ever stop executing just before your program terminates. Clearly you would not want any object references used inside that Main() method to live for the duration of the program, that would amount to a leak. The jitter can use the table to discover that such a local variable is no longer useful, depending on how far the program has progressed inside that Main() method before it made a call.
An almost magic method that is related to that table is GC.KeepAlive(). It is a very special method, it doesn't generate any code at all. Its only duty is to modify that table. It extends the lifetime of the local variable, preventing the reference it stores from getting garbage collected. The only time you need to use it is to stop the GC from being to over-eager with collecting a reference, that can happen in interop scenarios where a reference is passed to unmanaged code. The garbage collector cannot see such references being used by such code since it wasn't compiled by the jitter so doesn't have the table that says where to look for the reference. Passing a delegate object to an unmanaged function like EnumWindows() is the boilerplate example of when you need to use GC.KeepAlive().
So, as you can tell from your sample snippet after running it in the Release build, local variables can get collected early, before the method finished executing. Even more powerfully, an object can get collected while one of its methods runs if that method no longer refers to this. There is a problem with that, it is very awkward to debug such a method. Since you may well put the variable in the Watch window or inspect it. And it would disappear while you are debugging if a GC occurs. That would be very unpleasant, so the jitter is aware of there being a debugger attached. It then modifies the table and alters the "last used" address. And changes it from its normal value to the address of the last instruction in the method. Which keeps the variable alive as long as the method hasn't returned. Which allows you to keep watching it until the method returns.
This now also explains what you saw earlier and why you asked the question. It prints "0" because the GC.Collect call cannot collect the reference. The table says that the variable is in use past the GC.Collect() call, all the way up to the end of the method. Forced to say so by having the debugger attached and by running the Debug build.
Setting the variable to null does have an effect now because the GC will inspect the variable and will no longer see a reference. But make sure you don't fall in the trap that many C# programmers have fallen into, actually writing that code was pointless. It makes no difference whatsoever whether or not that statement is present when you run the code in the Release build. In fact, the jitter optimizer will remove that statement since it has no effect whatsoever. So be sure to not write code like that, even though it seemed to have an effect.
One final note about this topic, this is what gets programmers in trouble that write small programs to do something with an Office app. The debugger usually gets them on the Wrong Path, they want the Office program to exit on demand. The appropriate way to do that is by calling GC.Collect(). But they'll discover that it doesn't work when they debug their app, leading them into never-never land by calling Marshal.ReleaseComObject(). Manual memory management, it rarely works properly because they'll easily overlook an invisible interface reference. GC.Collect() actually works, just not when you debug the app.

[ Just wanted to add further on the Internals of Finalization process ]
You create an object and when the object is garbage collected, the object's Finalize method should be called. But there is more to finalization than this very simple assumption.
CONCEPTS:
Objects not implementing Finalize methods: their memory is reclaimed immediately, unless of course, they are not reachable by application code any more.
Objects implementing Finalize method: the concepts of Application Roots, Finalization Queue, Freachable Queue need to be understood since they are involved in the reclamation process.
Any object is considered garbage if it is not reachable by application code.
Assume: classes/objects A, B, D, G, H do not implement the Finalize method and C, E, F, I, J do implement the Finalize method.
When an application creates a new object, the new operator allocates memory from the heap. If the object's type contains a Finalize method, then a pointer to the object is placed on the finalization queue. Therefore pointers to objects C, E, F, I, J get added to the finalization queue.
The finalization queue is an internal data structure controlled by the garbage collector. Each entry in the queue points to an object that should have its Finalize method called before the object's memory can be reclaimed.
The figure below shows a heap containing several objects. Some of these objects are reachable from the application roots, and some are not. When objects C, E, F, I, and J are created, the .NET framework detects that these objects have Finalize methods and pointers to these objects are added to the finalization queue.
When a GC occurs (1st Collection), objects B, E, G, H, I, and J are determined to be garbage. A,C,D,F are still reachable by application code depicted as arrows from the yellow box above.
The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to the freachable queue ("F-reachable", i.e. finalizer reachable). The freachable queue is another internal data structure controlled by the garbage collector. Each pointer in the freachable queue identifies an object that is ready to have its Finalize method called.
After the 1st GC, the managed heap looks something similar to figure below. Explanation given below:
The memory occupied by objects B, G, and H has been reclaimed immediately because these objects did not have a finalize method that needed to be called.
However, the memory occupied by objects E, I, and J could not be reclaimed because their Finalize method has not been called yet. Calling the Finalize method is done by freachable queue.
A, C, D, F are still reachable by application code depicted as arrows from yellow box above, so they will not be collected in any case.
There is a special runtime thread dedicated to calling Finalize methods. When the freachable queue is empty (which is usually the case), this thread sleeps. But when entries appear, this thread wakes, removes each entry from the queue, and calls each object's Finalize method. The garbage collector compacts the reclaimable memory and the special runtime thread empties the freachable queue, executing each object's Finalize method. So here finally is when your Finalize method gets executed.
The next time the garbage collector is invoked (2nd GC), it sees that the finalized objects are truly garbage, since the application's roots don't point to it and the freachable queue no longer points to it (it's EMPTY too), therefore the memory for the objects E, I, J may be reclaimed from the heap. See figure below and compare it with figure just above.
The important thing to understand here is that two GCs are required to reclaim memory used by objects that require finalization. In reality, more than two collections cab be even required since these objects may get promoted to an older generation.
NOTE: The freachable queue is considered to be a root just like global and static variables are roots. Therefore, if an object is on the freachable queue, then the object is reachable and is not garbage.
As a last note, remember that debugging application is one thing, garbage collection is another thing and works differently. So far you can't feel garbage collection just by debugging applications. If you wish to further investigate memory get started here.

Garbage Collection with For Loop in .Net 4.5/4.5.1 [duplicate]

Consider the below code:
public class Class1
{
public static int c;
~Class1()
{
c++;
}
}
public class Class2
{
public static void Main()
{
{
var c1=new Class1();
//c1=null; // If this line is not commented out, at the Console.WriteLine call, it prints 1.
}
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(Class1.c); // prints 0
Console.Read();
}
}
Now, even though the variable c1 in the main method is out of scope and not referenced further by any other object when GC.Collect() is called, why is it not finalized there?

Are C# weak references in fact soft?

The basic difference is that weak references are supposed to be claimed on each run of the GC (keep memory footprint low) while soft references ought to be kept in memory until the GC actually requires memory (they try to expand lifetime but may fail anytime, which is useful for e.g. caches especially of rather expensive objects).
To my knowledge, there is no clear statement as to how weak references influence the lifetime of an object in .NET. If they are true weak refs they should not influence it at all, but that would also render them pretty useless for their, I believe, main purpose of caching (am I wrong there?). On the other hand, if they act like soft refs, their name is a little misleading.
Personally, I imagine them to behave like soft references, but that is just an impression and not founded.
Implementation details apply, of course. I'm asking about the mentality associated with .NET's weak references - are they able to expand lifetime, or do they behave like true weak refs?
(Despite a number of related questions I could not find an answer to this specific issue yet.)

Are C# weak references in fact soft?
No.
am I wrong there?
You are wrong there. The purpose of weak references is absolutely not caching in the sense that you mean. That is a common misconception.
are they able to expand lifetime, or do they behave like true weak refs?
No, they do not expand lifetime.
Consider the following program (F# code):
do
let x = System.WeakReference(Array.create 0 0)
for i=1 to 10000000 do
ignore(Array.create 0 0)
if x.IsAlive then "alive" else "dead"
|> printfn "Weak reference is %s"
This heap allocates an empty array that is immediately eligible for garbage collection. Then we loop 10M times allocating more unreachable arrays. Note that this does not increase memory pressure at all so there is no motivation to collect the array referred to by the weak reference. Yet the program prints "Weak reference is dead" because it was collected nevertheless. This is the behaviour of a weak reference. A soft reference would have been retained until its memory was actually needed.
Here is another test program (F# code):
open System
let isAlive (x: WeakReference) = x.IsAlive
do
let mutable xs = []
while true do
xs <- WeakReference(Array.create 0 0)::List.filter isAlive xs
printfn "%d" xs.Length
This keeps filtering out dead weak references and prepending a fresh one onto the front of a linked list, printing out the length of the list each time. On my machine, this never exceeds 1,000 surviving weak references. It ramps up and then falls to zero in cycles, presumably because all of the weak references are collected at every gen0 collection. Again, this is the behaviour of a weak reference and not a soft reference.
Note that this behaviour (aggressive collection of weakly referenced objects at gen0 collections) is precisely what makes weak references a bad choice for caches. If you try to use weak references in your cache then you'll find your cache getting flushed a lot for no reason.

I have seen no information that indicates that they would increase the lifetime of the object they point to. And the articles I read about the algorithm the GC uses to determine reachability do not mention them in this way either. So I expect them to have no influence on the lifetime of the object.
Weak
This handle type is used to track an object, but allow it to be collected. When an object is collected, the contents of the GCHandle are zeroed. Weak references are zeroed before the finalizer runs, so even if the finalizer resurrects the object, the Weak reference is still zeroed.
WeakTrackResurrection
This handle type is similar to Weak, but the handle is not zeroed if the object is resurrected during finalization.
http://msdn.microsoft.com/en-us/library/83y4ak54.aspx
There are a few mechanism by which an object that's unreachable can survive a garbage collection.
The generation of the object is larger than the generation of the GC that happened. This is particularly interesting for large objects, which are allocated on the large-object-heap and are always considered Gen2 for this purpose.
Objects with a finalizer and all objects reachable from them survive the GC.
There might be a mechanism where former references from old objects can keep young objects alive, but I'm not sure about that.

Yes
Weak references do not extend the lifespan of an object, thus allowing it to be garbage collected once all strong references have gone out of scope. They can be useful for holding on to large objects that are expensive to initialize but should be avaialble for garabage collection if they are not actively in use.

object lifecycle in List<T>.add(T _element) in C#

I know the add function actually only adds a reference of object _element. My question is, if my list is a global one, by I use add function in a function, so the _element is local as well. Is it true that even after I exit from thst function, the _element that was declared is still on the heap until say when the global List is dead?
Thanks.

Yes, the object pointed too by the reference will now stay alive. More generally, objects almost always outlive the code that creates them, since regular GC is non-deterministic.

This is partially true. The element will stay on the heap as long as their are references (except for weak references) towards it. If the list is "dead" as you state it or the element is removed. The element will be removed from the heap, but not right away.
Even when there are no references refering to the element anymore, we still have to wait for a garbage collection to clean it up. There is a chance that the element moved to a higher generation within the Garbage collector, and therefore even a first level garbage collection wont remove the element from he heap.
Garbage Collection (in .NET) is heavy stuff, i suggest you read http://msdn.microsoft.com/en-us/magazine/bb985010.aspx for more information

Yes, all the while objects are held in that list, memory they hold will not be reclaimed by GC. If you wish to have GC collect objects in static or long living collection, you may be interested to look at WeakReference.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.