Are C# weak references in fact soft? - c#

The basic difference is that weak references are supposed to be claimed on each run of the GC (keep memory footprint low) while soft references ought to be kept in memory until the GC actually requires memory (they try to expand lifetime but may fail anytime, which is useful for e.g. caches especially of rather expensive objects).
To my knowledge, there is no clear statement as to how weak references influence the lifetime of an object in .NET. If they are true weak refs they should not influence it at all, but that would also render them pretty useless for their, I believe, main purpose of caching (am I wrong there?). On the other hand, if they act like soft refs, their name is a little misleading.
Personally, I imagine them to behave like soft references, but that is just an impression and not founded.
Implementation details apply, of course. I'm asking about the mentality associated with .NET's weak references - are they able to expand lifetime, or do they behave like true weak refs?
(Despite a number of related questions I could not find an answer to this specific issue yet.)

Are C# weak references in fact soft?
No.
am I wrong there?
You are wrong there. The purpose of weak references is absolutely not caching in the sense that you mean. That is a common misconception.
are they able to expand lifetime, or do they behave like true weak refs?
No, they do not expand lifetime.
Consider the following program (F# code):
do
let x = System.WeakReference(Array.create 0 0)
for i=1 to 10000000 do
ignore(Array.create 0 0)
if x.IsAlive then "alive" else "dead"
|> printfn "Weak reference is %s"
This heap allocates an empty array that is immediately eligible for garbage collection. Then we loop 10M times allocating more unreachable arrays. Note that this does not increase memory pressure at all so there is no motivation to collect the array referred to by the weak reference. Yet the program prints "Weak reference is dead" because it was collected nevertheless. This is the behaviour of a weak reference. A soft reference would have been retained until its memory was actually needed.
Here is another test program (F# code):
open System
let isAlive (x: WeakReference) = x.IsAlive
do
let mutable xs = []
while true do
xs <- WeakReference(Array.create 0 0)::List.filter isAlive xs
printfn "%d" xs.Length
This keeps filtering out dead weak references and prepending a fresh one onto the front of a linked list, printing out the length of the list each time. On my machine, this never exceeds 1,000 surviving weak references. It ramps up and then falls to zero in cycles, presumably because all of the weak references are collected at every gen0 collection. Again, this is the behaviour of a weak reference and not a soft reference.
Note that this behaviour (aggressive collection of weakly referenced objects at gen0 collections) is precisely what makes weak references a bad choice for caches. If you try to use weak references in your cache then you'll find your cache getting flushed a lot for no reason.

I have seen no information that indicates that they would increase the lifetime of the object they point to. And the articles I read about the algorithm the GC uses to determine reachability do not mention them in this way either. So I expect them to have no influence on the lifetime of the object.
Weak
This handle type is used to track an object, but allow it to be collected. When an object is collected, the contents of the GCHandle are zeroed. Weak references are zeroed before the finalizer runs, so even if the finalizer resurrects the object, the Weak reference is still zeroed.
WeakTrackResurrection
This handle type is similar to Weak, but the handle is not zeroed if the object is resurrected during finalization.
http://msdn.microsoft.com/en-us/library/83y4ak54.aspx
There are a few mechanism by which an object that's unreachable can survive a garbage collection.
The generation of the object is larger than the generation of the GC that happened. This is particularly interesting for large objects, which are allocated on the large-object-heap and are always considered Gen2 for this purpose.
Objects with a finalizer and all objects reachable from them survive the GC.
There might be a mechanism where former references from old objects can keep young objects alive, but I'm not sure about that.

Yes
Weak references do not extend the lifespan of an object, thus allowing it to be garbage collected once all strong references have gone out of scope. They can be useful for holding on to large objects that are expensive to initialize but should be avaialble for garabage collection if they are not actively in use.

Related

C# Clearing list before nulling

Today I have seen a piece of code that first seemed odd to me at first glance and made me reconsider. Here is a shortened version of the code:
if(list != null){
list.Clear();
list = null;
}
My thought was, why not replace it simply by:
list = null;
I read a bit and I understand that clearing a list will remove the reference to the objects allowing the GC to do it's thing but will not "resize". The allocated memory for this list stays the same.
On the other side, setting to null would also remove the reference to the list (and thus to its items) also allowing the GC to do it's thing.
So I have been trying to figure out a reason to do it the like the first block. One scenario I thought of is if you have two references to the list. The first block would clear the items in the list so even if the second reference remains, the GC can still clear the memory allocated for the items.
Nonetheless, I feel like there's something weird about this so I would like to know if the scenario I mentioned makes sense?
Also, are there any other scenarios where we would have to Clear() a list right before setting the reference to null?
Finally, if the scenario I mentioned made sense, wouldn't it be better off to just make sure we don't hold multiple references to this list at once and how would we do that (explicitly)?
Edit: I get the difference between Clearing and Nulling the list. I'm mostly curious to know if there is something inside the GC that would make it so that there would be a reason to Clear before Nulling.
The list.Clear() is not necessary in your scenario (where the List is private and only used within the class).
A great intro level link on reachability / live objects is http://levibotelho.com/development/how-does-the-garbage-collector-work :
How does the garbage collector identify garbage?
In Microsoft’s
implementation of the .NET framework the garbage collector determines
if an object is garbage by examining the reference type variables
pointing to it. In the context of the garbage collector, reference
type variables are known as “roots”. Examples of roots include:
A reference on the stack
A reference in a static variable
A reference in another object on the managed heap that is not eligible for garbage
collection
A reference in the form of a local variable in a method
The key bit in this context is A reference in another object on the managed heap that is not eligible for garbage collection. Thus, if the List is eligible to be collected (and the objects within the list aren't referenced elsewhere) then those objects in the List are also eligible to be collected.
In other words, the GC will realise that list and its contents are unreachable in the same pass.
So, is there an instance where list.Clear() would be useful? Yes. It might be useful if you have two references to a single List (e.g. as two fields in two different objects). One of those references may wish to clear the list in a way that the other reference is also impacted - in which list.Clear() is perfect.
This answer started as a comment for Mick, who claims that:
It depends on which version of .NET you are working with. On mobile platforms like Xamarin or mono, you may find that the garbage collector needs this kind of help in order to do its work.
That statement is begging to be fact checked. So, let us see...
.NET
.NET uses a generational mark and sweep garbage collector. You can see the abstract of the algorithm in What happens during a garbage collection
. For summary, it goes over the object graph, and if it cannot reach a object, that one can be erased.
Thus, the garbage collector will correctly identify the items of the list as collectible in the same iteration, regardless of whatever or not you clear the list. There is no need to decouple the objects beforehand.
This means that clearing the list does not help the garbage collector on the regular implementation of .NET.
Note: If there were another reference to the list, then the fact that you cleared the list would be visible.
Mono and Xamarin
Mono
As it turns out, the same is true for Mono.
Xamarin.Android
Also true for Xamarin.Android.
Xamarin.iOS
However, Xamarin.iOS requires additional considerations. In particular, MonoTouch will use wrapped Objective-C objects which are beyond the garbage collector. See Avoid strong circular references under iOS Performance. These objects require different semantics.
Xamarin.iOS will minimize the use of Objetive-C objects by keeping a cache:
C# NSObjects are also created on demand when you invoke a method or a property that returns an NSObject. At this point, the runtime will look into an object cache and determine whether a given Objective-C NSObject has already been surfaced to the managed world or not. If the object has been surfaced, the existing object will be returned, otherwise a constructor that takes an IntPtr as a parameter is invoked to construct the object.
The system keeps these objects alive even there are no references from managed code:
User-subclasses of NSObjects often contain C# state so whenever the Objective-C runtime performs a "retain" operation on one of these objects, the runtime creates a GCHandle that keeps the managed object alive, even if there are no C# visible references to the object. This simplifies bookeeping a lot, since the state will be preserved automatically for you.
Emphasis mine.
Thus, under Xamarin.iOS, if there were a chance that the list might contain wrapped Objetive-C objects, this code would help the garbage collector.
See the question How does memory management works on Xamarin.IOS, Miguel de Icaza explains in his answer that the semantics are to "retain" the object when you take a reference and "release" it when the reference is null.
On the Objetive-C side, "release" does not mean to destroy the object. Objetive-C uses a reference count garbage collector. When we "retain" the object the counter is incremented and when we "release" the counter is decreased. The system destroys the object when the counter reaches zero. See: About Memory Management.
Therefore, Objetive-C is bad at handling circular references (if A references B and B references A, their reference count is not zero, even if they cannot be reached), thus, you should avoid them in Xamarin.iOS. In fact, forgetting to decouple references will lead to leaks in Xamarin.iOS... See: Xamarin iOS memory leaks everywhere.
Others
dotGNU also uses a generational mark and sweep garbage collector.
I also had a look at CrossNet (that compiles IL to C++), it appears they attempted to implement it too. I do not know how good it is.
It depends on which version of .NET you are working with. On mobile platforms like Xamarin or mono, you may find that the garbage collector needs this kind of help in order to do its work. Whereas on desktop platforms the garbage collector implementation may be more elaborate. Each implementation of the CLI out there is going to have it's own implementation of the garbage collector and it is likely to behave differently from one implementation to another.
I can remember 10 years ago working on a Windows Mobile application which had memory issues and this sort of code was the solution. This was probably due to the mobile platform requiring a garbage collector that was more frugal with processing power than the desktop.
Decoupling objects helps simplify the analysis the garbage collector needs to do and helps avoid scenarios where the garbage collector fails to recognise a large graph of objects has actually become disconnected from all the threads in your application. Which results in memory leaks.
Anyone who believes you can't have memory leaks in .NET is an inexperienced .NET developer. On desktop platforms just ensuring Dispose is called on objects which implement them may be enough, however with other implementations you may find it is not.
List.Clear() will decouple the objects in the list from the list and each other.
EDIT: So to be clear I'm not claiming that any particular implementation currently out there is susceptible to memory leaks. And again depending on when this answer is read the robustness of the garbage collector on any implementation of the CLI currently out there could have changed since the time writing this.
Essentially I'm suggesting if you know that your code needs to be cross platform and used across many implementations of the .NET framework, especially implementations of the .NET framework for mobile devices, it could be worth investing time into decoupling objects when they are no longer required. In that case I'd start off by adding decoupling to classes that already implement Dispose, and then if needed look at implementing IDisposable on classes that don't implement IDisposable and ensuring Dispose is called on those classes.
How to tell for sure if it's needed? You need to instrument and monitor the memory usage of your application on each platform it is to be deployed on. Rather than writing lots of superfluous code, I think the best approach is to wait until your monitoring tools indicate you have memory leaks.
As mentioned in the docs:
List.Clear Method (): Count is set to 0, and references to other
objects from elements of the collection are also released.
In your 1st snippet:
if(list != null){
list.Clear();
list = null;
}
If you just set the list to null, it means that you release the reference of your list to the actual object in the memory (so the list itself is remain in the memory) and waiting for the Garbage Collector comes and release its allocated memory.
But the problem is that your list may contain elements that hold a reference to another objects, for example:
list → objectA, objectB, objectC
objectB → objectB1, objectB2
So, after setting the list to null, now list has no reference and it should be collected by Garbage Collector later, but objectB1 and objectB2 has a reference from objectB (still be in the memory) and because of that, Garbage Collector need to analyse the object reference chain. To make it less confusing, this snippet use .Clear() function to remove this confusion.
Clearing the list ensures that if the list is not garbage collected for some reason, then at the very least, the elements it contained can still be disposed of.
As stated in the comments, preventing other references to the list from existing requires careful planning, and clearing the list before nulling it doesn't incur a big enough performance hit to justify trying to avoid doing so.

How does garbage collector decodes that a reference type object is of no use? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
CLR garbage collector actively goes through all objects that have been created and works out if they are being used. But, how does garbage collector decide which object are to be killed and which are in use?
I understand the concept of assigning a null value to object will suffice. But, what if I write only
string obj = new string(new char[] {'a'});
and not null assignment lineobj = null;.
How will garbage collector determine when to clean it?
The CLR Garbage Collector is (at its core) a so-called tracing GC. (The other "big" class of garbage collectors are so-called reference-counting GCs.)
Tracing GCs work by, well recursively "tracing" the set of reachable objects from a set of objects that are already known to be reachable. Here's how that works:
Assume that we already have a set of objects that we know are reachable. For every object in that set, follow all the references (e.g. fields, and also internal pointers such as the class pointer etc.). Those objects are also reachable. Repeat until you have visited all objects at least once. Now you know all reachable objects. (We could say that we have computed the transitive closure with respect to reachability.) All objects that you haven't visited are not reachable and thus eligible for garbage collection.
Now, we just have to figure out how to start this algorithm, i.e. how do get the first set of objects that are known to be reachable. Well, every language usually has a set of objects that are known to be always reachable. This set from which we are starting our trace from, is called the root set. It includes things like:
globals
pointers in CPU registers
objects referenced by local variables on the stack
objects on the stack
Thread-Local Storage
unsafe memory
native memory
VM-internal data structures
the root namespace
…
That's it.
There are, of course, many variations of this theme. The most simple implementation of this tracing idea is called mark-sweep. It has two phases, mark and sweep (duh!) The mark phase is the tracing phase, you trace the reachable objects, and then you set a bit in the object header which says "yep, reachable". In the sweep phase, you collect all objects which don't have the bit set and reset the bit to false in the other objects.
A slight improvement of this scheme is to keep a separate marking table. For one, you don't have to write all over the entire RAM just to set those marking bits (which throws all data out of the cache, for example, and also triggers a copy-on-write if the memory is shared with another process). And secondly, you don't have to visit the reachable objects to reset the marking bit, you can just throw away the marking table after you're done.
The biggest dis-advantage of this scheme is that it leads to memory fragmentation. The biggest advantage is that objects don't move around in memory, which means for example that you can hand out pointers to objects without fear that those pointers may become invalid.
Another, very simple scheme, is Henry Baker's semi-space copying collector. It is called "semi-space" because it always uses at most 50% of the allocated memory. It is also a tracing collector, but it is a copying collector instead of mark-sweep. Instead of marking the objects when visiting them, it copies them over to the empty half of the memory. Afterwards, the old half can be simply freed in constant time.
The advantage is that everytime you copy the objects, they will be neatly tightly packed in memory without holes, so there is no fragmentation. But, they move around in memory, so you cannot just hand out pointers to those objects.
Note: the CLR's Garbage Collectors (it actually has two of them!) are much more complex and sophisticated than those two schemes I presented. They are, however, both tracing GCs.
The second big class of collectors are reference-counting collectors. Instead of tracing references only when a collection occurs, they count references, everytime a reference is created or destroyed. So, when you assign an object to a local variable, or a field, or pass it as an argument, …, the system increments a reference counter in the object header, and everytime you assign a different object to a local variable, or the local variable goes out of scope, or the object that the field belongs to gets GCd, …, the reference is decremented. If the reference count hits 0, there are no more references, and the object is eligible for garbage collection.
The big advantage of this scheme is that you always know exactly when an object becomes unreachable. The big disadvantage is that you can get disconnected cycles whose reference count(s) will never be 0. If you have a reference from A → B, from B → C, from C → A, and from D → B, then A's reference count is 1, B's reference count is 2, C's reference count is 1. If you now remove the reference from D, B's reference count drops to 1, and there is no reference from the rest of the system to either A or B or C, so they are all not reachable, but their reference count will never drop to 0, so they will never be collected.
A third big idea in GC is the Generational Hypothesis:
Objects die young
Older objects don't reference younger objects
As it turns out, for typical systems, this is true for almost all objects. Which means, it makes sense to treat objects differently depending on their age. A generational GC divides the objects into different generations, and has different garbage collection and memory allocation strategies for each. (Let's leave it at that.)
For more information about garbage collection in general, you should read The Garbage Collection Handbook – The art of automatic memory management by Richard Jones, Antony Hosking, Eliot Moss.

C# - Garbage Collection

Ok so I understand about the stack and the heap (values live on the Stack, references on the Heap).
When I declare a new instance of a Class, this lives on the heap, with a reference to this point in memory on the stack. I also know that C# does it's own Garbage Collection (ie. It determines when an instanciated class is no longer in use and reclaims the memory).
I have 2 questions:
Is my understanding of Garbage Collection correct?
Can I do my own? If so is there any real benefit to doing this myself or should I just leave it.
I ask because I have a method in a For loop. Every time I go through a loop, I create a new instance of my Class. In my head I visualise all of these classes lying around in a heap, not doing anything but taking up memory and I want to get rid of them as quickly as I can to keep things neat and tidy!
Am I understanding this correctly or am I missing something?
Ok so I understand about the stack and the heap (values live on the Stack, references on the Heap
I don't think you understand about the stack and the heap. If values live on the stack then where does an array of integers live? Integers are values. Are you telling me that an array of integers keeps its integers on the stack? When you return an array of integers from a method, say, with ten thousand integers in it, are you telling me that those ten thousand integers are copied onto the stack?
Values live on the stack when they live on the stack, and live on the heap when they live on the heap. The idea that the type of a thing has to do with the lifetime of its storage is nonsense. Storage locations that are short lived go on the stack; storage locations that are long lived go on the heap, and that is independent of their type. A long-lived int has to go on the heap, same as a long-lived instance of a class.
When I declare a new instance of a Class, this lives on the heap, with a reference to this point in memory on the stack.
Why does the reference have to go on the stack? Again, the lifetime of the storage of the reference has nothing to do with its type. If the storage of the reference is long-lived then the reference goes on the heap.
I also know that C# does it's own Garbage Collection (ie. It determines when an instanciated class is no longer in use and reclaims the memory).
The C# language does not do so; the CLR does so.
Is my understanding of Garbage Collection correct?
You seem to believe a lot of lies about the stack and the heap, so odds are good no, it's not.
Can I do my own?
Not in C#, no.
I ask because I have a method in a For loop. Every time I go through a loop, I create a new instance of my Class. In my head I visualise all of these classes lying around in a heap, not doing anything but taking up memory and I want to get rid of them as quickly as I can to keep things neat and tidy!
The whole point of garbage collection is to free you from worrying about tidying up. That's why its called "automatic garbage collection". It tidies for you.
If you are worried that your loops are creating collection pressure, and you wish to avoid collection pressure for performance reasons then I advise that you pursue a pooling strategy. It would be wise to start with an explicit pooling strategy; that is:
while(whatever)
{
Frob f = FrobPool.FetchFromPool();
f.Blah();
FrobPool.ReturnToPool(f);
}
rather than attempting to do automatic pooling using a resurrecting finalizer. I advise against both finalizers and object resurrection in general unless you are an expert on finalization semantics.
The pool of course allocates a new Frob if there is not one in the pool. If there is one in the pool, then it hands it out and removes it from the pool until it is put back in. (If you forget to put a Frob back in the pool, the GC will get to it eventually.) By pursuing a pooling strategy you cause the GC to eventually move all the Frobs to the generation 2 heap, instead of creating lots of collection pressure in the generation 0 heap. The collection pressure then disappears because no new Frobs are allocated. If something else is producing collection pressure, the Frobs are all safely in the gen 2 heap where they are rarely visited.
This of course is the exact opposite of the strategy you described; the whole point of the pooling strategy is to cause objects to hang around forever. Objects hanging around forever is a good thing if you're going to use them.
Of course, do not make these sorts of changes before you know via profiling that you have a performance problem due to collection pressure! It is rare to have such a problem on the desktop CLR; it is rather more common on the compact CLR.
More generally, if you are the kind of person who feels uncomfortable having a memory manager clean up for you on its schedule, then C# is not the right language for you. Consider C instead.
values live on the Stack, references on the Heap
This is an implementation detail. There is nothing to stop a .NET Framework from storing both on the stack.
I also know that C# does it's own Garbage Collection
C# has nothing to do with this. This is a service provided by the CLR. VB.NET, F#, etc all still have garbage collection.
The CLR will remove an object from memory if it has no strong roots. For example, when your class instance goes out of scope in your for loop. There will be a few lying around, but they will get collected eventually, either by garbage collection or the program terminating.
Can I do my own? If so is there any real benefit to doing this myself or should I just leave it?
You can use GC.Collect to force a collection. You should not do it because it is an expensive operation. More expensive than letting a few objects occupy memory a little bit longer than they are absolutely needed. The GC is incredibly good at what it does on its own. You will also force short lived objects to promote to generations they wouldn't get normally.
First off, to Erics seminal post about The truth about value types
Secondly on Garbage collection, the collector knows far more about your running program than you do, don't try to second guess it unless you're in the incredibly unlikely situation that you have a memory leak.
So to your second question, no don't try to "help" the GC.
I'll find a post to this effect on the CG and update this answer.
Can I do my own? If so is there any real benefit to doing this myself or should I just leave it.
Yes you can with GC.Collect but you shouldn't. The GC is optimized for variables that are short lived, ones in a method, and variables that are long lived, ones that generally stick around for the life time of the application.
Variables that are in-between aren't as common and aren't really optimum for the GC.
By forcing a GC.Collect you're more likely to cause variables in scope to be in forced into that in-between state which is the opposite from you are trying to accomplish.
Also from the MSDN article Writing High-Performance Managed Applications : A Primer
The GC is self-tuning and will adjust itself according to applications
memory requirements. In most cases programmatically invoking a GC will
hinder that tuning. "Helping" the GC by calling GC.Collect will more
than likely not improve your applications performance
Your understanding of Garbage Collection is good enough. Essentially, an unreferenced instance is deemed as being out-of-scope and no longer needed. Having determined this, the collector will remove an unreferenced object at some future point.
There's no way to force the Garbage Collector to collect just a specific instance. You can ask it to do its normal "collect everything possible" operation GC.Collect(), but you shouldn't.; the garbage-collector is efficient and effective if you just leave it to its own devices.
In particular it excels at collecting objects which have a short lifespan, just like those that are created as temporary objects. You shouldn't have to worry about creating loads of objects in a loop, unless they have a long lifespan that prevents immediate collection.
Please see this related question with regard to the Stack and Heap.
In your specific scenario, agreed, if you new up objects in a for-loop then you're going to have sub-optimal performance. Are the objects stored (or otherwise used) within the loop, or are they discarded? If the latter, can you optimize this by newing up one object outside the loop and re-using it?
With regard to can you implement your own GC, there is no explicit delete keyword in C#, you have to leave it to the CLR. You can however give it hints such as when to collect, or what to ignore during collection, however I'd leave that unless absolutely necessary.
Best regards,
Read the following article by Microsoft to get a level of knowledge about Garbage Collection in C#. I'm sure it'll help anyone who need information regarding this matter.
Memory Management and Garbage Collection in the .NET Framework
If you are interested in performance of some areas in your code when writing C#, you can write unsafe code. You will have a plus of performance, and also, in your fixed block, the garbage collector most likely will not occur.
Garbage collection is basically reference tracking. I can't think of any good reason why you would want to change it. Are you having some sort of problem where you find that memory isn't being freed? Or maybe you are looking for the dispose pattern
Edit:
Replaced "reference counting" with "reference tracking" to not be confused with the Increment/Decrement Counter on object Reference/Dereference (eg from Python).
I thought it was pretty common to refer to the object graph generation as "Counting" like in this answer:
Why no Reference Counting + Garbage Collection in C#?
But I will not pick up the glove of (the) Eric Lippert :)

Does WeakReference make a good cache?

i have a cache that uses WeakReferences to the cached objects to make them automatically removed from the cache in case of memory pressure. My problem is that the cached objects are collected very soon after they have been stored in the cache. The cache runs in a 64-Bit application and in spite of the case that more than 4gig of memory are still available, all the cached objects are collected (they usually are stored in the G2-heap at that moment). There are no garbage collection induced manually as the process explorer shows.
What methods can i apply to make the objects live a litte longer?
Using WeakReferences as the primary means of referencing cached objects is not really a great idea, because as Josh said, your at the mercy of any future behavioral changes to WeakReference and the GC.
However, if your cache needs any kind of resurrection capability, use of WeakReferences for items that are pending purge is useful. When an item meets eviction criteria, rather than immediately evicting it, you change its reference to a weak reference. If anything requests it before it is GC'ed, you restore its strong reference, and the object can live again. I have found this useful for some caches that have hard to predict hit rate patterns with frequent enough "resurrections" to be beneficial.
If you have predictable hit rate patterns, then I would forgoe the WeakReference option and perform explicit evictions.
There is one situation where a WeakReference-based cache may be good: when the usefulness of an item in the class is predicated upon the existence of a reference to it. In such a situation, a weak interning cache may be useful. For example, if one had an application which would deserialize many large immutable objects, many of which were expected to be duplicates, and would have to perform many comparisons between them. If X and Y are references to some immutable class type, testing X.Equals(Y) will be very fast if both variables point to the same instance, but may be very slow if they point to distinct instances that happen to be equal. If a deserialized object happens to match another object to which a reference already exists, fetching a from the dictionary a reference to that latter object (requiring one slow comparison) may expedite future comparisons. On the other hand, if it matched an item in the dictionary but the dictionary was the only reference to that item, there would be little advantage to using the dictionary object instead of simply keeping the object that was read in; probably not enough advantage to justify the cost of the comparison. For an interning cache, having WeakReferences get invalidated as soon as possible once no other references exist to an object would be a good thing.
In .net, a WeakReference is not considered a reference from the GC standpoint at all, so any object that only has weak references will be collected in the next GC run (for the appropriate generation).
That makes weak reference completely inappropriate for caching - as your experience shows.
You need a "real" cache component, and the most important thing about caching is to get one where the eviction policy (that is, the rules about when to drop an object from the cache) are a good match for you application's usage pattern.
No, WeakReference is not good for that because the behavior of the garbage collector can and will change over time and your cache should not be relying on today's behavior. Also many factors outside of your control could affect memory pressure.
There are many implementations of a cache for .NET. You could find probably a dozen on CodePlex. I guess what you need to add to it is something that looks at the application's current working set to use that as a trigger for purging.
One more note about why your objects are being collected so frequently. The GC is very aggressive at cleaning up Gen0 objects. If your objects are very short-lived (up until the only reference to it is a weak reference) then the GC is doing what it's designed to do by cleaning up as quickly as it can.
I believe the problem you are having is that the Garbage Collector removes weakly referenced objects in response not only in response to memory pressure - instead it will do collection quite aggressively sometimes just because the runtime system thinks some objects may likely have become unreachable.
You may be better off using e.g. System.Runtime.Caching.MemoryCache which can be configured with a memory limit, or custom eviction policies for the items.
The answer actually depends on usage characteristics of the cache you are trying to build. I have successfully used WeakReference based caching strategy for improving performance in many of my projects where the cached objects are expected to be used in short bursts of multiple reads. As others pointed out, the weak references are pretty much garbage from GC's point of view and will be collected whenever the next GC cycle is run. It's nothing to do with the memory utilization.
If, however, you need a cache that survives such brutality from GC, you need to use or mimic the functionality provided by System.Runtime.Caching namespace. Keep in mind that you'd need an additional thread that cleans up the cache when the memory usage is crossing your thresholds.
A bit late, but here's a relevant use case:
I need to cache two types of objects: large (deserialised) data files that take 10 minutes to load and cost 15G of ram each, and smaller (dynamically compiled) objects that contain internal references to those data files (the smaller objects are also cached because they take ~10s to generate). These caches are hidden within the factories that supply the objects (the former component having no knowledge of the latter), and have different eviction policies.
When my `data file' cache evicts an object, it replaces it by a weak reference, so if that object is still available when next requested, we can resurrect it (and renew its cache timeout). In this way we avoid losing (or accidentally duplicating) any object before it is truly defunct (i.e. not used anywhere else). Notice that neither cache is required to be aware of the other, and that no other client objects need to be aware that there are any caches at all (eg: we avoid needing 'keepalives', callbacks, registration, retrieve-and-return scopes, etc - things get a lot simpler).
So although using WeakReference by itself (instead of a cache) is a terrible idea (because modern GCs are typically tuned to the size of the L2 CPU cache, and regular code will burn through this many times per minute), it's very useful as a way to hide your caches from the rest of your code.

When should weak references be used?

I recently came across a piece of Java code with WeakReferences - I had never seen them deployed although I'd come across them when they were introduced. Is this something that should be routinely used or only when one runs into memory problems? If the latter, can they be easily retrofitted or does the code need serious refactoring? Can the average Java (or C#) programmer generally ignore them?
EDIT Can any damage be done by over-enthusiastic use of WRs?
Weak references are all about garbage collection. A standard object will not "disappear" until all references to it are severed, this means all the references your various objects have to it have to be removed before garbage collection will consider it garbage.
With a weak reference just because your object is referenced by other objects doesn't necessarily mean it's not garbage. It can still get picked up by GC and get removed from memory.
An example: If I have a bunch of Foo objects in my application I might want to use a Set to keep a central record of all the Foo's I have around. But, when other parts of my application remove a Foo object by deleting all references to it, I don't want the remaining reference my Set holds to that object to keep it from being garbage collected! Really I just want it to disappear from my set. This is where you'd use something like a Weak Set (Java has a WeakHashMap) instead, which uses weak references to its members instead of "strong" references.
If your objects aren't being garbage collected when you want them to then you've made an error in your book keeping, something's still holding a reference that you forgot to remove. Using weak references can ease the pain of such book keeping, since you don't have to worry about them keeping an object "alive" and un-garbage-collected, but you don't have to use them.
You use them whenever you want to have a reference to an object without keeping the object alive yourself. This is true for many caching-like features, but also play an important role in event handling, where a subscriber shall not be kept alive by being subscribed to an event.
A small example: A timer event which refreshes some data. Any number of objects can subscribe on the timer in order to get notified, but the bare fact that they subscribed on the timer should not keep them alive. So the timer should have weak references to the objects.
Can any damage be done by
over-enthusiastic use of WRs?
Yes it can.
One concern is that weak references make your code more complicated and potentially error prone. Any code that uses a weak reference needs to deal with the possibility that the reference has been broken each time it uses it. If you over-use weak references you end up writing lots of extra code. (You can mitigate this by hiding each weak reference behind a method that takes care of the checking, and re-creates the discarded object on demand. But this may not necessarily be as simple as that; e.g. if the re-creation process involves network access, you need to cope with the possibility of re-creation failure.)
A second concern is that there are runtime overheads with using weak references. The obvious costs are those of creating weak references and calling get on them. A less obvious cost is that significant extra work needs to be done each time the GC runs.
A final concern is that if you use a weak references for something that your application is highly likely to need in the future, you may incur the cost of repeatedly recreating it. If this cost is high (in terms of CPU time, IO bandwidth, network traffic, whatever) your application may perform badly as a result. You may be better off giving the JVM more memory and not using weak references at all.
Off course, this does not mean you should avoid using weak references entirely. Just that you need to think carefully. And probably you should first run a memory profiler on your application to figure out where your memory usage problems stem from.
A good question to ask when considering use of a WeakReference is how one would feel if the weak reference were invalidated the instant no strong references existed to the object. If that would make the WeakReference less useful, then a WeakReference is probably not the best thing to use. If that would be an improvement over the non-deterministic invalidation that comes from garbage-collection, then a WeakReference is probably the right choice.

Categories