Does WeakReference make a good cache? - c#

i have a cache that uses WeakReferences to the cached objects to make them automatically removed from the cache in case of memory pressure. My problem is that the cached objects are collected very soon after they have been stored in the cache. The cache runs in a 64-Bit application and in spite of the case that more than 4gig of memory are still available, all the cached objects are collected (they usually are stored in the G2-heap at that moment). There are no garbage collection induced manually as the process explorer shows.
What methods can i apply to make the objects live a litte longer?

Using WeakReferences as the primary means of referencing cached objects is not really a great idea, because as Josh said, your at the mercy of any future behavioral changes to WeakReference and the GC.
However, if your cache needs any kind of resurrection capability, use of WeakReferences for items that are pending purge is useful. When an item meets eviction criteria, rather than immediately evicting it, you change its reference to a weak reference. If anything requests it before it is GC'ed, you restore its strong reference, and the object can live again. I have found this useful for some caches that have hard to predict hit rate patterns with frequent enough "resurrections" to be beneficial.
If you have predictable hit rate patterns, then I would forgoe the WeakReference option and perform explicit evictions.

There is one situation where a WeakReference-based cache may be good: when the usefulness of an item in the class is predicated upon the existence of a reference to it. In such a situation, a weak interning cache may be useful. For example, if one had an application which would deserialize many large immutable objects, many of which were expected to be duplicates, and would have to perform many comparisons between them. If X and Y are references to some immutable class type, testing X.Equals(Y) will be very fast if both variables point to the same instance, but may be very slow if they point to distinct instances that happen to be equal. If a deserialized object happens to match another object to which a reference already exists, fetching a from the dictionary a reference to that latter object (requiring one slow comparison) may expedite future comparisons. On the other hand, if it matched an item in the dictionary but the dictionary was the only reference to that item, there would be little advantage to using the dictionary object instead of simply keeping the object that was read in; probably not enough advantage to justify the cost of the comparison. For an interning cache, having WeakReferences get invalidated as soon as possible once no other references exist to an object would be a good thing.

In .net, a WeakReference is not considered a reference from the GC standpoint at all, so any object that only has weak references will be collected in the next GC run (for the appropriate generation).
That makes weak reference completely inappropriate for caching - as your experience shows.
You need a "real" cache component, and the most important thing about caching is to get one where the eviction policy (that is, the rules about when to drop an object from the cache) are a good match for you application's usage pattern.

No, WeakReference is not good for that because the behavior of the garbage collector can and will change over time and your cache should not be relying on today's behavior. Also many factors outside of your control could affect memory pressure.
There are many implementations of a cache for .NET. You could find probably a dozen on CodePlex. I guess what you need to add to it is something that looks at the application's current working set to use that as a trigger for purging.
One more note about why your objects are being collected so frequently. The GC is very aggressive at cleaning up Gen0 objects. If your objects are very short-lived (up until the only reference to it is a weak reference) then the GC is doing what it's designed to do by cleaning up as quickly as it can.

I believe the problem you are having is that the Garbage Collector removes weakly referenced objects in response not only in response to memory pressure - instead it will do collection quite aggressively sometimes just because the runtime system thinks some objects may likely have become unreachable.
You may be better off using e.g. System.Runtime.Caching.MemoryCache which can be configured with a memory limit, or custom eviction policies for the items.

The answer actually depends on usage characteristics of the cache you are trying to build. I have successfully used WeakReference based caching strategy for improving performance in many of my projects where the cached objects are expected to be used in short bursts of multiple reads. As others pointed out, the weak references are pretty much garbage from GC's point of view and will be collected whenever the next GC cycle is run. It's nothing to do with the memory utilization.
If, however, you need a cache that survives such brutality from GC, you need to use or mimic the functionality provided by System.Runtime.Caching namespace. Keep in mind that you'd need an additional thread that cleans up the cache when the memory usage is crossing your thresholds.

A bit late, but here's a relevant use case:
I need to cache two types of objects: large (deserialised) data files that take 10 minutes to load and cost 15G of ram each, and smaller (dynamically compiled) objects that contain internal references to those data files (the smaller objects are also cached because they take ~10s to generate). These caches are hidden within the factories that supply the objects (the former component having no knowledge of the latter), and have different eviction policies.
When my `data file' cache evicts an object, it replaces it by a weak reference, so if that object is still available when next requested, we can resurrect it (and renew its cache timeout). In this way we avoid losing (or accidentally duplicating) any object before it is truly defunct (i.e. not used anywhere else). Notice that neither cache is required to be aware of the other, and that no other client objects need to be aware that there are any caches at all (eg: we avoid needing 'keepalives', callbacks, registration, retrieve-and-return scopes, etc - things get a lot simpler).
So although using WeakReference by itself (instead of a cache) is a terrible idea (because modern GCs are typically tuned to the size of the L2 CPU cache, and regular code will burn through this many times per minute), it's very useful as a way to hide your caches from the rest of your code.

Related

What are approaches to optimize the mark phase of a non-generational GC?

I am running on Unity's Boehm–Demers–Weiser garbage collector, which is a non-generational GC.
I have a large tree of managed objects in memory (~100k objects, ~200MiB allocation).
These objects are essentially a cache and never go out of scope, so they never actually get sweeped by the GC.
However, because Boehm is non-generational, this stale cache never gets moved up to higher generations. This causes the mark phase to take a very high amount of processing time, as it has to traverse this whole cache on every collection, causing noticeable lag spikes.
This is "by-design", as the Unity documentation puts it:
Crucially, Unity’s garbage collection – which uses the Boehm GC algorithm – is non-generational and non-compacting. “Non-generational” means that the GC must sweep through the entire heap when performing a collection pass, and its performance therefore degrades as the heap expands.
I am well aware of approaches to reduce recurring garbage allocation, however I cannot find any information on how to optimize a large, stale, baseline allocation in a non-generational GC.
More specifically:
Is there any way to mark a root pointer (e.g. static field) as ignored from GC entirely?
Are there some data structure patterns that are faster to traverse in the mark phase?
Conversely, are there known data structure patterns that hinder the mark phase speed?
These questions are just some of my hypotheses to solve this, but I'm open to all suggestions.
One could approximate generational behavior by separating program startup with initialization of static data structures from steady state operation. All pointers into the startup memory region could be ignored while pointers from it should not exist since nothing allocated after the switchpoint (which would be under GC control) has been allocated yet.
One could even GC the startup region once before switching to a new region. Essentially you would end up with a limited form of region-based, non-moving collector where references between regions only happen in one direction.

.Net garbage collection mark phase and huge linked lists

If there is a linked list with 4M+ nodes, does the mark phase needs to traverse the entire list each time to build the graph? Are there any optimizations applied in this case? In the plain sight it doesn't look efficient. Is there a way to verify if GC traverses the entire list or not?
TIA.
Yes, it will need to traverse the whole object graph. I can't think how there could be any optimizations, to be honest... but it doesn't need to do very much on each node. Most of the time will probably be spent waiting on memory, I suspect, as obviously it'll burn through the cache. Of course, by the time the linked list ends up in gen2 (and if you're allocating millions of nodes, most of it will be in gen2 pretty quickly), it will only need to do that very rarely.
If this is the most reasonable data structure for your app, I would use it for the moment, but keep track of the performance hit of garbage collection using Performance Monitor etc. If it turns out to be a problem, you can consider alternative strategies.
What Jon said.
Also, once an object ends up in Gen2, an optimisation that's available (on Windows but not other platforms IIRC) is that the GC can register with the kernel for notifications to a given page of memory. In cases where a page remains unchanged between GC events, some work need not be repeated.
There is one very important optimization being made. The .NET GC is generational, and data in gen2 is only rarely traversed.
With large data structures (such as huge linked lists), most of your data will quickly end up in gen2, where the GC will only rarely access it.
Also, the GC only traverses live data during collections, dead data is collected "for free". So when your list becomes unreachable (or if most of its nodes, but not all, do), then the GC will be able to collect millions of nodes basically for free.

Best time to cull WeakReferences in a collection in .NET

I have a collection (I'm writing a Weak Dictionary) and I need to cull the dead WeakReferences periodically. What I've usually seen is checks in the Add and Remove methods that say, "After X modifications to the collection, it's time to cull." This will be acceptable for me, but it seems like there should be a better way.
I would really like to know when the GC ran and run my cleanup code immediately after. After all, the GC is probably the best mechanism for determining when is a good time to clean up dead references. I found Garbage Collection Notifications, but it doesn't look like this is what I want. I don't want to spawn a separate thread just to monitor the GC. Ideally, my collection would implement IWantToRunCodeDuringGC or subscribe to a System.GC.Collected event. But the .NET framework probably can't trust user code to run during a GC...
Or maybe there's another approach I'm overlooking.
EDIT: I don't think it matters if my code runs after, before, or during the GC.
I guess you want to create a weak dictionary as a way of “lazy” caching of some data.
You need to consider the fact that GC will occur very often and your weak references will be dead most of the time if no other objects will reference them. GC occurs approximately after every 256KB of memory is allocated. It is pretty often.
You probably will be better by implementing your cache as a dictionary with a maximum number of elements. Then you can use least-recently used algorithm or time-based algorithm for pushing elements out of the collection. Usually such approach has better performance and memory consumption than using weak references.
Can't you just use an object with no references, then call some code in the finalizer? The finalizer is called when the GC is collecting the object.
EDIT: Then of course, you wouldn't know when the GC was done.. Hmm.
From the standpoint of avoiding memory leaks, I would suggest that when something is added to a dictionary, you check whether the number of garbage-collections that have been performed. If the number of items that have been added between the last time the dictionary was checked and the time the last collection occurred exceeds a reasonable fraction of the size of the dictionary (say, 10%, or some minimum number of items, whichever is less), that would be a sign that the dictionary should be swept. Note that this approach will limit the number of excessive items in the dictionary to a certain fraction of the dictionary size while offering reasonable performance regardless of dictionary size.

.Net 4 MemoryCache Leaks with Concurrent Garbage Collection

I'm using the new MemoryCache in .Net 4, with a max cache size limit in MB (I've tested it set between 10 and 200MB, on systems with between 1.75 and 8GB of memory). I don't set any time based expiration on the objects, as I'm using the cache simply as a high performance drive, and as long as there is space, I want it used. To my surprise, the cache refused to evict any objects, to the point that I would get SystemOutOfMemory exceptions.
I fired up perfmon, wired up my application to .Net CLR Memory\#Bytes In All Heaps, .Net Memory Cache 4.0, and Process\Private Bytes -- indeed, the memory consumption was out of control, and no cache trims were being registered.
Did some googling and stackoverflowing, downloaded and attached the CLRProfiler, and wham: evictions everywhere! The memory stayed within reasonable bounds based upon the memory size limit I had set. Ran it in debug mode again, no evictions. CLRProfiler again, evictions.
I finally noticed that the profiler forced the application to run without concurrent garbage collection (also see useful SO Concurrent Garbage Collection Question). I turned it off in my app.config, and, sure enough, evictions!
This seems like at best an outrageous lack of documentation to not say: this only works with non-concurrent garbage collection -- though I image since its ported from ASP.NET, they may not have had to worry about concurrent garbage collection.
So has anyone else seen this? I'd love to get some other experiences out there, and maybe some more educated insights.
Update 1
I've reproduced the issue within a single method: it seems that the cache must be written to in parallel for the cache evictions not to fire (in concurrent garbage collection mode). If there is some interest, I'll upload the test code to a public repo. I'm definitely getting toward the deep end of the the CLR/GC/MemoryCache pool, and I think I forgot my floaties...
Update 2
I published test code on CodePlex to reproduce the issue. Also, possibly of interest, the original production code runs in Azure, as a Worker Role. Interesting, changing the GC concurrency setting in the role's app.config has no effect. Possibly Azure overrides GC settings much like ASP.NET? Further, running the test code under WPF vs a Console application will produce slightly different eviction results.
You can "force" a garbage collection right after the problematic method and see if the problem reproduces executing:
System.Threading.Thread.Sleep(200);
GC.Collect();
GC.WaitForPendingFinalizers();
right at the end of the method (make sure that you free any handles to reference objects and null them out). If this prevents memory leakage, and then yes, there may be a runtime bug.
Stop-the-world garbage collection is based on determining whether a strong live reference to an object exists at the moment the world is stopped. Concurrent garbage collection usually determines whether a strong live reference to an object has existed since some particular time in the past. My conjecture would be that many strong references to objects held in WeakReferences are being individually created and discarded. If a stop-the-world garbage collector fires between the time a particular object is created and the time it's discarded, that particular object will be kept alive, but previously-discarded objects will not. By contrast, a concurrent garbage collector may not detect that all strong references an object have been discarded until a certain amount of time goes by without any strong references to that object being created.
I've sometimes wished that .net would offer something between a strong reference and a weak one, which would prevent an object from being wiped from memory, but would not protect it from being finalized or having weak WeakReferences to it invalidated. Such references would slightly complicate the GC process, requiring every object to have separate flags indicating whether strong and quasi-weak references exist to it, and whether it has been scanned for both strong and quasi-weak references, but such a feature could be helpful in many "weak event" scenarios.
I found this entry while searching for a similiar topic and I'm focusing on your Out of Memory exception.
If you put an object in the cache then it still may be referencing other objects and therefore these objects would not be garbage collected -- hence the out of memory exception and probably a CPU being pegged out due to Gen 2 garbage collection.
Are you putting "used" objects on the cache or clones of "used" objects on the cache? If you put a clone on the cache then the "used" object that possible references other objects could be garbage collected.
If you shut off your caching mechanism does your program still run out of memory? If it doesn't run out of memory then that would prove that the objects you would otherwise be putting on the cache are still holding references to other objects hindering garbage collection.
Forcing garbage collection is not a best practice and shouldn't have to be done. In this scenario forcing a garbage collection wouldn't dispose of referenced objects anyway.
MemoryCache definately has some issues. It ate 160Mb of memory on my asp.net server, just changed to simple list and added some extra logic to get the same functionality.

When should weak references be used?

I recently came across a piece of Java code with WeakReferences - I had never seen them deployed although I'd come across them when they were introduced. Is this something that should be routinely used or only when one runs into memory problems? If the latter, can they be easily retrofitted or does the code need serious refactoring? Can the average Java (or C#) programmer generally ignore them?
EDIT Can any damage be done by over-enthusiastic use of WRs?
Weak references are all about garbage collection. A standard object will not "disappear" until all references to it are severed, this means all the references your various objects have to it have to be removed before garbage collection will consider it garbage.
With a weak reference just because your object is referenced by other objects doesn't necessarily mean it's not garbage. It can still get picked up by GC and get removed from memory.
An example: If I have a bunch of Foo objects in my application I might want to use a Set to keep a central record of all the Foo's I have around. But, when other parts of my application remove a Foo object by deleting all references to it, I don't want the remaining reference my Set holds to that object to keep it from being garbage collected! Really I just want it to disappear from my set. This is where you'd use something like a Weak Set (Java has a WeakHashMap) instead, which uses weak references to its members instead of "strong" references.
If your objects aren't being garbage collected when you want them to then you've made an error in your book keeping, something's still holding a reference that you forgot to remove. Using weak references can ease the pain of such book keeping, since you don't have to worry about them keeping an object "alive" and un-garbage-collected, but you don't have to use them.
You use them whenever you want to have a reference to an object without keeping the object alive yourself. This is true for many caching-like features, but also play an important role in event handling, where a subscriber shall not be kept alive by being subscribed to an event.
A small example: A timer event which refreshes some data. Any number of objects can subscribe on the timer in order to get notified, but the bare fact that they subscribed on the timer should not keep them alive. So the timer should have weak references to the objects.
Can any damage be done by
over-enthusiastic use of WRs?
Yes it can.
One concern is that weak references make your code more complicated and potentially error prone. Any code that uses a weak reference needs to deal with the possibility that the reference has been broken each time it uses it. If you over-use weak references you end up writing lots of extra code. (You can mitigate this by hiding each weak reference behind a method that takes care of the checking, and re-creates the discarded object on demand. But this may not necessarily be as simple as that; e.g. if the re-creation process involves network access, you need to cope with the possibility of re-creation failure.)
A second concern is that there are runtime overheads with using weak references. The obvious costs are those of creating weak references and calling get on them. A less obvious cost is that significant extra work needs to be done each time the GC runs.
A final concern is that if you use a weak references for something that your application is highly likely to need in the future, you may incur the cost of repeatedly recreating it. If this cost is high (in terms of CPU time, IO bandwidth, network traffic, whatever) your application may perform badly as a result. You may be better off giving the JVM more memory and not using weak references at all.
Off course, this does not mean you should avoid using weak references entirely. Just that you need to think carefully. And probably you should first run a memory profiler on your application to figure out where your memory usage problems stem from.
A good question to ask when considering use of a WeakReference is how one would feel if the weak reference were invalidated the instant no strong references existed to the object. If that would make the WeakReference less useful, then a WeakReference is probably not the best thing to use. If that would be an improvement over the non-deterministic invalidation that comes from garbage-collection, then a WeakReference is probably the right choice.

Categories