I have to create thread safe cache of disposable objects. How i see it:
I have some data class, that i want to cache, ex MyData
I'm creating some collection (ConcurrentDictionary for example) for MyData
I have method that creates new instance of MyData, using some key.
When i need to get MyData for some key i check if it exists in my storage - then use it from collection, else creates new instance and put it into collection
I have some event, when i should invalidate cache. On this event i clear storage.
Probles is that MyData is Disposable. I don't know when should i call Dispose method. I can't call Dispose method when clearing collection on clear-cache event, becouse some thread can use this instances of MyData in a moment.
What pattern should i choose?
I can't see much sense with concurrent collection in the case, when consumers of the cache should be able to complete current operation with the item from cache. You really need a synchronization here.
Most obvious you can do here, is to use ReaderWriteLockSlim and a wrapper around regular Dictionary<TKey, TValue>.
When someone wants to use item from cache, it acquires a read access. When someone wants to modify cache (add an item, or invalidate cache at all), it acquires a write access (hence, writer can't invalidate cache, until last reader won't release the lock).
Another option is to consider approach, when you're just catching ObjectDisposedException. But this approach assumes, that currect operation can be interrupted from the outside.
i think you should encapsulate the cache mechanism and no one should call dispose the object that wraps the dictionary should do it without out side intervention ( a Singleton pattern would be good here, and all access to the dictionary should be locked, on the event you invalidate the cache just lock it again dispose of it and clear it. if you need help with the singleton tell me.
If the objects are logically immutable, are somewhat expensive to create, and are known to consume no non-fungible resources and only a modest quantity of fungible ones; and if tracking all references to them would be impractical, you might consider using cache of short weak references. Such a situation would be one of the few times I might consider it reasonable to "rely" upon finalization, since only one object matching a given key would ever exist outside the "freachable" queue (list of objects needing cleanup) at any given time. One advantage of that approach is that provided you avoid resurrecting objects which have become eligible for finalization, your code shouldn't have to worry too much about threading issues in most cases. You'll have to be a periodically clean out entries whose associated weak references have gone dead, and watch out for the possibility that a cache entry might get updated with a new object after it's been recognized as dead, but the GC should ensure that objects don't get cleaned up while anyone's using them.
Related
I am working on a multi-thread application, where I load data from external feeds and store them in internal collections.
These collections are updated once per X minutes, by loading all data from the external feeds again.
There is no other adding/removing from these collection, just reading.
Normally I would use locking during the updating, same as everywhere I am accessing the collections.
Question:
Do the concurrent collections make my life easier in this case?
Basically I see two approaches
Load the data from external feed and then remove the items which are not present anymore, add the missing, and update the changed - I guess this is a good solution with help of concurrent collection (no locking required, right?), but it require too much code from my side.
Simply override the old collection object with a new one (e.g. _data = new ConcurentBag(newData). Here I am quite sure that using the concurrent collections have no advantage at all, am I right? Locking mechanism is required.
Is there out of the box solution I can use, using the concurrent collections? I would not like to reinvent the wheel again.
Yes, for concurrent collections the locking mechanism is stored inside the collections, so if you new up a collection in place of the old one, that just defeats the purpose. They are mostly used in producer-consumer situations, usually in combination with a BlockingCollection<T>. If your producer does more than just add data, it makes things a bit more complicated.
The benefit to not using concurrent collections is that your locking mechanism no longer depends on the collection - you can have a separate synchronization object that you lock on, and inside the critical section you're free to assign another instance like you wanted.
To answer your question - I don't know of any out-of-the-box mechanism to do what you want, but I wouldn't call using a simple lock statement "reinventing the wheel". That's a bit like saying that using for loops is reinventing the wheel. Just have a separate synchronization object alongside your non-concurrent collection.
Hello and thank you very much for your help!
Does anybody have a good idea to find unreferenced objects of a specific class before garbage collection? (preferable as soon as possible)
In my case, I need to create a lot of small objects of a specific class for temporary use only. The problem is that I don’t know when the object is not needed anymore. I would like to collect the objects of that class which are not referenced any more (as soon as possible) before garbage collection so that I can recycle them and don’t need to create them new. I think that would make the code much faster.
Kind Regards,
David
First off, before you do this you should do extensive profiling to determine that you really, truly do have a performance problem caused by collection pressure. The garbage collector is highly tuned and works quite well most of the time; situations where you need to pool objects for performance reasons are rare.
I actually am in that scenario; we have determined through extensive testing that there are certain objects we use all the time on a temporary basis, ("builders" of other objects, essentially) and that the cost of collection pressure caused by re-allocating them frequently is measurable and high.
What we do is we have a pool class which maintains an array of "blank" objects. When you need a new object, the pool checks the array and returns an object that is in the array if we have one, nulling out the array entry. If we don't have one then it creates a new object. When the temporary user is done with the object, it passes it back to the pool, which "blanks" it and sticks it back in the array. (Growing the array if necessary.)
If a user forgets to put the object back into the pool, or cannot do so because an exception was thrown before the "back in the pool" call, who cares? All we've done in that case is perhaps slightly de-optimized a future allocation. The cost is that you need to remember to put the object back in the pool when you're done with it.
There's no way to "hook" the garbage collector to put stuff back in the pool automatically that I know of.
You can't directly control garbage collection, but you could create a manager class that is responsible for creating, holding the references and disposing of these objects. As long as the manager class is in scope, its objects will not be garbage collected.
In a complex application (involving inversion of control and quite some classes) it is hardly possible to know when a certain object won't be referenced anylonger.
First Question: Suggests the statement above that there is a design flaw in such an application, since there is a pattern saying: "In all OO programming it is about objects using other types of objects to ease up implementation. However: For any object created there should be some owner that will take care of its lifetime."
I assume it is save to state that traditional unmanaged OO programming works like stated above: Some owner will eventually free / release the used object.
However the benefit of a managed language is that in principle you don't have to care about lifetime management anymore. As long an object is referenced anyhow (event-handler...) and from anywhere (maybe not the "owner") it lives and should live, since it is still in use.
I really like that idea and that you don't have to think in terms of owner relationships. However at some point in a program it might get obvious that you want to get rid of an object (or at least mute it in a way as it wouldn't be there).
IStoppable: a suggestion of a design pattern
There could be an interface like "IStoppable", with a "Stop()" method and an "Stopped" event, so that any other object using it can remove their references onto the object. (Therefore would need to unplug their OnStopped event handler within the event handler if that is possible). As a result the object is no longer needed and will get collected.
Maybe it is naive but what i like to believe about that idea is that there wouldn't be an undefined state of the object. Even if some other object missed to unregister itself on OnStopped it will just stay alive and can still get called. Nothing got broken just by removing most references onto it.
I think this pattern can be viewed as an anarchistic app design, since
it is based on the idea that ANY other object can manage the lifetime of an IStoppable
there is no need for an owner
it would be considered as OK to leave the decision of unregistering from an IStoppable to those using it
you don't need to dispose, destroy or throw away - you just stop and let live (let GC do the dirty part)
IDisposable: from scatch and just to check a related pattern:
The disposable pattern suggests that you should still think and work like in unmanaged OO programming: Dispose an object that you don't need anylonger.
using is your friend in a method (very comfortable!)
an own IDisposable implementation is your friend otherwise.
after using it / calling Dispose you shouldn't call it anylonger: undefined behaviour.
implementation and resource centric: it is not so much about when and why, but more about the details of reclaiming resources
So again: In an application where i don't have in mind if anything else but an "owner" is pointing to an object, it is hard to ensure that noone will reference and call it anylonger.
I read of a "Dispose" event in the Component class of .NET. Is there a design pattern around it?
Why would i want to think in terms of Disposables? Why should i?
In a managed world...
Thanks!
Sebastian
I personally don't like the idea of IStoppable, as defined above. You're saying you want any object to manage the lifetime of the object - however, a defined lifecycle really suggests ownership - allowing multiple objects to manage the lifetime of a single object is going to cause issues in the long
IDisposable is, however, a well defined pattern in the .NET world. I wrote an entire series on implementing IDisposable which is a decent introduction to it's usage. However, it's purpose is for handling resource which have an unmanaged component - when you have a managed object that refers to a native resource, it's often desirable to have explicit control of the lifetime of that resource. IDisposable is a defined pattern for handling that situation.
That being said, a proper implementation of IDisposable will still clean up your resources if you fail to call Dispose(). The downside is that the resource will be cleaned up during the object's finalization, which could occur at any arbitrary point after the object is no longer used. This can be very bad for quite a few reasons - especially if you're using native resources that are limited in nature. By not disposing of the resource immediately, you can run out of resources before the GC runs on the object, especially if there isn't a lot of memory pressure in the system.
Ok first I would point out a few things I find uncomfortable about your IStoppable suggestion.
IStoppable raises event Stopped, consumers must know about this and release references. This is a bit complex at best, problematic at worst. Consumers must know where every reference is in order to remove/reset the reference.
You claim "... Nothing got broken just by removing most references onto it.". That entirely depends on the object implementing IStoppable and it's uses. Say, for example, my IStoppable object is an object cache. Now I forget about or ignore the event and suddenly I'm using a different object cache as the rest of the world... maybe that is ok, maybe not.
Events are a horrible way to provide behavior like this due to the fact that exceptions prove difficult to handle. What does it mean when the third out 10 event handlers throws an exception in the IStoppable.Stopped event?
I think what your trying to express is an object that may be 'owned' by many things and can be forcefully released by one? In this case you might consider using a reference counter pattern, more like old-school COM. That of course has issues as well, but they are less of a problem in a managed world.
The issue with a reference counter around an object is that you come back to the idea of an invalid/uninitialized object. One possible way to solve this is to provide the reference counter with a valid 'default' instance (or a factory delegate) to use when all references have been release and someone still wants an instance.
I think you have a misunderstanding of modern OO languages; in particular scope and garbage collection.
The lifetime of the objects are very much controlled by their scope. Whether the scope is limited to a using clause, a method, or even the appdomain.
Although you don't necessarily "care" about the lifetime of the object, the compiler does and will set it aside for garbage collection as soon as it goes out of scope.
You can speed up that process by purposely telling the garbage collector to run now, but that's usually a pointless exercise as the compiler will optimize the code to do so at the most opportune time anyway.
If you are talking about objects in multi-threaded applications, these already expose mechanisms to stop their execution or otherwise kill them on demand.
Which leaves us with unmanaged resources. For those, the wrapper should implement IDisposable. I'll skip talking about it as Reed Copsey has already covered that ground nicely.
While there are times a Disposed event (like the one used by Windows Forms) can be useful, events do add a fair bit of overhead. In cases where an object will keep all the IDisposables it ever owns until it's disposed (a common situation) it may be better to keep a List(Of IDisposable) and have a private function "T RegDisp<T>(T obj) where T:IDisposable" which will add an object to the disposables list and return it. Instead of setting a field to SomeDisposable, set it to RegDisp(SomeDisposable). Note that in VB, provided all constructor calls are wrapped in factory methods, it's possible to safely use RegDisp() within field initializers, but that cannot be done in C#.
Incidentally, if an IDisposable's constructor accepts an IDisposable as a parameter, it may often be helpful to have it accept a Boolean indicating whether or not ownership of that object will be transferred. If a possibly-owned IDisposable will be exposed in a mutable property (e.g. PictureBox.Image) the property itself should be read-only, with a setter method that accepts an ownership flag. Calling the set method when the object owns the old object should Dispose the old object before setting the new one. Using that approach will eliminate much of the need for a Disposed event.
i have a cache that uses WeakReferences to the cached objects to make them automatically removed from the cache in case of memory pressure. My problem is that the cached objects are collected very soon after they have been stored in the cache. The cache runs in a 64-Bit application and in spite of the case that more than 4gig of memory are still available, all the cached objects are collected (they usually are stored in the G2-heap at that moment). There are no garbage collection induced manually as the process explorer shows.
What methods can i apply to make the objects live a litte longer?
Using WeakReferences as the primary means of referencing cached objects is not really a great idea, because as Josh said, your at the mercy of any future behavioral changes to WeakReference and the GC.
However, if your cache needs any kind of resurrection capability, use of WeakReferences for items that are pending purge is useful. When an item meets eviction criteria, rather than immediately evicting it, you change its reference to a weak reference. If anything requests it before it is GC'ed, you restore its strong reference, and the object can live again. I have found this useful for some caches that have hard to predict hit rate patterns with frequent enough "resurrections" to be beneficial.
If you have predictable hit rate patterns, then I would forgoe the WeakReference option and perform explicit evictions.
There is one situation where a WeakReference-based cache may be good: when the usefulness of an item in the class is predicated upon the existence of a reference to it. In such a situation, a weak interning cache may be useful. For example, if one had an application which would deserialize many large immutable objects, many of which were expected to be duplicates, and would have to perform many comparisons between them. If X and Y are references to some immutable class type, testing X.Equals(Y) will be very fast if both variables point to the same instance, but may be very slow if they point to distinct instances that happen to be equal. If a deserialized object happens to match another object to which a reference already exists, fetching a from the dictionary a reference to that latter object (requiring one slow comparison) may expedite future comparisons. On the other hand, if it matched an item in the dictionary but the dictionary was the only reference to that item, there would be little advantage to using the dictionary object instead of simply keeping the object that was read in; probably not enough advantage to justify the cost of the comparison. For an interning cache, having WeakReferences get invalidated as soon as possible once no other references exist to an object would be a good thing.
In .net, a WeakReference is not considered a reference from the GC standpoint at all, so any object that only has weak references will be collected in the next GC run (for the appropriate generation).
That makes weak reference completely inappropriate for caching - as your experience shows.
You need a "real" cache component, and the most important thing about caching is to get one where the eviction policy (that is, the rules about when to drop an object from the cache) are a good match for you application's usage pattern.
No, WeakReference is not good for that because the behavior of the garbage collector can and will change over time and your cache should not be relying on today's behavior. Also many factors outside of your control could affect memory pressure.
There are many implementations of a cache for .NET. You could find probably a dozen on CodePlex. I guess what you need to add to it is something that looks at the application's current working set to use that as a trigger for purging.
One more note about why your objects are being collected so frequently. The GC is very aggressive at cleaning up Gen0 objects. If your objects are very short-lived (up until the only reference to it is a weak reference) then the GC is doing what it's designed to do by cleaning up as quickly as it can.
I believe the problem you are having is that the Garbage Collector removes weakly referenced objects in response not only in response to memory pressure - instead it will do collection quite aggressively sometimes just because the runtime system thinks some objects may likely have become unreachable.
You may be better off using e.g. System.Runtime.Caching.MemoryCache which can be configured with a memory limit, or custom eviction policies for the items.
The answer actually depends on usage characteristics of the cache you are trying to build. I have successfully used WeakReference based caching strategy for improving performance in many of my projects where the cached objects are expected to be used in short bursts of multiple reads. As others pointed out, the weak references are pretty much garbage from GC's point of view and will be collected whenever the next GC cycle is run. It's nothing to do with the memory utilization.
If, however, you need a cache that survives such brutality from GC, you need to use or mimic the functionality provided by System.Runtime.Caching namespace. Keep in mind that you'd need an additional thread that cleans up the cache when the memory usage is crossing your thresholds.
A bit late, but here's a relevant use case:
I need to cache two types of objects: large (deserialised) data files that take 10 minutes to load and cost 15G of ram each, and smaller (dynamically compiled) objects that contain internal references to those data files (the smaller objects are also cached because they take ~10s to generate). These caches are hidden within the factories that supply the objects (the former component having no knowledge of the latter), and have different eviction policies.
When my `data file' cache evicts an object, it replaces it by a weak reference, so if that object is still available when next requested, we can resurrect it (and renew its cache timeout). In this way we avoid losing (or accidentally duplicating) any object before it is truly defunct (i.e. not used anywhere else). Notice that neither cache is required to be aware of the other, and that no other client objects need to be aware that there are any caches at all (eg: we avoid needing 'keepalives', callbacks, registration, retrieve-and-return scopes, etc - things get a lot simpler).
So although using WeakReference by itself (instead of a cache) is a terrible idea (because modern GCs are typically tuned to the size of the L2 CPU cache, and regular code will burn through this many times per minute), it's very useful as a way to hide your caches from the rest of your code.
If .NET has garbage collection then why do you have to explicitly call IDisposable?
Garbage collection is for memory. You need to dispose of non-memory resources - file handles, sockets, GDI+ handles, database connections etc. That's typically what underlies an IDisposable type, although the actual handle can be quite a long way down a chain of references. For example, you might Dispose an XmlWriter which disposes a StreamWriter it has a reference to, which disposes the FileStream it has a reference to, which releases the file handle itself.
Expanding a bit on other comments:
The Dispose() method should be called on all objects that have references to un-managed resources. Examples of such would include file streams, database connections etc. A basic rule that works most of the time is: "if the .NET object implements IDisposable then you should call Dispose() when you are done with the object.
However, some other things to keep in mind:
Calling dispose does not give you control over when the object is actually destroyed and memory released. GC handles that for us and does it better than we can.
Dispose cleans up all native resources, all the way down the stack of base classes as Jon indicated. Then it calls SuppressFinalize() to indicate that the object is ready to be reclaimed and no further work is needed. The next run of the GC will clean it up.
If Dispose is not called, then GC finds the object as needing to be cleaned up, but Finalize must be called first, to make sure resources are released, that request for Finalize is queued up and the GC moves on, so the lack of a call to Dispose forces one more GC to run before the object can be cleaned. This causes the object to be promoted to the next "generation" of GC. This may not seem like a big deal, but in a memory pressured application, promoting objects up to higher generations of GC can push a high-memory application over the wall to being an out-of-memory application.
Do not implement IDisposable in your own objects unless you absolutely need to. Poorly implemented or unneccessary implementations can actually make things worse instead of better. Some good guidance can be found here:
Implementing a Dispose Method
Or read that whole section of MSDN on Garbage Collection
Because Objects sometime hold resources beside memory. GC releases the memory; IDisposable is so you can release anything else.
because you want to control when the resources held by your object will get cleaned up.
See, GC works, but it does so when it feels like it, and even then, the finalisers you add to your objects will get called only after 2 GC collections. Sometimes, you want to clean those objects up immediately.
This is when IDisposable is used. By calling Dispose() explicitly (or using thr syntactic sugar of a using block) you can get access to your object to clean itself up in a standard way (ie you could have implemented your own cleanup() call and called that explicitly instead)
Example resources you would want to clean up immediately are: database handles, file handles, network handles.
In order to use the using keyword the object must implement IDisposable. http://msdn.microsoft.com/en-us/library/yh598w02(VS.71).aspx
The IDisposable interface is often described in terms of resources, but most such descriptions fail to really consider what "resource" really means.
Some objects need to ask outside entities to do something on their behalf, to the detriment of other entities, until further notice. For example, an object encompassing a file stream may need to ask a file system (which may be anywhere in the connected universe) to grant exclusive access to a file. In many cases, the object's need for the outside entity will be tied to outside code's need for the object. Once client code has done everything it's going to do with the aforementioned file stream object, for example, that object will no longer need to have exclusive access (or any access for that matter) to its associated file.
In general, an object X which asks an entity to do something until further notice incurs an obligation to deliver such notice, but can't deliver such notice as long as X's client might need X's services. The purpose of IDisposable is to provide a uniform way of letting objects know that their services will no longer be required, so that they can notify entities (if any) that were acting on their behalf that their services are no longer required. The code which calls IDisposable need neither know nor care about what (if any) services an object has requested from outside entities, since IDisposable merely invites an object to fulfill obligations (if any) to outside entities.
To put things in terms of "resources", an object acquires a resource when it asks an outside entity to do something on its behalf (typically, though not necessarily, granting exclusive use of something) until further notice, and releases a resource when it tells that outside entity its services are no longer required. Code that acquires a resource doesn't gain a "thing" so much as it incurs an obligation; releasing a resource doesn't give up a "thing", but instead fulfills an obligation.