C#: How to implement a smart cache

C#: How to implement a smart cache - c#

I have some places where implementing some sort of cache might be useful. For example in cases of doing resource lookups based on custom strings, finding names of properties using reflection, or to have only one PropertyChangedEventArgs per property name.
A simple example of the last one:
public static class Cache
{
private static Dictionary<string, PropertyChangedEventArgs> cache;
static Cache()
{
cache = new Dictionary<string, PropertyChangedEventArgs>();
}
public static PropertyChangedEventArgs GetPropertyChangedEventArgs(
string propertyName)
{
if (cache.ContainsKey(propertyName))
return cache[propertyName];
return cache[propertyName] = new PropertyChangedEventArgs(propertyName);
}
}
But, will this work well? For example if we had a whole load of different propertyNames, that would mean we would end up with a huge cache sitting there never being garbage collected or anything. I'm imagining if what is cached are larger values and if the application is a long-running one, this might end up as kind of a problem... or what do you think? How should a good cache be implemented? Is this one good enough for most purposes? Any examples of some nice cache implementations that are not too hard to understand or way too complex to implement?

This is a large problem, you need to determine the domain of the problem and apply the correct techniques. For instance, how would you describe the expiration of the objects? Do they become stale over a fixed interval of time? Do they become stale from an external event? How frequently does this happen? Additionally, how many objects do you have? Finally, how much does it cost to generate the object?
The simplest strategy would be to do straight memoization, as you have above. This assumes that objects never expire, and that there are not so many as to run your memory dry and that you think the cost to create these objects warrants the use of a cache to begin with.
The next layer might be to limit the number of objects, and use an implicit expiration policy, such as LRU (least recently used). To do this you'd typically use a doubly linked list in addition to your dictionary, and every time an objects is accessed it is moved to the front of the list. Then, if you need to add a new object, but it is over your limit of total objects, you'd remove from the back of the list.
Next, you might need to enforce explicit expiration, either based on time, or some external stimulus. This would require you to have some sort of expiration event that could be called.
As you can see there is alot of design in caching, so you need to understand your domain and engineer appropriately. You did not provide enough detail for me to discuss specifics, I felt.
P.S. Please consider using Generics when defining your class so that many types of objects can be stored, thus allowing your caching code to be reused.

You could wrap each of your cached items in a WeakReference. This would allow the GC to reclaim items if-and-when required, however it doesn't give you any granular control of when items will disappear from the cache, or allow you to implement explicit expiration policies etc.
(Ha! I just noticed that the example given on the MSDN page is a simple caching class.)

Looks like .NET 4.0 now supports System.Runtime.Caching for caching many types of things. You should look into that first, instead of re-inventing the wheel. More details:
http://msdn.microsoft.com/en-us/library/system.runtime.caching%28VS.100%29.aspx

This is a nice debate to have, but depending your application, here's some tips:
You should define the max size of the cache, what to do with old items if your cache is full, have a scavenging strategy, determine a time to live of the object in the cache, does your cache can/must be persisted somewhere else that memory, in case of application abnormal termination, ...

This is a common problem that has many solutions depending on your application need.
It is so common that Microsoft released a whole library to address it.
You should check out Microsoft Velocity before rolling up your own cache.
http://msdn.microsoft.com/en-us/data/cc655792.aspx
Hope this help.

You could use a WeakReference but if your object is not that large than don't because the WeakReference would be taking more memory than the object itself which is not a good technique. Also, if the object is a short-time usage where it will never make it to generation 1 from generation 0 on the GC, there is not much need for the WeakReference but IDisposable interface on the object would have with the release on SuppressFinalize.
If you want to control the lifetime you need a timer to update the datetime/ timespan again the desiredExpirationTime on the object in your cache.
The important thing is if the object is large then opt for the WeakReference else use the strong reference. Also, you can set the capacity on the Dictionary and create a queue for requesting additional objects in your temp bin serializing the object and loading it when there is room in the Dictionary, then clear it from the temp directory.

Related

Keeping class state valid vs performance

If i have public method that returns a reference type value, which is private field in the current class, do i need to return a copy of it? In my case i need to return List, but this method is called very often and my list holds ~100 items. The point is that if i return the same variable, everybody can modify it, but if i return a copy, the performance will degrade. In my case im trying to generate sudoku table, which is not fast procedure.
Internal class SudokuTable holds the values with their possible values. Public class SudokuGame handles UI requests and generates/solves SudokuTable. Is it good practice to chose performance instead OOP principles? If someone wants to make another library using my SudokuTable class, he wont be aware that he can brake its state with modifying the List that it returns.

Performance and object-oriented programming are not mutually exclusive - your code can be object-oriented and perform badly, etc.
In the case you state here I don't think it would be wise to allow external parts edit the internal state of a thing, so I would return an array or ReadOnlyCollection of the entries (it could be a potential possibility to use an ObservableCollection and monitor for tampering out-of-bounds, and 'handling' that accordingly (say, with an exception or something) - unsure how desirable this would be).
From there, you might consider how you expose access to these entries, trying to minimise the need for callers to get the full collection when all they need is to look up and return a specific one.
It's worth noting that an uneditable collection doesn't necessarily mean the state cannot be altered, either; if the entries are represented by a reference type rather than a value type then returning an entry leaves that open to tampering (potentially, depending on the class definition), so you might be better off with structs for the entry types.
At length, this, without a concrete example of where you're having problems, is a bit subjective and theoretical at the moment. Have you tried restricting the collection? And if so, how was the performance? Where were the issues? And so on.

Memory management / caching for costly objects in C#

Assume that I have the following object
public class MyClass
{
public ReadOnlyDictionary<T, V> Dict
{
get
{
return createDictionary();
}
}
}
Assume that ReadOnlyDictionary is a read-only wrapper around Dictionary<T, V>.
The createDictionary method takes significant time to complete and returned dictionary is relatively large.
Obviously, I want to implement some sort of caching so I could reuse result of createDictionary but also I do not want to abuse garbage collector and use to much memory.
I thought of using WeakReference for the dictionary but not sure if this is best approach.
What would you recommend? How to properly handle result of a costly method that might be called multiple times?
UPDATE:
I am interested in an advice for a C# 2.0 library (single DLL, non-visual). The library might be used in a desktop of a web application.
UPDATE 2:
The question is relevant for read-only objects as well. I changed value of the property from Dictionary to ReadOnlyDictionary.
UPDATE 3:
The T is relatively simple type (string, for example). The V is a custom class. You might assume that an instance of V is costly to create. The dictionary might contain from 0 to couple of thousands elements.
The code assumed to be accessed from a single thread or from multiple threads with an external synchronization mechanism.
I am fine if the dictionary is GC-ed when no one uses it. I am trying to find a balance between time (I want to somehow cache the result of createDictionary) and memory expenses (I do not want to keep memory occupied longer than necessary).

WeakReference is not a good solution for a cache since you object won´t survive the next GC if nobody else is referencing your dictionary. You can make a simple cache by storing the created value in a member variable and reuse it if it is not null.
This is not thread safe and you would end up in some situations creating the dictionary several times if you have heavy concurent access to it. You can use the double checked lock pattern to guard against this with minimal perf impact.
To help you further you would need to specify if concurrent access is an issue for you and how much memory your dictionary does consume and how it is created. If e.g. the dictionary is the result of an expensive query it might help to simply serialize the dictionary to disc and reuse it until you need to recreate it (this depends on your specific needs).
Caching is another word for memory leak if you have no clear policy when your object should be removed from the cache. Since you are trying WeakReference I assume you do not know when exactly a good time would be to clear the cache.
Another option is to compress the dictionary into a less memory hungry structure. How many keys does your dictionary has and what are the values?

There are four major mechanisms available for you (Lazy comes in 4.0, so it is no option)
lazy initialization
virtual proxy
ghost
value holder
each has it own advantages.
i suggest a value holder, which populates the dictionary on the first call of the GetValue
method of the holder. then you can use that value as long as you want to AND it is only
done once AND it is only done when in need.
for more information, see martin fowlers page

Are you sure you need to cache the entire dictionary?
From what you say, it might be better to keep a Most-Recently-Used list of key-value pairs.
If the key is found in the list, just return the value.
If it is not, create the one value (which is supposedly faster than creating all of them, and using less memory too) and store it in the list, thereby removing the key-value pair that hasn't been used the longest.
Here's a very simple MRU list implementation, it might serve as inspiration:
using System.Collections.Generic;
using System.Linq;
internal sealed class MostRecentlyUsedList<T> : IEnumerable<T>
{
private readonly List<T> items;
private readonly int maxCount;
public MostRecentlyUsedList(int maxCount, IEnumerable<T> initialData)
: this(maxCount)
{
this.items.AddRange(initialData.Take(maxCount));
}
public MostRecentlyUsedList(int maxCount)
{
this.maxCount = maxCount;
this.items = new List<T>(maxCount);
}
/// <summary>
/// Adds an item to the top of the most recently used list.
/// </summary>
/// <param name="item">The item to add.</param>
/// <returns><c>true</c> if the list was updated, <c>false</c> otherwise.</returns>
public bool Add(T item)
{
int index = this.items.IndexOf(item);
if (index != 0)
{
// item is not already the first in the list
if (index > 0)
{
// item is in the list, but not in the first position
this.items.RemoveAt(index);
}
else if (this.items.Count >= this.maxCount)
{
// item is not in the list, and the list is full already
this.items.RemoveAt(this.items.Count - 1);
}
this.items.Insert(0, item);
return true;
}
else
{
return false;
}
}
public IEnumerator<T> GetEnumerator()
{
return this.items.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
In your case, T is a key-value pair. Keep maxcount small enough, so that searching stays fast, and to avoid excessive memory usage. Call Add each time you use an item.

An application should use WeakReference as a caching mechanism if the useful lifetime of an object's presence in the cache will be comparable to reference lifetime of the object. Suppose, for example, that you have a method which will create a ReadOnlyDictionary based on deserializing a String. If a common usage pattern would be to read a string, create a dictionary, do some stuff with it, abandon it, and start again with another string, WeakReference is probably not ideal. On the other hand, if your objective is to deserialize many strings (quite a few of which will be equal) into ReadOnlyDictionary instances, it may be very useful if repeated attempts to deserialize the same string yield the same instance. Note that the savings would not just come from the fact that one only had to do the work of building the instance once, but also from the facts that (1) it would not be necessary to keep multiple instances in memory, and (2) if ReadOnlyDictionary variables refer to the same instance, they can be known to be equivalent without having to examine the instances themselves. By contrast, determining whether two distinct ReadOnlyDictionary instances were equivalent might require examining all the items in each. Code which would have to do many such comparisons could benefit from using a WeakReference cache so that variables which hold equivalent instances would usually hold the same instance.

I think you have two mechanisms you can rely on for caching, instead of developing your own. The first, as you yourself suggested, was to use a WeakReference, and to let the garbage collector decide when to free this memory up.
You have a second mechanism - memory paging. If the dictionary is created in one swoop, it'll probably be stored in a more or less continuous part of the heap. Just keep the dictionary alive, and let Windows page it out to the swap file if you don't need it. Depending on your usage (how random is your dictionary access), you may end up with better performance than the WeakReference.
This second approach is problematic if you're close to your address space limits (this happens only in 32-bit processes).

Performance recommendations of millions objects creation

My code has to generate millions object to perform some algorithm (millions objects will be created and at the same time 2/3 of them should be destroyed).
I know that object creation causes performance problems.
Could someone recommend how to manage so huge amount of objects, garbage collection and so on?
Thank you.

Elaborating a bit on my "make them a value type" comment above.
If you have a struct Foo, then preparing for the algorithm with e.g. var storage = new Foo[1000000] will only allocate one big block of memory (I 'm assuming the required amount of contiguous memory will be available).
You can then manually manage the memory inside that block to avoid performing more memory allocations:
Keep a count of how many slots in the array are actually used
To "create" a new Foo, put it at the first unused slot and increment the counter
To "delete" a Foo, swap it with the one in last used slot and decrement the counter
Of course making an algorithm work with value types vs reference types is not as simple as changing class to struct. But if workable it will allow you to side-step all of this overhead for an one-time startup cost.

If it is possible in your algorithm then try to reuse objects - if 2/3 are destroyed immedietly then you can try to use them again.

You can implement IDisposable interface on the type whose object is been created. Then you can implment using keyword and write whatever logic involving the object within the using scope. The following links will give you a fair idea of what i am trying to say. Hope they are of some help.
http://www.codeguru.com/csharp/csharp/cs_syntax/interfaces/article.php/c8679
Am I implementing IDisposable correctly?
Regards,
Samar

Is there a LinkedList collection that supports dictionary type operations

I was recently profiling an application trying to work out why certain operations were extremely slow. One of the classes in my application is a collection based on LinkedList. Here's a basic outline, showing just a couple of methods and some fluff removed:
public class LinkInfoCollection : PropertyNotificationObject, IEnumerable<LinkInfo>
{
private LinkedList<LinkInfo> _items;
public LinkInfoCollection()
{
_items = new LinkedList<LinkInfo>();
}
public void Add(LinkInfo item)
{
_items.AddLast(item);
}
public LinkInfo this[Guid id]
{ get { return _items.SingleOrDefault(i => i.Id == id); } }
}
The collection is used to store hyperlinks (represented by the LinkInfo class) in a single list. However, each hyperlink also has a list of hyperlinks which point to it, and a list of hyperlinks which it points to. Basically, it's a navigation map of a website. As this means you can having infinite recursion when links go back to each other, I implemented this as a linked list - as I understand it, it means for every hyperlink, no matter how many times it is referenced by another hyperlink, there is only ever one copy of the object.
The ID property in the above example is a GUID.
With that long winded description out the way, my problem is simple - according to the profiler, when constructing this map for a fairly small website, the indexer referred to above is called no less than 27906 times. Which is an extraordinary amount. I still need to work out if it's really necessary to be called that many times, but at the same time, I would like to know if there's a more efficient way of doing the indexer as this is the primary bottleneck identified by the profiler (also assuming it isn't lying!). I still needed the linked list behaviour as I certainly don't want more than one copy of these hyperlinks floating around killing my memory, but I also do need to be able to access them by a unique key.
Does anyone have any advice to offer on improving the performance of this indexer. I also have another indexer which uses a URI rather than a GUID, but this is less problematic as the building incoming/outgoing links is done by GUID.
Thanks;
Richard Moss

You should use a Dictionary<Guid, LinkInfo>.

You don't need to use LinkedList in order to have only one copy of each LinkInfo in memory. Remember that LinkInfo is a managed reference type, and so you can place it in any collection, and it'll just be a reference to the object that gets placed in the list, not a copy of the object itself.
That said, I'd implement the LinkInfo class as containing two lists of Guids: one for the things this links to, one for the things linking to this. I'd have just one Dictionary<Guid, LinkInfo> to store all the links. Dictionary is a very fast lookup, I think that'll help with your performance.
The fact that this[] is getting called 27,000 times doesn't seem like a big deal to me, but what's making it show up in your profiler is probably the SingleOrDefault call on the LinkedList. Linked lists are best for situations where you need fast insertions & removals, particularly in the middle of the list. For quick lookups, which is probably more important here, let the Dictionary do its work with hash tables.

C#: Is there any way to easily find/update all references to an object?

I've been reading Rockford Lhotka's "Expert C# 2008 Business Objects", where there is such a thing as a data portal which nicely abstracts where the data comes from. When using the DataPortal.Update(this), which as you might guess persists 'this' to the database, an object is returned - the persisted 'this' with any changes the db made to it, eg. a timestamp.
Lhotka has written often and very casually, that you have to make sure to update all references to the old object to the new returned object. Makes sense, but is there an easy way to find all references to the old object and change them? Obviously the GC tracks references, is it possible to tap into that?
Cheers

There are profiling API's to do this but nothing for general consumption. One possible solution and one which I've used myself is to implement in a base class a tracking mechanism where each instance of the object adds a WeakReference to itself to a static collection.
I have this conditionally compiled for DEBUG builds but it probably wouldn't be a good idea to rely on this in a release build.
// simplified example
// do not use. performance would suck
abstract class MyCommonBaseClass {
static readonly List<WeakReference> instances = new List<WeakReference>();
protected MyCommonBaseClass() {
lock (instances) {
RemoveDeadOnes();
instances.Add(new WeakReference(this));
}
}
}

The GC doesn't actually track the references to the objects. Instead, it calculates which objects are reachable starting from global and stack objects at the runtime, and executing some variant of "flood fill" algorithm.
Specifically for your problem, why not just have a proxy holding reference to the "real" object? This way you need to update at only one place.

There isn't a simple way to do this directly, however, Son of Strike has this capability. It allows you to delve into all object references tracked by the CLR, and look at what objects are referencing any specific object, etc.
Here is a good tutorial for learning CLR debugging via SoS.

If you are passing object references around and those object references remain unchanged, then any changes made to the object in a persistence layer will be instantly visible to any other consumers of the object. However if your object is crossing a service boundary then the assemblies on each side of the object will be viewing different objects that are just carbon copies. Also if you have made clones of the object, or have created anonymous types that incorporate properties from the original object, then those will be tough to track down - and of course to the GC these are new objects that have no tie-in to the original object.
If you have some sort of key or ID in the object then this becomes easier. The key doesn't have to be a database ID, it can be a GUID that is new'ed up when the object is instantiated, and does not get changed for the entire lifecycle of the object (i.e. it is a property that has a getter but no setter) - as it is a property it will persist across service boundaries, so your object will still be identifiable. You can then use LINQ or even old-fashioned loops (icky!) to iterate through any collection that is likely to hold a copy of the updated object, and if one is found you can then merge the changes back in.
Having said this, i wouldn't think that you have too many copies floating around. IF you do then the places where these copies are should be very localized. Ensuring that your object implements INotifyPropertyChanged will also help propagate notifications of changes if you hold a list in one spot which is then bound to directly or indirectly in several other spots.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.