I am working on implementing a MRU(Most Recently Used) cache in my project using C#.
I googled some conceptions and implementations about MRU, and its contrary, LRU(Least Recently Used), and found this article http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=626 that describes the implementation of MRU collection in C#.
To confuse me is that I think this implementation is LRU rather than MRU. Could anyone help me to confirm this collection class is MRU or not?
Following code block is the whole MRUCollection class. Thanks.
class MruDictionary<TKey, TValue>
{
private LinkedList<MruItem> items;
private Dictionary<TKey, LinkedListNode<MruItem>> itemIndex;
private int maxCapacity;
public MruDictionary(int cap)
{
maxCapacity = cap;
items = new LinkedList<MruItem>();
itemIndex = new Dictionary<TKey, LinkedListNode<MruItem>>(maxCapacity);
}
public void Add(TKey key, TValue value)
{
if (itemIndex.ContainsKey(key))
{
throw new ArgumentException("An item with the same key already exists.");
}
if (itemIndex.Count == maxCapacity)
{
LinkedListNode<MruItem> node = items.Last;
items.RemoveLast(); //Why do we move the last than first here? The node accessed recently is moved to the front of list.
itemIndex.Remove(node.Value.Key);
}
LinkedListNode<MruItem> newNode = new LinkedListNode<MruItem>(new MruItem(key, value));
items.AddFirst(newNode);
itemIndex.Add(key, newNode);
}
public bool TryGetValue(TKey key, out TValue value)
{
LinkedListNode<MruItem> node;
if (itemIndex.TryGetValue(key, out node))
{
value = node.Value.Value;
items.Remove(node);
items.AddFirst(node);
return true;
}
value = default(TValue);
return false;
}
}
class MruItem
{
private TKey _key;
private TValue _value;
public MruItem(TKey k, TValue v)
{
_key = key;
_value = v;
}
public TKey Key
{
get { return _key; }
}
public TValue Value
{
get { return _value; }
}
}
http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
Most Recently Used (MRU): discards, in contrast to LRU, the most recently used items first.
According my understanding, as the node accessed recently is moved to the front of list, if the cache is full, we should remove the first node of list rather than last.
It looks to me like an MRU implementation. Notice how searches start from the beginning of the linked list and go back, and whenever a node is accessed it's moved to the front of the list. In Add(), the node is added using AddFirst(), and in TryGetValue(), it removes the node and adds it to the front of the list.
Based on what is documented here: http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
It's LRU. Think about the items being a "ordered" list.
The most recently used item is at the "front".
When a new item is added they call items.AddFirst(newNode); which adds it to the front of the list.
When an item is "touched", they move it to the front of the list using these calls:
items.Remove(node);
items.AddFirst(node);
When the list is full, it pushes the "last" / "oldest" item from the list using items.RemoveLast();
The cache is removing the "least recently used" items first when it hits capacity.
Microsoft's "MRU" lists correctly use an LRU cache replacement algorithm.
Note that Microsoft in this case uses different terminology for MRU lists than the cache community.
The cache community uses MRU / LRU to talk about replacement (or eviction) strategies. When your cache is full, and you need to put a new item in the list, which item should be removed from the list?
Microsoft provides tools for getting the most recently used items, like for a drop down or a recent documents list.
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/mru-source-list-functions
https://www.codeproject.com/articles/78/most-recently-used-list-in-a-combobox
This means that to correctly implement an MRU list, you need to implement an LRU Cache eviction strategy.
Related
I am trying to change the value of Keys in my dictionary as follows:
//This part loads the data in the iterator
List<Recommendations> iterator = LoadBooks().ToList();
//This part adds the data to a list
List<Recommendations> list = new List<Recommendations>();
foreach (var item in iterator.Take(100))
{
list.Add(item);
}
//This part adds Key and List as key pair value to the Dictionary
if (!SuggestedDictionary.ContainsKey(bkName))
{
SuggestedDictionary.Add(bkName, list);
}
//This part loops over the dictionary contents
for (int i = 0; i < 10; i++)
{
foreach (var entry in SuggestedDictionary)
{
rec.Add(new Recommendations() { bookName = entry.Key, Rate = CalculateScore(bkName, entry.Key) });
entry.Key = entry.Value[i];
}
}
But it says "Property or Indexer KeyValuePair>.Key Cannot be assigned to. Is read only. What I exactly want to do is change the value of dictionary Key here and assign it another value.
The only way to do this will be to remove and re-add the dictionary item.
Why? It's because a dictionary works on a process called chaining and buckets (it's similar to a hash table with different collision resolution strategy).
When an item is added to a dictionary, it is added to the bucket that its key hashes to and, if there's already an instance there, it's prepended to a chained list. If you were to change the key, it will need to to go through the process of working out where it belongs. So the easiest and most sane solution is to just remove and re-add the item.
Solution
var data = SomeFunkyDictionary[key];
SomeFunkyDictionary.Remove(key);
SomeFunkyDictionary.Add(newKey,data);
Or make your self an extension method
public static class Extensions
{
public static void ReplaceKey<T, U>(this Dictionary<T, U> source, T key, T newKey)
{
if(!source.TryGetValue(key, out var value))
throw new ArgumentException("Key does not exist", nameof(key));
source.Remove(key);
source.Add(newKey, value);
}
}
Usage
SomeFunkyDictionary.ReplaceKey(oldKye,newKey);
Side Note : Adding and removing from a dictionary incurs a penalty; if you don't need fast lookups, it may just be more suitable not use a dictionary at all, or use some other strategy.
What way I could use to avoid duplicates in a list?
One way is when I will add a new item, check first if the element exists, but this make me use more code and iterate all the list to check if it exists.
Another way I could use a hashset, that if I try to add a new item, itself check if the item exists, if not, it will add the new item, if exists, then do nothing.
But I know that the hashset is less efficient, need more resources than a list, so I don't know if using a hashset to avoid duplicates it is a good use of the hashset.
There are any other alternative?
Thanks.
You can achieve this in a single line of code :-
List<long> longs = new List<long> { 1, 2, 3, 4, 3, 2, 5 };
List<long> unique = longs.Distinct().ToList();
unique will contains only 1,2,3,4,5
You cannot avoid duplicates in List. No way - there is no verification of items.
If you don't bother with order of items - use HashSet.
If you want to preserve order of items (actually there is a little ambiguity - should item appear at index of first addition or at index of last addition). But you want to be sure that all items are unique, then you should write your own List class. I.e. something which implements IList<T> interface:
public class ListWithoutDuplicates<T> : IList<T>
And you have different options here. E.g. you should decide what is more important for you - fast addition or memory consumption. Because for fast addition and contains operation you should use some hash-based data structure. Which is unordered. Here is sample implementation with HashSet for storing hashes of all items stored in the internal list. You will need following fields:
private readonly HashSet<int> hashes = new HashSet<int>();
private readonly List<T> items = new List<T>();
private static readonly Comparer<T> comparer = Comparer<T>.Default;
Adding items is simple (warning: no null-checks here and further) - use item hash code to quickly O(1) check if it's already added. Use same approach for removing items:
public void Add(T item)
{
var hash = item.GetHashCode();
if (hashes.Contains(hash))
return;
hashes.Add(hash);
items.Add(item);
}
public bool Remove(T item)
{
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
return false;
hashes.Remove(item.GetHashCode());
return items.Remove(item);
}
Some index-based operations:
public int IndexOf(T item)
{
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
return -1;
return items.IndexOf(item);
}
public void Insert(int index, T item)
{
var itemAtIndex = items[index];
if (comparer.Compare(item, itemAtIndex) == 0)
return;
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
{
hashes.Remove(itemAtIndex.GetHashCode());
items[index] = item;
hashes.Add(hash);
return;
}
throw new ArgumentException("Cannot add duplicate item");
}
public void RemoveAt(int index)
{
var item = items[index];
hashes.Remove(item.GetHashCode());
items.RemoveAt(index);
}
And left-overs:
public T this[int index]
{
get { return items[index]; }
set { Insert(index, value); }
}
public int Count => items.Count;
public bool Contains(T item) => hashes.Contains(item.GetHashCode());
public IEnumerator<T> GetEnumerator() => items.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => items.GetEnumerator();
That's it. Now you have list implementation which will add item only once (first time). E.g.
var list = new ListWithoutDuplicates<int> { 1, 2, 1, 3, 5, 2, 5, 3, 4 };
Will create list with items 1, 2, 3, 5, 4. Note: if memory consumption is more important than performance, then instead of using hashes use items.Contains operation which is O(n).
BTW What we just did is actually a IList Decorator
A List is a data-structure that may contain duplicates. Duplicate elements are disambiguated by their index.
One way is when I will add a new item, check first if the element exists, but this make me use more code and iterate all the list to check if it exists.
This is possible, but it is error-prone and slow. You will need to iterate through the entire list every time you want to add an element. It is also possible that you will forget to check somewhere in your code.
Another way I could use a hashset, that if I try to add a new item, itself check if the item exists, if not, it will add the new item, if exists, then do nothing.
This is the preferred way. It is best to use the standard library to enforce the contraints that you want.
But I know that the hashset is less efficient, need more resources than a list, so I don't know if using a hashset to avoid duplicates it is a good use of the hashset.
The efficiency depends on what you are trying to do; see https://stackoverflow.com/a/23949528/1256041.
There are any other alternative?
You could implement your own ISet using List. This would make insertion much slower (you would need to iterate the whole collection), but you would gain O(1) random-access.
The hashset is the best way to check if the item exist because it's O(1).
So you can insert the items both in a list and in hashset
and before inserting a new item you check if it's exist in the hashset.
I need to provide access to a Key/Value pair store that persists for all users across session.
I could easily create a singleton for this, but for performance reasons I want to limit the size of the dictionary to 10000 items (or any performant number, as the object will persist indefinitely)
Is there a form of dictionary where I can specify a limit to the number of objects stored, and when that limit is exceeded, remove the oldest entry?
There is no such built-in dictionary, but you can build your own. You will need a queue for keys - that will allow you quickly find oldest entry and remove it. Also you will need a simple dictionary for keeping your values - that will allow you quickly search for them:
public class SuperDictionary<TKey, TValue>
{
private Dictionary<TKey, TValue> dictionary;
private Queue<TKey> keys;
private int capacity;
public SuperDictionary(int capacity)
{
this.keys = new Queue<TKey>(capacity);
this.capacity = capacity;
this.dictionary = new Dictionary<TKey, TValue>(capacity);
}
public void Add(TKey key, TValue value)
{
if (dictionary.Count == capacity)
{
var oldestKey = keys.Dequeue();
dictionary.Remove(oldestKey);
}
dictionary.Add(key, value);
keys.Enqueue(key);
}
public TValue this[TKey key]
{
get { return dictionary[key]; }
}
}
NOTE: You can implement IDictionary<TKey,TValue> interface, to make this class a 'true' dictionary.
Use the Cache, rather than Session. It's not user specific, and you can set the maximum size of the cache. When new items are added and the cache is full, it'll remove items to make space. It allows for sophisticated aging mechanisms, such as items being removed after a fixed period of time, a fixed period of time after their last use, priorities (to be taken into consideration when deciding what to remove), etc.
No, there is no built-in dictionary that does this. In fact, all of the generic collections are infinite-sized.
However, you could easily make a Queue<KeyValuePair<string, int>> and a function that checks the count and performs a dequeue when an element is added and the length is too long. Dictionary is a difficult choice here because there is no way to determine "age" (unless you make it part of the key or value).
Something like:
public void AddDataToDictionary(string key, int value)
{
if (queue.Count > 10000)
queue.Dequeue();
queue.Enqueue(new KeyValuePair(key, value);
}
Here's a dictionary implementation that has the following removal strategies:
EmptyRemovalStrategy<TKey> – Removes the first item in it’s internal collection. Does not track access in any way.
MruRemovalStrategy<TKey> – Removes the most recently used (most accessed) item in the CacheDictionary.
LruRemovalStrategy<TKey> – Removes the least recently used (least accessed) item in the CacheDictionary.
The CacheDictionary is a dictionary with a limited number of items. So you'd be able to specify a max size of 1000. With this implementation you would also be able to determine the "age" of an entry and remove the least used (hence a cache)
http://alookonthecode.blogspot.com/2012/03/implementing-cachedictionarya.html
Is there a way to remove an entry from a Dictionary (by Key) AND retrieve its Value in the same step?
For example, I'm calling
Dictionary.Remove(Key);
but I also want it to return the Value at the same time. The function only returns a bool.
I know I can do something like
Value = Dictionary[Key];
Dictionary.Remove(Key);
but it seems like this will search the dictionary twice (once to get the value, and another time to remove it from the dictionary). How can I (if possible) do both WITHOUT searching the dictionary twice?
Starting with .NET Core 2.0, we have:
public bool Remove (TKey key, out TValue value);
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.remove?view=netcore-2.0#System_Collections_Generic_Dictionary_2_Remove__0__1__
Note this API hasn't been included in .NET Standard 2.0 and .NET Framework 4.7.
Because they both have the desired missing method I tried Microsoft's ConcurrentDictionary and C5 from University of Copenhagen http://www.itu.dk/research/c5/ and I can tell with, at least with my use case it was super slow (I mean 5x - 10x slower) compared to Dictionary.
I think C5 is sorting both keys and values all the time and Concurrent Dictionary is "too worried" about the calling thread.. I am not here to discuss why those two incarnations of Dictionary are slow.
My algorithm was seeking and replacing some entries whereas the first keys would be removed and new keys would be added (some sort of Queue)...
The only think left to do was to modify original .Net mscorelib's Dictionary. I downloaded the source code from Microsoft and included the Dictionary class in my source code. To compile I also need to drag along just the HashHelpers class and ThrowHelper class. All that was left was to comment out some lines (e.g. [DebuggerTypeProxy(typeof(Mscorlib_DictionaryDebugView<,>))] and some resource fetching). Obviously I had to add the missing method to the copied class. Also do not try to compile Microsoft Source code you will be doing that for hours, I was lucky enough to get it going.
public bool Remove(TKey key, out TValue value)
{
if (key == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null)
{
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int bucket = hashCode % buckets.Length;
int last = -1;
for (int i = buckets[bucket]; i >= 0; last = i, i = entries[i].next)
{
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
{
if (last < 0)
{
buckets[bucket] = entries[i].next;
}
else
{
entries[last].next = entries[i].next;
}
entries[i].hashCode = -1;
entries[i].next = freeList;
entries[i].key = default(TKey);
value = entries[i].value;
entries[i].value = default(TValue);
freeList = i;
freeCount++;
version++;
return true;
}
}
}
value = default(TValue);
return false;
}
Lastly I modified the namespace to System.Collection.Generic.My
In my algorithm I only had two lines where I was getting the value than remove it in the next line.. replaced that with the new method and obtained a steady performance gain of 7%-10%.
Hope it helps this use case and any other cases where re-implementing Dictionary from scratch is just not what one should do.
Even though this is not what the OP has asked for, I could not help myself but post a corrected extension method:
public static bool Remove<TKey, TValue>(this Dictionary<TKey, TValue> self, TKey key, out TValue target)
{
self.TryGetValue(key, out target);
return self.Remove(key);
}
The concurrentDictionary has a TryRemove method that attempts to remove and return the value that has the specified key from the System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>.
It returns the default value of the TValue type if key does not exist.
https://msdn.microsoft.com/en-us/library/dd287129(v=vs.110).aspx
You can do it with an Extension method:
public static string GetValueAndRemove<TKey, TValue>(this Dictionary<int, string> dict, int key)
{
string val = dict[key];
dict.Remove(key);
return val;
}
static void Main(string[] args)
{
Dictionary<int, string> a = new Dictionary<int, string>();
a.Add(1, "sdfg");
a.Add(2, "sdsdfgadfhfg");
string value = a.GetValueAndRemove<int, string>(1);
}
You can extend the class to add that functionality:
public class PoppableDictionary<T, V> : Dictionary<T, V>
{
public V Pop(T key)
{
V value = this[key];
this.Remove(key);
return value;
}
}
I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.
So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.
static Foo CreateFoo(int id) {
Foo foo;
if (!cache.TryGetValue(id, out foo)) {
foo = new Foo(id);
foo.Initialize(...);
cache.Put(id, foo);
}
return foo;
}
The cache is implemented as a Dictionary<TKey,WeakReference>, based on #JaredPar's Building a WeakReference Hashtable:
class WeakDictionary<TKey, TValue> where TValue : class {
private readonly Dictionary<TKey, WeakReference> items;
public WeakDictionary() {
this.items = new Dictionary<TKey, WeakReference>();
}
public void Put(TKey key, TValue value) {
this.items[key] = new WeakReference(value);
}
public bool TryGetValue(TKey key, out TValue value) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef)) {
value = null;
return false;
} else {
value = (TValue)weakRef.Target;
return (value != null);
}
}
}
The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by #Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.
My question is: What's the best strategy to compact a WeakReference Dictionary?
The options that I see are:
Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.
Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).
Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?
Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.
Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.
Are there other strategies?
The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?
If you can switch the managed object to be the key of the dictionary, then you can use .Net 4.0's ConditionalWeakTable (namespace System.Runtime.CompilerServices).
According to Mr. Richter, ConditionalWeakTable is notified of object collection by the garbage collector rather than using a polling thread.
static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();
void Window_Loaded(object sender, RoutedEventArgs e)
{
...
dataGrid.SelectionChanged += (_sender, _e) =>
{
var cs = dataGrid.SelectedItem as ClientSession;
this.tabControl.Items.Clear();
foreach (var tid in cs.GetThreadIDs())
{
tid.tabItem = new TabItem() { Header = ... };
tid.tabItem.AddHandler(UIElement.MouseDownEvent,
new MouseButtonEventHandler((__sender, __e) =>
{
tabControl_SelectionChanged(tid.tabItem);
}), true);
tidByTab.Add(tid.tabItem, tid);
this.tabControl.Items.Add(tid.tabItem);
}
};
}
void tabControl_SelectionChanged(TabItem tabItem)
{
this.tabControl.SelectedItem = tabItem;
if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
{
tidExec.EnsureBlocksLoaded();
ShowStmt(tidExec.CurrentStmt);
}
else
throw new Exception("huh?");
}
What's important here is that the only thing referencing the TabItem object is the tabControls.Items collection, and the key of ConditionalWeakTable. The key of ConditionalWeakTable does not count. So when we clear all the items from the tabControl, then those TabItems can be garbage-collected (because nothing references them any longer, again the key of ConditionalWeakTable does not count). When they are garabage collected, ConditionalWeakTable is notified and the entry with that key value is removed. So my bulky TIDExec objects are also garbage-collected at that point (nothing references them, except the value of ConditionalWeakTable).
Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.
Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.
But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.
After some discussion:
Does such a[n amortized] strategy
exist?
I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.
Approach #5 is interesting, but has the disadvantage that it could be difficult to know what the real level of hash-table utilization is, and consequently when the hash table should be expanded. That difficulty might be overcome if, whenever it "seems" like the hash table should be expanded, one first does a whole-table scan to remove dead entries. If more than half of the entries in the table were dead, don't bother expanding it. Such an approach should yield amortized O(1) behavior, since one wouldn't do the whole-table scan until one had added back as many entries as had been deleted.
A simpler approach, which would also yield O(1) amortized time and O(1) space per recently-live element would be to keep a count of how many items were alive after the last time the table was purged, and how many elements have been added since then. Whenever the latter count exceeds the first, do a whole-table scan-and-purge. The time required for a scan and purge will be proportional to the number of elements added between purges, thus retaining amortized O(1) time, and the number of total elements in the collection will not exceed twice the number of elements that were recently observed to be alive, so the number of dead elements cannot exceed twice the number of recently-live elements.
I had this same problem, and solved it like this (WeakDictionary is the class I was trying to clean up):
internal class CleanerRef
{
~CleanerRef()
{
if (handle.IsAllocated)
handle.Free();
}
public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
{
handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
Dictionary = dictionary;
}
public bool IsAlive
{
get {return handle.IsAllocated && handle.Target != null;}
}
public object Target
{
get {return IsAlive ? handle.Target : null;}
}
GCHandle handle;
public WeakDictionary Dictionary;
}
internal class WeakDictionaryCleaner
{
public WeakDictionaryCleaner(WeakDictionary dict)
{
refs.Add(new CleanerRef(this, dict));
}
~WeakDictionaryCleaner()
{
foreach(var cleanerRef in refs)
{
if (cleanerRef.Target == this)
{
cleanerRef.Dictionary.ClearGcedEntries();
refs.Remove(cleanerRef);
break;
}
}
}
private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}
What this two classes try to achieve is to "hook" the GC. You activate this mechanism by creating an instance of WeakDictionaryCleaner during the construction of the weak collection:
new WeakDictionaryCleaner(weakDictionary);
Notice that I don't create any reference to the new instance, so that the GC will dispose it during the next cycle. In the ClearGcedEntries() method I create a new instance again, so that each GC cycle will have a cleaner to finalize that in turn will execute the collection compaction.
You can make the CleanerRef.Dictionary also a weak reference so that it won't hold the dictionary in memory.
Hope this helps
I guess this is a right place to put it, even though it might look like necromancy. Just in case someone stumbles upon this question like I did. Lack of a dedicated Identity Map in .net is somewhat surprising, and I feel the most natural way for it work is as described in the last option: when the table is full and about to double its capacity, it checks to see if there is enough dead entries that can be recycled for further use so that growing is not necessary.
static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));
The class exposes just one public method Get with key and optional value parameter that lazily loads and caches an entity if it is not in the cache.
using System;
class IdentityMap<TKey, TValue>
where TKey : IEquatable<TKey>
where TValue : class
{
Func<TValue, TKey> key_selector;
WeakReference<TValue>[] references;
int[] buckets;
int[] bucket_indexes;
int tail_index;
int entries_count;
int capacity;
public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
this.key_selector = key_selector;
Init(capacity);
}
void Init(int capacity) {
this.bucket_indexes = new int[capacity];
this.buckets = new int[capacity];
this.references = new WeakReference<TValue>[capacity];
for (int i = 0; i < capacity; i++) {
bucket_indexes[i] = -1;
buckets[i] = i - 1;
}
this.tail_index = capacity - 1;
this.entries_count = 0;
this.capacity = capacity;
}
public TValue Get(TKey key, Func<TValue> value = null) {
int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
var ret = WalkBucket(bucket_index, true, key);
if (ret == null && value != null) Add(bucket_index, ret = value());
return ret;
}
void Add(int bucket_index, TValue value) {
if (this.entries_count == this.capacity) {
for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
if (this.entries_count * 2 > this.capacity) {
var old_references = references;
Init(this.capacity * 2);
foreach (var old_reference in old_references) {
TValue old_value;
if (old_reference.TryGetTarget(out old_value)) {
int hash = key_selector(value).GetHashCode();
Add(Math.Abs(hash % this.capacity), old_value);
}
}
}
}
int new_index = this.tail_index;
this.tail_index = buckets[this.tail_index];
this.entries_count += 1;
buckets[new_index] = bucket_indexes[bucket_index];
if (references[new_index] != null) references[new_index].SetTarget(value);
else references[new_index] = new WeakReference<TValue>(value);
bucket_indexes[bucket_index] = new_index;
}
TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
int curr_index = bucket_indexes[bucket_index];
int prev_index = -1;
while (curr_index != -1) {
TValue value;
int next_index = buckets[curr_index];
if (references[curr_index].TryGetTarget(out value)) {
if (is_searching && key_selector(value).Equals(key)) return value;
prev_index = curr_index;
} else {
if (prev_index != -1) buckets[prev_index] = next_index;
else bucket_indexes[bucket_index] = next_index;
buckets[curr_index] = this.tail_index;
this.tail_index = curr_index;
this.entries_count -= 1;
}
curr_index = next_index;
}
return null;
}
}
You could remove the "invalid" WeakReference inside TryGetValue:
[Edit] My mistake, these solutions actually do nothing more than what you suggested, since Put method will swap the old object with the new one anyway. Just ignore it.
public bool TryGetValue(TKey key, out TValue value) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef)) {
value = null;
return false;
} else {
value = (TValue)weakRef.Target;
if (value == null)
this.items.Remove(key);
return (value != null);
}
}
Or, you can immediatelly create a new instance inside your dictionary, whenever it is needed:
public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef) {
Tvalue result = ctor(key);
this.Put(key, result);
return result;
}
value = (TValue)weakRef.Target;
if (value == null)
{
Tvalue result = ctor(key);
this.Put(key, result);
return result;
}
return value;
}
You would then use it like this:
static Foo CreateFoo(int id)
{
return cache.GetOrCreate(id, id => new Foo(id));
}
[Edit]
According to windbg, WeakReference instance alone occupies 16 bytes. For 100,000 collected objects, this would not be such a serious burden, so you could easily let them live.
If this is a server app and you believe you could benefit from collecting, I would consider going for a background thread, but also implementing a simple algorithm to increase waiting time whenever you collect a relatively small number of objects.
A little specialization: When target classes know the weak dictionary reference and its TKey value, you can remove its entry from finalyzer call.
public class Entry<TKey>
{
TKey key;
Dictionary<TKey, WeakReference> weakDictionary;
public Entry(Dictionary<TKey, WeakReference> weakDictionary, TKey key)
{
this.key = key;
this.weakDictionary = weakDictionary;
}
~Entry()
{
weakDictionary.Remove(key);
}
}
When cached objects are subclass of Entry<TKey>, no empty WeakReference leaks
since finalyzer is called after its instance was garbage collected.