Remove from Dictionary by Key and Retrieve Value - c#

Is there a way to remove an entry from a Dictionary (by Key) AND retrieve its Value in the same step?
For example, I'm calling
Dictionary.Remove(Key);
but I also want it to return the Value at the same time. The function only returns a bool.
I know I can do something like
Value = Dictionary[Key];
Dictionary.Remove(Key);
but it seems like this will search the dictionary twice (once to get the value, and another time to remove it from the dictionary). How can I (if possible) do both WITHOUT searching the dictionary twice?

Starting with .NET Core 2.0, we have:
public bool Remove (TKey key, out TValue value);
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.remove?view=netcore-2.0#System_Collections_Generic_Dictionary_2_Remove__0__1__
Note this API hasn't been included in .NET Standard 2.0 and .NET Framework 4.7.

Because they both have the desired missing method I tried Microsoft's ConcurrentDictionary and C5 from University of Copenhagen http://www.itu.dk/research/c5/ and I can tell with, at least with my use case it was super slow (I mean 5x - 10x slower) compared to Dictionary.
I think C5 is sorting both keys and values all the time and Concurrent Dictionary is "too worried" about the calling thread.. I am not here to discuss why those two incarnations of Dictionary are slow.
My algorithm was seeking and replacing some entries whereas the first keys would be removed and new keys would be added (some sort of Queue)...
The only think left to do was to modify original .Net mscorelib's Dictionary. I downloaded the source code from Microsoft and included the Dictionary class in my source code. To compile I also need to drag along just the HashHelpers class and ThrowHelper class. All that was left was to comment out some lines (e.g. [DebuggerTypeProxy(typeof(Mscorlib_DictionaryDebugView<,>))] and some resource fetching). Obviously I had to add the missing method to the copied class. Also do not try to compile Microsoft Source code you will be doing that for hours, I was lucky enough to get it going.
public bool Remove(TKey key, out TValue value)
{
if (key == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null)
{
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int bucket = hashCode % buckets.Length;
int last = -1;
for (int i = buckets[bucket]; i >= 0; last = i, i = entries[i].next)
{
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
{
if (last < 0)
{
buckets[bucket] = entries[i].next;
}
else
{
entries[last].next = entries[i].next;
}
entries[i].hashCode = -1;
entries[i].next = freeList;
entries[i].key = default(TKey);
value = entries[i].value;
entries[i].value = default(TValue);
freeList = i;
freeCount++;
version++;
return true;
}
}
}
value = default(TValue);
return false;
}
Lastly I modified the namespace to System.Collection.Generic.My
In my algorithm I only had two lines where I was getting the value than remove it in the next line.. replaced that with the new method and obtained a steady performance gain of 7%-10%.
Hope it helps this use case and any other cases where re-implementing Dictionary from scratch is just not what one should do.

Even though this is not what the OP has asked for, I could not help myself but post a corrected extension method:
public static bool Remove<TKey, TValue>(this Dictionary<TKey, TValue> self, TKey key, out TValue target)
{
self.TryGetValue(key, out target);
return self.Remove(key);
}

The concurrentDictionary has a TryRemove method that attempts to remove and return the value that has the specified key from the System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>.
It returns the default value of the TValue type if key does not exist.
https://msdn.microsoft.com/en-us/library/dd287129(v=vs.110).aspx

You can do it with an Extension method:
public static string GetValueAndRemove<TKey, TValue>(this Dictionary<int, string> dict, int key)
{
string val = dict[key];
dict.Remove(key);
return val;
}
static void Main(string[] args)
{
Dictionary<int, string> a = new Dictionary<int, string>();
a.Add(1, "sdfg");
a.Add(2, "sdsdfgadfhfg");
string value = a.GetValueAndRemove<int, string>(1);
}

You can extend the class to add that functionality:
public class PoppableDictionary<T, V> : Dictionary<T, V>
{
public V Pop(T key)
{
V value = this[key];
this.Remove(key);
return value;
}
}

Related

Most efficient way to retrieve a KeyValuePair from Dictionary

In my app I have a Dictionary<ContainerControl, int>.
I need to check if a key is present in the dictionary and alter its corresponding value if key is found or add the key if not already present.
The key for my dictionary is a ControlContainer object.
I could use this method:
var dict = new Dictionary<ContainerControl, int>();
/*...*/
var c = GetControl();
if (dict.ContainsKey(c))
{
dict[c] = dict[c] + 1;
}
else
{
dict.Add(c, 0);
}
but I think that this way if the key is already present, my dictionary is iterated three times: once in ContainsKey and twice in the if branch.
I wander if there is a more efficient way to do this, something like
var dict = new Dictionary<ContainerControl, int>();
/*...*/
var c = GetControl();
var kvp = dict.GetKeyValuePair(c); /* there is no such function in Dictionary */
if (kvp != null)
{
kvp.Value++;
}
else
{
dict.Add(c, 0);
}
This is possible using linq:
var kvp = dict.SingleOrDefault(x => x.Key == c);
but what about performance?
As noted in comments, finding a key in a dictionary doesn't mean iterating over the whole dictionary. But in some cases it's still worth trying to reduce the lookups.
KeyValuePair<,> is a struct anyway, so if GetKeyValuePair did exist, your kvp.Value++ wouldn't compile (as Value is read-only) and wouldn't work even if it did (as the pair wouldn't be the "original" in the dictionary).
You can use TryGetValue to reduce this to a single "read" operation and a single "write" operation:
// value will be 0 if TryGetValue returns false
if (dict.TryGetValue(c, out var value))
{
value++;
}
dict[c] = value;
Or change to ConcurrentDictionary and use AddOrUpdate to perform the change in a single call.
You could also store a reference type in the dict. This means an extra allocation when you insert an item, but you can mutate items without another dictionary access. You'll need a profiler to tell you whether this is a net improvement!
class IntBox
{
public int Value { get; set; }
}
if (dict.TryGetValue(c, out var box))
{
box.Value++;
}
else
{
dict[c] = new IntBox();
}
With .NET 6 you can use CollectionsMarshal.GetValueRefOrAddDefault for a single lookup:
ref int value = ref CollectionsMarshal.GetValueRefOrAddDefault(dict, c, out bool exists);
if(exists) value++; // changes the value in the dictionary even if it's a value type
Demo: https://dotnetfiddle.net/tnW9P5

How to index the Values property of C# Dictionary

Using the Values property of C# Dictionary,
var myDict = Dictionary < string, object> ;
How would I get the values in
myDict.Values
I tried
var theVales = myDict.Values ;
object obj = theValues[0] ;
But that is a syntax error.
Added:
I am trying to compare the values in two dictionaries that have
the same keys
You can't. The values do not have a fixed order. You could write the values into a new List<object> and index them there, but obviously that's not terribly helpful if the dictionary's contents change frequently.
You can also use linq: myDict.Values.ElementAt(0) but:
The elements will change position as the dictionary grows
It's really inefficient, since it's just calling foreach on the Values collection for the given number of iterations.
You could also use SortedList<TKey, TValue>. That maintains the values in order according to the key, which may or may not be what you want, and it allows you to access the values by key or by index. It has very unfortunate performance characteristics in certain scenarios, however, so be careful about that!
Here's a linq solution to determine if the values for matching keys also match. This only works if you're using the default equality comparer for the key type. If you're using a custom equality comparer, you can do this with method call syntax.
IEnumerable<bool> matches =
from pair1 in dict1
join pair2 in dict2
on pair1.Key equals pair2.Key
select pair1.Value.Equals(pair2.Value)
bool allValuesMatch = matches.All();
If you require that all items in one dictionary have a matching item in the other, you could do this:
bool allKeysMatch = new HashSet(dict1.Values).SetEquals(dict2.ValueS);
bool dictionariesMatch = allKeysMatch && allValuesMatch;
Well, you could use Enumerable.ElementAt if you really had to, but you shouldn't expect the order to be stable or meaningful. Alternatively, call ToArray or ToList to take a copy.
Usually you only use Values if you're going to iterate over them. What exactly are you trying to do here? Do you understand that the order of entries in a Dictionary<,> is undefined?
EDIT: It sounds like you want something like:
var equal = dict1.Count == dict2.Count &&
dict1.Keys.All(key => ValuesEqual(key, dict1, dict2));
...
private static bool ValuesEqual<TKey, TValue>(TKey key,
IDictionary<TKey, TValue> dict1,
IDictionary<TKey, TValue> dict2)
{
TValue value1, value2;
return dict1.TryGetValue(out value1) && dict2.TryGetValue(out value2) &&
EqualityComparer<TValue>.Default.Equals(value1, value2);
}
EDIT: Note that this isn't as fast as it could be, because it performs lookups on both dictionaries. This would be more efficient, but less elegant IMO:
var equal = dict1.Count == dict2.Count &&
dict1.All(pair => ValuesEqual(pair.Key, pair.Value, dict2));
...
private static bool ValuesEqual<TKey, TValue>(TKey key, TValue value1,
IDictionary<TKey, TValue> dict2)
{
TValue value2;
return dict2.TryGetValue(out value2) &&
EqualityComparer<TValue>.Default.Equals(value1, value2);
}
To add to #JonSkeet's answer, Dictionary<TKey, TValue> is backed by a HashTable, which is an un-ordered data structure. The index of the values is therefore meaningless- it is perfectly valid to get, say, A,B,C with one call and C,B,A with the next.
EDIT:
Based on the comment you made on JS's answer ("I am trying to compare the values in two dictionaries with the same keys"), you want something like this:
public boolean DictionariesContainSameKeysAndValues<TKey, TValue>(Dictionary<TKey, TValue> dict1, Dictionary<TKey, TValue> dict2) {
if (dict1.Count != dict2.Count) return false;
for (var key1 in dict1.Keys)
if (!dict2.ContainsKey(key1) || !dict2[key1].Equals(dict1[key1]))
return false;
return true;
}
You could use an Indexer propertie to lookup the string Key.
It is still not an Index but one more way:
using System.Collections.Generic;
...
class Client
{
private Dictionary<string, yourObject> yourDict
= new Dictionary<string, yourObject>();
public void Add (string id, yourObject value)
{ yourDict.Add (id, value); }
public string this [string id] // indexer
{
get { return yourDict[id]; }
set { yourDict[id] = value; }
}
}
public class Test
{
public static void Main( )
{
Client client = new Client();
client.Add("A1",new yourObject() { Name = "Bla",...);
Console.WriteLine ("Your result: " + client["A1"]); // indexer access
}
}

Retrieving the key of a value from a hash table c#

I have a hash table that contains values of a^j. j is the key and a^j is the value.
I am now calculating another value a^m. I basically want to see if a^m is in the hash table.
I used the ContainsValue fn. to find the value. How would i go about finding out the key of the value?
Here is a little snippet of where i want to implement the search for the value.
Dictionary<BigInteger, BigInteger> b = new Dictionary<BigInteger, BigInteger>();
***add a bunch of BigIntegers into b***
for(int j=0; j < n; j++)
{
z = q* BigInteger.ModPow(temp,j,mod);
***I want to implement to search for z in b here****
}
Does this change anything? the fact that i am searching while inside a for loop?
The fastest way is probably to iterate through the hashtable's DictionaryEntry items to find the value, which in turn gives you the key. I don't see how else to do it.
Firstly, you should absolutely be using Dictionary<TKey, TValue> instead of Hashtable - if you're using BigInteger from .NET 4, there's no reason not to use generic collections everywhere you can. Chances are for the most part you'd see no difference in how it's used - just create it with:
Dictionary<BigInteger, BigInteger> map =
new Dictionary<BigInteger, BigInteger>();
to start with. One thing to watch out for is that the indexer will throw an exception if the key isn't present in the map - use TryGetValue to fetch the value if it exists and a bool to say whether or not it did exist.
As for finding the key by value - there's no way to do that efficiently from a Dictionary. You can search all the entries, which is most easily done with LINQ:
var key = map.Where(pair => pair.Value == value)
.Select(pair => pair.Key)
.First();
but that will iterate over the whole dictionary until it finds a match, so it's an O(n) operation.
If you want to do this efficiently, you should keep two dictionaries - one from a to a^j and one from a^j to a. When you add an entry, add it both ways round. Somewhere on Stack Overflow I've got some sample code of a class which does this for you, but I doubt I'd be able to find it easily. EDIT: There's one which copes with multiple mappings here; the "single mapping" version is in the answer beneath that one.
Anyway, once you've got two dictionaries, one in each direction, it's easy - obviously you'd just lookup a^m as a key in the second dictionary to find the original value which created it.
Note that you'll need to consider whether it's possible for two original keys to end up with the same value - at that point you obviously wouldn't be able to have both mappings in one reverse dictionary (unless it was a Dictionary<BigInteger, List<BigInteger>> or something similar).
Edit: Changed to use Dictionary<TKey, TValue>
Dictionary<TKey, TValue> is an IEnumerable<KeyValuePair<TKey, TValue>>. If you do a foreach over it directly, you can get both the key and value for each entry.
class SomeType
{
public int SomeData = 5;
public override string ToString()
{
return SomeData.ToString();
}
}
// ...
var blah = new Dictionary<string, SomeType>();
blah.Add("test", new SomeType() { SomeData = 6 });
foreach (KeyValuePair<string, SomeType> item in blah)
{
if(e.Value.SomeData == 6)
{
Console.WriteLine("Key: {0}, Value: {1}", item.Key, item.Value);
}
}
If you have a newer version of the .Net framework, you could use Linq to find your matches, and place them in their own collection. Here's a code sample showing a little bit of Linq syntax:
using System;
using System.Collections;
using System.Linq;
class SomeType
{
public int SomeData = 5;
public override string ToString()
{
return SomeData.ToString();
}
}
class Program
{
static void Main(string[] args)
{
var blah = new Dictionary<string, SomeType>();
blah.Add("test", new SomeType() { SomeData = 6 });
// Build an enumeration of just matches:
var entriesThatMatchValue = blah
.Where(e => e.Value.SomeData == 6);
foreach (KeyValuePair<string, SomeType> item in entriesThatMatchValue)
{
Console.WriteLine("Key: {0}, Value: {1}", item.Key, item.Value);
}
// or: ...
// Build a sub-enumeration of just keys from matches:
var keysThatMatchValue = entriesThatMatchValue.Select(e => e.Key);
// Build a list of keys from matches in-line, using method chaining:
List<string> matchingKeys = blah
.Where(e => e.Value.SomeData == 6)
.Select(e => e.Key)
.ToList();
}
}
private object GetKeyByValue(object searchValue)
{
foreach (DictionaryEntry entry in myHashTable)
{
if (entry.Value.Equals(searchValue))
{
return entry.Key;
}
}
return null;
}

How do I use HashSet<T> as a dictionary key?

I wish to use HashSet<T> as the key to a Dictionary:
Dictionary<HashSet<T>, TValue> myDictionary = new Dictionary<HashSet<T>, TValue>();
I want to look up values from the dictionary such that two different instances of HashSet<T> that contain the same items will return the same value.
HashSet<T>'s implementations of Equals() and GetHashCode() don't seem to do this (I think they're just the defaults). I can override Equals() to use SetEquals() but what about GetHashCode()? I feel like I am missing something here...
You could use the set comparer provided by HashSet<T>:
var myDictionary = new Dictionary<HashSet<T>, TValue>(HashSet<T>.CreateSetComparer());
digEmAll's answer is clearly the better choice in practice, since it uses built in code instead of reinventing the wheel. But I'll leave this as a sample implementation.
You can use implement an IEqualityComparer<HashSet<T>> that uses SetEquals. Then pass it to the constructor of the Dictionary. Something like the following(Didn't test it):
class HashSetEqualityComparer<T>: IEqualityComparer<HashSet<T>>
{
public int GetHashCode(HashSet<T> hashSet)
{
if(hashSet == null)
return 0;
int h = 0x14345843; //some arbitrary number
foreach(T elem in hashSet)
{
h = unchecked(h + hashSet.Comparer.GetHashCode(elem));
}
return h;
}
public bool Equals(HashSet<T> set1, HashSet<T> set2)
{
if(set1 == set2)
return true;
if(set1 == null || set2 == null)
return false;
return set1.SetEquals(set2);
}
}
Note that the hash function here is commutative, that's important because the enumeration order of the elements in the set is undefined.
One other interesting point is that you can't just use elem.GetHashCode since that will give wrong results when a custom equality comparer was supplied to the set.
You can provide a IEqualityComparer<HashSet<T>> to the Dictionary constructor and make the desired implementation in that comparer.

Compacting a WeakReference Dictionary

I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.
So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.
static Foo CreateFoo(int id) {
Foo foo;
if (!cache.TryGetValue(id, out foo)) {
foo = new Foo(id);
foo.Initialize(...);
cache.Put(id, foo);
}
return foo;
}
The cache is implemented as a Dictionary<TKey,WeakReference>, based on #JaredPar's Building a WeakReference Hashtable:
class WeakDictionary<TKey, TValue> where TValue : class {
private readonly Dictionary<TKey, WeakReference> items;
public WeakDictionary() {
this.items = new Dictionary<TKey, WeakReference>();
}
public void Put(TKey key, TValue value) {
this.items[key] = new WeakReference(value);
}
public bool TryGetValue(TKey key, out TValue value) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef)) {
value = null;
return false;
} else {
value = (TValue)weakRef.Target;
return (value != null);
}
}
}
The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by #Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.
My question is: What's the best strategy to compact a WeakReference Dictionary?
The options that I see are:
Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.
Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).
Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?
Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.
Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.
Are there other strategies?
The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?
If you can switch the managed object to be the key of the dictionary, then you can use .Net 4.0's ConditionalWeakTable (namespace System.Runtime.CompilerServices).
According to Mr. Richter, ConditionalWeakTable is notified of object collection by the garbage collector rather than using a polling thread.
static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();
void Window_Loaded(object sender, RoutedEventArgs e)
{
...
dataGrid.SelectionChanged += (_sender, _e) =>
{
var cs = dataGrid.SelectedItem as ClientSession;
this.tabControl.Items.Clear();
foreach (var tid in cs.GetThreadIDs())
{
tid.tabItem = new TabItem() { Header = ... };
tid.tabItem.AddHandler(UIElement.MouseDownEvent,
new MouseButtonEventHandler((__sender, __e) =>
{
tabControl_SelectionChanged(tid.tabItem);
}), true);
tidByTab.Add(tid.tabItem, tid);
this.tabControl.Items.Add(tid.tabItem);
}
};
}
void tabControl_SelectionChanged(TabItem tabItem)
{
this.tabControl.SelectedItem = tabItem;
if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
{
tidExec.EnsureBlocksLoaded();
ShowStmt(tidExec.CurrentStmt);
}
else
throw new Exception("huh?");
}
What's important here is that the only thing referencing the TabItem object is the tabControls.Items collection, and the key of ConditionalWeakTable. The key of ConditionalWeakTable does not count. So when we clear all the items from the tabControl, then those TabItems can be garbage-collected (because nothing references them any longer, again the key of ConditionalWeakTable does not count). When they are garabage collected, ConditionalWeakTable is notified and the entry with that key value is removed. So my bulky TIDExec objects are also garbage-collected at that point (nothing references them, except the value of ConditionalWeakTable).
Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.
Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.
But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.
After some discussion:
Does such a[n amortized] strategy
exist?
I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.
Approach #5 is interesting, but has the disadvantage that it could be difficult to know what the real level of hash-table utilization is, and consequently when the hash table should be expanded. That difficulty might be overcome if, whenever it "seems" like the hash table should be expanded, one first does a whole-table scan to remove dead entries. If more than half of the entries in the table were dead, don't bother expanding it. Such an approach should yield amortized O(1) behavior, since one wouldn't do the whole-table scan until one had added back as many entries as had been deleted.
A simpler approach, which would also yield O(1) amortized time and O(1) space per recently-live element would be to keep a count of how many items were alive after the last time the table was purged, and how many elements have been added since then. Whenever the latter count exceeds the first, do a whole-table scan-and-purge. The time required for a scan and purge will be proportional to the number of elements added between purges, thus retaining amortized O(1) time, and the number of total elements in the collection will not exceed twice the number of elements that were recently observed to be alive, so the number of dead elements cannot exceed twice the number of recently-live elements.
I had this same problem, and solved it like this (WeakDictionary is the class I was trying to clean up):
internal class CleanerRef
{
~CleanerRef()
{
if (handle.IsAllocated)
handle.Free();
}
public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
{
handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
Dictionary = dictionary;
}
public bool IsAlive
{
get {return handle.IsAllocated && handle.Target != null;}
}
public object Target
{
get {return IsAlive ? handle.Target : null;}
}
GCHandle handle;
public WeakDictionary Dictionary;
}
internal class WeakDictionaryCleaner
{
public WeakDictionaryCleaner(WeakDictionary dict)
{
refs.Add(new CleanerRef(this, dict));
}
~WeakDictionaryCleaner()
{
foreach(var cleanerRef in refs)
{
if (cleanerRef.Target == this)
{
cleanerRef.Dictionary.ClearGcedEntries();
refs.Remove(cleanerRef);
break;
}
}
}
private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}
What this two classes try to achieve is to "hook" the GC. You activate this mechanism by creating an instance of WeakDictionaryCleaner during the construction of the weak collection:
new WeakDictionaryCleaner(weakDictionary);
Notice that I don't create any reference to the new instance, so that the GC will dispose it during the next cycle. In the ClearGcedEntries() method I create a new instance again, so that each GC cycle will have a cleaner to finalize that in turn will execute the collection compaction.
You can make the CleanerRef.Dictionary also a weak reference so that it won't hold the dictionary in memory.
Hope this helps
I guess this is a right place to put it, even though it might look like necromancy. Just in case someone stumbles upon this question like I did. Lack of a dedicated Identity Map in .net is somewhat surprising, and I feel the most natural way for it work is as described in the last option: when the table is full and about to double its capacity, it checks to see if there is enough dead entries that can be recycled for further use so that growing is not necessary.
static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));
The class exposes just one public method Get with key and optional value parameter that lazily loads and caches an entity if it is not in the cache.
using System;
class IdentityMap<TKey, TValue>
where TKey : IEquatable<TKey>
where TValue : class
{
Func<TValue, TKey> key_selector;
WeakReference<TValue>[] references;
int[] buckets;
int[] bucket_indexes;
int tail_index;
int entries_count;
int capacity;
public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
this.key_selector = key_selector;
Init(capacity);
}
void Init(int capacity) {
this.bucket_indexes = new int[capacity];
this.buckets = new int[capacity];
this.references = new WeakReference<TValue>[capacity];
for (int i = 0; i < capacity; i++) {
bucket_indexes[i] = -1;
buckets[i] = i - 1;
}
this.tail_index = capacity - 1;
this.entries_count = 0;
this.capacity = capacity;
}
public TValue Get(TKey key, Func<TValue> value = null) {
int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
var ret = WalkBucket(bucket_index, true, key);
if (ret == null && value != null) Add(bucket_index, ret = value());
return ret;
}
void Add(int bucket_index, TValue value) {
if (this.entries_count == this.capacity) {
for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
if (this.entries_count * 2 > this.capacity) {
var old_references = references;
Init(this.capacity * 2);
foreach (var old_reference in old_references) {
TValue old_value;
if (old_reference.TryGetTarget(out old_value)) {
int hash = key_selector(value).GetHashCode();
Add(Math.Abs(hash % this.capacity), old_value);
}
}
}
}
int new_index = this.tail_index;
this.tail_index = buckets[this.tail_index];
this.entries_count += 1;
buckets[new_index] = bucket_indexes[bucket_index];
if (references[new_index] != null) references[new_index].SetTarget(value);
else references[new_index] = new WeakReference<TValue>(value);
bucket_indexes[bucket_index] = new_index;
}
TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
int curr_index = bucket_indexes[bucket_index];
int prev_index = -1;
while (curr_index != -1) {
TValue value;
int next_index = buckets[curr_index];
if (references[curr_index].TryGetTarget(out value)) {
if (is_searching && key_selector(value).Equals(key)) return value;
prev_index = curr_index;
} else {
if (prev_index != -1) buckets[prev_index] = next_index;
else bucket_indexes[bucket_index] = next_index;
buckets[curr_index] = this.tail_index;
this.tail_index = curr_index;
this.entries_count -= 1;
}
curr_index = next_index;
}
return null;
}
}
You could remove the "invalid" WeakReference inside TryGetValue:
[Edit] My mistake, these solutions actually do nothing more than what you suggested, since Put method will swap the old object with the new one anyway. Just ignore it.
public bool TryGetValue(TKey key, out TValue value) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef)) {
value = null;
return false;
} else {
value = (TValue)weakRef.Target;
if (value == null)
this.items.Remove(key);
return (value != null);
}
}
Or, you can immediatelly create a new instance inside your dictionary, whenever it is needed:
public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef) {
Tvalue result = ctor(key);
this.Put(key, result);
return result;
}
value = (TValue)weakRef.Target;
if (value == null)
{
Tvalue result = ctor(key);
this.Put(key, result);
return result;
}
return value;
}
You would then use it like this:
static Foo CreateFoo(int id)
{
return cache.GetOrCreate(id, id => new Foo(id));
}
[Edit]
According to windbg, WeakReference instance alone occupies 16 bytes. For 100,000 collected objects, this would not be such a serious burden, so you could easily let them live.
If this is a server app and you believe you could benefit from collecting, I would consider going for a background thread, but also implementing a simple algorithm to increase waiting time whenever you collect a relatively small number of objects.
A little specialization: When target classes know the weak dictionary reference and its TKey value, you can remove its entry from finalyzer call.
public class Entry<TKey>
{
TKey key;
Dictionary<TKey, WeakReference> weakDictionary;
public Entry(Dictionary<TKey, WeakReference> weakDictionary, TKey key)
{
this.key = key;
this.weakDictionary = weakDictionary;
}
~Entry()
{
weakDictionary.Remove(key);
}
}
When cached objects are subclass of Entry<TKey>, no empty WeakReference leaks
since finalyzer is called after its instance was garbage collected.

Categories