Compacting a WeakReference Dictionary - c#

I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.
So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.
static Foo CreateFoo(int id) {
Foo foo;
if (!cache.TryGetValue(id, out foo)) {
foo = new Foo(id);
foo.Initialize(...);
cache.Put(id, foo);
}
return foo;
}
The cache is implemented as a Dictionary<TKey,WeakReference>, based on #JaredPar's Building a WeakReference Hashtable:
class WeakDictionary<TKey, TValue> where TValue : class {
private readonly Dictionary<TKey, WeakReference> items;
public WeakDictionary() {
this.items = new Dictionary<TKey, WeakReference>();
}
public void Put(TKey key, TValue value) {
this.items[key] = new WeakReference(value);
}
public bool TryGetValue(TKey key, out TValue value) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef)) {
value = null;
return false;
} else {
value = (TValue)weakRef.Target;
return (value != null);
}
}
}
The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by #Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.
My question is: What's the best strategy to compact a WeakReference Dictionary?
The options that I see are:
Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.
Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).
Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?
Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.
Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.
Are there other strategies?
The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?

If you can switch the managed object to be the key of the dictionary, then you can use .Net 4.0's ConditionalWeakTable (namespace System.Runtime.CompilerServices).
According to Mr. Richter, ConditionalWeakTable is notified of object collection by the garbage collector rather than using a polling thread.
static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();
void Window_Loaded(object sender, RoutedEventArgs e)
{
...
dataGrid.SelectionChanged += (_sender, _e) =>
{
var cs = dataGrid.SelectedItem as ClientSession;
this.tabControl.Items.Clear();
foreach (var tid in cs.GetThreadIDs())
{
tid.tabItem = new TabItem() { Header = ... };
tid.tabItem.AddHandler(UIElement.MouseDownEvent,
new MouseButtonEventHandler((__sender, __e) =>
{
tabControl_SelectionChanged(tid.tabItem);
}), true);
tidByTab.Add(tid.tabItem, tid);
this.tabControl.Items.Add(tid.tabItem);
}
};
}
void tabControl_SelectionChanged(TabItem tabItem)
{
this.tabControl.SelectedItem = tabItem;
if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
{
tidExec.EnsureBlocksLoaded();
ShowStmt(tidExec.CurrentStmt);
}
else
throw new Exception("huh?");
}
What's important here is that the only thing referencing the TabItem object is the tabControls.Items collection, and the key of ConditionalWeakTable. The key of ConditionalWeakTable does not count. So when we clear all the items from the tabControl, then those TabItems can be garbage-collected (because nothing references them any longer, again the key of ConditionalWeakTable does not count). When they are garabage collected, ConditionalWeakTable is notified and the entry with that key value is removed. So my bulky TIDExec objects are also garbage-collected at that point (nothing references them, except the value of ConditionalWeakTable).

Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.
Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.
But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.
After some discussion:
Does such a[n amortized] strategy
exist?
I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.

Approach #5 is interesting, but has the disadvantage that it could be difficult to know what the real level of hash-table utilization is, and consequently when the hash table should be expanded. That difficulty might be overcome if, whenever it "seems" like the hash table should be expanded, one first does a whole-table scan to remove dead entries. If more than half of the entries in the table were dead, don't bother expanding it. Such an approach should yield amortized O(1) behavior, since one wouldn't do the whole-table scan until one had added back as many entries as had been deleted.
A simpler approach, which would also yield O(1) amortized time and O(1) space per recently-live element would be to keep a count of how many items were alive after the last time the table was purged, and how many elements have been added since then. Whenever the latter count exceeds the first, do a whole-table scan-and-purge. The time required for a scan and purge will be proportional to the number of elements added between purges, thus retaining amortized O(1) time, and the number of total elements in the collection will not exceed twice the number of elements that were recently observed to be alive, so the number of dead elements cannot exceed twice the number of recently-live elements.

I had this same problem, and solved it like this (WeakDictionary is the class I was trying to clean up):
internal class CleanerRef
{
~CleanerRef()
{
if (handle.IsAllocated)
handle.Free();
}
public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
{
handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
Dictionary = dictionary;
}
public bool IsAlive
{
get {return handle.IsAllocated && handle.Target != null;}
}
public object Target
{
get {return IsAlive ? handle.Target : null;}
}
GCHandle handle;
public WeakDictionary Dictionary;
}
internal class WeakDictionaryCleaner
{
public WeakDictionaryCleaner(WeakDictionary dict)
{
refs.Add(new CleanerRef(this, dict));
}
~WeakDictionaryCleaner()
{
foreach(var cleanerRef in refs)
{
if (cleanerRef.Target == this)
{
cleanerRef.Dictionary.ClearGcedEntries();
refs.Remove(cleanerRef);
break;
}
}
}
private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}
What this two classes try to achieve is to "hook" the GC. You activate this mechanism by creating an instance of WeakDictionaryCleaner during the construction of the weak collection:
new WeakDictionaryCleaner(weakDictionary);
Notice that I don't create any reference to the new instance, so that the GC will dispose it during the next cycle. In the ClearGcedEntries() method I create a new instance again, so that each GC cycle will have a cleaner to finalize that in turn will execute the collection compaction.
You can make the CleanerRef.Dictionary also a weak reference so that it won't hold the dictionary in memory.
Hope this helps

I guess this is a right place to put it, even though it might look like necromancy. Just in case someone stumbles upon this question like I did. Lack of a dedicated Identity Map in .net is somewhat surprising, and I feel the most natural way for it work is as described in the last option: when the table is full and about to double its capacity, it checks to see if there is enough dead entries that can be recycled for further use so that growing is not necessary.
static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));
The class exposes just one public method Get with key and optional value parameter that lazily loads and caches an entity if it is not in the cache.
using System;
class IdentityMap<TKey, TValue>
where TKey : IEquatable<TKey>
where TValue : class
{
Func<TValue, TKey> key_selector;
WeakReference<TValue>[] references;
int[] buckets;
int[] bucket_indexes;
int tail_index;
int entries_count;
int capacity;
public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
this.key_selector = key_selector;
Init(capacity);
}
void Init(int capacity) {
this.bucket_indexes = new int[capacity];
this.buckets = new int[capacity];
this.references = new WeakReference<TValue>[capacity];
for (int i = 0; i < capacity; i++) {
bucket_indexes[i] = -1;
buckets[i] = i - 1;
}
this.tail_index = capacity - 1;
this.entries_count = 0;
this.capacity = capacity;
}
public TValue Get(TKey key, Func<TValue> value = null) {
int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
var ret = WalkBucket(bucket_index, true, key);
if (ret == null && value != null) Add(bucket_index, ret = value());
return ret;
}
void Add(int bucket_index, TValue value) {
if (this.entries_count == this.capacity) {
for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
if (this.entries_count * 2 > this.capacity) {
var old_references = references;
Init(this.capacity * 2);
foreach (var old_reference in old_references) {
TValue old_value;
if (old_reference.TryGetTarget(out old_value)) {
int hash = key_selector(value).GetHashCode();
Add(Math.Abs(hash % this.capacity), old_value);
}
}
}
}
int new_index = this.tail_index;
this.tail_index = buckets[this.tail_index];
this.entries_count += 1;
buckets[new_index] = bucket_indexes[bucket_index];
if (references[new_index] != null) references[new_index].SetTarget(value);
else references[new_index] = new WeakReference<TValue>(value);
bucket_indexes[bucket_index] = new_index;
}
TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
int curr_index = bucket_indexes[bucket_index];
int prev_index = -1;
while (curr_index != -1) {
TValue value;
int next_index = buckets[curr_index];
if (references[curr_index].TryGetTarget(out value)) {
if (is_searching && key_selector(value).Equals(key)) return value;
prev_index = curr_index;
} else {
if (prev_index != -1) buckets[prev_index] = next_index;
else bucket_indexes[bucket_index] = next_index;
buckets[curr_index] = this.tail_index;
this.tail_index = curr_index;
this.entries_count -= 1;
}
curr_index = next_index;
}
return null;
}
}

You could remove the "invalid" WeakReference inside TryGetValue:
[Edit] My mistake, these solutions actually do nothing more than what you suggested, since Put method will swap the old object with the new one anyway. Just ignore it.
public bool TryGetValue(TKey key, out TValue value) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef)) {
value = null;
return false;
} else {
value = (TValue)weakRef.Target;
if (value == null)
this.items.Remove(key);
return (value != null);
}
}
Or, you can immediatelly create a new instance inside your dictionary, whenever it is needed:
public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {
WeakReference weakRef;
if (!this.items.TryGetValue(key, out weakRef) {
Tvalue result = ctor(key);
this.Put(key, result);
return result;
}
value = (TValue)weakRef.Target;
if (value == null)
{
Tvalue result = ctor(key);
this.Put(key, result);
return result;
}
return value;
}
You would then use it like this:
static Foo CreateFoo(int id)
{
return cache.GetOrCreate(id, id => new Foo(id));
}
[Edit]
According to windbg, WeakReference instance alone occupies 16 bytes. For 100,000 collected objects, this would not be such a serious burden, so you could easily let them live.
If this is a server app and you believe you could benefit from collecting, I would consider going for a background thread, but also implementing a simple algorithm to increase waiting time whenever you collect a relatively small number of objects.

A little specialization: When target classes know the weak dictionary reference and its TKey value, you can remove its entry from finalyzer call.
public class Entry<TKey>
{
TKey key;
Dictionary<TKey, WeakReference> weakDictionary;
public Entry(Dictionary<TKey, WeakReference> weakDictionary, TKey key)
{
this.key = key;
this.weakDictionary = weakDictionary;
}
~Entry()
{
weakDictionary.Remove(key);
}
}
When cached objects are subclass of Entry<TKey>, no empty WeakReference leaks
since finalyzer is called after its instance was garbage collected.

Related

Auto increment abstract base class's typeid

So I have a Dictionary<Type, ...> that has become the bottleneck in a tight loop. I'd like to do away with using Type as the key. I'm lucky enough that all the types that end up as a key in this dictionary implement a specific interface that I have access to changing, and I think I'd like to change it to an abstract base class.
I think what I'm ultimately trying to do is have something like this:
public abstract class MyBaseType
{
public virtual int TypeId { get; }
}
Then I would go about having the derived classes from this auto increment the TypeId somehow. Is this possible in an automatic manner? So I don't have to specifically set a number for each one?
What I'm trying to avoid:
public class MyDerived : MyBaseType
{
public override int TypeId => 0;
}
public class MyDerived2: MyBaseTYpe
{
public override int TypeId => 1;
}
...etc
Any ideas?
Edit:
It's the lookup on type that is causing my bottleneck. Would changing this to a type besides Type actually gain me anything? Is it even likely that it's because of GetHashCode() on Type that I'm bottlenecking in the first place?
Edit again:
I've narrowed down the most expensive part to be exactly where I thought it was, in the dictionary index operator:
Is this possible in an automatic manner? So I don't have to specifically set a number for each one?
No.
Would changing this to a type besides Type actually gain me anything?
Why don't you try and find out? Setting up a simple test where you compare different key performances is rather straightforward.
If you bothered to do it you would see that an int key will perform better than a Type key; performance is slightly more than twice faster in the case of an int.
But the most meaningful result from the test is that in my machine, 1 677 000 lookups are performed in under 40 ms for ints, and slightly over 80 ms for types. That is simply blindingly fast (and don't forget those times include the test key lookup too). If thats your bottleneck then you need to begin thinking in parallelizing your work somehow.
Benchmark code:
static void Main(string[] args)
{
var d1 = new Dictionary<int, object>();
var d2 = new Dictionary<Type, object>();
var keys1 = new List<int>();
var keys2 = new List<Type>();
var counter = 0;
var types = Assembly.GetAssembly(typeof(int)).GetTypes();
foreach (var t in types)
{
d1.Add(counter, null);
keys1.Add(counter++);
d2.Add(t, null);
keys2.Add(t);
}
//warmup run. JITTER
benchMarkDictionary(d1, keys1);
//good runs
for (var repetition = 0; repetition < 10; repetition++)
{
Console.WriteLine($"Test #{repetition} --------");
Console.WriteLine($"int key: {benchMarkDictionary(d1, keys1)}");
Console.WriteLine($"type key: {benchMarkDictionary(d2, keys2)}");
}
}
static long benchMarkDictionary<TKey, TValue>(
Dictionary<TKey, TValue> dict,
IList<TKey> keys)
{
var count = dict.Count;
TValue result = default(TValue);
var watch = new Stopwatch();
watch.Start();
for (var lookups = 0; lookups < count * 1000; lookups++)
{
result = dict[keys[lookups % count]];
}
watch.Stop();
Console.WriteLine(result);
return watch.ElapsedMilliseconds;
}
I ended up doing the (very repetitive) work to get the key down to an int and the speed wasn't improved that much. What this tells me is that I just had unrealistic expectations of the cost of the work I was performing. I'll investigate threading as an alternative way to see improvements.

Is this algorithm implementation LRU or MRU?

I am working on implementing a MRU(Most Recently Used) cache in my project using C#.
I googled some conceptions and implementations about MRU, and its contrary, LRU(Least Recently Used), and found this article http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=626 that describes the implementation of MRU collection in C#.
To confuse me is that I think this implementation is LRU rather than MRU. Could anyone help me to confirm this collection class is MRU or not?
Following code block is the whole MRUCollection class. Thanks.
class MruDictionary<TKey, TValue>
{
private LinkedList<MruItem> items;
private Dictionary<TKey, LinkedListNode<MruItem>> itemIndex;
private int maxCapacity;
public MruDictionary(int cap)
{
maxCapacity = cap;
items = new LinkedList<MruItem>();
itemIndex = new Dictionary<TKey, LinkedListNode<MruItem>>(maxCapacity);
}
public void Add(TKey key, TValue value)
{
if (itemIndex.ContainsKey(key))
{
throw new ArgumentException("An item with the same key already exists.");
}
if (itemIndex.Count == maxCapacity)
{
LinkedListNode<MruItem> node = items.Last;
items.RemoveLast(); //Why do we move the last than first here? The node accessed recently is moved to the front of list.
itemIndex.Remove(node.Value.Key);
}
LinkedListNode<MruItem> newNode = new LinkedListNode<MruItem>(new MruItem(key, value));
items.AddFirst(newNode);
itemIndex.Add(key, newNode);
}
public bool TryGetValue(TKey key, out TValue value)
{
LinkedListNode<MruItem> node;
if (itemIndex.TryGetValue(key, out node))
{
value = node.Value.Value;
items.Remove(node);
items.AddFirst(node);
return true;
}
value = default(TValue);
return false;
}
}
class MruItem
{
private TKey _key;
private TValue _value;
public MruItem(TKey k, TValue v)
{
_key = key;
_value = v;
}
public TKey Key
{
get { return _key; }
}
public TValue Value
{
get { return _value; }
}
}
http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
Most Recently Used (MRU): discards, in contrast to LRU, the most recently used items first.
According my understanding, as the node accessed recently is moved to the front of list, if the cache is full, we should remove the first node of list rather than last.
It looks to me like an MRU implementation. Notice how searches start from the beginning of the linked list and go back, and whenever a node is accessed it's moved to the front of the list. In Add(), the node is added using AddFirst(), and in TryGetValue(), it removes the node and adds it to the front of the list.
Based on what is documented here: http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
It's LRU. Think about the items being a "ordered" list.
The most recently used item is at the "front".
When a new item is added they call items.AddFirst(newNode); which adds it to the front of the list.
When an item is "touched", they move it to the front of the list using these calls:
items.Remove(node);
items.AddFirst(node);
When the list is full, it pushes the "last" / "oldest" item from the list using items.RemoveLast();
The cache is removing the "least recently used" items first when it hits capacity.
Microsoft's "MRU" lists correctly use an LRU cache replacement algorithm.
Note that Microsoft in this case uses different terminology for MRU lists than the cache community.
The cache community uses MRU / LRU to talk about replacement (or eviction) strategies. When your cache is full, and you need to put a new item in the list, which item should be removed from the list?
Microsoft provides tools for getting the most recently used items, like for a drop down or a recent documents list.
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/mru-source-list-functions
https://www.codeproject.com/articles/78/most-recently-used-list-in-a-combobox
This means that to correctly implement an MRU list, you need to implement an LRU Cache eviction strategy.

ConcurrentDictionary doesn't seem to mark elements for GC when they are removed

I was surprised to find that my app memory footprint kept growing - the longer it run, the more memory it consumed. So with some magic of windbg I pinpointed a problem to my little LRU cache based on ConcurrentDictionary. The CD has bunch of benefits that were very cool for me (one of which is that its data never ends up in LOH). TryAdd and TryRemove are two methods used to add and evict items. !gcroot of some older element lead me back to my cache. Some investigation with ILSpy led me to this conclusion:
TryRemove does not really remove an element. All it does is changing linked list pointers to skip never assigning the value of an array element to null. This prevent GC from collecting old evicted objects.
Really? Is that a known problem? If so is my only option is TryUpdate(key, null) and then TryRemove(key)? If so then I have to have locking around ConcurrentDictionary access which is oxymoronic.
Here is ILSpy dump:
// System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>
private bool TryRemoveInternal(TKey key, out TValue value, bool matchValue, TValue oldValue)
{
while (true)
{
ConcurrentDictionary<TKey, TValue>.Tables tables = this.m_tables;
int num;
int num2;
this.GetBucketAndLockNo(this.m_comparer.GetHashCode(key), out num, out num2, tables.m_buckets.Length, tables.m_locks.Length);
lock (tables.m_locks[num2])
{
if (tables != this.m_tables)
{
continue;
}
ConcurrentDictionary<TKey, TValue>.Node node = null;
ConcurrentDictionary<TKey, TValue>.Node node2 = tables.m_buckets[num];
while (node2 != null)
{
if (this.m_comparer.Equals(node2.m_key, key))
{
bool result;
if (matchValue && !EqualityComparer<TValue>.Default.Equals(oldValue, node2.m_value))
{
value = default(TValue);
result = false;
return result;
}
if (node == null)
{
Volatile.Write<ConcurrentDictionary<TKey, TValue>.Node>(ref tables.m_buckets[num], node2.m_next);
}
else
{
node.m_next = node2.m_next;
}
value = node2.m_value;
tables.m_countPerLock[num2]--;
result = true;
return result;
}
else
{
node = node2;
node2 = node2.m_next;
}
}
}
break;
}
value = default(TValue);
return false;
}

Remove from Dictionary by Key and Retrieve Value

Is there a way to remove an entry from a Dictionary (by Key) AND retrieve its Value in the same step?
For example, I'm calling
Dictionary.Remove(Key);
but I also want it to return the Value at the same time. The function only returns a bool.
I know I can do something like
Value = Dictionary[Key];
Dictionary.Remove(Key);
but it seems like this will search the dictionary twice (once to get the value, and another time to remove it from the dictionary). How can I (if possible) do both WITHOUT searching the dictionary twice?
Starting with .NET Core 2.0, we have:
public bool Remove (TKey key, out TValue value);
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.remove?view=netcore-2.0#System_Collections_Generic_Dictionary_2_Remove__0__1__
Note this API hasn't been included in .NET Standard 2.0 and .NET Framework 4.7.
Because they both have the desired missing method I tried Microsoft's ConcurrentDictionary and C5 from University of Copenhagen http://www.itu.dk/research/c5/ and I can tell with, at least with my use case it was super slow (I mean 5x - 10x slower) compared to Dictionary.
I think C5 is sorting both keys and values all the time and Concurrent Dictionary is "too worried" about the calling thread.. I am not here to discuss why those two incarnations of Dictionary are slow.
My algorithm was seeking and replacing some entries whereas the first keys would be removed and new keys would be added (some sort of Queue)...
The only think left to do was to modify original .Net mscorelib's Dictionary. I downloaded the source code from Microsoft and included the Dictionary class in my source code. To compile I also need to drag along just the HashHelpers class and ThrowHelper class. All that was left was to comment out some lines (e.g. [DebuggerTypeProxy(typeof(Mscorlib_DictionaryDebugView<,>))] and some resource fetching). Obviously I had to add the missing method to the copied class. Also do not try to compile Microsoft Source code you will be doing that for hours, I was lucky enough to get it going.
public bool Remove(TKey key, out TValue value)
{
if (key == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null)
{
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int bucket = hashCode % buckets.Length;
int last = -1;
for (int i = buckets[bucket]; i >= 0; last = i, i = entries[i].next)
{
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
{
if (last < 0)
{
buckets[bucket] = entries[i].next;
}
else
{
entries[last].next = entries[i].next;
}
entries[i].hashCode = -1;
entries[i].next = freeList;
entries[i].key = default(TKey);
value = entries[i].value;
entries[i].value = default(TValue);
freeList = i;
freeCount++;
version++;
return true;
}
}
}
value = default(TValue);
return false;
}
Lastly I modified the namespace to System.Collection.Generic.My
In my algorithm I only had two lines where I was getting the value than remove it in the next line.. replaced that with the new method and obtained a steady performance gain of 7%-10%.
Hope it helps this use case and any other cases where re-implementing Dictionary from scratch is just not what one should do.
Even though this is not what the OP has asked for, I could not help myself but post a corrected extension method:
public static bool Remove<TKey, TValue>(this Dictionary<TKey, TValue> self, TKey key, out TValue target)
{
self.TryGetValue(key, out target);
return self.Remove(key);
}
The concurrentDictionary has a TryRemove method that attempts to remove and return the value that has the specified key from the System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>.
It returns the default value of the TValue type if key does not exist.
https://msdn.microsoft.com/en-us/library/dd287129(v=vs.110).aspx
You can do it with an Extension method:
public static string GetValueAndRemove<TKey, TValue>(this Dictionary<int, string> dict, int key)
{
string val = dict[key];
dict.Remove(key);
return val;
}
static void Main(string[] args)
{
Dictionary<int, string> a = new Dictionary<int, string>();
a.Add(1, "sdfg");
a.Add(2, "sdsdfgadfhfg");
string value = a.GetValueAndRemove<int, string>(1);
}
You can extend the class to add that functionality:
public class PoppableDictionary<T, V> : Dictionary<T, V>
{
public V Pop(T key)
{
V value = this[key];
this.Remove(key);
return value;
}
}

Updating fields of values in a ConcurrentDictionary

I am trying to update entries in a ConcurrentDictionary something like this:
class Class1
{
public int Counter { get; set; }
}
class Test
{
private ConcurrentDictionary<int, Class1> dict =
new ConcurrentDictionary<int, Class1>();
public void TestIt()
{
foreach (var foo in dict)
{
foo.Value.Counter = foo.Value.Counter + 1; // Simplified example
}
}
}
Essentially I need to iterate over the dictionary and update a field on each Value. I understand from the documentation that I need to avoid using the Value property. Instead I think I need to use TryUpdate except that I don’t want to replace my whole object. Instead, I want to update a field on the object.
After reading this blog entry on the PFX team blog: Perhaps I need to use AddOrUpdate and simply do nothing in the add delegate.
Does anyone have any insight as to how to do this?
I have tens of thousands of objects in the dictionary which I need to update every thirty seconds or so. Creating new ones in order to update the property is probably not feasible. I would need to clone the existing object, update it and replace the one in the dictionary. I’d also need to lock it for the duration of the clone/add cycle. Yuck.
What I’d like to do is iterate over the objects and update the Counter property directly if possible.
My latest research has led me to to Parallel.ForEach which sounds great but it is not supposed to be used for actions that update state.
I also saw mention of Interlocked.Increment which sounds great but I still need to figure out how to use it on each element in my dictionary in a thread safe way.
First, to solve your locking problem:
class Class1
{
// this must be a variable so that we can pass it by ref into Interlocked.Increment.
private int counter;
public int Counter
{
get{return counter; }
}
public void Increment()
{
// this is about as thread safe as you can get.
// From MSDN: Increments a specified variable and stores the result, as an atomic operation.
Interlocked.Increment(ref counter);
// you can return the result of Increment if you want the new value,
//but DO NOT set the counter to the result :[i.e. counter = Interlocked.Increment(ref counter);] This will break the atomicity.
}
}
Iterating the just values should be faster than iterating the key value pair. [Though I think iterating a list of keys and doing the look-ups will be faster still on the ConcurrentDictionary in most situations.]
class Test
{
private ConcurrentDictionary<int, Class1> dictionary = new ConcurrentDictionary<int, Class1>();
public void TestIt()
{
foreach (var foo in dictionary.Values)
{
foo.Increment();
}
}
public void TestItParallel()
{
Parallel.ForEach(dictionary.Values,x=>x.Increment() );
}
}
ConcurrentDictionary doesn't help you with accessing members of stored values concurrently, just with the elements themselves.
If multiple threads call TestIt, you should get a snapshot of the collection and lock the shared resources (which are the individual dictionary values):
foreach (KeyValuePair<int, Class1> kvp in dict.ToArray())
{
Class1 value = kvp.Value;
lock (value)
{
value.Counter = value.Counter + 1;
}
}
However, if you want to update the counter for a specific key, ConcurrentDictionary can help you with atomically adding a new key value pair if the key does not exist:
Class1 value = dict.GetOrAdd(42, key => new Class1());
lock (value)
{
value.Counter = value.Counter + 1;
}
AddOrUpdate and TryUpdate indeed are for cases in which you want to replace the value for a given key in a ConcurrentDictionary. But, as you said, you don't want to change the value, you want to change a property of the value.
You can use the AddOrUpdate function.
Here is how you can increment the current value by 1:
dict.AddOrUpdate(key, 1, (key, oldValue) => oldValue + 1);

Categories