Dictionary with item limit - c#

I need to provide access to a Key/Value pair store that persists for all users across session.
I could easily create a singleton for this, but for performance reasons I want to limit the size of the dictionary to 10000 items (or any performant number, as the object will persist indefinitely)
Is there a form of dictionary where I can specify a limit to the number of objects stored, and when that limit is exceeded, remove the oldest entry?

There is no such built-in dictionary, but you can build your own. You will need a queue for keys - that will allow you quickly find oldest entry and remove it. Also you will need a simple dictionary for keeping your values - that will allow you quickly search for them:
public class SuperDictionary<TKey, TValue>
{
private Dictionary<TKey, TValue> dictionary;
private Queue<TKey> keys;
private int capacity;
public SuperDictionary(int capacity)
{
this.keys = new Queue<TKey>(capacity);
this.capacity = capacity;
this.dictionary = new Dictionary<TKey, TValue>(capacity);
}
public void Add(TKey key, TValue value)
{
if (dictionary.Count == capacity)
{
var oldestKey = keys.Dequeue();
dictionary.Remove(oldestKey);
}
dictionary.Add(key, value);
keys.Enqueue(key);
}
public TValue this[TKey key]
{
get { return dictionary[key]; }
}
}
NOTE: You can implement IDictionary<TKey,TValue> interface, to make this class a 'true' dictionary.

Use the Cache, rather than Session. It's not user specific, and you can set the maximum size of the cache. When new items are added and the cache is full, it'll remove items to make space. It allows for sophisticated aging mechanisms, such as items being removed after a fixed period of time, a fixed period of time after their last use, priorities (to be taken into consideration when deciding what to remove), etc.

No, there is no built-in dictionary that does this. In fact, all of the generic collections are infinite-sized.
However, you could easily make a Queue<KeyValuePair<string, int>> and a function that checks the count and performs a dequeue when an element is added and the length is too long. Dictionary is a difficult choice here because there is no way to determine "age" (unless you make it part of the key or value).
Something like:
public void AddDataToDictionary(string key, int value)
{
if (queue.Count > 10000)
queue.Dequeue();
queue.Enqueue(new KeyValuePair(key, value);
}

Here's a dictionary implementation that has the following removal strategies:
EmptyRemovalStrategy<TKey> – Removes the first item in it’s internal collection. Does not track access in any way.
MruRemovalStrategy<TKey> – Removes the most recently used (most accessed) item in the CacheDictionary.
LruRemovalStrategy<TKey> – Removes the least recently used (least accessed) item in the CacheDictionary.
The CacheDictionary is a dictionary with a limited number of items. So you'd be able to specify a max size of 1000. With this implementation you would also be able to determine the "age" of an entry and remove the least used (hence a cache)
http://alookonthecode.blogspot.com/2012/03/implementing-cachedictionarya.html

Related

C# how to avoid duplicates in a list?

What way I could use to avoid duplicates in a list?
One way is when I will add a new item, check first if the element exists, but this make me use more code and iterate all the list to check if it exists.
Another way I could use a hashset, that if I try to add a new item, itself check if the item exists, if not, it will add the new item, if exists, then do nothing.
But I know that the hashset is less efficient, need more resources than a list, so I don't know if using a hashset to avoid duplicates it is a good use of the hashset.
There are any other alternative?
Thanks.
You can achieve this in a single line of code :-
List<long> longs = new List<long> { 1, 2, 3, 4, 3, 2, 5 };
List<long> unique = longs.Distinct().ToList();
unique will contains only 1,2,3,4,5
You cannot avoid duplicates in List. No way - there is no verification of items.
If you don't bother with order of items - use HashSet.
If you want to preserve order of items (actually there is a little ambiguity - should item appear at index of first addition or at index of last addition). But you want to be sure that all items are unique, then you should write your own List class. I.e. something which implements IList<T> interface:
public class ListWithoutDuplicates<T> : IList<T>
And you have different options here. E.g. you should decide what is more important for you - fast addition or memory consumption. Because for fast addition and contains operation you should use some hash-based data structure. Which is unordered. Here is sample implementation with HashSet for storing hashes of all items stored in the internal list. You will need following fields:
private readonly HashSet<int> hashes = new HashSet<int>();
private readonly List<T> items = new List<T>();
private static readonly Comparer<T> comparer = Comparer<T>.Default;
Adding items is simple (warning: no null-checks here and further) - use item hash code to quickly O(1) check if it's already added. Use same approach for removing items:
public void Add(T item)
{
var hash = item.GetHashCode();
if (hashes.Contains(hash))
return;
hashes.Add(hash);
items.Add(item);
}
public bool Remove(T item)
{
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
return false;
hashes.Remove(item.GetHashCode());
return items.Remove(item);
}
Some index-based operations:
public int IndexOf(T item)
{
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
return -1;
return items.IndexOf(item);
}
public void Insert(int index, T item)
{
var itemAtIndex = items[index];
if (comparer.Compare(item, itemAtIndex) == 0)
return;
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
{
hashes.Remove(itemAtIndex.GetHashCode());
items[index] = item;
hashes.Add(hash);
return;
}
throw new ArgumentException("Cannot add duplicate item");
}
public void RemoveAt(int index)
{
var item = items[index];
hashes.Remove(item.GetHashCode());
items.RemoveAt(index);
}
And left-overs:
public T this[int index]
{
get { return items[index]; }
set { Insert(index, value); }
}
public int Count => items.Count;
public bool Contains(T item) => hashes.Contains(item.GetHashCode());
public IEnumerator<T> GetEnumerator() => items.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => items.GetEnumerator();
That's it. Now you have list implementation which will add item only once (first time). E.g.
var list = new ListWithoutDuplicates<int> { 1, 2, 1, 3, 5, 2, 5, 3, 4 };
Will create list with items 1, 2, 3, 5, 4. Note: if memory consumption is more important than performance, then instead of using hashes use items.Contains operation which is O(n).
BTW What we just did is actually a IList Decorator
A List is a data-structure that may contain duplicates. Duplicate elements are disambiguated by their index.
One way is when I will add a new item, check first if the element exists, but this make me use more code and iterate all the list to check if it exists.
This is possible, but it is error-prone and slow. You will need to iterate through the entire list every time you want to add an element. It is also possible that you will forget to check somewhere in your code.
Another way I could use a hashset, that if I try to add a new item, itself check if the item exists, if not, it will add the new item, if exists, then do nothing.
This is the preferred way. It is best to use the standard library to enforce the contraints that you want.
But I know that the hashset is less efficient, need more resources than a list, so I don't know if using a hashset to avoid duplicates it is a good use of the hashset.
The efficiency depends on what you are trying to do; see https://stackoverflow.com/a/23949528/1256041.
There are any other alternative?
You could implement your own ISet using List. This would make insertion much slower (you would need to iterate the whole collection), but you would gain O(1) random-access.
The hashset is the best way to check if the item exist because it's O(1).
So you can insert the items both in a list and in hashset
and before inserting a new item you check if it's exist in the hashset.

Make an existing Dictionary case insensitive .Net

I know how to make a new dictionary case insensitive with the code below:
var caseInsensitiveDictionary = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
But I'm using WebApi which serializes JSON objects into a class we've created.
public class Notification : Common
{
public Notification();
[JsonProperty("substitutionStrings")]
public Dictionary<string, string> SubstitutionStrings { get; set; }
}
So besides rebuilding the dictionary after receiving the "Notification" object, is there a way to set this dictionary to case insensitive in the first place or after it's been created?
Thanks
So besides rebuilding the dictionary after receiving the "Notification" object, is there a way to set this dictionary to case insensitive in the first place or after it's been created?
No, it is impossible. You need to create a new dictionary.
Currently the dictionary has all of the keys in various different buckets; changing the comparer would mean that a bunch of keys would all suddenly be in the wrong buckets. You'd need to go through each key and re-compute where it needs to go and move it, which is basically the same amount of work as creating a new dictionary would be.
Whenever an item is added to a dictionary, the dictionary will compute its hash code and make note of it. Whenever a dictionary is asked to look up an item, the dictionary will compute the hash code on the item being sought and assume that any item in the dictionary which had returned a different hash code cannot possibly match it, and thus need not be examined.
In order for a dictionary to regard "FOO", "foo", and "Foo" as equal, the hash code function it uses must yield the same value for all of them. If a dictionary was built using a hash function which returns different values for "FOO", "foo", and "Foo", changing to a hash function which yielded the same value for all three strings would require that the dictionary re-evaluate the hash value of every item contained therein. Doing this would require almost as much work as building a new dictionary from scratch, and for that reason .NET does not support any means of changing the hash function associated with a dictionary other than copying all the items from the old dictionary to a new dictionary, abandoning the old one.
Note that one could design a SwitchablyCaseSensitiveComparator whose GetHashCode() method would always return a case-insensitive hash value, but whose Equals method could be switched between case-sensitive and non-case sensitive operation. If one were to implement such a thing, one could add items to a dictionary and then switch between case-sensitive and non-case-sensitive modes. The biggest problem with doing that would be that adding if the dictionary is in case-sensitive mode when two items are added which differ only in case, attempts to retrieve either of those items when the dictionary is in case-insensitive mode might not behave as expected. If populating a dictionary in case-insensitive mode and performing some look-ups in case-sensitive mode should be relatively safe, however.
Try changing your class definition to something like this
public class Notification : Common
{
public Notification()
{
this.substitutionStringsBackingStore =
new Dictionary<string,string>( StringComparer.OrdinalIgnoreCase )
;
}
[JsonProperty("substitutionStrings")]
public Dictionary<string, string> SubstitutionStrings
{
get { return substitutionStringsBackingStore ; }
set { substitutionStringsBackingStore = value ; }
}
private Dictionary<string,string> substitutionStringsBackingStore ;
}
You do have to re-create the dictionary, but this can be done with extensions:
public static class extensions
{
public static Dictionary<string, T> MakeCI<T>(this Dictionary<string, T> dictionary)
{
return dictionary.ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);
}
}
I've specified string type for the key as this is what we want to be CI, but the value can be any type.
You would use it like so:
myDict = myDict.MakeCI();

Is this algorithm implementation LRU or MRU?

I am working on implementing a MRU(Most Recently Used) cache in my project using C#.
I googled some conceptions and implementations about MRU, and its contrary, LRU(Least Recently Used), and found this article http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=626 that describes the implementation of MRU collection in C#.
To confuse me is that I think this implementation is LRU rather than MRU. Could anyone help me to confirm this collection class is MRU or not?
Following code block is the whole MRUCollection class. Thanks.
class MruDictionary<TKey, TValue>
{
private LinkedList<MruItem> items;
private Dictionary<TKey, LinkedListNode<MruItem>> itemIndex;
private int maxCapacity;
public MruDictionary(int cap)
{
maxCapacity = cap;
items = new LinkedList<MruItem>();
itemIndex = new Dictionary<TKey, LinkedListNode<MruItem>>(maxCapacity);
}
public void Add(TKey key, TValue value)
{
if (itemIndex.ContainsKey(key))
{
throw new ArgumentException("An item with the same key already exists.");
}
if (itemIndex.Count == maxCapacity)
{
LinkedListNode<MruItem> node = items.Last;
items.RemoveLast(); //Why do we move the last than first here? The node accessed recently is moved to the front of list.
itemIndex.Remove(node.Value.Key);
}
LinkedListNode<MruItem> newNode = new LinkedListNode<MruItem>(new MruItem(key, value));
items.AddFirst(newNode);
itemIndex.Add(key, newNode);
}
public bool TryGetValue(TKey key, out TValue value)
{
LinkedListNode<MruItem> node;
if (itemIndex.TryGetValue(key, out node))
{
value = node.Value.Value;
items.Remove(node);
items.AddFirst(node);
return true;
}
value = default(TValue);
return false;
}
}
class MruItem
{
private TKey _key;
private TValue _value;
public MruItem(TKey k, TValue v)
{
_key = key;
_value = v;
}
public TKey Key
{
get { return _key; }
}
public TValue Value
{
get { return _value; }
}
}
http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
Most Recently Used (MRU): discards, in contrast to LRU, the most recently used items first.
According my understanding, as the node accessed recently is moved to the front of list, if the cache is full, we should remove the first node of list rather than last.
It looks to me like an MRU implementation. Notice how searches start from the beginning of the linked list and go back, and whenever a node is accessed it's moved to the front of the list. In Add(), the node is added using AddFirst(), and in TryGetValue(), it removes the node and adds it to the front of the list.
Based on what is documented here: http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
It's LRU. Think about the items being a "ordered" list.
The most recently used item is at the "front".
When a new item is added they call items.AddFirst(newNode); which adds it to the front of the list.
When an item is "touched", they move it to the front of the list using these calls:
items.Remove(node);
items.AddFirst(node);
When the list is full, it pushes the "last" / "oldest" item from the list using items.RemoveLast();
The cache is removing the "least recently used" items first when it hits capacity.
Microsoft's "MRU" lists correctly use an LRU cache replacement algorithm.
Note that Microsoft in this case uses different terminology for MRU lists than the cache community.
The cache community uses MRU / LRU to talk about replacement (or eviction) strategies. When your cache is full, and you need to put a new item in the list, which item should be removed from the list?
Microsoft provides tools for getting the most recently used items, like for a drop down or a recent documents list.
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/mru-source-list-functions
https://www.codeproject.com/articles/78/most-recently-used-list-in-a-combobox
This means that to correctly implement an MRU list, you need to implement an LRU Cache eviction strategy.

Reverse lookup a nested dictionary

Dictionary<string, Dictionary<string, ... >...> nestedDictionary;
Above Dictionary has a one-to-many relationship at each level from top to bottom. Adding an item is pretty easy since we have the leaf object and we start from bottom, creating dictionaries and adding each to the relevant parent...
My problem is when I want to find an item at the inner Dictionaries. There are two options:
Nested foreach and find the item
then snapshot all the loops at the
moment we found the item and exit
all loops. Then we know item
pedigree is
string1->string2->...->stringN.
Problems with this solution is A)
Performance B) Thread-safety (since I want to remove the item, the parent if it has no child and it's parent if it has no child...)
Creating a reverse look-up dictionary
and indexing added items. Something
like a Tuple for all outer
dictionaries. Then add the item as
key and all the outer parents as
Tuple members. Problem: A)
Redundancy B) Keeping synchronized
reverse look-up Dictionary with
main Dictionary.
Any idea for a fast and thread-safe solution?
It looks like you actually have more than two levels of Dictionary. Since you cannot support a variable number of dictionaries using this type syntax:
Dictionary<string, Dictionary<string, ... >...> nestedDictionary;
I can only assume that it is some number greater than two. Let's say that it's three. For any data structure you construct, you have an intended use and operations that you want to perform efficiently.
I'm going to assume you need calls like this:
var dictionary = new ThreeLevelDictionary();
dictionary.Add(string1, string2, string3, value);
var value = dictionary[string1, string2, string3];
dictionary.Remove(string1, string2, string3);
And (critical to the question) the reverse lookup you are describing:
var strings = dictionary.FindKeys(value);
If these are the operations that you need to perform and to perform quickly, then one data structure that you can use is a Dictionary with a Tuple key:
public class ThreeLevelDictionary<TValue> : Dictionary<Tuple<string, string, string>, TValue>
{
public void Add(string s1, string s2, string s3, TValue value)
{
Add(Tuple.Create(s1, s2, s3), value);
}
public TValue this[string s1, string s2, string s3]
{
get { return this[Tuple.Create(s1, s2, s3)]; }
set { value = this[Tuple.Create(s1, s2, s3)]; }
}
public void Remove(string s1, string s2, string s3)
{
Remove(Tuple.Create(s1, s2, s3);
}
public IEnumerable<string> FindKeys(TValue value)
{
foreach (var key in Keys)
{
if (EqualityComparer<TValue>.Default.Equals(this[key], value))
return new string[] { key.Item1, key.Item2, key.Item3 };
}
throw new InvalidOperationException("missing value");
}
}
Now you are perfectly positioned to create a reverse-lookup hashtable using another Dictionary if performance indicates that this is a bottleneck.
If the previous liked operations are not the ones you want to perform, then this data structure might not meet your needs. Either way, if you describe the interface first that summarizes what you want the data structure to do, then it's easier to see if there are other alternatives.
Although I have little direct experience with the C5 collection library, it sounds like you could use their TreeDictionary class. It comes with a whole suite of useful methods for finding, iterating and modifying the tree, and is surprisingly well documented.
Another option would be to use the QuickGraph library (which you can find in NuGet or on codeplex). This involves some knowledge of graph theory but is otherwise a very useful library.
Both libraries require you to handle concurrency, just like the standard BCL collections.

Removing duplicate string from List (.NET 2.0!)

I'm having issues finding the most efficient way to remove duplicates from a list of strings (List).
My current implementation is a dual foreach loop checking the instance count of each object being only 1, otherwise removing the second.
I know there are MANY other questions out there, but they all the best solutions require above .net 2.0, which is the current build environment I'm working in. (GM and Chrysler are very resistant to changes ... :) )
This limits the possible results by not allowing any LINQ, or HashSets.
The code I'm using is Visual C++, but a C# solution will work just fine as well.
Thanks!
This probably isn't what you're looking for, but if you have control over this, the most efficient way would be to not add them in the first place...
Do you have control over this? If so, all you'd need to do is a myList.Contains(currentItem) call before you add the item and you're set
You could do the following.
List<string> list = GetTheList();
Dictionary<string,object> map = new Dictionary<string,object>();
int i = 0;
while ( i < list.Count ) {
string current = list[i];
if ( map.ContainsKey(current) ) {
list.RemoveAt(i);
} else {
i++;
map.Add(current,null);
}
}
This has the overhead of building a Dictionary<TKey,TValue> object which will duplicate the list of unique values in the list. But it's fairly efficient speed wise.
I'm no Comp Sci PhD, but I'd imagine using a dictionary, with the items in your list as the keys would be fast.
Since a dictionary doesn't allow duplicate keys, you'd only have unique strings at the end of iteration.
Just remember when providing a custom class to override the Equals() method in order for the Contains() to function as required.
Example
List<CustomClass> clz = new List<CustomClass>()
public class CustomClass{
public bool Equals(Object param){
//Put equal code here...
}
}
If you're going the route of "just don't add duplicates", then checking "List.Contains" before adding an item works, but its O(n^2) where n is the number strings you want to add. Its no different from your current solution using two nested loops.
You'll have better luck using a hashset to store items you've already added, but since you're using .NET 2.0, a Dictionary can substitute for a hash set:
static List<T> RemoveDuplicates<T>(List<T> input)
{
List<T> result = new List<T>(input.Count);
Dictionary<T, object> hashSet = new Dictionary<T, object>();
foreach (T s in input)
{
if (!hashSet.ContainsKey(s))
{
result.Add(s);
hashSet.Add(s, null);
}
}
return result;
}
This runs in O(n) and uses O(2n) space, it will generally work very well for up to 100K items. Actual performance depends on the average length of the strings -- if you really need to maximum performance, you can exploit some more powerful data structures like tries make inserts even faster.

Categories