Retrieving Dictionary Value Best Practices

Retrieving Dictionary Value Best Practices - c#

I just recently noticed Dictionary.TryGetValue(TKey key, out TValue value) and was curious as to which is the better approach to retrieving a value from the Dictionary.
I've traditionally done:
if (myDict.Contains(someKey))
someVal = myDict[someKey];
...
unless I know it has to be in there.
Is it better to just do:
if (myDict.TryGetValue(somekey, out someVal)
...
Which is the better practice? Is one faster than the other? I would imagine that the Try version would be slower as its 'swallowing' a try/catch inside itself and using that as logic, no?

TryGetValue is slightly faster, because FindEntry will only be called once.
How much faster? It depends on the
dataset at hand. When you call the
Contains method, Dictionary does an
internal search to find its index. If
it returns true, you need another
index search to get the actual value.
When you use TryGetValue, it searches
only once for the index and if found,
it assigns the value to your variable.
FYI: It's not actually catching an error.
It's calling:
public bool TryGetValue(TKey key, out TValue value)
{
int index = this.FindEntry(key);
if (index >= 0)
{
value = this.entries[index].value;
return true;
}
value = default(TValue);
return false;
}
ContainsKey is this:
public bool ContainsKey(TKey key)
{
return (this.FindEntry(key) >= 0);
}

Well in fact TryGetValue is faster. How much faster? It depends on the dataset at hand. When you call the Contains method, Dictionary does an internal search to find its index. If it returns true, you need another index search to get the actual value. When you use TryGetValue, it searches only once for the index and if found, it assigns the value to your variable.
Edit:
Ok, I understand your confusion so let me elaborate:
Case 1:
if (myDict.Contains(someKey))
someVal = myDict[someKey];
In this case there are 2 calls to FindEntry, one to check if the key exists and one to retrieve it
Case 2:
myDict.TryGetValue(somekey, out someVal)
In this case there is only one call to FindKey because the resulting index is kept for the actual retrieval in the same method.

I imagine that trygetvalue is doing something more like:
if(myDict.ReallyOptimisedVersionofContains(someKey))
{
someVal = myDict[someKey];
return true;
}
return false;
So hopefully no try/catch anywhere.
I think it is just a method of convenience really. I generally use it as it saves a line of code or two.

public bool TryGetValue(TKey key, out TValue value)
{
int index = this.FindEntry(key);
if (index >= 0)
{
value = this.entries[index].value;
return true;
}
value = default(TValue);
return false;
}
public bool ContainsKey(TKey key)
{
return (this.FindEntry(key) >= 0);
}
Like you can see TryGetValue is same as ContainsKey + one array lookup.
If your logic is only to check if the key is existing in the Dictionary and nothing else related to this key (taking the value for the key) you should use ContainsKey.
Try also checking this similar question: is-there-a-reason-why-one-should-use-containskey-over-trygetvalue

Related

Best way to store list of string pairs for optimal query performance

Right now I use Dictionary to store some configuration data in my app. The data gets added to Dictionary only once but it gets very frequent queries. Dictionary has around 2500 items, all "keys" are unique.
So right now I have something like this:
private Dictionary<string, string> Data;
public string GetValue(string key) // This gets hit very often
{
string value;
if (this.Data.TryGetValue(key, out value))
{
return value;
}
...
}
Is there more optimal way to do this?

What you have is pretty efficient. The only way to improve performance that I can think of is to use int as the dictionary key, instead of string. You would need to run performance tests to see how much it makes a difference in your use case -- it may or may not be significant.
And I would use an enum for storing the settings for convenience. Of course, this assumes you have a known set of settings.
private Dictionary<int, string> Data;
public string GetValue(MyAppSettingsEnum key)
{
string value;
if (this.Data.TryGetValue((int)key, out value))
{
return value;
}
...
}
Note that I don't use the enum directly as the dictionary key, as it is more efficient to use an int as the key. More details on that issue here.

Using TryGetValue is a pretty optimal way of returning an item so there's not much you can improve on that front. However, if this isn't causing a bottleneck at the moment, I wouldn't worry too much about trying to optimize TryGetValue.
One thing that you can do, but isn't shown in your code so I don't know if you are, is to create a Dictionary object with an estimated capacity. Since you seem to know the rough number of items that will be expected, creating the Dictionary with that capacity will improve performance as it would reduce the number of times .NET has to resize the dictionary.
From MSDN:
If the size of the collection can be estimated, specifying the initial
capacity eliminates the need to perform a number of resizing
operations while adding elements to the Dictionary.

The only faster way is using an array if your keys are int and have a short range.

As you can see from the source code of System.Collections.Generic.Dictionary (available at http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs) the most frequent code used in your case is
private int FindEntry(TKey key) {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}
}
return -1;
}
as you can see further the lookup is fast if the comparer.GetHashCode is fast and produces nice hash code distribution, if possible a perfect hash function.
The dictionary construction code is not visible in your example, but if you use the default constructor then the dictionary will use the default comparer EqualityComparer<string>.Default.
Providing your own comparer with time & space efficient hash function might speed up the code.
If you don't know how a good hash function should look like in your case, then using interned strings may also give you some boost (see http://www.dotnetperls.com/string-intern (or MSDN: String.Intern Method))

Find next incremental value not in existing list using linq

I have two methods in an IntExtensions class to help generate the next available incremental value (which is not in a list of existing integers which need to be excluded).
I dont think I'm addressing the NextIncrementalValueNotInList method in the best way and am wondering if I can better use linq to return the next available int?
public static bool IsInList(this int value, List<int> ListOfIntegers) {
if (ListOfIntegers.Contains(value))
return true;
return false;
}
public static int NextIncrementalValueNotInList(this int value,
List<int> ListOfIntegers) {
int maxResult;
maxResult = ListOfIntegers.Max() + 1;
for (int i = value; i <= maxResult; i++)
{
if (!(i.IsInList(ListOfIntegers)))
{
return i;
}
}
return maxResult;
}

Using linq your method will look like:
return IEnumerable.Range(1, ListOfIntegers.Count + 1)
.Except(ListOfIntegers)
.First();

I guess it starting at 1.
You could also proceed like this:
IEnumerable.Range(1, ListOfIntegers.Count)
.Where(i => !ListOfIntegers.Contains(i))
.Union(new []{ ListOfIntegers.Count + 1 })
.First();

You don't actually need to calculate the Max value - just keep incrementing i until you find a value that doesn't exist in the list, e.g:
public static int NextIncrementalValueNotInList(this int value,
List<int> ListOfIntegers)
{
int i = value;
while(true)
{
if (!(i.IsInList(ListOfIntegers)))
{
return i;
}
i++;
}
return maxResult;
}
. Besides that, I'm not sure if there's much more you can do about this unless:
ListOfIntegers is guaranteed to be, or needs to be, sorted, or
ListOfIntegers doesn't actually need to be a List<int>
If the answer to the first is no, and to the second is yes, then you might instead use a HashSet<int>, which might provide a faster implementation by allowing you to simply use HashSet<T>'s own bool Contains(T) method:
public static int NextIncrementalValueNotInList(this int value,
HashSet<int> ListOfIntegers)
{
int i = value;
while(true)
{
if (!(ListOfIntegers.Contains(i))
{
return value;
}
i++;
}
}
Note that this version shows how to do away with the Max check also.
Although be careful of premature optimisation - if your current implementation is fast enough, then I wouldn't worry. You should properly benchmark any alternative solution with extreme cases as well as real-world cases to see if there's actually any difference.
Also what you don't want to do is use my suggestion above by turning your list into a HashSet for every call. I'm suggesting changing entirely your use of List to HashSet - any piecemeal conversion per-call will negate any potential performance benefits due to the overhead of creating the HashSet.
Finally, if you're not actually expecting much fragmentation in your integer list, then it's possible that a HashSet might not be much different from the current Linq version, because it's possibly going to end up doing similar amounts of work anyway.

Return the value of dictionary elment when key is exist

How do I return the value of
dictionary <string, int >
element when the key is found for the first time???.
I'm trying the following code, but i'm sure I'm doing something wrong. because it takes long time ti return the value.
private int GetIndex(string term)
{
int index = 0;
foreach (var entry in dic)
{
var word = entry.Key;
var wordFreq = entry.Value;
if (word == term)
index = wordFreq;
}
return index;
}
Can some help please ?? thanks a lot

Just request it directly:
return dic[term]
That should do the trick!
But if you would like to return 0 when it doesnt exist, go this way:
int i;
if (dic.TryGetValue(term), out i)
return i;
else
return 0;

Dicitionaries are not meant to be used in a linear fashion and one does not index into them via a number. Each key is a unique hash value which when computed gives the internal location to be returned. Yes one can enumerate over a dictionary as you have, but that is not how a dictionary is meant to be used.

Dictionary has things defined for this already. If you want it to throw an exception if the key is not found, use the indexer property, e.g. dic[term]. If you don't want it to throw, but instead get a bool saying whether it was found, use the TryGetValue method, e.g.
int result;
if (dic.TryGetValue(term, out result))
// do something

Critique this C# Hashmap Implementation?

I wrote a hashmap in C# as a self study exercise. I wanted to implement chaining as a collision handling technique. At first I thought I'd simply use GetHashCode as my hashing algorithm, but I quickly found that use the numbers returned by GetHashCode would not always be viable (size of the int causes a out of mem if you want to index and array by the number and numbers can be negative :(). So, I came up with a kludgey method of narrowing the numbers (see MyGetHashCode).
Does anyone have any pointers/tips/criticism for this implementation (of the hash function and in general)? Thanks in advance!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace HashMap
{
class Program
{
public class MyKVP<T, K>
{
public T Key { get; set; }
public K Value { get; set; }
public MyKVP(T key, K value)
{
Key = key;
Value = value;
}
}
public class MyHashMap<T, K> : IEnumerable<MyKVP<T,K>>
where T:IComparable
{
private const int map_size = 5000;
private List<MyKVP<T,K>>[] storage;
public MyHashMap()
{
storage = new List<MyKVP<T,K>>[map_size];
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public IEnumerator<MyKVP<T, K>> GetEnumerator()
{
foreach (List<MyKVP<T, K>> kvpList in storage)
{
if (kvpList != null)
{
foreach (MyKVP<T, K> kvp in kvpList)
{
yield return kvp;
}
}
}
}
private int MyGetHashCode(T key)
{
int i = key.GetHashCode();
if (i<0) i=i*-1;
return i / 10000;
}
public void Add(T key, K data)
{
int value = MyGetHashCode(key);
SizeIfNeeded(value);
//is this spot in the hashmap null?
if (storage[value] == null)
{
//create a new chain
storage[value] = new List<MyKVP<T, K>>();
storage[value].Add(new MyKVP<T, K>(key, data));
}
else
{
//is this spot taken?
MyKVP<T, K> myKvp = Find(value, key);
if (myKvp != null) //key exists, throw
{
throw new Exception("This key exists. no soup for you.");
}
//if we didn't throw, then add us
storage[value].Add(new MyKVP<T, K>(key, data));
}
}
private MyKVP<T, K> Find(int value, T key)
{
foreach (MyKVP<T, K> kvp in storage[value])
{
if (kvp.Key.CompareTo(key) == 0)
{
return kvp;
}
}
return null;
}
private void SizeIfNeeded(int value)
{
if (value >= storage.Length)
{
List<MyKVP<T, K>>[] temp = storage;
storage = new List<MyKVP<T, K>>[value+1];
Array.Copy(temp, storage, temp.Length);
}
}
public K this[T key]
{
get
{
int value = MyGetHashCode(key);
if (value > storage.Length) { throw new IndexOutOfRangeException("Key does not exist."); }
MyKVP<T, K> myKvp = Find(value, key);
if (myKvp == null) throw new Exception("key does not exist");
return myKvp.Value;
}
set
{
Add(key, value);
}
}
public void Remove(T key)
{
int value = MyGetHashCode(key);
if (value > storage.Length) { throw new IndexOutOfRangeException("Key does not exist."); }
if (storage[value] == null) { throw new IndexOutOfRangeException("Key does not exist."); }
//loop through each kvp at this hash location
MyKVP<T, K> myKvp = Find(value, key);
if (myKvp != null)
{
storage[value].Remove(myKvp);
}
}
}
static void Main(string[] args)
{
MyHashMap<string, int> myHashMap = new MyHashMap<string, int>();
myHashMap.Add("joe", 1);
myHashMap.Add("mike", 2);
myHashMap.Add("adam", 3);
myHashMap.Add("dad", 4);
Assert.AreEqual(1, myHashMap["joe"]);
Assert.AreEqual(4, myHashMap["dad"]);
Assert.AreEqual(2, myHashMap["mike"]);
Assert.AreEqual(3, myHashMap["adam"]);
myHashMap.Remove("joe");
try
{
if (myHashMap["joe"] == 3) { }; //should throw
}
catch (Exception)
{
try { myHashMap.Add("mike",1); }
catch (Exception) {
foreach (MyKVP<string, int> kvp in myHashMap)
{
Console.WriteLine(kvp.Key + " " + kvp.Value.ToString());
}
return;
}
}
throw new Exception("fail");
}
}
}

Your hash method is of a fixed range. This means that a single item could cause 214748 buckets to be created (if it's hashcode rehashed to 214747). A more commonly used (and almost always better approach) is to start with an initial size that is either known (due to knowledge of the domain) to be big enough for all values or to start small and have hashmap resize itself as appropriate. With re-probing the obvious measure of a need to resize is how much reprobing was needed. With chaining as you are experimenting with here, you'll want to keep both average and maximum chain sizes down. This keeps down your worse-case lookup time, and hence your average lookup time closer to the best-case O(1).
The two most common approaches to such hashing (and hence to initial table size) is to either use prime numbers or powers of two. The former is considered (though there is some contention on the point) to offer better distribution of keys while the latter allows for faster computation (both cases do a modulo on the input-hash, but with a number known to be a power of 2, the modulo can be quickly done as a binary-and operation). Another advantage of using a power of two when you are chaining, is that its possible to test a chain to see if resizing the hash would actually cause that chain to be split or not (if you have an 8-value table and there's a chain whose hashes are all either 17, 1 or 33 then doubling the table size would still leave them in the same chain, but quadrupling it would re-distribute them).
You don't have a method offering replace semantics, which is usual with .NET dictionary types (where adding will error if there's already an item with that key, but assigning to an index won't).
Your error on a retrieval that would try to go beyond the number of buckets will make no sense to the user, who doesn't care whether the bucket existed or not, only the key (they need not know how your implementation works at all). Both cases where a key isn't found should throw the same error (System.Collections.Generic.KeyNotFoundException has precisely the right semantics, so you could reuse that.).
Using a List is rather heavy in this case. Generally I'd frown on anyone saying a BCL collection was too heavy, but when it comes to rolling your own collections, its generally either because (1) you want to learn from the exercise or (2) the BCL collections don't suit your purposes. In case (1) you should learn how to complete the job you started, and in case (2) you need to be sure that List doesn't have whatever failing you found with Dictionary.
Your removal both throws a nonsensical error for someone who doesn't know about the implementation details, and an inconsistent error (whether something else existed in that bucket is not something they should care about). Since removing a non-existent item isn't harmful it is more common to merely return a bool indicating whether the item had been present or not, and let the user decide if that indicates an error or not. It is also wasteful in continuing to search the entire bucket after the item has been removed.
Your implementation does now allow null keys, which is reasonable enough (indeed, the documentation for IDictionary<TKey, TValue> says that implementations may or may not do so). However, the way you reject them is by having the NullReferenceException caused by trying to call GetHashCode() on null be returned, rather than checking and throwing a ArgumentNullException. For the user to receive a NullReferenceException suggests that the collection itself was null. This is hence a clear bug.

A Remove method should never throw an exception. You are trying to remove an item. No harm is done if it have already been removed. All collection classes in .Net uses bool as a return value to indicate if an item was really removed.
Do not throw Exception, throw specific one. Browse through all exceptions in the Collection namespaces to find suitable ones.
Add a TryGetValue
Use KeyValuePair which already is a part of .Net instead of creating your own.
Add a constructor which can define map size.
When throwing exceptions include details to why it was thrown. For instance, instead of writing "This key exists", write string.Format("Key '{0}' already exists", key)

Sorry to say this, but this class won't be working as HashMap or even simple dictionary.
First of all, value returned from GetHashCode() is not unique. Two different objects, e.g. two strings, can possibly return same hash code value. The idea to use hash code as the array index then simply leads to record loss in case of hash code clashing. I would suggest reading about GetHashCode() method and how to implement it from MSDN. Some obvious example is if you get hash code of all possible Int64 values starting at 0, the hash code will surely be clashed at some point.
Another thing is, the for-loop lookup is slow. You should consider using binary search for look up. To do so, you must maintained your key-value pair sorted by the key at any time, which imply that you should use List instead of array for the storage variable so when adding new key-value pair you can insert it at the appropriate index.
After all, make sure that when you are coding for real hash map, you realized that hash code can be the same for different keys, and never do the look up with for-loop from 0 to len-1.

Casting C# out parameters?

Is it possible to cast out param arguments in C#? I have:
Dictionary<string,object> dict; // but I know all values are strings
string key, value;
Roughly speaking (and if I didn't have static typing) I want to do:
dict.TryGetValue(key, out value);
but this obviously won't compile because it "cannot convert from 'out string' to 'out object'".
The workaround I'm using is:
object valueAsObject;
dict.TryGetValue(key, out valueAsObject);
value = (string) valueAsObject;
but that seems rather awkward.
Is there any kind of language feature to let me cast an out param in the method call, so it does this switcheroo for me? I can't figure out any syntax that'll help, and I can't seem to find anything with google.

I don't know if it is a great idea, but you could add a generic extension method:
static bool TryGetTypedValue<TKey, TValue, TActual>(
this IDictionary<TKey, TValue> data,
TKey key,
out TActual value) where TActual : TValue
{
if (data.TryGetValue(key, out TValue tmp))
{
value = (TActual)tmp;
return true;
}
value = default(TActual);
return false;
}
static void Main()
{
Dictionary<string,object> dict
= new Dictionary<string,object>();
dict.Add("abc","def");
string key = "abc", value;
dict.TryGetTypedValue(key, out value);
}

I spy with my little eye an old post that was still active a month ago...
Here's what you do:
public static class DictionaryExtensions
{
public static bool TryGetValueAs<Key, Value, ValueAs>(this IDictionary<Key, Value> dictionary, Key key, out ValueAs valueAs) where ValueAs : Value
{
if(dictionary.TryGetValue(key, out Value value))
{
valueAs = (ValueAs)value;
return true;
}
valueAs = default;
return false;
}
}
And because compilers are great, you can just call it like this:
dict.TryGetValueAs(key, out bool valueAs); // All generic types are filled in implicitely! :D
But say you're not creating a blackboard AI and just need to call this operation the one time. You can simply do a quicksedoodle inliner like this:
var valueAs = dict.TryGetValue(key, out var value) ? (bool)value : default;
I know these answers have been given already, but they must be pretty old because there is no cool hip modern inlining going on to condense these methods to the size we really want: no more than 1 line.

I used Marc's extension method but added a bit to it.
My problem with the original was that in some cases my dictionary would contain an int64 whereas I would expect an int 32. In other cases the dictionary would contain a string (for example "42") while I would like to get it as an int.
There is no way to handle conversion in Marc's method so I added the ability to pass in a delegate to a conversion method:
internal static bool TryGetTypedValue<TKey, TValue, TActual>(
this IDictionary<TKey, TValue> data,
TKey key,
out TActual value, Func<TValue, TActual> converter = null) where TActual : TValue
{
TValue tmp;
if (data.TryGetValue(key, out tmp))
{
if (converter != null)
{
value = converter(tmp);
return true;
}
if (tmp is TActual)
{
value = (TActual) tmp;
return true;
}
value = default(TActual);
return false;
}
value = default(TActual);
return false;
}
Which you can call like this:
int limit;
myParameters.TryGetTypedValue("limitValue", out limit, Convert.ToInt32)

No, there is no way around that. The out parameter must have a variable that matches exactly.
Using a string reference is not safe, as the dictionary can contain other things than strings. However if you had a dictionary of strings and tried to use an object variable in the TryGetValue call, that won't work either even though that would be safe. The variable type has to match exactly.

If you know all values are strings use Dictionary<string, string> instead. The out parameter type is set by the type of the second generic type parameter. Since yours is currently object, it will return an object when retrieving from the dictionary. If you change it to string, it will return strings.

No, you can't. The code inside the method is directly modifying the variable passed to it, it is not passed a copy of the content of the variable.

It is possible by using the Unsafe.As<TFrom, TTo>(ref TFrom source) method to do the cast inline.
var dict = new Dictionary<string, int>
{
["one"] = 1,
["two"] = 2,
["three"] = 3,
};
long result = 0;
dict.TryGetValue("two", out Unsafe.As<long, int>(ref result));
Depending on which platform you are on, this may require you to add a reference to System.Runtime.CompilerServices.Unsafe.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.