How can I access a Dictionary through a key hashcode? - c#

I have a dictionary like that:
Dictionary<MyCompositeKey, int>
Clearly MyCompositeKey is a class I designed which implements IEqualityComparer and thus has a GetHashCode method.
As far as I know, dictionary uses the key's hash to access the value, so here's my question:
While I can easily access the value via dict.TryGetValue(new MyCompositeKey(params)), I wanted to get rid off the new overhead on each access.
For this reason I was wondering if there's a way to access the value directly from key's hash (which I can compute with a very lower overhead).

There is no way to do that.
Note that hash collisions may occur, so there could be many keys in the Dictionary<,> matching the given hash. We need Equals to find out which (if any) is correct.
You talk about new overhead. Are you sure it is significant in your case?
If it is, you could consider making MyCompositeKey an immutable struct instead of a class. It might be faster in some cases, eliminating the need for the garbage collector to remove all those "loose" keys from memory.
If MyCompositeKey is a struct, the expression new MyCompositeKey(params) only loads all the params onto the call stack (or CPU registers or whatever the run-time figures is best).
Addition: If you go for a struct, consider implementing IEquatable<>. It will look like this:
struct MyCompositeKey : IEquatable<MyCompositeKey>
{
// readonly fields/properties
public override bool Equals(object obj)
{
if (obj is MyCompositeKey)
return Equals((MyCompositeKey)obj); // unbox and go to below overload
return false;
}
public bool Equals(MyCompositeKey other) // implements interface, can avoid boxing
{
// equality logic here
}
public override int GetHashCode()
{
// hash logic here
}
}

You can't do that.
A Dictionary<TKey, TValue> uses an internal buckets collection which you cannot access from outside the class - it is private.
As you can see in the source code, the access method first determines the bucket (according to the hash code) and then accesses the item by index:
public bool TryGetValue(TKey key, out TValue value)
{
int i = FindEntry(key);
if (i >= 0)
{
value = entries[i].value;
return true;
}
value = default(TValue);
return false;
}
private int FindEntry(TKey key)
{
if (buckets != null)
{
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next)
{
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
return i;
}
}
return -1;
}

Related

Why does KeyValuePair not override Equals() and GetHashCode()?

I was going to use KeyValuePair in a comparison-intensive code and was perplexed checking how it is implemented in .NET (s. below)
Why does it not override Equals and GetHashCode for efficiency (and not implement ==) but instead uses the slow reflection-based default implementation?
I know that structs/value types have a default implementation based on reflection for their GetHashCode() and Equals(object) methods, but I suppose it is very inefficient compared to overriding equality if you do a lot of comparisons.
EDIT I made some tests and found out that in my scenario (WPF Lists) both default KeyValuePair and my own implementation of a struct overriding GetHashCode() and Equals(object) are both much more slow then an implementation as a class!
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/keyvaluepair.cs,8585965bb176a426
// ==++==
//
// Copyright (c) Microsoft Corporation. All rights reserved.
//
// ==--==
/*============================================================
**
** Interface: KeyValuePair
**
** <OWNER>[....]</OWNER>
**
**
** Purpose: Generic key-value pair for dictionary enumerators.
**
**
===========================================================*/
namespace System.Collections.Generic {
using System;
using System.Text;
// A KeyValuePair holds a key and a value from a dictionary.
// It is used by the IEnumerable<T> implementation for both IDictionary<TKey, TValue>
// and IReadOnlyDictionary<TKey, TValue>.
[Serializable]
public struct KeyValuePair<TKey, TValue> {
private TKey key;
private TValue value;
public KeyValuePair(TKey key, TValue value) {
this.key = key;
this.value = value;
}
public TKey Key {
get { return key; }
}
public TValue Value {
get { return value; }
}
public override string ToString() {
StringBuilder s = StringBuilderCache.Acquire();
s.Append('[');
if( Key != null) {
s.Append(Key.ToString());
}
s.Append(", ");
if( Value != null) {
s.Append(Value.ToString());
}
s.Append(']');
return StringBuilderCache.GetStringAndRelease(s);
}
}
}
As the other answers point out, you get equality and hashing "for free", so you don't need to override them. However, you get what you pay for; the default implementations of equality and hashing are (1) not particularly efficient in some cases, and (2) may do bitwise comparisons, and hence can do things like compare negative zero and positive zero doubles as different when logically they are equal.
If you expect that your struct will frequently be used in contexts that require equality and hashing, then you should write custom implementations of both and follow the appropriate rules and guidelines.
https://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
So, to answer your question: why did no one do so for a particular type? Likely because they did not believe that doing so was a good use of their time compared to all the other things they had to do to improve the base class libraries. Most people do not compare key-value pairs for equality, so optimizing it was probably not a high priority.
This is of course conjectural; if you actually want to know the reason why something did not get done on a particular day, you're going to have to track down all the people who did not do that action and ask them what else they were doing that was more important on that day.
It is a struct, Structs inherit from ValueType and that type already overrides the implementation of Equals and GetHashCode.
It does not support ==, doing the following won't even compile
var result = new KeyValuePair<string, string>("KVP", "Test1") ==
new KeyValuePair<string, string>("KVP", "Test2");
You will receive the error "Operator '==' cannot be applied to operands of type KeyValuePair<string, string> and KeyValuePair<string, string>"
KeyValuePair is a struct (Implicitly Inherits from ValueType) and the Equality works just fine:
var a = new KeyValuePair<string, string>("a", "b");
var b = new KeyValuePair<string, string>("a", "b");
bool areEqual = a.Equals(b); // true
Reference below Shows the Equals Strategy:
1- Same Reference.
2- Can Compare by bits.
3- Compare each field in the struct using reflection.
public abstract class ValueType {
[System.Security.SecuritySafeCritical]
public override bool Equals (Object obj) {
BCLDebug.Perf(false, "ValueType::Equals is not fast. "+this.GetType().FullName+" should override Equals(Object)");
if (null==obj) {
return false;
}
RuntimeType thisType = (RuntimeType)this.GetType();
RuntimeType thatType = (RuntimeType)obj.GetType();
if (thatType!=thisType) {
return false;
}
Object thisObj = (Object)this;
Object thisResult, thatResult;
// if there are no GC references in this object we can avoid reflection
// and do a fast memcmp
if (CanCompareBits(this))
return FastEqualsCheck(thisObj, obj);
FieldInfo[] thisFields = thisType.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
for (int i=0; i<thisFields.Length; i++) {
thisResult = ((RtFieldInfo)thisFields[i]).UnsafeGetValue(thisObj);
thatResult = ((RtFieldInfo)thisFields[i]).UnsafeGetValue(obj);
if (thisResult == null) {
if (thatResult != null)
return false;
}
else
if (!thisResult.Equals(thatResult)) {
return false;
}
}
return true;
}

Is it safe to override GetHashCode and get it from string property?

I have a class:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
The purpose of overriding GetHashCode is that I want to have only one occurence of an object with specified name in Dictionary.
But is it safe to get hash code from string?
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
But is it safe to get hash code from string?
Yes, it is safe. But, what you're doing isn't. You're using a mutable string field to generate your hash code. Let's imagine that you inserted an Item as a key for a given value. Then, someone changes the Name string to something else. You now are no longer able to find the same Item inside your Dictionary, HashSet, or whichever structure you use.
More-so, you should be relying on immutable types only. I'd also advise you to implement IEquatable<T> as well:
public class Item : IEquatable<Item>
{
public Item(string name)
{
Name = name;
}
public string Name { get; }
public bool Equals(Item other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(Name, other.Name);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Item) obj);
}
public static bool operator ==(Item left, Item right)
{
return Equals(left, right);
}
public static bool operator !=(Item left, Item right)
{
return !Equals(left, right);
}
public override int GetHashCode()
{
return (Name != null ? Name.GetHashCode() : 0);
}
}
is there any chance that two objects with different values of property
Name would return the same hash code?
Yes, there is a statistical chance that such a thing will happen. Hash codes do not guarantee uniqueness. They strive for uni-formal distribution. Why? because your upper boundary is Int32, which is 32bits. Given the Pigenhole Principle, you may happen at end up with two different strings containing the same hash code.
Your class is buggy, because you have a GetHashCode override, but no Equals override. You also don't consider the case where Name is null.
The rule for GetHashCode is simple:
If a.Equals(b) then it must be the case that a.GetHashCode() == b.GetHashCode().
The more cases where if !a.Equals(b) then a.GetHashCode() != b.GetHashCode() the better, indeed the more cases where !a.Equals(b) then a.GetHashCode() % SomeValue != b.GetHashCode() % SomeValue the better, for any given SomeValue (you can't predict it) so we like to have a good mix of bits in the results. But the vital thing is that two objects considered equal must have equal GetHashCode() results.
Right now this isn't the case, because you've only overridden one of these. However the following is sensible:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public override bool Equals(object obj)
{
var asItem = obj as Item;
return asItem != null && Name == obj.Name;
}
}
The following is even better, because it allows for faster strongly-typed equality comparisons:
public class Item : IEquatable<Item>
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public bool Equals(Item other)
{
return other != null && Name == other.Name;
}
public override bool Equals(object obj)
{
return Equals(obj as Item);
}
}
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
Yes, this can happen, but it won't happen often, so that's fine. The hash-based collections like Dictionary and HashSet can handle a few collisions; indeed there'll be collisions even if the hash codes are all different because they're modulo'd down to a smaller index. It's only if this happens a lot that it impacts performance.
Another danger is that you'll be using a mutable value as a key. There's a myth that you shouldn't use mutable values for hash-codes, which isn't true; if a mutable object has a mutable property that affects what it is considered equal with then it must result in a change to the hash-code.
The real danger is mutating an object that is a key to a hash collection at all. If you are defining equality based on Name and you have such an object as the key to a dictionary then you must not change Name while it is used as such a key. The easiest way to ensure that is to have Name be immutable, so that is definitely a good idea if possible. If it is not possible though, you need to be careful just when you allow Name to be changed.
From a comment:
So, even if there is a collision in hash codes, when Equals will return false (because the names are different), the Dictionary will handle propertly?
Yes, it will handle it, though it's not ideal. We can test this with a class like this:
public class SuckyHashCode : IEquatable<SuckyHashCode>
{
public int Value { get; set; }
public bool Equals(SuckyHashCode other)
{
return other != null && other.Value == Value;
}
public override bool Equals(object obj)
{
return Equals(obj as SuckyHashCode);
}
public override int GetHashCode()
{
return 0;
}
}
Now if we use this, it works:
var dict = Enumerable.Range(0, 1000).Select(i => new SuckyHashCode{Value = i}).ToDictionary(shc => shc);
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = 3})); // True
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = -1})); // False
However, as the name suggests, it isn't ideal. Dictionaries and other hash-based collections all have means to deal with collisions, but those means mean that we no longer have the great nearly O(1) look-up, but rather as the percentage of collisions gets greater the look-up approaches O(n). In the case above where the GetHashCode is as bad as it could be without actually throwing an exception, the look-up would be O(n) which is the same as just putting all the items into an unordered collection and then finding them by looking at every one to see if it matches (indeed, due to differences in overheads, it's actually worse than that).
So for this reason we always want to avoid collisions as much as possible. Indeed, to not just avoid collisions, but to avoid collisions after the result has been modulo'd down to make a smaller hash code (because that's what happens internally to the dictionary).
In your case though because string.GetHashCode() is reasonably good at avoiding collisions, and because that one string is the only thing that equality is defined by, your code would in turn be reasonably good at avoiding collisions. More collision-resistant code is certainly possible, but comes at a cost to performance in the the code itself* and/or is more work than can be justified.
*(Though see https://www.nuget.org/packages/SpookilySharp/ for code of mine that is faster than string.GetHashCode() on large strings on 64-bit .NET and more collision-resistant, though it is slower to produce those hash codes on 32-bit .NET or when the string is short).
Instead of using GetHashCode to prevent duplicates to be added to a dictionary, which is risky in your case as explained already, I would recommend to use a (custom) equality comparer for your dictionary.
If the key is an object, you should create an own equality comparer that compares the string Name value. If the key is the string itself, you can use StringComparer.CurrentCulture for example.
Also in this case it is key to make the string immutable, since else you might invalidate your dictionary by changing the Name.

Hibernate and Equality checks with lazy loaded objects

I had some fun with hibernate. A function like that:
public class Key
{
public virtual bool IsEqual(Key key)
{
return this == key;
}
}
One would expect this function to always return true if the parameter was the same as the instance where IsEqual was called upon:
Assert.IsTrue(MyKey.IsEqual(MyKey));
But this is only the case as long as the instance "myKey" is not a lazily loaded object / proxy. A KeyProxy will delegate that call to the internal Key object that is wrapped, and this results in the wrapped object to be compared with the Proxy object (which will in turn fail).
Basically, it has also been discussed here : NHibernate, proxies and equality
The solution there is a little bit disappointing. Overriding equals to compare the primary key properties has the drawback that it only works for objects that already have a value, whereas new objects don´t have a primary key value until saved. I could try to force new objects to directly receive a valid primary key value, but that doesnt sound like a great way of handling this issue.
Is there a better (more general) way known to handle such situations? Would´nt overriding Equals and comparing with a unique (non-persisted) property just do the trick?
Something like that?
public object Identifier {get; private set;}
public Key()
{
Identifer = new object();
}
public override bool Equals(object obj)
{
if (obj == null)
{
return false;
}
Key k = obj as Key;
if (k == null)
{
return false;
}
return this.Identifier == key.Identifier;
}
To overcome this and other problems, such as using an identity column as the primary key, we added a GUID to the base class of our domain model, object creation is handled by factory classes that gives each entity a GUID and this is then persisted as part of the entity.
The GUID is then used to compare entities, basicly we use it in the Equals() and GetHashCode() methods.
public override int GetHashCode()
{
return this.EqualityIdentifier.GetHashCode();
}
public override bool Equals(object obj)
{
IDomainObject Obj = obj as IDomainObject
if (Obj == null)
{
return false;
}
return this.EqualityIdentifier == Obj.EqualityIdentifier;
}
To have a minimum performance impact, I decided to use a non-persisted readonly int property "Identifier" that is filled (lazily/at first access) by a small static and thread-safe number generator method.
private static int _equalityIdentifierSequence;
private static int GenerateEqualityIdentifier()
{
Interlocked.Increment(ref _equalityIdentifierSequence);
return _equalityIdentifierSequence;
}
I am quite comfortable with the fact that two objects that were loaded from different sessions but are representing the same entity are regarded as "not equal", so the GUID strategy did not look that promising to me. The original problem of proxies compared with their wrapped objects seems to be solved with that.

Comparing objects

I have a class it contains some string members, some double members and some array objects.
I create two objects of this class, is there any simplest, efficient way of comparing these objects and say their equal? Any suggestions?
I know how to write a compare function, but will it be time consuming.
The only way you can really do this is to override bool Object.Equals(object other) to return true when your conditions for equality are met, and return false otherwise. You must also override int Object.GetHashCode() to return an int computed from all of the data that you consider when overriding Equals().
As an aside, note that the contract for GetHashCode() specifies that the return value must be equal for two objects when Equals() would return true when comparing them. This means that return 0; is a valid implementation of GetHashCode() but it will cause inefficiencies when objects of your class are used as dictionary keys, or stored in a HashSet<T>.
The way I implement equality is like this:
public class Foo : IEquatable<Foo>
{
public bool Equals(Foo other)
{
if (other == null)
return false;
if (other == this)
return true; // Same object reference.
// Compare this to other and return true/false as appropriate.
}
public override bool Equals(Object other)
{
return Equals(other as Foo);
}
public override int GetHashCode()
{
// Compute and return hash code.
}
}
A simple way of implementing GetHashCode() is to XOR together the hash codes of all of the data you consider for equality in Equals(). So if, for example, the properties you compare for equality are string FirstName; string LastName; int Id;, your implementation might look like:
public override int GetHashCode()
{
return (FirstName != null ? FirstName.GetHashCode() : 0) ^
(LastName != null ? LastName.GetHashCode() : 0) ^
Id; // Primitives of <= 4 bytes are their own hash codes
}
I typically do not override the equality operators, as most of the time I'm concerned with equality only for the purposes of dictionary keys or collections. I would only consider overriding the equality operators if you are likely to do more comparisons by value than by reference, as it is syntactically less verbose. However, you have to remember to change all places where you use == or != on your object (including in your implementation of Equals()!) to use Object.ReferenceEquals(), or to cast both operands to object. This nasty gotcha (which can cause infinite recursion in your equality test if you are not careful) is one of the primary reasons I rarely override these operators.
The 'proper' way to do it in .NET is to implement the IEquatable interface for your class:
public class SomeClass : IEquatable<SomeClass>
{
public string Name { get; set; }
public double Value { get; set; }
public int[] NumberList { get; set; }
public bool Equals(SomeClass other)
{
// whatever your custom equality logic is
return other.Name == Name &&
other.Value == Value &&
other.NumberList == NumberList;
}
}
However, if you really want to do it right, this isn't all you should do. You should also override the Equals(object, object) and GetHashCode(object) methods so that, no matter how your calling code is comparing equality (perhaps in a Dictionary or perhaps in some loosely-typed collection), your code and not reference-type equality will be the determining factor:
public class SomeClass : IEquatable<SomeClass>
{
public string Name { get; set; }
public double Value { get; set; }
public int[] NumberList { get; set; }
/// <summary>
/// Explicitly implemented IEquatable method.
/// </summary>
public bool IEquatable<SomeClass>.Equals(SomeClass other)
{
return other.Name == Name &&
other.Value == Value &&
other.NumberList == NumberList;
}
public override bool Equals(object obj)
{
var other = obj as SomeClass;
if (other == null)
return false;
return ((IEquatable<SomeClass>)(this)).Equals(other);
}
public override int GetHashCode()
{
// Determine some consistent way of generating a hash code, such as...
return Name.GetHashCode() ^ Value.GetHashCode() ^ NumberList.GetHashCode();
}
}
Just spent the whole day writing an extension method looping through reflecting over properties of an object with various complex bits of logic to deal with different property type and actually got it close to good, then at 16:55 it dawned on me that if you serialize the two object, you simply need compare the two strings ... duh
So here is a simple serializer extension method that even works on Dictionaries
public static class TExtensions
{
public static string Serialize<T>(this T thisT)
{
var serializer = new DataContractSerializer(thisT.GetType());
using (var writer = new StringWriter())
using (var stm = new XmlTextWriter(writer))
{
serializer.WriteObject(stm, thisT);
return writer.ToString();
}
}
}
Now your test can be as simple as
Asset.AreEqual(objA.Serialise(), objB.Serialise())
Haven't done extensive testing yet, but looks promising and more importantly, simple. Either way still a useful method to have in your utility set right ?
The best answer is to implement IEquatable for your class - it may not be the answer you want to hear, but that's the best way to implement value equivalence in .NET.
Another option would be computing a unique hash of all of the members of your class and then doing value comparisons against those, but that's even more work than writing a comparison function ;)
Since these are objects my guess is that you will have to override the Equals method for objects. Otherwise the Equals method will give you ok only if both objects refering to the same object.
I know this is not the answer you want. But since there is little number of properties in your class you can easily override the method.

C#: optimizing dictionary access (hash in key structures)

So, I need to create a struct in C# that will act as a key into a (quite large) dictionary, will look like this:
private readonly IDictionary<KeyStruct, string> m_Invitations;
Problem is, I REALLY need a struct to use as a key, because it is only possible to identify entries via two separate data items, where one of them can be a null (not only empty!) string.
What will I need to implement on the struct? How would you go about creating the hash? Would a hash collision (occassional) hurt the performance heavily or would that be negligible?
I'm asking because this is "inner loop" code.
If you have resharper, you can generate these method with Alt-Ins -> Equality members.
Here is the generated code for you KeyStruct:
public struct KeyStruct : IEquatable<KeyStruct>
{
public string Value1 { get; private set; }
public long Value2 { get; private set; }
public KeyStruct(string value1, long value2)
: this()
{
Value1 = value1;
Value2 = value2;
}
public bool Equals(KeyStruct other)
{
return Equals(other.Value1, Value1) && other.Value2 == Value2;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (obj.GetType() != typeof (KeyStruct)) return false;
return Equals((KeyStruct) obj);
}
public override int GetHashCode()
{
unchecked
{
return ((Value1 != null ? Value1.GetHashCode() : 0)*397) ^ Value2.GetHashCode();
}
}
public static bool operator ==(KeyStruct left, KeyStruct right)
{
return left.Equals(right);
}
public static bool operator !=(KeyStruct left, KeyStruct right)
{
return !left.Equals(right);
}
}
If KeyStruct is structure (declared with struct C# keyword), don't forget to override Equals and GetHash code methods, or provide custom IEqualityComparer to dictionary constructor, because default implementation of ValueType.Equals method uses Reflection to compare content of two structure instances.
It is prefer to make KeyStruct immutable, if you do so, you can calculate structure instance hash once and then simply return it from GetHashCode method. But it may be premature optimization, depends of how often do you need to get value by key.
Generally, it is OK to use structure as a dictionary key.
Or maybe you are asking how to implement GetHashCode method?
You need to implement (override) two methods.
1. bool Equals(object)
2. int GetHashCode()
The hash code need not be unique but the less different objects will return the same hash code the better performance you will have.
you can use something like:
public int GetHashCode()
{
int strHash = str == null ? 0 : str.GetHashCode();
return ((int)lng*397) ^ strHash;
}

Categories