Do I have to implement GetHashCode() to follow best practice? - c#

In my class I have the following override.
public override bool Equals(object input)
{
Occasion comparee = input as Occasion;
return comparee != null && comparee.Id == Id;
}
It's being used in the GUI for determining of a combox pre-selected value and it works perfectly. However, R# nags and suggests that, if I have overriden Equals(object), then I also should override GetHashCode(). When I add the following code, it nags that the code is calling the base method.
public override int GetHashCode()
{
return base.GetHashCode();
}
I have to return something, so basically it wants me to implement a dummy method returning an arbitrary integer (it's not being used anywhere in the code as far I can tell because I get the same behavior regardless of whether I have it commented out or not.
Since best practice is to omit the code that isn't needed, I'm confused on what the appropriate design would be.

The MSDN page for GetHashCode states that
If you override the GetHashCode method, you should also override Equals, and vice versa. If your overridden Equals method returns true when two objects are tested for equality, your overridden GetHashCode method must return the same value for the two objects.
If you are overriding the equals so that it returns true when the object ids are equal, you must define your GetHashCode method to return the same hash code when given the same id.
The hashcode is used for insertion and lookup in collections - so it's unlikely to be used directly by any code you write, but it will be used by other .NET code. But if you don't implement it correctly you may find performance issues or worse when dealing with collections of your object. For example, if you get it wrong or don't implement it then you might not be able to determine whether an instance of your class is in the collection or not, or you won't be able to retrieve the instance you you put in.
In your case the method is simply:
public override int GetHashCode()
{
return Id.GetHashCode();
}
as you are just comparing the id's in the Equals method.

Related

Should GetHashCode be implemented for IEquatable<T> on mutable types?

I'm implementing IEquatable<T>, and I am having difficulty finding consensus on the GetHashCode override on a mutable class.
The following resources all provide an implementation where GetHashCode would return different values during the object's lifetime if the object changes:
https://stackoverflow.com/a/13906125/197591
https://csharp.2000things.com/tag/iequatable/
http://broadcast.oreilly.com/2010/09/understanding-c-equality-iequa.html
However, this link states that GetHashCode should not be implemented for mutable types for the reason that it could cause undesirable behaviour if the object is part of a collection (and this has always been my understanding also).
Interestingly, the MSDN example implements the GetHashCode using only immutable properties which is in line with my understanding. But I'm confused as to why the other resources don't cover this. Are they simply wrong?
And if a type has no immutable properties at all, the compiler warns that GetHashCode is missing when I override Equals(object). In this case, should I implement it and just call base.GetHashCode() or just disable the compiler warning, or have I missed something and GetHashCode should always be overridden and implemented? In fact, if the advice is that GetHashCode should not be implemented for mutable types, why bother implementing for immutable types? Is it simply to reduce collisions compared to the default GetHashCode implementation, or does it actually add more tangible functionality?
To summarise my Question, my dilemma is that using GetHashCode on mutable objects means it can return different values during the lifetime of the object if properties on it change. But not using it means that the benefit of comparing objects that might be equivalent is lost because it will always return a unique value and thus collections will always fall back to using Equals for its operations.
Having typed this Question out, another Question popped up in the 'Similar Questions' box that seems to address the same topic. The answer there seems to be quite explicit in that only immutable properties should be used in a GetHashCode implementation. If there are none, then simply don't write one. Dictionary<TKey, TValue> will still function correctly albeit not at O(1) performance.
Mutable classes work quite bad with Dictionaries and other classes that relies on GetHashCode and Equals.
In the scenario you are describing, with mutable object, I suggest one of the following:
class ConstantHasCode: IEquatable<ConstantHasCode>
{
public int SomeVariable;
public virtual Equals(ConstantHasCode other)
{
return other.SomeVariable == SomeVariable;
}
public override int GetHashCode()
{
return 0;
}
}
or
class ThrowHasCode: IEquatable<ThrowHasCode>
{
public int SomeVariable;
public virtual Equals(ThrowHasCode other)
{
return other.SomeVariable == SomeVariable;
}
public override int GetHashCode()
{
throw new ApplicationException("this class does not support GetHashCode and should not be used as a key for a dictionary");
}
}
With the first, Dictionary works (almost) as expected, with performance penalty in lookup and insertion: in both cases, Equals will be called for every element already in the dictionary until a comparison return true. You are actually reverting to performance of a List
The second is a way to tell the programmers will use your class "no, you cannot use this within a dictionary".
Unfortunately, as far as I know there is no method to detect it at compile time, but this will fail the first time the code adds an element to the dictionary, very likely quite early while developping, not the kind of bug happening only in production environment with an unpredicted set of input.
Last but not least, ignore the "mutable" problem and implement GetHashCode using member variables: now you have to be aware that you are not free to modify the class when it's used withing a Dictionary. In some scenario this can be acceptable, in other it's not
It all depends of what kind of collection type you are talking about. For my answer I will assume you are talking about Hash Table based collections and in particular I will address it for .NET Dictionary and Key calculation.
So best way to identify what will happen if you modify key( given your key is a class which does custom HashCode calculation) is to look at the .NET source. From .NET source we can see that your key value pair is now wrapped into Entry struct which carries hashcode which was calculated on addition of your value. Meaning that if you change HashCode value after that time of your key was added, it will no longer be able to find a value in dictionary.
Code to prove it:
static void Main()
{
var myKey = new MyKey { MyBusinessKey = "Ohai" };
var dic = new Dictionary<MyKey, int>();
dic.Add(myKey, 1);
Console.WriteLine(dic[myKey]);
myKey.MyBusinessKey = "Changing value";
Console.WriteLine(dic[myKey]); // Key Not Found Exception.
}
public class MyKey
{
public string MyBusinessKey { get; set; }
public override int GetHashCode()
{
return MyBusinessKey.GetHashCode();
}
}
.NET source reference.
So to answer your question. You want to have immutable values for which you base your hashcode calculation on.
Another point, hashcode for custom class if you do not override GetHashCode will be based on reference of the object. So concern of returning same hashcode for different object which are identical in underlying values could be mitigated by overriding GetHashCode method and calculating your HashCode depending on your business keys. For example you would have two string properties, to calculate hashcode you would concat strings and call base string GetHashCode method. This will guarantee that you will get same hashcode for same underlying values of the object.
After much discussion and reading other SO answers on the topic, it was eventually this ReSharper help page that summarised it very well for me:
MSDN documentation of the GetHashCode() method does not explicitly require that your override of this method returns a value that never changes during the object's lifetime. Specifically, it says:
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.
On the other hand, it says that the hash code should not change at least when your object is in a collection:
*You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
You can compute the hash code from fields that are not mutable; or
You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.*
But why do you need to override GetHashCode() in the first place? Normally, you will do it if your object is going to be used in a Hashtable, as a key in a dictionary, etc., and it's quite hard to predict when your object will be added to a collection and how long it will be kept there.
With all that said, if you want to be on the safe side make sure that your override of GetHashCode() returns the same value during the object's lifetime. ReSharper will help you here by pointing at each non-readonly field or non-get-only property in your implementation of GetHashCode(). If possible, ReSharper will also suggest quick-fixes to make these members read-only/get-only.
Of course, it doesn't suggest what to do if the quick-fixes are not possible. However, it does indicate that those quick-fixes should only be used "if possible" which implies that the the inspection could be suppressed. Gian Paolo's answer on this suggests to throw an exception which will prevent the class from being used as a key and would present itself early in development if it was inadvertently used as a key.
However, GetHashCode is used in other circumstances such as when an instance of your object is passed as a parameter to a mock method setup. Therefore, the only viable option is to implement GetHashCode using the mutable values and put the onus on the rest of the code to ensure the object is not mutated while it is being used as a key, or to not use it as a key at all.

When implementing IEqualityComparer<T>.GetHashCode(T obj), can I use the current instance's state, or do I have to use obj?

How come when I implement IEqualityComparer, it has a parameter for GetHashCode(T obj)? It's not a static object of course, so why can't I just use the current instance's state to generate the hash code? Is this == obj?
I'm curious because I'm trying to do this:
public abstract class BaseClass : IEqualityComparer<BaseClass>
{
public abstract int GetHashCode(BaseClass obj);
}
public class DerivedClass : BaseClass
{
public int MyData;
public override int GetHashCode(BaseClass obj)
{
return MyData.GetHashCode();
// Or do I have to do this:
// return (DerivedClass)obj.MyData.GetHashCode();
}
}
I'm trying to prevent doing the cast, since it's being used in really high-performance code.
I think the main issue here is that you're confusing IEqualityComparer<T> with IEquatable<T>.
IEquatable<T> defines a method for determining if the current instance (this) is equal to an instance of the same type. In other words it's used for testing objA.Equals(objB). When implementing this interface, it is recommended that you also override the GetHashCode() instance method.
IEqualityComparer<T> defines methods for testing whether two objects of the given type are equal, in other words, it's for testing comparer.Equals(objA, objB). Hence the necessity to to provide an object as a parameter to GetHashCode (which, remember is different than the GetHashCode that it inherits from object)
You can think of IEquatable<T> as your object's way of saying, "this is how I know if I am equal to something else," and IEqualityComparer<T> as your object's way of saying, "this is how I know if two other things are equal".
For some good examples of how these two interfaces are used in the framework see:
String which implements IEquatable<string>
StringComparer which implements IEqualityComparer<string>
Should you use the current state of an IEqualityComparer<T> to determine the hash code? If the state is at all mutable, then no! Anywhere where the hash is used (e.g. HashSet<T> or Dictionary<T, V>) the hash code will be cached and used for efficient lookup. If that hash code can change because the state of the comparer changes, that would totally destroy the usefulness of the data structure storing the hash. Now, if the state is not mutable (i.e. it's set only when creating the comparer and cannot be modified throughout the lifetime of the comparer), then yes, you can, but I would still recommend against it, unless you have a really good reason.
Finally, you mentioned performance. Honestly, this sounds like premature optimization. I'd recommend not worrying so much about performance until you can be sure that this particular line of code is causing a problem.
If you are not using information from passed in obj arguments your hash code will not vary for different incoming objects and will not be useful. Comparer is not instance of object you want to get hash code for or compare to.
Indeed you can use local fields of comaprer in GetHashCode and even can return MyData as hash code as shown in your sample - it will still satisfy GetHashCode requirement to "return the same value data for the same object". But in your sample all hash codes will be the same for instance of comparer and hence using it for Dictionary will essentially turn dictionary into list.
The same applies to Equals call - indeed you can return true all the time, but how useful it will be?

Why should I *not* override GetHashCode()?

My search for a helper to correctly combine constituent hashcodes for GetHashCode() seemed to garner some hostility. I got the impression from the comments that some C# developers don't think you should override GetHashCode() often - certainly some commenters seemed to think that a library for helping get the behaviour right would be useless. Such functionality was considered useful enough in Java for the Java community to ask for it to be added to the JDK, and it's now in JDK 7.
Is there some fundamental reason that in C# you don't need to - or should definitely not - override GetHashCode() (and correspondingly, Equals()) as often as in Java? I find myself doing this often with Java, for example whenever I create a type that I know I want to keep in a HashSet or use as a key in a HashMap (equivalently, .net Dictionary).
C# has built-in value types which provide value equality, whereas Java does not. So writing your own hashcode in Java may be a necessity, whereas doing it in C# may be a premature optimisation.
It's common to write a type to use as a composite key to use in a Dictionary/HashMap. Often on such types you need value equality (equivalence) as opposed to reference equality(identity), for example:
IDictionary<Person, IList<Movie> > moviesByActor; // e.g. initialised from DB
// elsewhere...
Person p = new Person("Chuck", "Norris");
IList<Movie> chuckNorrisMovies = moviesByActor[p];
Here, if I need to create a new instance of Person to do the lookup, I need Person to implement value equality otherwise it won't match existing entries in the Dictionary as they have a different identity.
To get value equality, you need an overridden Equals() and GetHashCode(), in both languages.
C#'s structs (value types) implement value equality for you (albeit a potentially inefficient one), and provide a consistent implementation of GetHashCode. This may suffice for many people's needs and they won't go further to implement their own improved version unless performance problems dictate otherwise.
Java has no such built-in language feature. If you want to create a type with value equality semantics to use as a composite key, you must implement equals() and correspondingly hashCode() yourself. (There are third-party helpers and libraries to help you do this, but nothing built into the language itself).
I've described C# value types as 'potentially inefficient' for use in a Dictionary because:
The implementation of ValueType.Equals itself can sometimes be slow. This is used in Dictionary lookups.
The implementation of ValueType.GetHashCode, whilst correct, can yield many collisions leading to very poor Dictionary performance also. Have a look at this answer to a Q by Jon Skeet, which demonstrates that KeyValuePair<ushort, uint> appears to always yield the same hashCode!
If your object represents a value or type, then you SHOULD override the GetHashCode() along with Equals. I never override hash codes for control classes, like "App". Though I see no reason why even overriding GetHashCode() in those circumstances would be a problem as they will never be put in a position to interfere with collection indexing or comparisons.
Example:
public class ePoint : eViewModel, IEquatable<ePoint>
{
public double X;
public double Y;
// Methods
#region IEquatable Overrides
public override bool Equals(object obj)
{
if (Object.ReferenceEquals(obj, null)) { return false; }
if (Object.ReferenceEquals(this, obj)) { return true; }
if (!(obj is ePoint)) { return false; }
return Equals((ePoint)obj);
}
public bool Equals(ePoint other)
{
return X == other.X && Y == other.Y;
}
public override int GetHashCode()
{
return (int)Math.Pow(X,Y);
}
#endregion
I wrote a helper class to implement GetHashCode(), Equals(), and CompareTo() using value semantics from an array of properties.

Overriding GetHashCode()

In this article, Jon Skeet mentioned that he usually uses this kind of algorithm for overriding GetHashCode().
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + Id.GetHashCode();
return hash;
}
}
Now, I've tried using this, but Resharper tells me that the method GetHashCode() should be hashing using only read-only fields (it compiles fine, though). What would be a good practice, because right now I can't really have my fields to be read-only?
I tried generating this method by Resharper, here's the result.
public override int GetHashCode()
{
return base.GetHashCode();
}
This doesn't contribute much, to be honest...
If all your fields are mutable and you have to implement GetHashCode method, I am afraid this is the implementation you would need to have.
public override int GetHashCode()
{
return 1;
}
Yes, this is inefficient but this is at least correct.
The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary.
Note that with all the objects returning the same HashCode 1, this basically means all the objects are being put in the same bucket in the HashSet or Dictionary. So, there is always only one single bucket in the HashSet or Dictionary. When trying to lookup the object, it will do a equality check on each of the objects inside the only bucket. This is like doing a search in a linked list.
Somebody may argue that implementing the hashcode based on mutable fields can be fine if we can make sure fields are never changed after the objects added to HashCode or Dictionary collection. My personal view is that this is error-prone. Somebody taking over your code two years later might not be aware of this and breaks the code accidentally.
Please note that your GetHashCode must go hand in hand with your Equals method. And if you can just use reference equality (when you'd never have two different instances of your class that can be equal) then you can safely use Equals and GetHashCode that are inherited from Object. This would work much better than simply return 1 from GetHashCode.
I personally tend to return a different numeric value for each implementation of GetHashCode() in a class which has no immutable fields. This means if I have a dictionary containing different implementing types, there is a chance the different instances of different types will be put in different buckets.
For example
public class A
{
// TODO Equals override
public override int GetHashCode()
{
return 21313;
}
}
public class B
{
// TODO Equals override
public override int GetHashCode()
{
return 35507;
}
}
Then if I have a Dictionary<object, TValue> containing instances of A, B and other types , the performance of the lookup will be better than if all the implementations of GetHashCode returned the same numeric value.
It should also be noted that I make use of prime numbers to get a better distribution.
As per the comments I have provided a LINQPad sample here which demonstrates the performance difference between using return 1 for different types and returning a different value for each type.

static Object.Equals method, default implementation of GetHashCode and the Dictionary class

I just want to confirm my understanding of a few fundamentals. Hope you don't mind!
I understand the static equals method
Object.Equals(objA, objB)
first checks for reference equality. If not equal by reference, then calls the object instance equals method
objA.Equals(objB)
Currently in my override for equals, i first check for reference equality, and if not equal referentially then check with all members to see if the semantics are the same. Is this a good approach? If so, then the static version seems superfluous?
Also what exactly does the default GetHashCode for an object do?
If I add my object to a dictionary which is a HashTable underneath and don't override equals and GetHashCode, then I guess I should do to make it sort optimally hence better retrieval time?
Currently in my override for equals, i first check for reference
equality, and if not equal referentially then check with all members
to see if the semantics are the same. Is this a good approach? If so,
then the static version seems superfluous?
Yes, it's a great idea to do the fast reference-equality check. There's no guarantee that your method will be called through the static Object.Equals method - it could well be called directly. For example, EqualityComparer<T>.Default (the typical middleman for equality checking) will directly call this method in many situations (when the type does not implement IEquatable<T>) without first doing a reference-equality check.
Also what exactly does the default GetHashCode for an object do?
It forwards toRuntimeHelpers.GetHashCode: a magic, internally-implemented CLR method that is a compliant GetHashCode implementation for reference-equality. For more information, see Default implementation for Object.GetHashCode(). You should definitely override it whenever you override Equals.
EDIT:
If I add my object to a dictionary which is a HashTable underneath and
don't override equals and GetHashCode, then I guess I should do to
make it sort optimally hence better retrieval time?
If you don't override either, you'll get reference-equality with (probably) a well-balanced table.
If you override one but not the other or implement them in any other non-compliant way, you'll get a broken hashtable.
By the way, hashing is quite different from sorting.
For more information, see Why is it important to override GetHashCode when Equals method is overriden in C#?
Your first question was already answered, but I think the second was not fully answered.
Implementing your GetHashCode is important if you want to use your object as a key in a hash table or a dictionary. It minimizes collisions and therefore it speeds the lookup. A lookup collision happens when two or more keys have the same hashcode and for those equals method is invoked. If the hashcode is unique, an equals will only be called once, otherwise it will be called for every key with the same hashcode until the equals returns true.

Categories