I just want to confirm my understanding of a few fundamentals. Hope you don't mind!
I understand the static equals method
Object.Equals(objA, objB)
first checks for reference equality. If not equal by reference, then calls the object instance equals method
objA.Equals(objB)
Currently in my override for equals, i first check for reference equality, and if not equal referentially then check with all members to see if the semantics are the same. Is this a good approach? If so, then the static version seems superfluous?
Also what exactly does the default GetHashCode for an object do?
If I add my object to a dictionary which is a HashTable underneath and don't override equals and GetHashCode, then I guess I should do to make it sort optimally hence better retrieval time?
Currently in my override for equals, i first check for reference
equality, and if not equal referentially then check with all members
to see if the semantics are the same. Is this a good approach? If so,
then the static version seems superfluous?
Yes, it's a great idea to do the fast reference-equality check. There's no guarantee that your method will be called through the static Object.Equals method - it could well be called directly. For example, EqualityComparer<T>.Default (the typical middleman for equality checking) will directly call this method in many situations (when the type does not implement IEquatable<T>) without first doing a reference-equality check.
Also what exactly does the default GetHashCode for an object do?
It forwards toRuntimeHelpers.GetHashCode: a magic, internally-implemented CLR method that is a compliant GetHashCode implementation for reference-equality. For more information, see Default implementation for Object.GetHashCode(). You should definitely override it whenever you override Equals.
EDIT:
If I add my object to a dictionary which is a HashTable underneath and
don't override equals and GetHashCode, then I guess I should do to
make it sort optimally hence better retrieval time?
If you don't override either, you'll get reference-equality with (probably) a well-balanced table.
If you override one but not the other or implement them in any other non-compliant way, you'll get a broken hashtable.
By the way, hashing is quite different from sorting.
For more information, see Why is it important to override GetHashCode when Equals method is overriden in C#?
Your first question was already answered, but I think the second was not fully answered.
Implementing your GetHashCode is important if you want to use your object as a key in a hash table or a dictionary. It minimizes collisions and therefore it speeds the lookup. A lookup collision happens when two or more keys have the same hashcode and for those equals method is invoked. If the hashcode is unique, an equals will only be called once, otherwise it will be called for every key with the same hashcode until the equals returns true.
Related
I'm implementing IEquatable<T>, and I am having difficulty finding consensus on the GetHashCode override on a mutable class.
The following resources all provide an implementation where GetHashCode would return different values during the object's lifetime if the object changes:
https://stackoverflow.com/a/13906125/197591
https://csharp.2000things.com/tag/iequatable/
http://broadcast.oreilly.com/2010/09/understanding-c-equality-iequa.html
However, this link states that GetHashCode should not be implemented for mutable types for the reason that it could cause undesirable behaviour if the object is part of a collection (and this has always been my understanding also).
Interestingly, the MSDN example implements the GetHashCode using only immutable properties which is in line with my understanding. But I'm confused as to why the other resources don't cover this. Are they simply wrong?
And if a type has no immutable properties at all, the compiler warns that GetHashCode is missing when I override Equals(object). In this case, should I implement it and just call base.GetHashCode() or just disable the compiler warning, or have I missed something and GetHashCode should always be overridden and implemented? In fact, if the advice is that GetHashCode should not be implemented for mutable types, why bother implementing for immutable types? Is it simply to reduce collisions compared to the default GetHashCode implementation, or does it actually add more tangible functionality?
To summarise my Question, my dilemma is that using GetHashCode on mutable objects means it can return different values during the lifetime of the object if properties on it change. But not using it means that the benefit of comparing objects that might be equivalent is lost because it will always return a unique value and thus collections will always fall back to using Equals for its operations.
Having typed this Question out, another Question popped up in the 'Similar Questions' box that seems to address the same topic. The answer there seems to be quite explicit in that only immutable properties should be used in a GetHashCode implementation. If there are none, then simply don't write one. Dictionary<TKey, TValue> will still function correctly albeit not at O(1) performance.
Mutable classes work quite bad with Dictionaries and other classes that relies on GetHashCode and Equals.
In the scenario you are describing, with mutable object, I suggest one of the following:
class ConstantHasCode: IEquatable<ConstantHasCode>
{
public int SomeVariable;
public virtual Equals(ConstantHasCode other)
{
return other.SomeVariable == SomeVariable;
}
public override int GetHashCode()
{
return 0;
}
}
or
class ThrowHasCode: IEquatable<ThrowHasCode>
{
public int SomeVariable;
public virtual Equals(ThrowHasCode other)
{
return other.SomeVariable == SomeVariable;
}
public override int GetHashCode()
{
throw new ApplicationException("this class does not support GetHashCode and should not be used as a key for a dictionary");
}
}
With the first, Dictionary works (almost) as expected, with performance penalty in lookup and insertion: in both cases, Equals will be called for every element already in the dictionary until a comparison return true. You are actually reverting to performance of a List
The second is a way to tell the programmers will use your class "no, you cannot use this within a dictionary".
Unfortunately, as far as I know there is no method to detect it at compile time, but this will fail the first time the code adds an element to the dictionary, very likely quite early while developping, not the kind of bug happening only in production environment with an unpredicted set of input.
Last but not least, ignore the "mutable" problem and implement GetHashCode using member variables: now you have to be aware that you are not free to modify the class when it's used withing a Dictionary. In some scenario this can be acceptable, in other it's not
It all depends of what kind of collection type you are talking about. For my answer I will assume you are talking about Hash Table based collections and in particular I will address it for .NET Dictionary and Key calculation.
So best way to identify what will happen if you modify key( given your key is a class which does custom HashCode calculation) is to look at the .NET source. From .NET source we can see that your key value pair is now wrapped into Entry struct which carries hashcode which was calculated on addition of your value. Meaning that if you change HashCode value after that time of your key was added, it will no longer be able to find a value in dictionary.
Code to prove it:
static void Main()
{
var myKey = new MyKey { MyBusinessKey = "Ohai" };
var dic = new Dictionary<MyKey, int>();
dic.Add(myKey, 1);
Console.WriteLine(dic[myKey]);
myKey.MyBusinessKey = "Changing value";
Console.WriteLine(dic[myKey]); // Key Not Found Exception.
}
public class MyKey
{
public string MyBusinessKey { get; set; }
public override int GetHashCode()
{
return MyBusinessKey.GetHashCode();
}
}
.NET source reference.
So to answer your question. You want to have immutable values for which you base your hashcode calculation on.
Another point, hashcode for custom class if you do not override GetHashCode will be based on reference of the object. So concern of returning same hashcode for different object which are identical in underlying values could be mitigated by overriding GetHashCode method and calculating your HashCode depending on your business keys. For example you would have two string properties, to calculate hashcode you would concat strings and call base string GetHashCode method. This will guarantee that you will get same hashcode for same underlying values of the object.
After much discussion and reading other SO answers on the topic, it was eventually this ReSharper help page that summarised it very well for me:
MSDN documentation of the GetHashCode() method does not explicitly require that your override of this method returns a value that never changes during the object's lifetime. Specifically, it says:
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.
On the other hand, it says that the hash code should not change at least when your object is in a collection:
*You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
You can compute the hash code from fields that are not mutable; or
You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.*
But why do you need to override GetHashCode() in the first place? Normally, you will do it if your object is going to be used in a Hashtable, as a key in a dictionary, etc., and it's quite hard to predict when your object will be added to a collection and how long it will be kept there.
With all that said, if you want to be on the safe side make sure that your override of GetHashCode() returns the same value during the object's lifetime. ReSharper will help you here by pointing at each non-readonly field or non-get-only property in your implementation of GetHashCode(). If possible, ReSharper will also suggest quick-fixes to make these members read-only/get-only.
Of course, it doesn't suggest what to do if the quick-fixes are not possible. However, it does indicate that those quick-fixes should only be used "if possible" which implies that the the inspection could be suppressed. Gian Paolo's answer on this suggests to throw an exception which will prevent the class from being used as a key and would present itself early in development if it was inadvertently used as a key.
However, GetHashCode is used in other circumstances such as when an instance of your object is passed as a parameter to a mock method setup. Therefore, the only viable option is to implement GetHashCode using the mutable values and put the onus on the rest of the code to ensure the object is not mutated while it is being used as a key, or to not use it as a key at all.
Attempt #3 to simplify this question:
A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Questions:
Am I understanding that correctly?
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
In general, how does one separate value equality from reference equality when overriding .Equals()?
The original line of code spurring this question:
// For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;
StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));
.
Ok, let's handle a few misconceptions first:
By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.
I can't think of a single case where that would be desirable for a reference type.
Oh I'm sure you can... String is a reference type for instance :)
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Err... No.
IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:
It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)
So, for your actual questions:
Am I understanding that correctly?
You still seem a bit confused about this, but don't worry :)
Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()
By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()
Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).
It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.
Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.
Am I understanding that correctly?
Partially - the "default" IEqualityComparer will use either (in order):
The implementation of IEquatable<T>
An overridden Equals(object)
the base object.Equals(object), which is reference equality for reference types.
I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.
The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.
So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.
If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.
It would be strange to have an instance method accept itself as a parameter...
Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>
Bottom Line
If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.
Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.
I am aware of the importance to override GetHashCode when we override Equals method. I assume Equals internally calls GetHashCode.
What are the other methods that might be internally using GetHashCode?
Equals doesn't internally call GetHashCode.
GetHashCode is used by many classes as a means to improve performance: If the hash codes of two instances differ, the instances are not equal by definition so the call to Equals can be skipped.
Only if the hash codes are the same it needs to call Equals, because multiple instances can have the same hash code, even if they are different.
Concrete examples of classes that work like this:
Dictionary
HashSet
I assume Equals internally calls GetHashCode.
That would be pretty unusual actually. GetHashCode is used mainly by dictionaries and other hash-set based implementations; so: Hashtable, Dictionary<,>, HashSet<>, and a range of other things. Basically, GetHashCode serves two purposes:
getting a number which loosely represents the value, and which can be used, for example, for distributing a set of keys over a range of buckets via modulo, or any other numeric categorisation
proving non-equality (but never proving equality)
See also: Why is it important to override GetHashCode when Equals method is overridden?
I want to check if an object is in a Queue before I enqueue it. If don't explicitly define an EqualityComparer, what does the Contains() function compare?
If it compares property values, that's perfect. If it compares to see if a reference to that object exists in the Queue then that defeats what I'm trying to accomplish in my code.
For classes, the default equality operation is by reference - it assumes that object identity and equality are the same, basically.
You can overcome this by overriding Equals and GetHashCode. I'd also suggest implementing IEquatable<T> to make this clear. Your hash code implementation should generate the hash code from the same values as the equality operation.
The default for reference types is to compare the reference.
However, if the type implements IEquatable<> it can be doing a different comparison. If you need to have a specific equality comparison in place, you need to create one yourself.
Suppose I have a class T that I want to use as a key in a Dictionary<T,U> collection.
What must I implement in T so that these keys are based on values of T rather than T references?
I'm hoping it's just GetHashCode().
You must implement GetHashCode() and Equals().
Dictionary is a Hashtable below the covers, so you might want to read this: Pitfalls Of Equals/GetHashCode – How Does A Hash Table Work?
If you don't pass any IEqualityComparer<T> in the dictionary constructor, it will use EqualityComparer<T>.Default which is defined by MSDN as :
The Default property checks whether
type T implements the System.IEquatable(Of T)
interface and, if so, returns an
EqualityComparer(Of T) that
uses that implementation. Otherwise,
it returns an EqualityComparer(Of T) that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
So implementing IEquatable<T> would be my choice (if you implement it it also makes sense to override Equals and GetHashCode anyway).
Either implement Equals and GetHashCode or create an appropriate IEqualityComparer<T> which has the right form of equality matching for your map.
I rather like the IEqualityComparer<T> route: in many cases there isn't one obviously-right form of equality - you want to treat objects as equal in different ways depending on the situation. In that case, a custom equality comparer is just what you need. Of course, if there is a natural equality operation, it makes sense to implement IEquatable<T> in the type itself... if you can. (Another benefit of IEqualityComparer<T> is that you can implement it for types you have no control over.)
You need to override Equals(object obj). It is always expected that you implement GetHashCode when you modify Equals. Read this article at MSDN.