I am aware of the importance to override GetHashCode when we override Equals method. I assume Equals internally calls GetHashCode.
What are the other methods that might be internally using GetHashCode?
Equals doesn't internally call GetHashCode.
GetHashCode is used by many classes as a means to improve performance: If the hash codes of two instances differ, the instances are not equal by definition so the call to Equals can be skipped.
Only if the hash codes are the same it needs to call Equals, because multiple instances can have the same hash code, even if they are different.
Concrete examples of classes that work like this:
Dictionary
HashSet
I assume Equals internally calls GetHashCode.
That would be pretty unusual actually. GetHashCode is used mainly by dictionaries and other hash-set based implementations; so: Hashtable, Dictionary<,>, HashSet<>, and a range of other things. Basically, GetHashCode serves two purposes:
getting a number which loosely represents the value, and which can be used, for example, for distributing a set of keys over a range of buckets via modulo, or any other numeric categorisation
proving non-equality (but never proving equality)
See also: Why is it important to override GetHashCode when Equals method is overridden?
Related
Attempt #3 to simplify this question:
A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Questions:
Am I understanding that correctly?
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
In general, how does one separate value equality from reference equality when overriding .Equals()?
The original line of code spurring this question:
// For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;
StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));
.
Ok, let's handle a few misconceptions first:
By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.
I can't think of a single case where that would be desirable for a reference type.
Oh I'm sure you can... String is a reference type for instance :)
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Err... No.
IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:
It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)
So, for your actual questions:
Am I understanding that correctly?
You still seem a bit confused about this, but don't worry :)
Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()
By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()
Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).
It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.
Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.
Am I understanding that correctly?
Partially - the "default" IEqualityComparer will use either (in order):
The implementation of IEquatable<T>
An overridden Equals(object)
the base object.Equals(object), which is reference equality for reference types.
I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.
The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.
So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.
If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.
It would be strange to have an instance method accept itself as a parameter...
Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>
Bottom Line
If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.
Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.
In .NET, Whenever we override Equals() method for a class, it is a normal practice to override the GetHashCode() method as well. Doing so will ensure better performance when the object is used in Hashtables and Dictionaries. Two keys are considered to be equal in Hashtable only if their GetHashCode() values are same. My question is why can't the Hashtables use Equals() method to compare the keys?, that would have removed the burden of overriding GetHashCode() method.
HastTable/Dictionaries use Equals in case of collision (when two hash codes are same).
Why don't they use only Equals ?
Because that would require a lot more processing than accessing/(comparing) integer value value (hash code). (Since hash codes are used as index so they have the complexity of O(1))
A HashSet (or HashTable, or Dictionary) uses an array of buckets to distribute the items, those buckets are indexed by the object's hash code (which should be immutable), so the search of the bucket the item is in is O(1).
Then it uses Equals within that bucket to find the exact match if there's more than one item with the same hashcode: that's O(N) since it needs to iterate over all items within that bucket to find the match.
If a hashset used only Equals, finding an item would be O(N) and you could aswell be using a list, or an array.
That's also why two equal items must have the same hashcode, but two items with the same hashcode don't necessarily need to be equal.
Two object instances that compare as equal must always have identical hash codes. If this doesn't hold, hash-based data structures will not work correctly. It's not a matter of performance.
Two object instances that don't compare as equal should ideally have different hash codes. If this doesn't hold, hash-based data structures will have degraded performance, but at least they'll still work.
Thus, for a given object instance, GetHashCode needs to reflect the logic of Equals, to some extent.
Now if you're overriding the Equals method, you're providing custom comparison logic. As an example, let's say your custom comparison logic involves only one particular data member of the instance. For a non-virtual GetHashCode method to be useful, it would have to be general enough to understand your custom Equals logic and be able to come up with a custom hash code function (one that only involves your chosen data member) on the spot.
It's not that easy to write such a sophisticated GetHashCode and it's not worth the trouble either, when the user can simply provide a custom one-liner that honors the initial requirement.
I just want to confirm my understanding of a few fundamentals. Hope you don't mind!
I understand the static equals method
Object.Equals(objA, objB)
first checks for reference equality. If not equal by reference, then calls the object instance equals method
objA.Equals(objB)
Currently in my override for equals, i first check for reference equality, and if not equal referentially then check with all members to see if the semantics are the same. Is this a good approach? If so, then the static version seems superfluous?
Also what exactly does the default GetHashCode for an object do?
If I add my object to a dictionary which is a HashTable underneath and don't override equals and GetHashCode, then I guess I should do to make it sort optimally hence better retrieval time?
Currently in my override for equals, i first check for reference
equality, and if not equal referentially then check with all members
to see if the semantics are the same. Is this a good approach? If so,
then the static version seems superfluous?
Yes, it's a great idea to do the fast reference-equality check. There's no guarantee that your method will be called through the static Object.Equals method - it could well be called directly. For example, EqualityComparer<T>.Default (the typical middleman for equality checking) will directly call this method in many situations (when the type does not implement IEquatable<T>) without first doing a reference-equality check.
Also what exactly does the default GetHashCode for an object do?
It forwards toRuntimeHelpers.GetHashCode: a magic, internally-implemented CLR method that is a compliant GetHashCode implementation for reference-equality. For more information, see Default implementation for Object.GetHashCode(). You should definitely override it whenever you override Equals.
EDIT:
If I add my object to a dictionary which is a HashTable underneath and
don't override equals and GetHashCode, then I guess I should do to
make it sort optimally hence better retrieval time?
If you don't override either, you'll get reference-equality with (probably) a well-balanced table.
If you override one but not the other or implement them in any other non-compliant way, you'll get a broken hashtable.
By the way, hashing is quite different from sorting.
For more information, see Why is it important to override GetHashCode when Equals method is overriden in C#?
Your first question was already answered, but I think the second was not fully answered.
Implementing your GetHashCode is important if you want to use your object as a key in a hash table or a dictionary. It minimizes collisions and therefore it speeds the lookup. A lookup collision happens when two or more keys have the same hashcode and for those equals method is invoked. If the hashcode is unique, an equals will only be called once, otherwise it will be called for every key with the same hashcode until the equals returns true.
I tried to find out how C# goes about comparing objects in a HashSet for equality.
I couldn't find anything here: http://msdn.microsoft.com/en-us/library/bb359438.aspx
Only when I came to stackoverflow, I read that it uses the Equals() and maybe getHashCode()
I was planning to implement both methods anyways, but my question is:
What would you do to find out how HashSet actually compares objects?
It compares objects for equality using Equals. It determines which bucket to place them in using GetHashCode.
More generically, HashSet uses the IEqualityComparer<T> passed in to its constructor to do both. If none is specified, it uses EqualityComparer<T>.Default which calls the object's GetHashCode() and IEquatable<T>.Equals() method (or object.Equals() if the type doesn't implement IEquatable<T>).
Suppose I have a class T that I want to use as a key in a Dictionary<T,U> collection.
What must I implement in T so that these keys are based on values of T rather than T references?
I'm hoping it's just GetHashCode().
You must implement GetHashCode() and Equals().
Dictionary is a Hashtable below the covers, so you might want to read this: Pitfalls Of Equals/GetHashCode – How Does A Hash Table Work?
If you don't pass any IEqualityComparer<T> in the dictionary constructor, it will use EqualityComparer<T>.Default which is defined by MSDN as :
The Default property checks whether
type T implements the System.IEquatable(Of T)
interface and, if so, returns an
EqualityComparer(Of T) that
uses that implementation. Otherwise,
it returns an EqualityComparer(Of T) that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
So implementing IEquatable<T> would be my choice (if you implement it it also makes sense to override Equals and GetHashCode anyway).
Either implement Equals and GetHashCode or create an appropriate IEqualityComparer<T> which has the right form of equality matching for your map.
I rather like the IEqualityComparer<T> route: in many cases there isn't one obviously-right form of equality - you want to treat objects as equal in different ways depending on the situation. In that case, a custom equality comparer is just what you need. Of course, if there is a natural equality operation, it makes sense to implement IEquatable<T> in the type itself... if you can. (Another benefit of IEqualityComparer<T> is that you can implement it for types you have no control over.)
You need to override Equals(object obj). It is always expected that you implement GetHashCode when you modify Equals. Read this article at MSDN.