Value vs. Reference equality in generic List<T>.Contains() - c#

Attempt #3 to simplify this question:
A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Questions:
Am I understanding that correctly?
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
In general, how does one separate value equality from reference equality when overriding .Equals()?
The original line of code spurring this question:
// For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;
StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));
.

Ok, let's handle a few misconceptions first:
By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.
I can't think of a single case where that would be desirable for a reference type.
Oh I'm sure you can... String is a reference type for instance :)
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Err... No.
IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:
It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)
So, for your actual questions:
Am I understanding that correctly?
You still seem a bit confused about this, but don't worry :)
Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()
By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()
Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).
It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.
Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.

Am I understanding that correctly?
Partially - the "default" IEqualityComparer will use either (in order):
The implementation of IEquatable<T>
An overridden Equals(object)
the base object.Equals(object), which is reference equality for reference types.
I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.
The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.
So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.
If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.
It would be strange to have an instance method accept itself as a parameter...
Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>
Bottom Line
If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.
Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.

Related

What is the default equality comparer for a set type?

In the MSDN API for the HashSet constructor with no arguments it states
Initializes a new instance of the HashSet class that is empty and
uses the default equality comparer for the set type.
What is the default equality comparer for the set type, e.g. for a custom class?
BTW: Is it just me or is the MSDN API documentation really a bit thin on explanations? I stumble about such questions more than once when reading it.
It means it will use the comparer returned by EqualityComparer<T>.Default for the element type T of the set.
As the documentation states:
The Default property checks whether type T implements the
System.IEquatable interface and, if so, returns an
EqualityComparer that uses that implementation. Otherwise, it
returns an EqualityComparer that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
So for your custom type, it will use the GetHashCode method you have defined to locate items in the set. If you have implemented IEquatable<T> it will use IEquatable<T>.Equals(T) for equality, otherwise it will use your Equals(object) method. This method defaults to reference equality as defined in the object class. Therefore if you are defining equality using either method, you should ensure you also override GetHashCode as well.
By default, it will delegate to EqualityComparer<T>.Default. This returns a comparer that can compare two objects of type T.
For a custom class, this does a few things in this order:
if the class implements IEquatable<T>, it will delegate to the class's implementation of this interface
if the class has an Equals method defined, it will use that
as a last resort, it will use reference equality

A questionable inside into overriding Equals

Following Guidelines for Overriding Equals() and Operator == (C# Programming Guide), it seems advisable to override gethashcode when overriding equals(object), as well as equals(type).
It is in my understanding that there is an endless discussion about what's the best implementation for overriding Equals. However, I still like to understand the Equals concept a little better and decide for my own.
My questions will probably be kinda noobish, but here we go:
What is the main difference between Equals(object) and Equals(type) (independently of the given parameters)?
As far as I understand (And I could be completely wrong, so this is a question at the same time):
Equals(object) is a build in method that looks (at default) if object
references are the same. And Equals(Type) is a local method you
create. So in fact, what you have in that class is the method Equals
with 2 overloads.
Why do they check for property equality twice?
In equals(object) :
return base.Equals(obj) && z == p.z;
and in equals(type) :
return base.Equals((TwoDPoint)p) && z == p.z;
Why is it advisable to implement the Equals(type) method?
Most of my questions are rapped in my statement in question 1. So note any wrong or misleading arguments plz. Also, feel free to add any information, it will certainly help.
Thanks in advance
First lets distinguish the 2 methods
object.Equals() is a method on the root object which is marked as virtual and therefore can be overriden in a derived class.
IEquatable<T>.Equals is a method obtained by implementing the IEquatable<T> interface.
The latter is used for determining equality inside a generic Collection; so say the documentation:
The IEquatable<T> interface is used by generic collection objects such as Dictionary<TKey, TValue>, List<T>, and LinkedList<T> when testing for equality in such methods as Contains, IndexOf, LastIndexOf, and Remove. It should be implemented for any object that might be stored in a generic collection.
The former is used for determining equality everywhere else.
So with the groundwork in place lets try to answer some of your specific questions
What is the main difference between Equals(object) and Equals(type) (independently of the given parameters)?
One operates on any type, the other compares instances of the same type
Why do they check for property equality twice?
They dont, generally only one is used. However quite often one implementation calls the other internally
Why is it advisable to implement the Equals(type) method?
The answer is above - if you intend to store the object in a generic collection
As a side note, and one which may help you understand this, the default behaviour of equality checking is to check that the references are the same (ie, that one object is exactly the same instance as another). Quite often overriding/implementing different equality logic is used to compare some data within fields of the object (akin to your example of z == p.z)
One difference between the overloads is that, as noted, one will be invoked when comparing an object to things which are known at compile time to be of the same type, while the other will be invoked in all other circumstances. Another very important difference which has not been mentioned is that Equals(ownType) will act not only on things of ownType, but also on things that are implicitly convertible to ownType. Because of this, Equals cannot not be expected to implement an equivalence relation among objects of convertible types unless one forces its operands to be of type Object. Consider, for example,
(12.0).Equals(12);
converts the integer value 12 to the Double value 12.0. Since the type and value of the passed value precisely match the 12.0 whose Equals method is being invoked, thus returning true.
(12).Equals(12.0);
Because Double is not implicitly convertible to Int32, passes the Double value as Object instead. Since the Double does not match the type of the 12 whose Equals method is being invoked, the method returns false.
The virtual method Equals(Object) implements an equivalence relation; in many cases involving implicit type conversions, the type-specific methods cannot be expected to do so.

What's compared in my Class without an EqualityComparer?

I want to check if an object is in a Queue before I enqueue it. If don't explicitly define an EqualityComparer, what does the Contains() function compare?
If it compares property values, that's perfect. If it compares to see if a reference to that object exists in the Queue then that defeats what I'm trying to accomplish in my code.
For classes, the default equality operation is by reference - it assumes that object identity and equality are the same, basically.
You can overcome this by overriding Equals and GetHashCode. I'd also suggest implementing IEquatable<T> to make this clear. Your hash code implementation should generate the hash code from the same values as the equality operation.
The default for reference types is to compare the reference.
However, if the type implements IEquatable<> it can be doing a different comparison. If you need to have a specific equality comparison in place, you need to create one yourself.

What must be done to use the value of a reference type as a dictionary key?

Suppose I have a class T that I want to use as a key in a Dictionary<T,U> collection.
What must I implement in T so that these keys are based on values of T rather than T references?
I'm hoping it's just GetHashCode().
You must implement GetHashCode() and Equals().
Dictionary is a Hashtable below the covers, so you might want to read this: Pitfalls Of Equals/GetHashCode – How Does A Hash Table Work?
If you don't pass any IEqualityComparer<T> in the dictionary constructor, it will use EqualityComparer<T>.Default which is defined by MSDN as :
The Default property checks whether
type T implements the System.IEquatable(Of T)
interface and, if so, returns an
EqualityComparer(Of T) that
uses that implementation. Otherwise,
it returns an EqualityComparer(Of T) that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
So implementing IEquatable<T> would be my choice (if you implement it it also makes sense to override Equals and GetHashCode anyway).
Either implement Equals and GetHashCode or create an appropriate IEqualityComparer<T> which has the right form of equality matching for your map.
I rather like the IEqualityComparer<T> route: in many cases there isn't one obviously-right form of equality - you want to treat objects as equal in different ways depending on the situation. In that case, a custom equality comparer is just what you need. Of course, if there is a natural equality operation, it makes sense to implement IEquatable<T> in the type itself... if you can. (Another benefit of IEqualityComparer<T> is that you can implement it for types you have no control over.)
You need to override Equals(object obj). It is always expected that you implement GetHashCode when you modify Equals. Read this article at MSDN.

Quick question about a reference type key in a generic dictionary in .Net

I have a mutable class that I'm using as a key to a generic dictionary. Two keys should be equal only if their references are equal.
From what I've read, in this case, I don't need to override Equals, GetHashCode , or implement IEqualityComparer.
Is this correct?
Yes. The default comparison operation in System.Object uses reference equality. If this behavior is what you want, the defaults should work fine.
Yes, this is correct. As long as you don't override, reference is the default comparison.
I'll add on to what everyone else has said here (yes) but with one more point that no one seems to have mentioned here.
When using generic collections (Dictionary, List, etc) you can override IEquatable to provide a type specific version that can do your comparison without boxing or up/down casting. These generic collections will use this overload when present to do comparisons and it can be a bit more efficient.
As noted in the docs, when implementing IEquatable you still need to override Equals/Hashcode from Object.
As everyone else pointed out already, yes, you are correct. In fact, you definitely do not want to override the equality members if your type is mutable (it has setters). But, if you want to have equality checking which uses values in your type, you can make your type immutable (like String) by ensuring that there are no setters (only the constructor sets values). Or use a struct.
For anybody using .Net 5 or later it comes with a ReferenceEqualityComparer class that you can pass to the dictionary's constructor. This means you don't need to worry about someone overriding GetHashCode and Equals in the future.
Yes you are correct doing a == comparison (or .Equals) on two objects compares their references if no other overload is specified.
String s = "a";
object test1 = (object)s;
object test2 = (object)s;
Debug.Assert(test1.Equals(test2));

Categories