I was reading the MSDN documentation about object.Equals. in the remarks part it mentioned:
If the two objects do not represent the same object reference and
neither is null, it calls objA.Equals(objB) and returns the result.
This means that if objA overrides the Object.Equals(Object) method,
this override is called.
My question is why they did not implement this part as objA.Equals(objB) && objB.Equals(objA) to make equality symmetric and just relate on one side of the relation? It can result in strange behaviors when calling object.Equals.
EDIT: Strange behavior can happen when type of objA overrides Equals method and implemented it as something not predictable, but type of objB does not override Equals.
Basically, this would only be of any use to developers with flawed Equals implementations. From the documentation:
The following statements must be true for all implementations of the Equals(Object) method. In the list, x, y, and z represent object references that are not null.
[...]
x.Equals(y) returns the same value as y.Equals(x).
[...]
So the check is redundant in every case where the method has been correctly implemented - causing a performance hit to every developer who has done the right thing.
It isn't even terribly useful to developers who haven't done the right thing, as they may still expect object.Equals(x, y) to return true when it returns false - they could debug and find that their method returns true, after all. You could say that it would be documented to check both ways round - but we've already established that the only developers this affects are ones who don't read the documentation anyway.
Basically, when you override a method or implement an interface, you should know what you're doing and obey the specified contract. If you don't do that, you will get odd behaviour, and I don't think it's reasonable to expect every caller to try to work around implementations which don't do what they're meant to.
Related
Attempt #3 to simplify this question:
A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Questions:
Am I understanding that correctly?
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
In general, how does one separate value equality from reference equality when overriding .Equals()?
The original line of code spurring this question:
// For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;
StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));
.
Ok, let's handle a few misconceptions first:
By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.
I can't think of a single case where that would be desirable for a reference type.
Oh I'm sure you can... String is a reference type for instance :)
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Err... No.
IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:
It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)
So, for your actual questions:
Am I understanding that correctly?
You still seem a bit confused about this, but don't worry :)
Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()
By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()
Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).
It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.
Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.
Am I understanding that correctly?
Partially - the "default" IEqualityComparer will use either (in order):
The implementation of IEquatable<T>
An overridden Equals(object)
the base object.Equals(object), which is reference equality for reference types.
I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.
The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.
So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.
If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.
It would be strange to have an instance method accept itself as a parameter...
Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>
Bottom Line
If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.
Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.
Following Guidelines for Overriding Equals() and Operator == (C# Programming Guide), it seems advisable to override gethashcode when overriding equals(object), as well as equals(type).
It is in my understanding that there is an endless discussion about what's the best implementation for overriding Equals. However, I still like to understand the Equals concept a little better and decide for my own.
My questions will probably be kinda noobish, but here we go:
What is the main difference between Equals(object) and Equals(type) (independently of the given parameters)?
As far as I understand (And I could be completely wrong, so this is a question at the same time):
Equals(object) is a build in method that looks (at default) if object
references are the same. And Equals(Type) is a local method you
create. So in fact, what you have in that class is the method Equals
with 2 overloads.
Why do they check for property equality twice?
In equals(object) :
return base.Equals(obj) && z == p.z;
and in equals(type) :
return base.Equals((TwoDPoint)p) && z == p.z;
Why is it advisable to implement the Equals(type) method?
Most of my questions are rapped in my statement in question 1. So note any wrong or misleading arguments plz. Also, feel free to add any information, it will certainly help.
Thanks in advance
First lets distinguish the 2 methods
object.Equals() is a method on the root object which is marked as virtual and therefore can be overriden in a derived class.
IEquatable<T>.Equals is a method obtained by implementing the IEquatable<T> interface.
The latter is used for determining equality inside a generic Collection; so say the documentation:
The IEquatable<T> interface is used by generic collection objects such as Dictionary<TKey, TValue>, List<T>, and LinkedList<T> when testing for equality in such methods as Contains, IndexOf, LastIndexOf, and Remove. It should be implemented for any object that might be stored in a generic collection.
The former is used for determining equality everywhere else.
So with the groundwork in place lets try to answer some of your specific questions
What is the main difference between Equals(object) and Equals(type) (independently of the given parameters)?
One operates on any type, the other compares instances of the same type
Why do they check for property equality twice?
They dont, generally only one is used. However quite often one implementation calls the other internally
Why is it advisable to implement the Equals(type) method?
The answer is above - if you intend to store the object in a generic collection
As a side note, and one which may help you understand this, the default behaviour of equality checking is to check that the references are the same (ie, that one object is exactly the same instance as another). Quite often overriding/implementing different equality logic is used to compare some data within fields of the object (akin to your example of z == p.z)
One difference between the overloads is that, as noted, one will be invoked when comparing an object to things which are known at compile time to be of the same type, while the other will be invoked in all other circumstances. Another very important difference which has not been mentioned is that Equals(ownType) will act not only on things of ownType, but also on things that are implicitly convertible to ownType. Because of this, Equals cannot not be expected to implement an equivalence relation among objects of convertible types unless one forces its operands to be of type Object. Consider, for example,
(12.0).Equals(12);
converts the integer value 12 to the Double value 12.0. Since the type and value of the passed value precisely match the 12.0 whose Equals method is being invoked, thus returning true.
(12).Equals(12.0);
Because Double is not implicitly convertible to Int32, passes the Double value as Object instead. Since the Double does not match the type of the 12 whose Equals method is being invoked, the method returns false.
The virtual method Equals(Object) implements an equivalence relation; in many cases involving implicit type conversions, the type-specific methods cannot be expected to do so.
I just want to confirm my understanding of a few fundamentals. Hope you don't mind!
I understand the static equals method
Object.Equals(objA, objB)
first checks for reference equality. If not equal by reference, then calls the object instance equals method
objA.Equals(objB)
Currently in my override for equals, i first check for reference equality, and if not equal referentially then check with all members to see if the semantics are the same. Is this a good approach? If so, then the static version seems superfluous?
Also what exactly does the default GetHashCode for an object do?
If I add my object to a dictionary which is a HashTable underneath and don't override equals and GetHashCode, then I guess I should do to make it sort optimally hence better retrieval time?
Currently in my override for equals, i first check for reference
equality, and if not equal referentially then check with all members
to see if the semantics are the same. Is this a good approach? If so,
then the static version seems superfluous?
Yes, it's a great idea to do the fast reference-equality check. There's no guarantee that your method will be called through the static Object.Equals method - it could well be called directly. For example, EqualityComparer<T>.Default (the typical middleman for equality checking) will directly call this method in many situations (when the type does not implement IEquatable<T>) without first doing a reference-equality check.
Also what exactly does the default GetHashCode for an object do?
It forwards toRuntimeHelpers.GetHashCode: a magic, internally-implemented CLR method that is a compliant GetHashCode implementation for reference-equality. For more information, see Default implementation for Object.GetHashCode(). You should definitely override it whenever you override Equals.
EDIT:
If I add my object to a dictionary which is a HashTable underneath and
don't override equals and GetHashCode, then I guess I should do to
make it sort optimally hence better retrieval time?
If you don't override either, you'll get reference-equality with (probably) a well-balanced table.
If you override one but not the other or implement them in any other non-compliant way, you'll get a broken hashtable.
By the way, hashing is quite different from sorting.
For more information, see Why is it important to override GetHashCode when Equals method is overriden in C#?
Your first question was already answered, but I think the second was not fully answered.
Implementing your GetHashCode is important if you want to use your object as a key in a hash table or a dictionary. It minimizes collisions and therefore it speeds the lookup. A lookup collision happens when two or more keys have the same hashcode and for those equals method is invoked. If the hashcode is unique, an equals will only be called once, otherwise it will be called for every key with the same hashcode until the equals returns true.
I've got a complex class in my C# project on which I want to be able to do equality tests. It is not a trivial class; it contains a variety of scalar properties as well as references to other objects and collections (e.g. IDictionary). For what it's worth, my class is sealed.
To enable a performance optimization elsewhere in my system (an optimization that avoids a costly network round-trip), I need to be able to compare instances of these objects to each other for equality – other than the built-in reference equality – and so I'm overriding the Object.Equals() instance method. However, now that I've done that, Visual Studio 2008's Code Analysis a.k.a. FxCop, which I keep enabled by default, is raising the following warning:
warning : CA2218 : Microsoft.Usage : Since 'MySuperDuperClass'
redefines Equals, it should also redefine GetHashCode.
I think I understand the rationale for this warning: If I am going to be using such objects as the key in a collection, the hash code is important. i.e. see this question. However, I am not going to be using these objects as the key in a collection. Ever.
Feeling justified to suppress the warning, I looked up code CA2218 in the MSDN documentation to get the full name of the warning so I could apply a SuppressMessage attribute to my class as follows:
[SuppressMessage("Microsoft.Naming",
"CA2218:OverrideGetHashCodeOnOverridingEquals",
Justification="This class is not to be used as key in a hashtable.")]
However, while reading further, I noticed the following:
How to Fix Violations
To fix a violation of this rule,
provide an implementation of
GetHashCode. For a pair of objects of
the same type, you must ensure that
the implementation returns the same
value if your implementation of Equals
returns true for the pair.
When to Suppress Warnings
-----> Do not suppress a warning from this
rule. [arrow & emphasis mine]
So, I'd like to know: Why shouldn't I suppress this warning as I was planning to? Doesn't my case warrant suppression? I don't want to code up an implementation of GetHashCode() for this object that will never get called, since my object will never be the key in a collection. If I wanted to be pedantic, instead of suppressing, would it be more reasonable for me to override GetHashCode() with an implementation that throws a NotImplementedException?
Update: I just looked this subject up again in Bill Wagner's good book Effective C#, and he states in "Item 10: Understand the Pitfalls of GetHashCode()":
If you're defining a type that won't
ever be used as the key in a
container, this won't matter. Types
that represent window controls, web
page controls, or database connections
are unlikely to be used as keys in a
collection. In those cases, do
nothing. All reference types will
have a hash code that is correct, even
if it is very inefficient. [...] In
most types that you create, the best
approach is to avoid the existence of
GetHashCode() entirely.
... that's where I originally got this idea that I need not be concerned about GetHashCode() always.
If you are reallio-trulio absosmurfly positive that you'll never use the thing as a key to a hash table then your proposal is reasonable. Override GetHashCode; make it throw an exception.
Note that hash tables hide in unlikely places. Plenty of LINQ sequence operators use hash table implementations internally to speed things up. By rejecting the implementation of GetHashCode you are also rejecting being able to use your type in a variety of LINQ queries. I like to build algorithms that use memoization for speed increases; memoizers usually use hash tables. You are therefore also rejecting ability to memoize method calls that take your type as a parameter.
Alternatively, if you don't want to be that harsh: Override GetHashCode; make it always return zero. That meets the semantic requirements of GetHashCode; that two equal objects always have the same hash code. If it is ever used as a key in a dictionary performance is going to be terrible, but you can deal with that problem when it arises, which you claim it never will.
All that said: come on. You've probably spent more time typing up the question than it would take to correctly implement it. Just do it.
You should not suppress it. Look at how your equals method is implemented. I'm sure it compares one or more members on the class to determine equality. One of these members is oftentimes enough to distinguish one object from another, and therefore you could implement GetHashCode by returning membername.GetHashCode();.
My $0.10 worth? Implement GetHashCode.
As much as you say you'll never, ever need it, you may change your mind, or someone else may have other ideas on how to use the code. A working GetHashCode isn't hard to make, and guarantees that there won't be any problems in the future.
As soon as you forget, or another developer who isn't aware uses this, someone is going to have a painful bug to track down. I'd recommend simply implementing GetHashCode correctly and then you won't have to worry about it. Or just don't use Equals for your special equality comparison case.
The GetHashCode and Equals methods work together to provide value-based equality semantics for your type - you ought to implement them together.
For more information on this topic please see these articles:
All types are not compared equally
All types are not compared equally (part 2)
Shameless plug: These articles were written by me.
This is not a question how to implement it but what is the purpose of this method? I mean -- OK, I understand that is needed when searching, but why it is buried as an method of "object" class?
The story goes -- I have classes which objects are not comparable by default (in logical sense). Each time you want to compare/search for them you have to specify exactly how matching is done. The best in such case would be:
there is no such ubiquitous method as Equals, problem solved, no programmer (user of my class) would fall in trap by omitting custom match when searching
but since I cannot change C#
hide inherited, unwanted methods to prevent the call (compile-time)
but this also would require change to C#
override Equals and throw exception -- at least programmer is notified in runtime
So I am asking because I am forced to ugly (c), because (b) is not possible and because of lack of (a).
So in short -- what is the reason of forcing all objects to be comparable (Equals)? For me it is one assumption too far. Thank you in advance for enlightenment :-).
I agree that it was basically a mistake, in both .NET and Java. The same is true for GetHashCode - along with every object having a monitor.
It made a bit more sense before generics, admittedly - but with generics, overriding Equals(object) always feels pretty horrible.
I blogged about this a while ago - you may find both the post and the comments interesting.
You forgot option 4.: Do nothing, let the default reference equality take place. No big deal IMO. Even with your custom match options, you could choose a default option (I'd go for the most strict option) and use it to implement Equals().
Suppose someone has a List of Animal, and one wishes to compare two items against each other: an instance of Cat and an instance of Dog. If the Cat instance is asked whether it is the same as the Dog instance, does it make more sense for the cat to throw an InvalidTypeException, or for it to simply say "No, it's not equal".
An Equals method is supposed to obey two rules:
Reciprocity of equality: For any X and Y, X.Equals(Y) will be true if and only if Y.Equals(X) is true.
Liskov Substitution Principle: If class Q derives from P, an operation that can be done with a Q may also be done with a P.
These together imply that if Q derives from P, it must be possible for an object of type P to call Equals on an object of type Q, which in turn implies that it must be possible for an object of type Q to call Equals on an object of type P. Further, if R also derives from P, it must be possible for an object of type Q to call Equals on an object of type R (whether or not R is related to Q).
While it may not be strictly necessary for all objects to implement Equals, it's much cleaner for all classes to have a single Equals(Object) than to have a variety of Equals methods for different base types, all of which must be overridden with identical semantics to avoid weird behaviors.
Edit/Addendum
Object.Equals exists to answer the question: given two object references X and Y, can object X promise that outside code which doesn't use ReferenceEquals, Reflection, etc. would be unable to show that X and Y do not refer to the same object instance? For any object X, X.Equals(X) must be true since outside code cannot possibly show that X is not the same instance as X. Further, if X.Equals(Y) is "legitimately" true, Y.Equals(X) must also be true; if not, the fact that X.Equals(X) (which is true) doesn't match Y.Equals(X) would be a demonstrable difference implying that X.Equals(Y) should be false.
For shallowly-mutable types, if X and Y refer to different object instances, one could generally demonstrate this by mutating X and observing whether the same mutation occurred in Y(*). If such mutation could be used to demonstrate that X and Y are different object instances, then X.Equals(Y) should return true. The reason two String objects containing the same characters will report themselves equal to each other is not just that they happen to contain the same characters at the time of comparison, but more significantly that if all instances of one were replaced with the other, only code that used ReferenceEquals, Reflection, or other such tricks, would even notice.
(*) It would be possible to have two distinct but indistinguishable instances X and Y of a shallowly-mutable class which both held references to each other, such that methods that would mutate one would also mutate the other. If there would be no way for outside code to distinguish the instances apart, one might legitimately have X.Equals(Y) report true (and vice versa). On the other hand, I can't think of any way such a class would be more useful that one in which both X and Y held immutable references to a shared mutable object. Note that X.Equals(Y) does not require that X and Y to deep-immutable; it merely requires that any mutations applied to X will have identical effects on Y.