What is the purpose of making Equals a common method? - c#

This is not a question how to implement it but what is the purpose of this method? I mean -- OK, I understand that is needed when searching, but why it is buried as an method of "object" class?
The story goes -- I have classes which objects are not comparable by default (in logical sense). Each time you want to compare/search for them you have to specify exactly how matching is done. The best in such case would be:
there is no such ubiquitous method as Equals, problem solved, no programmer (user of my class) would fall in trap by omitting custom match when searching
but since I cannot change C#
hide inherited, unwanted methods to prevent the call (compile-time)
but this also would require change to C#
override Equals and throw exception -- at least programmer is notified in runtime
So I am asking because I am forced to ugly (c), because (b) is not possible and because of lack of (a).
So in short -- what is the reason of forcing all objects to be comparable (Equals)? For me it is one assumption too far. Thank you in advance for enlightenment :-).

I agree that it was basically a mistake, in both .NET and Java. The same is true for GetHashCode - along with every object having a monitor.
It made a bit more sense before generics, admittedly - but with generics, overriding Equals(object) always feels pretty horrible.
I blogged about this a while ago - you may find both the post and the comments interesting.

You forgot option 4.: Do nothing, let the default reference equality take place. No big deal IMO. Even with your custom match options, you could choose a default option (I'd go for the most strict option) and use it to implement Equals().

Suppose someone has a List of Animal, and one wishes to compare two items against each other: an instance of Cat and an instance of Dog. If the Cat instance is asked whether it is the same as the Dog instance, does it make more sense for the cat to throw an InvalidTypeException, or for it to simply say "No, it's not equal".
An Equals method is supposed to obey two rules:
Reciprocity of equality: For any X and Y, X.Equals(Y) will be true if and only if Y.Equals(X) is true.
Liskov Substitution Principle: If class Q derives from P, an operation that can be done with a Q may also be done with a P.
These together imply that if Q derives from P, it must be possible for an object of type P to call Equals on an object of type Q, which in turn implies that it must be possible for an object of type Q to call Equals on an object of type P. Further, if R also derives from P, it must be possible for an object of type Q to call Equals on an object of type R (whether or not R is related to Q).
While it may not be strictly necessary for all objects to implement Equals, it's much cleaner for all classes to have a single Equals(Object) than to have a variety of Equals methods for different base types, all of which must be overridden with identical semantics to avoid weird behaviors.
Edit/Addendum
Object.Equals exists to answer the question: given two object references X and Y, can object X promise that outside code which doesn't use ReferenceEquals, Reflection, etc. would be unable to show that X and Y do not refer to the same object instance? For any object X, X.Equals(X) must be true since outside code cannot possibly show that X is not the same instance as X. Further, if X.Equals(Y) is "legitimately" true, Y.Equals(X) must also be true; if not, the fact that X.Equals(X) (which is true) doesn't match Y.Equals(X) would be a demonstrable difference implying that X.Equals(Y) should be false.
For shallowly-mutable types, if X and Y refer to different object instances, one could generally demonstrate this by mutating X and observing whether the same mutation occurred in Y(*). If such mutation could be used to demonstrate that X and Y are different object instances, then X.Equals(Y) should return true. The reason two String objects containing the same characters will report themselves equal to each other is not just that they happen to contain the same characters at the time of comparison, but more significantly that if all instances of one were replaced with the other, only code that used ReferenceEquals, Reflection, or other such tricks, would even notice.
(*) It would be possible to have two distinct but indistinguishable instances X and Y of a shallowly-mutable class which both held references to each other, such that methods that would mutate one would also mutate the other. If there would be no way for outside code to distinguish the instances apart, one might legitimately have X.Equals(Y) report true (and vice versa). On the other hand, I can't think of any way such a class would be more useful that one in which both X and Y held immutable references to a shared mutable object. Note that X.Equals(Y) does not require that X and Y to deep-immutable; it merely requires that any mutations applied to X will have identical effects on Y.

Related

If you need no casting, and you need to check the most specific (runtime) type of an object, which is faster? .GetType() & typeof(), or is operator?

1) .GetType() will return the runtime type of the object on which it is called,
which is the most specific type in the inheritance hierarchy. You can use the typeof()
function to get a Type for a classname
2) The is operator will check whether the type of the left hand side object
is a subtype of, or the same type as the type specified on the right hand side
3) Given that you only need to check the most specific type of an object, and you need no casting, is the former (1) considerably faster than the latter (2)?
4) Does the is operator actually perform casting and checks for null, or
this behaviour have been modified in a later version of C Sharp?
typeof(x) is to get a Type-Object of a type Literal, like typeof(int). It's a runtime constant.
For object.GetType() you need an object instance.
if (x is IList)
The is operator does a casting, but returns bool on success, or false, if x is null or an incompatible type.
With
if (x is IList list)
You can do the boolean test and the casting at the same time.
It does not make sense, to talk about performance, cause that are completly different operations.
if you want to get the type of an object instance object.GetType() is your only option, but you can test it agains a type literal like
x.GetType() == typeof(List)
While you can be sure
x.GetType() == typeof(IList)
will always be false, since GetType will never return the type of an interface.
For this test you would need
typeof(IList).IsAssignableFrom(x.GetType());
First 4) as it's simpler. The is operator is implemented with the as operator.
x is C
is the same as
x as C != null
For the source you can look at the fabulous Eric Lippert's blog. Currently (commit e09c42a), the Roslyn compiler translates both to
isinst C
ldnull
cgt.un
Where isinst is the magic instruction that tries to cast x to C and leaves the casted reference on top of the stack, or null if it fails. The remaining two instructions are the check for null.
As for performance, it's hard to say definitely. In theory, checking with is should be faster, as it's a built-in CLR instruction that's heavily optimized for what it does, while the other check has to call three methods, the GetType(), GetTypeFromHandle(RuntimeTypeHandle) and the equality on Type. There's also the standard check-for-null-and-throw-NRE involved with calling the GetType(). A very crude benchmark supports this hypothesis: link to DotNetFiddle. If someone is willing to perform a more sophisticated benchmark, go ahead.
Obviously I can imagine you could have an inheritance hierarchy so deep and complicated the is check will take longer than any overhead of GetType() ever could. Feel free to subtype my benchmark up to C100 and check if that's enough :)
EDIT:
I think I should add that this discussion is purely theoretical. In production code you should use x is C, as it's concise and more robust since it checks the entire hierarchy for subtyping. If you have a hot path that checks a given instance's type and you know for a fact that the hierarchy is flat, you probably should redesign the system to avoid that check instead of uglifying the code to squeeze out some performance.

Strange implementation of Object.Equals

I was reading the MSDN documentation about object.Equals. in the remarks part it mentioned:
If the two objects do not represent the same object reference and
neither is null, it calls objA.Equals(objB) and returns the result.
This means that if objA overrides the Object.Equals(Object) method,
this override is called.
My question is why they did not implement this part as objA.Equals(objB) && objB.Equals(objA) to make equality symmetric and just relate on one side of the relation? It can result in strange behaviors when calling object.Equals.
EDIT: Strange behavior can happen when type of objA overrides Equals method and implemented it as something not predictable, but type of objB does not override Equals.
Basically, this would only be of any use to developers with flawed Equals implementations. From the documentation:
The following statements must be true for all implementations of the Equals(Object) method. In the list, x, y, and z represent object references that are not null.
[...]
x.Equals(y) returns the same value as y.Equals(x).
[...]
So the check is redundant in every case where the method has been correctly implemented - causing a performance hit to every developer who has done the right thing.
It isn't even terribly useful to developers who haven't done the right thing, as they may still expect object.Equals(x, y) to return true when it returns false - they could debug and find that their method returns true, after all. You could say that it would be documented to check both ways round - but we've already established that the only developers this affects are ones who don't read the documentation anyway.
Basically, when you override a method or implement an interface, you should know what you're doing and obey the specified contract. If you don't do that, you will get odd behaviour, and I don't think it's reasonable to expect every caller to try to work around implementations which don't do what they're meant to.

Continuing confusion regarding overring Equals for mutable objects that are used in data bound collections

Background:
I've written a large scale WPF application using MVVM and it's been suffering from some intermittent problems. I initially asked the 'An item with the same key has already been added' Exception on selecting a ListBoxItem from code question here which explains the problem, but got no answers.
After some time, I managed to work out the cause of the Exceptions that I was getting and documented it in the What to return when overriding Object.GetHashCode() in classes with no immutable fields? question. Basically, it was because I had used mutable fields in the formula to return a value for GetHashCode.
From the very useful answers that I received for that question, I managed to deepen my understanding in that area. Here are three relevant rules:
If x equals y then the hash code of x must equal the hash code of y. Equivalently, if the hash code of x does not equal the hash code of y, then x and y must be unequal.
The hash code of x must remain stable while x is in a hash table.
The hash function should generate a random distribution among all
integers for all inputs.
These rules affected the possible solutions that I had to my problem of not knowing what to return from the GetHashCode method:
I couldn't return a constant because that would break the first and third rules above.
I couldn't create an additional readonly field for each class, solely to be used in the GetHashCode method for the same reasons.
The solution that I eventually went with was to remove each item from its ObservableCollection before editing any of the properties used in the GetHashCode method and then to re-add it again afterwards. While this has worked Ok in a number of views so far, I've run into a further problem as my UI items are animated using custom Panels. When I re-add an item (even by inserting it back to its original index in the collection), it sets off the entry animation(s) again.
I had already added a number of base class methods such as AddWithoutAnimation, RemoveWithoutAnimation, which has helped fix some of these issues, but it doesn't affect any Storyboard animations, which still get triggered after re-adding. So finally, we come to the question:
Question:
First, I'd like to clearly state that I am not using any Dictionary objects in my code... the Dictionary that throws the Exception must be internal to an ObservableCollection<T>. This point seems to have been missed by most people in my last question. Therefore, I cannot chose to simply not use a Dictionary... if only I could.
So, my question is 'is there any other way that I can implement GetHashCode in mutable classes while not breaking the three rules above, or avoid implementing it in the first place?'
I received a comment on the previous question from #HansPassant that suggested that
A good starting point is to completely remove the Equals and GetHashCode overrides, the default implementations inherited from Object are excellent and guarantee object uniqueness.
Can anyone tell me how can I remove the Equals and GetHashCode overrides? On the IEquatable<T> Interface page on MSDN it says It should be implemented for any object that might be stored in a generic collection and then on the IEquatable<T>.Equals Method page it says If you implement Equals, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behaviour is consistent with that of the IEquatable<T>.
If this is possible, it would be my preferred solution.
UPDATE >>>
After downloading and installing dotPeek, I have been able to look inside the PresentationFramework namespace where the Exception is actually occurring. I have found the exact part that uses the Dictionary that is causing this problem. It is in the internal InternalSelectedItemsStorage class constructor:
internal InternalSelectedItemsStorage(Selector.InternalSelectedItemsStorage collection, IEqualityComparer<ItemsControl.ItemInfo> equalityComparer = null)
{
this._equalityComparer = equalityComparer ?? collection._equalityComparer;
this._list = new List<ItemsControl.ItemInfo>((IEnumerable<ItemsControl.ItemInfo>) collection._list);
if (collection.UsesItemHashCodes)
this._set = new Dictionary<ItemsControl.ItemInfo, ItemsControl.ItemInfo>((IDictionary<ItemsControl.ItemInfo, ItemsControl.ItemInfo>) collection._set, this._equalityComparer);
this._resolvedCount = collection._resolvedCount;
this._unresolvedCount = collection._unresolvedCount;
}
This is used internally by the Selector class after the ListBoxItem.OnSelected method has been called, so I can only assume that this has something to do with when a selection is made on the Listbox.
Can anyone tell me how can I remove the Equals and GetHashCode overrides? On the IEquatable Interface page on MSDN it says It should be implemented for any object that might be stored in a generic collection and then on the IEquatable.Equals Method page it says If you implement Equals, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behaviour is consistent with that of the IEquatable.
Mutable objects are comparable by their identity while immutable or value objects by their values.
If you have a mutable object you need to figure out its identity (e.g. if it is a representation of an entity stored in the database the identity is the primary key of the identity; if it is just an 'ad hoc' mutable object created in memory, then its identity is reference of this object (i.e. the default implementation of Equals and GetHashCode)).
So if your object is not an entity you simply implement IEquatable.Equals(T x) { return this.Equals(x); }, i.e. you say that, yes you can compare objects of this class with objects of class T and you compare it by reference (Equals() method inherited from System.Object).
If your object is an entity and e.g. has a primary key PersonId, then you do comparison by PersonId and return PersonId.GetHashCode() from your GetHashCode() method.
Btw, in case of entities you usually use some OR mapper and Identity map pattern which ensures that within a given unit of work you don't have more than one instance of a given entity, i.e. whenever primary keys are equal the object references are equal too.

A questionable inside into overriding Equals

Following Guidelines for Overriding Equals() and Operator == (C# Programming Guide), it seems advisable to override gethashcode when overriding equals(object), as well as equals(type).
It is in my understanding that there is an endless discussion about what's the best implementation for overriding Equals. However, I still like to understand the Equals concept a little better and decide for my own.
My questions will probably be kinda noobish, but here we go:
What is the main difference between Equals(object) and Equals(type) (independently of the given parameters)?
As far as I understand (And I could be completely wrong, so this is a question at the same time):
Equals(object) is a build in method that looks (at default) if object
references are the same. And Equals(Type) is a local method you
create. So in fact, what you have in that class is the method Equals
with 2 overloads.
Why do they check for property equality twice?
In equals(object) :
return base.Equals(obj) && z == p.z;
and in equals(type) :
return base.Equals((TwoDPoint)p) && z == p.z;
Why is it advisable to implement the Equals(type) method?
Most of my questions are rapped in my statement in question 1. So note any wrong or misleading arguments plz. Also, feel free to add any information, it will certainly help.
Thanks in advance
First lets distinguish the 2 methods
object.Equals() is a method on the root object which is marked as virtual and therefore can be overriden in a derived class.
IEquatable<T>.Equals is a method obtained by implementing the IEquatable<T> interface.
The latter is used for determining equality inside a generic Collection; so say the documentation:
The IEquatable<T> interface is used by generic collection objects such as Dictionary<TKey, TValue>, List<T>, and LinkedList<T> when testing for equality in such methods as Contains, IndexOf, LastIndexOf, and Remove. It should be implemented for any object that might be stored in a generic collection.
The former is used for determining equality everywhere else.
So with the groundwork in place lets try to answer some of your specific questions
What is the main difference between Equals(object) and Equals(type) (independently of the given parameters)?
One operates on any type, the other compares instances of the same type
Why do they check for property equality twice?
They dont, generally only one is used. However quite often one implementation calls the other internally
Why is it advisable to implement the Equals(type) method?
The answer is above - if you intend to store the object in a generic collection
As a side note, and one which may help you understand this, the default behaviour of equality checking is to check that the references are the same (ie, that one object is exactly the same instance as another). Quite often overriding/implementing different equality logic is used to compare some data within fields of the object (akin to your example of z == p.z)
One difference between the overloads is that, as noted, one will be invoked when comparing an object to things which are known at compile time to be of the same type, while the other will be invoked in all other circumstances. Another very important difference which has not been mentioned is that Equals(ownType) will act not only on things of ownType, but also on things that are implicitly convertible to ownType. Because of this, Equals cannot not be expected to implement an equivalence relation among objects of convertible types unless one forces its operands to be of type Object. Consider, for example,
(12.0).Equals(12);
converts the integer value 12 to the Double value 12.0. Since the type and value of the passed value precisely match the 12.0 whose Equals method is being invoked, thus returning true.
(12).Equals(12.0);
Because Double is not implicitly convertible to Int32, passes the Double value as Object instead. Since the Double does not match the type of the 12 whose Equals method is being invoked, the method returns false.
The virtual method Equals(Object) implements an equivalence relation; in many cases involving implicit type conversions, the type-specific methods cannot be expected to do so.

C# How to select a Hashcode for a class that violates the Equals contract?

I've got multiple classes that, for certain reasons, do not follow the official Equals contract. In the overwritten GetHashCode() these classes simply return 0 so they can be used in a Hashmap.
Some of these classes implement the same interface and there are Hashmaps using this interface as key. So I figured that every class should at least return a different (but still constant) value in GetHashCode().
The question is how to select this value. Should I simply let the first class return 1, the next class 2 and so on? Or should I try something like
class SomeClass : SomeInterface {
public overwrite int GetHashCode() {
return "SomeClass".GetHashCode();
}
}
so the hash is distributed more evenly? (Do I have to cache the returned value myself or is Microsoft's compiler able to optimize this?)
Update: It is not possible to return an individual hashcode for each object, because Equals violates the contract. Specifially, I'm refering to this problem.
If it "violates the Equals contract", then I'm not sure you should be using it as a key.
It something is using that as a key, you really need to get the hashing right... it is very unclear what the Equals logic is, but two values that are considered equal must have the same hash-code. It is not required that two values with the same hash-code are equal.
Using a constant string won't really help much - you'll get the values split evenly over the types, but that is about it...
I'm curious what the reasoning would be for overriding GetHashCode() and returning a constant value. Why violate the idea of a hash rather than just violating the "contract" and not overriding the GetHashCode() function at all and leave the default implementation from Object?
Edit
If what you've done is that so you can have your objects match based on their contents rather than their reference then what you propose with having different classes simply use different constants can WORK, but is highly inefficient. What you want to do is come up with a hashing algorithm that can take the contents of your class and produce a value that balances speed with even distribution (that's hashing 101).
I guess I'm not sure what you're looking for...there isn't a "good" scheme for choosing constant numbers for this paradigm. One is not any better than the other. Try to improve your objects so that you're creating a real hash.
I ran into this exact problem when writing a vector class. I wanted to compare vectors for equality, but float operations give rounding errors, so I wanted approximate equality. Long story short, overriding equals is a bad idea unless your implementation is symmetric, reflexive, and transitive.
Other classes are going to assume equals has those properties, and so will classes using those classes, and so you can end up in weird cases. For example a list might enforce uniqueness, but end up with two elements which evaluate as equal to some element B.
A hash table is the perfect example of unpredictable behavior when you break equality. For example:
//Assume a == b, b == c, but a != c
var T = new Dictionary<YourType, int>()
T[a] = 0
T[c] = 1
return T[b] //0 or 1? who knows!
Another example would be a Set:
//Assume a == b, b == c, but a != c
var T = new HashSet<YourType>()
T.Add(a)
T.Add(c)
if (T.contains(b)) then T.remove(b)
//surely T can't contain b anymore! I sure hope no one breaks the properties of equality!
if (T.contains(b)) then throw new Exception()
I suggest using another method, with a name like ApproxEquals. You might also consider overriding the == operator, because it isn't virtual and therefore won't be used accidentally by other classes like Equals could be.
If you really can't use reference equality for the hash table, don't ruin the performance of cases where you can. Add an IApproxEquals interface, implement it in your class, and add an extension method GetApprox to Dictionary which enumerates the keys looking for an approximately equal one, and returns the associated value. You could also write a custom dictionary especially for 3-dimensional vectors, or whatever you need.
When hash collisions occur, the HashTable/Dictionary calls Equals to find the key you're looking for. Using a constant hash code removes the speed advantages of using a hash in the first place - it becomes a linear search.
You're saying the Equals method hasn't been implemented according to the contract. What exactly do you mean with this? Depending on the kind of violation, the HashTable or Dictionary will merely be slow (linear search) or not work at all.

Categories