Why should GetHashCode implement the same logic as Equals? - c#

In this MSDN page it says:
Warning:
If you override the GetHashCode method, you should also override Equals, and vice versa. If your overridden Equals method returns true when two objects are tested for equality, your overridden GetHashCode method must return the same value for the two objects.
I have also seen many similar recommendations and I can understand that when overriding the Equals method I would also want to override the GetHashCode. As far as I can work out though, the GetHashCode is used with hash table look-ups, which is not the same as equality checking.
Here is an example to help explain what I want to ask:
public class Temperature /* Immutable */
{
public Temperature(double value, TemperatureUnit unit) { ... }
private double Value { get; set; }
private TemperatureUnit Unit { get; set; }
private double GetValue(TemperatureUnit unit)
{
/* return value converted into the specified unit */
}
...
public override bool Equals(object obj)
{
Temperature other = obj as Temperature;
if (other == null) { return false; }
return (Value == other.GetValue(Unit));
}
public override int GetHashCode()
{
return Value.GetHashCode() + Unit.GetHashCode();
}
}
In this example, two Temperature objects are considered equal, even if they are not storing the same things internally (e.g. 295.15 K == 22 Celsius). At the moment the GetHashCode method will return different values for each. These two temperatures objects are equal but they are also not the same, so is it not correct that they have different hash codes?

When storing a value in a hash table, such as Dictionary<>, the framework will first call GetHashCode() and check if there's already a bucket in the hash table for that hash code. If there is, it will call .Equals() to see if the new value is indeed equal to the existing value. If not (meaning the two objects are different, but result in the same hash code), you have what's known as a collision. In this case, the items in this bucket are stored as a linked list and retrieving a certain value becomes O(n).
If you implemented GetHashCode() but did not implement Equals(), the framework would resort to using reference equality to check for equality which would result in every instance creating a collision.
If you implemented Equals() but did not implement GetHashCode(), you might run into a situation where you had two objects that were equal, but resulted in different hash codes meaning they'd maintain their own separate values in your hash table. This would potentially confuse anyone using your class.
As far as what objects are considered equal, that's up to you. If I create a hash table based on temperature, should I be able to refer to the same item using either its Celsius or Fahrenheit value? If so, they need to result in the same hash value and Equals() needs to return true.
Update:
Let's step back and take a look at the purpose of a hash code in the first place. Within this context, a hash code is used as a quick way to identify if two objects are most likely equal. If we have two objects that have different hash codes, we know for a fact they are not equal. If we have two objects that have the same hash code, we know they are most likely equal. I say most likely because an int can only be used to represent a few billion possible values, and strings can of course contain the complete works of Charles Dickens, or any number of possible values. Much in the .NET framework is based on these truths, and developers that use your code will assume things work in a way that is consistent with the rest of the framework.
If you were to have two instances that have different hash codes, but have an implementation of Equals() that returns true, you're breaking this convention. A developer that compares two objects might then use one of of those objects to refer to a key in a hash table and expect to get an existing value out. If all of a sudden the hash code is different, this code might result in a runtime exception instead. Or perhaps return a reference to a completely different object.
Whether 295.15k and 22C are equal within the domain of your program is your choice (In my opinion, they are not). However, whatever you decide, objects that are equal must return the same has code.

Warning:
If you override the GetHashCode method, you should also override Equals, and vice versa. If your overridden Equals method returns true when two objects are tested for equality, your overridden GetHashCode method must return the same value for the two objects.
This is a convention in the .NET libraries. It's not enforced at compile time, or even at run-time, but code in the .NET library (and likely any other external library) expects this statement to always be true:
If two object return true from Equals they will return the same hash code
And:
If two objects return different hash codes they are NOT equal
If you don't follow that convention, then your code will break. And worse it will probably break in ways that are really hard to trace (like putting two identical objects in a dictionary, or getting a different object from a dictionary than the one you expected).
So, follow the convention, or you will cause yourself a lot of grief.
In you particular class, you need to decide, either Equals returns false when the units are different, or GetHashCode returns the same hash code regardless of unit. You can't have it both ways.
So you either do this:
public override bool Equals(object obj)
{
Temperature other = obj as Temperature;
if (other == null) { return false; }
return (Value == other.Value && Unit == other.Unit);
}
Or you do this:
public override int GetHashCode()
{
// note that the value returned from ConvertToSomeBaseUnit
// should probably be cached as a private member
// especially if your class is supposed to immutable
return Value.ConvertToSomeBaseUnit().GetHashCode();
}
Note that nothing is stopping you from also implementing:
public bool TemperaturesAreEqual(Temperature other)
{
if (other == null) { return false; }
return (Value == other.GetValue(Unit));
}
And using that when you want to know if two temperatures represent the same physical temperature regardless of units.

Two objects that are equal should return the same HashCode (two objects that are different could return the same hashcode too, but that's a collision).
In your case, neither your equals nor your hashcode implementations are a good one. Problem being that the "real value" of the object is dependant on a parameter: there's no single property that defines the value of the object. You only store the initial unit to do equality compare.
So, why don't you settle on an internal definition of what's the Value of your Temperature?
I'd implement it like:
public class Temperature
{
public Temperature(double value, TemperatureUnit unit) {
Value = ConvertValue(value, unit, TemperatureUnit.Celsius);
}
private double Value { get; set; }
private double ConvertValue(double value, TemperatureUnit originalUnit, TemperatureUnit targetUnit)
{
/* return value from originalUnit converted to targetUnit */
}
private double GetValue(TemperatureUnit unit)
{
return ConvertValue(value, TemperatureUnit.Celsius, unit);
}
public override bool Equals(object obj)
{
Temperature other = obj as Temperature;
if (other == null) { return false; }
return (Value == other.Value);
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
}
That way, your internal Value is what defines if two objects are the same, and is always expressed in the same unit.
You don't really care what Unit the object has: it makes no sense, since for getting the value back, you'll always pass a value. It only makes sense to pass it for the initial conversion.

Related

C# Equals extension is unable to check equality

I extended the equals method and hashcode to check for equality of two identical objects with boolean properties. when I mutate the object making one of the boolean properties false instead of true it fails to recognize the difference and asserts they are equal;. Any Ideas why?
public override bool Equals(object value)
{
if (!(value is LocationPropertyOptions options))
return false;
return Equals(options, value);
}
public bool Equals(LocationPropertyOptions options)
{
return options.GetHashCode() == GetHashCode();
}
public override int GetHashCode()
{
return ToString().GetHashCode();
}
public override string ToString()
{
return $"{Identifier}{AccountEntityKey}{Address}{Comments}{Contact}" +
$"{Coordinate}{Description}{FaxNumber}{LastOrderDate}{PhoneNumber}" +
$"{ServiceAreaOverride}{ServiceRadiusOverride}{StandardInstructions}" +
$"{WorldTimeZone_TimeZone}{ZoneField}{CommentsOptions}";
}
You cast options from value, then call Equals with options vs value. That means you compare value with value, it returns always true for you
public override bool Equals(object value)
{
if (!(value is LocationPropertyOptions options))
return false;
return Equals(options, value);
}
Try comparing this with value, like
return Equals(this, value);
This doesn´t really answer you question. However you should consider a few things when implementing equality to avoid this kind of error.
First you have two completely different implementations to indicate equality. Your override bool Equals(object value) redirects to the static method Object.Equals(object, object), which just performs a ReferenceEquals. Your public bool Equals(LocationPropertyOptions) (probably an implementation for IEquatable<LocationPropertyOptions>) on the other hand simply uses your strange GetHashCode-implementation, which is
Point two: you should not use a mutable member within your hashcode-implementation, in particular when your objects are stored within a dictionary or a hashmap, which heavily depends on a good implementaion for a hashcode. See MSDN on GetHashCode
You can override GetHashCode() for immutable reference types. In
general, for mutable reference types, you should override
GetHashCode() only if:
You can compute the hash code from fields that are not mutable; or
You can ensure that the hash code of a mutable object does not change
while the object is contained in a collection that relies on its hash
code.
Third and last: you shouldn´t use GetHashCode in your check for equality:
Do not test for equality of hash codes to determine whether two
objects are equal
Whilst equal objects are assumed to have identical hashcodes, different objects may have the exact same hashcode anyway. An equal hashcode therefor is just an indicator that two instances may be equal.
Two objects that are equal return hash codes that are equal. However,
the reverse is not true: equal hash codes do not imply object
equality, because different (unequal) objects can have identical hash
codes [...]
You should not assume that equal hash codes imply object equality.

ConcurrentDictionary adding same keys more than once

I want to use ConcurrentDictionary to check if this data key has been added before, but it looks like I can still add keys which added before.
code:
public class pKeys
{
public pKeys()
{ }
public pKeys(long sID, long pID)
{
this.seID = sID;
this.pgID = pID;
}
public long seID;
public long pgID;
}
public static ConcurrentDictionary<pKeys, bool> existenceDic
= new ConcurrentDictionary<pKeys, bool>();
test code:
pKeys temKey = new pKeys(111, 222);
bool res = existenceDic.TryAdd(temKey, true);
Console.WriteLine(res);
temKey = new pKeys(111, 222);
res = existenceDic.TryAdd(temKey, true);
Console.WriteLine(res);
result:
true
true
You can add two different instances containing the same values, because you haven't overridden GetHashCode() and Equals(). This causes the default equality comparison to be used, which for reference types simply compares the references themselves. Two different instances are always considered as different values in this case.
One option is to make your type a struct instead of class. This uses a default comparison that will take into account the field values.
Alternatively, you can go ahead and override GetHashCode() and Equals(). For example:
public class pKeys
{
public pKeys()
{ }
public pKeys(long sID, long pID)
{
this.seID = sID;
this.pgID = pID;
}
public readonly long seID;
public readonly long pgID;
public override int GetHashCode()
{
return seID.GetHashCode() * 37 + pgID.GetHashCode();
}
public override bool Equals(object other)
{
pKeys otherKeys = other as pKeys;
return otherKeys != null &&
this.seID == otherKeys.seID &&
this.pgID == otherKeys.pgID;
}
}
Notes:
The hash code is calculated based on the hash codes of the individual values. One is multiplied by 37, which is simply a convenient prime number; some people prefer to use a much larger prime number for better "mixing". For most scenarios, the above will work fine IMHO.
Note that your proposed solution, converting the values to strings, concatenating them, and returning the hash code of that has several negative aspects:
You have to create three string instances just to generate the hash code! The memory overhead alone is bad enough, but of course there is the cost of formatting the two integers as well.
Generating a hash code from a string is more expensive computationally than from an integer value
You have a much higher risk of a collision, as it's easier for disparate values to result in the same string (e.g. (11, 2222) and (111, 222))
I added readonly to your fields. This would be critical if you decide to make the type a struct (i.e. even if you don't override the methods). But even for a class, mutable types that are equatable are a huge problem, because if they change after they are added to a hash-based collection, the collection is effectively corrupted. Using readonly here ensures that the type is immutable. (Also, IMHO public fields should be avoided, but if one must have them, they should definitely be readonly even if you don't override the equality methods).
Some people prefer to check for exact type equality in the Equals() method. In fact, this is often a good idea…it simplifies the scenarios where objects are compared and makes the code more maintainable. But for the sake of example, assignability (i.e. as) is easier to read, and is valid in many scenarios anyway.
See General advice and guidelines on how to properly override object.GetHashCode() for additional guidance.

What is the proper way to implement Equation functions [duplicate]

I'm having some difficulty using Linq's .Except() method when comparing two collections of a custom object.
I've derived my class from Object and implemented overrides for Equals(), GetHashCode(), and the operators == and !=. I've also created a CompareTo() method.
In my two collections, as a debugging experiment, I took the first item from each list (which is a duplicate) and compared them as follows:
itemListA[0].Equals(itemListB[0]); // true
itemListA[0] == itemListB[0]; // true
itemListA[0].CompareTo(itemListB[0]); // 0
In all three cases, the result is as I wanted. However, when I use Linq's Except() method, the duplicate items are not removed:
List<myObject> newList = itemListA.Except(itemListB).ToList();
Learning about how Linq does comparisons, I've discovered various (conflicting?) methods that say I need to inherit from IEquatable<T> or IEqualityComparer<T> etc.
I'm confused because when I inherit from, for example, IEquatable<T>, I am required to provide a new Equals() method with a different signature from what I've already overridden. Do I need to have two such methods with different signatures, or should I no longer derive my class from Object?
My object definition (simplified) looks like this:
public class MyObject : Object
{
public string Name {get; set;}
public DateTime LastUpdate {get; set;}
public int CompareTo(MyObject other)
{
// ...
}
public override bool Equals(object obj)
{
// allows some tolerance on LastUpdate
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + Name.GetHashCode();
hash = hash * 23 + LastUpdate.GetHashCode();
return hash;
}
}
// Overrides for operators
}
I noticed that when I inherit from IEquatable<T> I can do so using IEquatable<MyObject> or IEquatable<object>; the requirements for the Equals() signature change when I use one or the other. What is the recommended way?
What I am trying to accomplish:
I want to be able to use Linq (Distinct/Except) as well as the standard equality operators (== and !=) without duplicating code. The comparison should allow two objects to be considered equal if their name is identical and the LastUpdate property is within a number of seconds (user-specified) tolerance.
Edit:
Showing GetHashCode() code.
It doesn't matter whether you override object.Equals and object.GetHashCode, implement IEquatable, or provide an IEqualityComparer. All of them can work, just in slightly different ways.
1) Overriding Equals and GetHashCode from object:
This is the base case, in a sense. It will generally work, assuming you're in a position to edit the type to ensure that the implementation of the two methods are as desired. There's nothing wrong with doing just this in many cases.
2) Implementing IEquatable
The key point here is that you can (and should) implement IEquatable<YourTypeHere>. The key difference between this and #1 is that you have strong typing for the Equals method, rather than just having it use object. This is both better for convenience to the programmer (added type safety) and also means that any value types won't be boxed, so this can improve performance for custom structs. If you do this you should pretty much always do it in addition to #1, not instead of. Having the Equals method here differ in functionality from object.Equals would be...bad. Don't do that.
3) Implementing IEqualityComparer
This is entirely different from the first two. The idea here is that the object isn't getting it's own hash code, or seeing if it's equal to something else. The point of this approach is that an object doesn't know how to properly get it's hash or see if it's equal to something else. Perhaps it's because you don't control the code of the type (i.e. a 3rd party library) and they didn't bother to override the behavior, or perhaps they did override it but you just want your own unique definition of "equality" in this particular context.
In this case you create an entirely separate "comparer" object that takes in two different objects and informs you of whether they are equal or not, or what the hash code of one object is. When using this solution it doesn't matter what the Equals or GetHashCode methods do in the type itself, you won't use it.
Note that all of this is entirely unrelated from the == operator, which is its own beast.
The basic pattern I use for equality in an object is the following. Note that only 2 methods have actual logic specific to the object. The rest is just boiler plate code that feeds into these 2 methods
class MyObject : IEquatable<MyObject> {
public bool Equals(MyObject other) {
if (Object.ReferenceEquals(other, null)) {
return false;
}
// Actual equality logic here
}
public override int GetHashCode() {
// Actual Hashcode logic here
}
public override bool Equals(Object obj) {
return Equals(obj as MyObject);
}
public static bool operator==(MyObject left, MyObject right) {
if (Object.ReferenceEquals(left, null)) {
return Object.ReferenceEquals(right, null);
}
return left.Equals(right);
}
public static bool operator!=(MyObject left, MyObject right) {
return !(left == right);
}
}
If you follow this pattern there is really no need to provide a custom IEqualityComparer<MyObject>. The EqualityComparer<MyObject>.Default will be enough as it will rely on IEquatable<MyObject> in order to perform equality checks
You cannot "allow some tolerance on LastUpdate" and then use a GetHashCode() implementation that uses the strict value of LastUpdate!
Suppose the this instance has LastUpdate at 23:13:13.933, and the obj instance has 23:13:13.932. Then these two might compare equal with your tolerance idea. But if so, their hash codes must be the same number. But that will not happen unless you're extremely extremely lucky, for the DateTime.GetHashCode() should not give the same hash for these two times.
Besides, your Equals method most be a transitive relation mathematically. And "approximately equal to" cannot be made transitive. Its transitive closure is the trivial relation that identifies everything.

Equals and GetHashCode confusion

I am trying to implement an immutable Point class where two Point instances are considered equal if they have the same Coordinates. I am using Jon Skeet's implementation of a Coordinate value type.
For comparing equality of Points I have also inherited EqualityComparer<Point> and IEquatable<Point> and I have a unit test as below:
Point.cs:
public class Point : EqualityCompararer<Point>, IEquatable<Point>
{
public Coordinate Coordinate { get; private set; }
// EqualityCompararer<Point>, IEquatable<Point> methods and other methods
}
PointTests.cs:
[Fact]
public void PointReferencesToSamePortalAreNotEqual()
{
var point1 = new Point(22.0, 24.0);
var point2 = new Point(22.0, 24.0);
// Value equality should return true
Assert.Equal(point1, point2);
// Reference equality should return false
Assert.False(point1 == point2);
}
Now I am really confused by the 3 interface/abstract methods that I must implement. These are:
IEquatable<Point>.Equals(Point other)
EqualityComparer<Point>.Equals(Point x, Point y)
EqualityComparer<Point>.GetHashCode(Point obj)
And since I have overriden IEquatable<Point>.Equals, according to MSDN I must also implement:
Object.Equals(object obj)
Object.GetHashCode(object obj)
Now I am really confused about all the Equals and GetHashCode methods that are required to satisfy my unit test (Reference equality should return false and value equality should return true for point1 and point2).
Can anyone explain a bit further about Equals and GetHashCode?
Because Coordinate already implments GetHashCode() and Equals(Coordinate) for you it is actually quite easy, just use the underlying implmentation
public class Point : IEquatable<Point>
{
public Coordinate Coordinate { get; private set; }
public override int GetHashCode()
{
return Coordinate.GetHashCode();
}
public override bool Equals(object obj)
{
return this.Equals(obj as Point);
}
public bool Equals(Point point)
{
if(point == null)
return false;
return this.Coordinate.Equals(point.Coordinate);
}
}
the IEquatable<Point> is unnecessary as all it does is save you a extra cast. It is mainly for struct type classes to prevent the boxing of the struct in to the object passed in to bool Equals(object).
Equals:
Used to check if two objects are equal. There are several checks for equality (by value, by reference), and you really want to have a look at the link to see how they work, and the pitfalls when you don't know who is overriding them how.
GetHashCode:
A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary class, the Hashtable class, or a type derived from the DictionaryBase class. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality.
Let's assume you're having two huge objects with heaps of objects inside, and that comparing them might take a very long time. And then you have a collection of those objects, and you need to compare them all. As the definitions say, GetHashCode will return a simple number you can compare if you don't want to compare the two objects. (and assuming you implemented them correctly, two different objects will not have the same hashcode, while objects who are supposed to be "equal" will).
And if you want Jon Skeet's opinion on something similar, look here.

Implemeting GetHashCode and Equals methods for ValueObjects

There is a passage from NHibernate documentation:
Note: if you define an ISet of composite elements, it is very important to implement Equals() and GetHashCode() correctly.
What does correctly mean there? Is it neccessary to implement those methods for all value objects in domain?
EXTENDING MY QUESTION
In the article Marc attached user Albic states:
It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.
I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.
I suddenly realized what I've got inside my AbstractEntity class:
public abstract class AbstractEntity<T> where T : AbstractEntity<T> {
private Nullable<Int32> hashCode;
public virtual Guid Id { get; protected set; }
public virtual Byte[] Version { get; set; }
public override Boolean Equals(Object obj) {
var other = obj as T;
if(other == null) {
return false;
}
var thisIsNew = Equals(this.Id, Guid.Empty);
var otherIsNew = Equals(other.Id, Guid.Empty);
if(thisIsNew && otherIsNew) {
return ReferenceEquals(this, other);
}
return this.Id.Equals(other.Id);
} // public override Boolean Equals(Object obj) {
public override Int32 GetHashCode() {
if(this.hashCode.HasValue) {
return this.hashCode.Value;
}
var thisIsNew = Equals(this.Id, Guid.Empty);
if(thisIsNew) {
this.hashCode = base.GetHashCode();
return this.hashCode.Value;
}
return this.Id.GetHashCode();
} // public override Int32 GetHashCode() {
public static Boolean operator ==(AbstractEntity<T> l, AbstractEntity<T> r) {
return Equals(l, r);
}
public static Boolean operator !=(AbstractEntity<T> l, AbstractEntity<T> r) {
return !Equals(l, r);
}
} // public abstract class AbstractEntity<T>...
As all components are nested within entities should I then implement Equals() and GetHashCode() for them?
Correctly means that GetHashCode returns the same hash code for the entities that are expected to be equal. Because equality of 2 entities is made by comparison of that code.
On the other side, that means that for entities that are not equal, the uniqueness of hash code has to be guaranteed, as much as it possible.
The documentation for Equals and GetHashCode explain this well and include specific guidance on implementation for value objects. For value objects, Equals is true if the objects are the same type and the public and private fields are equal. However, this explanation applies to framework value types and you are free to create your own Equals by overriding it.
GetHashCode has two rules that must be followed:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not
compare as equal, the GetHashCode methods for the two object do not
have to return different values.
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state
that determines the return value of the object's Equals method. Note
that this is true only for the current execution of an application,
and that a different hash code can be returned if the application is
run again.

Categories