Overriding GetHash() And Equals() - c#

I am having trouble overriding the GetHashCode() method and the Equals() method.
public class Coordinate
{
int x;
int y;
public Coordinate(int p,int q)
{
this.x = p ;
this.y = q;
}
}
Suppose I created two Coordinate point objects with same x and y coordinates .
I want my program to understand that they are equal.
Coordinate Point 1 = new Coordinate(0,0);
Coordinate Point 2 = new Coordinate(0,0);
By default they are giving different GetHashCode() as expected.
I want them to give same hash code by overriding it and then use that hash code as a Key to generate values from a Dictionary. After searching about it, I know that I also have to override Equals().

You have to override Equals(), because if two objects have the same hashcode, it doesn't mean they are to be considered equal. The hashcode simply acts as an "index" to speed up searches.
Every time you use new, an instance is created and it will not be the same instance as another instance. This is what ReferenceEquals() checks - imagine two identical bottles of soda - they're the same, but they're not the same bottle.
Equals() is meant to check whether you (the developer) want to consider two instances as equal, even though they are not the same instance.

You can implement something in this vein:
public override bool Equal(Object o) {
if (object.ReferenceEquals(o, this))
return true;
Coordinate other = o as Coordinate;
else if (null == other)
return false;
return x == other.x && y == other.y;
}
public override int GetHashCode() {
return x.GetHashCode() ^ y.GetHashCode();
}
where Equals return true if and only if instances are equal while GetHashCode() does a quick estimation (instances are not equal if they have different hash code, the reverse, however, is not true) and ensures uniform distribution of the hashes as far as it's possible (so that in Dictionary and alike structures we have roughly equally number of values per each key)
https://msdn.microsoft.com/en-us/library/336aedhh(v=vs.100).aspx
https://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx

I would override the named methods like this. For the GetHashCode method I took one of several options from this question, but you can choose another if you like.
I also changed the class to immutable. You should only use immutable properties/fields to calculate the hashcode.
public class Coordinate {
public Coordinate(int p, int q) {
x = p;
y = q;
}
private readonly int x;
private readonly int y;
public int X { get { return x; } }
public int Y { get { return y; } }
public override int GetHashCode() {
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ x.GetHashCode();
hash = (hash * 16777619) ^ y.GetHashCode();
return hash;
}
}
public override bool Equals(object obj) {
if (obj == null)
return false;
var otherCoordinate = obj as Coordinate;
if (otherCoordinate == null)
return false;
return
this.X == otherCoordinate.X &&
this.Y == otherCoordinate.Y;
}
}

Here's a simple way you to do it.
First, override the ToString() method of your class to something like this:
public override string ToString()
{
return string.Format("[{0}, {1}]", this.x, this.y);
}
Now you can easily override GetHashCode() and Equals() like this:
public override int GetHashCode()
{
return this.ToString().GetHashCode();
}
public override bool Equals(object obj)
{
return obj.ToString() == this.ToString();
}
Now if you try this:
Coordinate p1 = new Coordinate(5, 0);
Coordinate p2 = new Coordinate(5, 0);
Console.WriteLine(p1.Equals(p2));
you'll get:
True

What you try to do typically happens when you have immutable objects and such, anyway if you don't want to use a struct, you can do it like this :
public class Coord : IEquatable<Coord>
{
public Coord(int x, int y)
{
this.X = x;
this.Y = y;
}
public int X { get; }
public int Y { get; }
public override int GetHashCode()
{
object.Equals("a", "b");
// Just pick numbers that are prime between them
int hash = 17;
hash = hash * 23 + this.X.GetHashCode();
hash = hash * 23 + this.Y.GetHashCode();
return hash;
}
public override bool Equals(object obj)
{
var casted = obj as Coord;
if (object.ReferenceEquals(this, casted))
{
return true;
}
return this.Equals(casted);
}
public static bool operator !=(Coord first, Coord second)
{
return !(first == second);
}
public static bool operator ==(Coord first, Coord second)
{
if (object.ReferenceEquals(second, null))
{
if (object.ReferenceEquals(first, null))
{
return true;
}
return false;
}
return first.Equals(second);
}
public bool Equals(Coord other)
{
if (object.ReferenceEquals(other, null))
{
return false;
}
return object.ReferenceEquals(this, other) || (this.X.Equals(other.X) && this.Y.Equals(other.Y));
}
}
Note . You really should make your class immutable if you do custom equality since it could break your code if you use a hash based collection.
I think it is considered good practice to do all those overloads when you want custom equality checking like you do. Especially since when object.GetHashCode() returns the same value for two object, Dictionary and other hash based collections use the default equality operator which uses object.Equals.
Object.ReferenceEquals(Ob,Ob) determine reference equality, a.k.a if both reference point to the same allocated value, two references being equal ensure you it's the exact same object.
Object.Equals(Ob) is the virtual method in object class, by default it compares references just like Object.ReferenceEquals(Ob,Ob)
Object.Equals(Ob,Ob) calls the Ob.Equals(Ob), so yeah just a static shorthand checking for null beforehand IIRC.

Related

How to guarantee equal hash codes if all properties are "Equal"?

Is my current solution, (A, B, C, D, ...).GetHashCode(), guaranteed to always be the same for tuples with "Equal" items?
public class Pair
{
public int X { get; set; }
public int Y { get; set; }
public Pair(int x, int y)
{
X = x;
Y = y;
}
public override bool Equals(object other) => Equals(other as Pair);
public virtual bool Equals(Pair other)
{
if (other is null)
{
return false;
}
if (object.ReferenceEquals(this, other))
{
return true;
}
if (this.GetType() != other.GetType())
{
return false;
}
return X == other.X && Y == other.Y;
}
public override int GetHashCode() => (X, Y).GetHashCode();
public static bool operator ==(Pair lhs, Pair rhs)
{
if (lhs is null)
{
if (rhs is null)
{
return true;
}
return false;
}
return lhs.Equals(rhs);
}
public static bool operator !=(Pair lhs, Pair rhs) => !(lhs == rhs);
}
In this code always guaranteed to print 1:
var uniquePairs = new HashSet<Pair>();
uniquePairs.Add(new Pair(2, 4));
uniquePairs.Add(new Pair(2, 4));
uniquePairs.Add(new Pair(2, 4));
uniquePairs.Add(new Pair(2, 4));
Console.WriteLine(uniquePairs.Count);
What about for a greater number of non-trivial type properties?
What are reliable GetHashCode solutions that can be used for classes like these, which guarantee equal hashodes if all (not-necessarily-int) members are the same?
People usually use some arithmetic with factors derived from the values of fields to provide a good pseudo-random distribution but will compute to the same thing if all fields are equal. Have a look at this:
General advice and guidelines on how to properly override object.GetHashCode()
Also, look at the Microsoft documentation if you want more information on the subject.
If your goal is simply to have classes whose equality is determined by fields matching rather than by reference, C# has a new record reference type you can use which does this by default. If you're using the latest version of C#/.NET this would be the way to go.
If you get into anything really complicated that has to be secure, consider looking into using some robust hash algorithms like SHA-256 ... take all your fields and turn them into a padded buffer or bytes and run them through SHA-256 (all this is found in System.Security.Cryptography). You'll take the SHA-256 output and select 4 bytes of it to produce a 32-bit integer. Collision is very, very unlikely (but of course it's still possible with only 32-bits).

Remove duplicates from a list containing a generic class

I'm trying to remove duplicates from a list containing a generic class. The generic class looks like this (stripped back example):
public class Point2D<T>
{
public T x;
public T y;
public Point2D(T x, T y)
{
this.x = x;
this.y = y;
}
}
and I've created the list like this:
List<Point2D<int>> pointList = new List<Point2D<int>>();
pointList.Add(new Point2D<int>(1,1));
pointList.Add(new Point2D<int>(1,2));
pointList.Add(new Point2D<int>(1,1));
pointList.Add(new Point2D<int>(1,3));
I tried removing the duplicates by:
pointList = pointList.Distinct().ToList();
expecting that pointList would only contain: (1,1), (1,2), (1,3) but it still contains all four points that were entered. I suspect I need my own equals or comparator method in Point2D, but I don't know if this is the case, or how they should be written (unless of course I'm just missing something simple).
To do this, you need to override Equals method:
public class Point2D<T>
{
public readonly T x;
public readonly T y;
public Point2D(T x, T y)
{
this.x = x;
this.y = y;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Point2D<T>) obj);
}
protected bool Equals(Point2D<T> other)
{
return EqualityComparer<T>.Default.Equals(x, other.x) && EqualityComparer<T>.Default.Equals(y, other.y);
}
public override int GetHashCode()
{
unchecked
{
return (EqualityComparer<T>.Default.GetHashCode(x)*397) ^ EqualityComparer<T>.Default.GetHashCode(y);
}
}
}
Also, you need to override GetHashCode. But to do it correctly, you must make your x and y readonly fields
You can use Anonymous object. How ever this will change the references. so use it only when you do not need previous references.
pointList = pointList.Select(x => new {x.x,x.y}).Distinct().Select(x => new Point2D<int>(x.x, x.y)).ToList();
You will need to implement
IEquatable<T>
interface for this custom class. Check this link for more details and sample:
https://msdn.microsoft.com/en-us/library/vstudio/bb348436(v=vs.100).aspx
I would suggest overriding == operator. This should help.

I'm implementing a CaseAccentInsensitiveEqualityComparer for Strings. I'm not sure how to implement the GetHashCode

My code is like this:
public class CaseAccentInsensitiveEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Compare(x, y, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0;
}
public int GetHashCode(string obj)
{
// not sure what to put here
}
}
I know the role of GetHashCode in this context, what I'm missing is how to produce the InvariantCulture, IgnoreNonSpace and IgnoreCase version of obj so that I can return it's HashCode.
I could remove diacritics and the case from obj myself and then return it's hashcode, but I wonder if there's a better alternative.
Returning 0 inside GetHashCode() works (as pointed out by #Michael Perrenoud) because Dictionaries and HashMaps call Equals() just if GetHashCode() for two objects return the same values.
The rule is, GetHashCode() must return the same value if objects are equal.
The drawback is that the HashSet (or Dictionary) performance decreases to the point it becomes the same as using a List. To find an item it has to call Equals() for each comparison.
A faster approach would be converting to Accent Insensitive string and getting its hashcode.
Code to remove accent (diacritics) from this post
static string RemoveDiacritics(string text)
{
return string.Concat(
text.Normalize(NormalizationForm.FormD)
.Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch) !=
UnicodeCategory.NonSpacingMark)
).Normalize(NormalizationForm.FormC);
}
Comparer code:
public class CaseAccentInsensitiveEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Compare(x, y, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0;
}
public int GetHashCode(string obj)
{
return obj != null ? RemoveDiacritics(obj).ToUpperInvariant().GetHashCode() : 0;
}
private string RemoveDiacritics(string text)
{
return string.Concat(
text.Normalize(NormalizationForm.FormD)
.Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch) !=
UnicodeCategory.NonSpacingMark)
).Normalize(NormalizationForm.FormC);
}
}
Ah, excuse me, I had my methods mixed up. When I implemented something like this before I just returned the hash code of the object itself return obj.GetHashCode(); so that it would always enter the Equals method.
Okay, after much confusion I believe I've got myself straight. I found that returning zero, always, will force the comparer to use the Equals method. I'm looking for the code I implemented this in to prove that and put it up here.
Here's the code to prove it.
class MyArrayComparer : EqualityComparer<object[]>
{
public override bool Equals(object[] x, object[] y)
{
if (x.Length != y.Length) { return false; }
for (int i = 0; i < x.Length; i++)
{
if (!x[i].Equals(y[i]))
{
return false;
}
}
return true;
}
public override int GetHashCode(object[] obj)
{
return 0;
}
}

Override Equals and GetHashCode in class with one field

I have a class:
public abstract class AbstractDictionaryObject
{
public virtual int LangId { get; set; }
public override bool Equals(object obj)
{
if (obj == null || obj.GetType() != GetType())
{
return false;
}
AbstractDictionaryObject other = (AbstractDictionaryObject)obj;
if (other.LangId != LangId)
{
return false;
}
return true;
}
public override int GetHashCode()
{
int hashCode = 0;
hashCode = 19 * hashCode + LangId.GetHashCode();
return hashCode;
}
And I have derived classes:
public class Derived1:AbstractDictionaryObject
{...}
public class Derived2:AbstractDictionaryObject
{...}
In the AbstractDictionaryObject is only one common field: LangId.
I think this is not enough to overload methods (properly).
How can I identify objects?
For one thing you can simplify both your methods:
public override bool Equals(object obj)
{
if (obj == null || obj.GetType() != GetType())
{
return false;
}
AbstractDictionaryObject other = (AbstractDictionaryObject)obj;
return other.LangId == LangId;
}
public override int GetHashCode()
{
return LangId;
}
But at that point it should be fine. If the two derived classes have other fields, they should override GetHashCode and Equals themselves, first calling base.Equals or base.GetHashCode and then applying their own logic.
Two instances of Derived1 with the same LangId will be equivalent as far as AbstractDictionaryObject is concerned, and so will two instances of Derived2 - but they will be different from each other as they have different types.
If you wanted to give them different hash codes you could change GetHashCode() to:
public override int GetHashCode()
{
int hash = 17;
hash = hash * 31 + GetType().GetHashCode();
hash = hash * 31 + LangId;
return hash;
}
However, hash codes for different objects don't have to be different... it just helps in performance. You may want to do this if you know you will have instances of different types with the same LangId, but otherwise I wouldn't bother.

How can I use a HashSet<MyCustomClass> to remove duplicates of MyCustomClass?

I have a HashSet<MyCustomClass> mySet = new HashSet<MyCustomClass>(); and I wish to remove all MyCustomClass's that contain the same values.
Let's say MyCustomClass looks like this:
public class MyCustomClass
{
Point point;
public MyCustomClass(int x, int y)
{
point.X = x;
point.Y = y;
}
// Other methods...
}
I tried to implement IEqualityComparer like MSDN suggests, and pass it through the constructor of the HashSet<MyCustomClass>(); but I ended up unsuccessfully.
What's the correct approach?
EDIT:
This is my Chain class and my ChainEqualityComparer:
public class Chain
{
HashSet<Mark> chain;
HashSet<Mark> marks;
public Chain(HashSet<Mark> marks)
{
chain = new HashSet<Mark>();
this.marks = marks;
}
// Other methods...
}
public class ChainEqualityComparer : IEqualityComparer<Chain>
{
#region IEqualityComparer<Chain> Members
public bool Equals(Chain x, Chain y)
{
if (x.ChainWithMarks.Count == y.ChainWithMarks.Count)
{
foreach (Mark mark in x.ChainWithMarks)
{
if (!y.ChainWithMarks.Contains(mark))
return false;
}
return true;
}
return false;
}
public int GetHashCode(Chain obj)
{
return obj.GetHashCode() ^ obj.GetType().GetHashCode();
}
#endregion
}
And this is my Mark class:
public class Mark
{
int x;
int y;
public Mark(int x, int y)
{
this.x = x;
this.y = y;
}
public int X
{
get { return x; }
set { x = value; }
}
public int Y
{
get { return y; }
set { y = value; }
}
}
public class MarkEqualityComparer : IEqualityComparer<Mark>
{
#region IEqualityComparer<Mark> Members
public bool Equals(Mark x, Mark y)
{
return (x.X == y.X) && (x.Y == y.Y);
}
public int GetHashCode(Mark obj)
{
return obj.GetHashCode() ^ obj.GetType().GetHashCode();
}
#endregion
}
(I can pastebin the code if it's too much code...)
You can use the EqualityComparer or just override Equals and GetHashCode.
You must make sure that whatever you consider to be a duplicate is identified as having an equivalent hash code, and returning true when tested for equality.
My guess is that you weren't returning equal hash codes. Could you post the code from your equality comparer?
As a test, you could do:
var set = new HashSet<MyCustomClass>();
var a = new MyCustomClass(1,2);
var b = new MyCustomClass(1,2);
set.Add(a);
set.Add(b);
Assert.IsTrue(a.Equals(b));
Assert.IsTrue(b.Equals(a));
Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
Assert.AreEqual(1, set.Count);
A similar set of tests would be applicable to an equality comparer too.
EDIT
Yep, as suspected it's the hash code function. You need to calculate it based on the values of the type itself. A common enough mistake.
public int GetHashCode(Mark obj)
{
return ((MyCustomClass)obj).point.GetHashCode();
}
That assumes point is the only state field in your type.
I think you are getting tripped up becase two Mark instances with the same values won't be equal in your ChainEqualityComparer class. It doesn't appear like MarkEqualityComparer is ever used.
The line:
if (!y.ChainWithMarks.Contains(mark))
will always be false unless you override Equals and GetHashCode on the Mark class. (Except if you have two references to the same mark in both Chain x and Chain y, which I'm presuming is not what you want).
If y.ChainWithMarks is a HashSet and you want to use MarkEqualityComparer, make sure you create that HashSet with the correct constructor including an instance of MarkEqualityComparer.
Since Mark is a value type, you might consider using a struct to represent it, since the .Net runtime then uses value equality instead of referential equality when comparing. I think this is actually the most correct implementation of your idea.

Categories