What would be the best way to override the GetHashCode function for the case, when
my objects are considered equal if there is at least ONE field match in them.
In the case of generic Equals method the example might look like this:
public bool Equals(Whatever other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
// Considering that the values can't be 'null' here.
return other.Id.Equals(Id) || Equals(other.Money, Money) ||
Equals(other.Code, Code);
}
Still, I'm confused about making a good GetHashCode implementation for this case.
How should this be done?
Thank you.
This is a terrible definition of Equals because it is not transitive.
Consider
x = { Id = 1, Money = 0.1, Code = "X" }
y = { Id = 1, Money = 0.2, Code = "Y" }
z = { Id = 3, Money = 0.2, Code = "Z" }
Then x == y and y == z but x != z.
Additionally, we can establish that the only reasonable implementation of GetHashCode is a constant map.
Suppose that x and y are distinct objects. Let z be the object
z = { Id = x.Id, Money = y.Money, Code = "Z" }
Then x == z and y == z so that x.GetHashCode() == z.GetHashCode() and y.GetHashCode() == z.GetHashCode() establishing that x.GetHashCode() == y.GetHashCode(). Since x and y were arbitrary we have established that GetHashCode is constant.
Thus, we have shown that the only possible implementation of GetHashCode is
private readonly int constant = 17;
public override int GetHashCode() {
return constant;
}
All of this put together makes it clear that you need to rethink the concept you are trying model, and come up with a different definition of Equals.
I don't think you should be using Equals for this. People have a very explicit notion of what equals means, and if the Ids are different but the code or name are the same, I would not consider those "Equal". Maybe you need a different method like "IsCompatible".
If you want to be able to group them, you could use the extension method ToLookup() on a list of these objects, to use a predicate which would be your IsCompatible method. Then they would be grouped.
The golden rule is: if the objects compare equal, they must produce the same hash code.
Therefore a conforming (but let's say, undesirable) implementation would be
public override int GetHashCode()
{
return 0;
}
Frankly, if Id, Name and Code are independent of each other then I don't know if you can do any better. Putting objects of this type in a hash table is going to be painful.
Related
Consider the following class
public class X
{
//Unique per set / never null
public ulong A { get; set; }
//Unique per set / never null
public string B { get; set; }
//Combination of C and D is Unique per set / both never null
public string C { get; set; }
public string D { get; set; }
public override bool Equals(object obj)
{
var x = (X)obj;
if (A == x.A || B==x.B)
return true;
if (C+D==x.C+x.D)
return true;
return false;
}
public override int GetHashCode()
{
return 0;
}
}
I can't think of writing a hash function in which the combination of comments over the properties above apply, just like in the Equals function, in that case is my best bet returning a 0 from the GetHashCode or am I missing something?
This is not possible. This is fundamental problem. In fact it is possible, but it is VERY hard problem to solve.
Explanation
Just think about it in reverse, in which cases your objects are NOT equal? From code I can see what they are equal by this expression:
return A == x.A || B==x.B || (C+D)==(x.C+x.D)
And not equal expression:
return A!=x.A && B!=x.B && (C+D)!=(x.C+x.D)
So your hash should be same for any particular value in equality expression and same for any particular value in not equality expression. Values can vary to infinity.
The only real possible solution for both expressions is constant value. But this solution is not optional in performance cause it will just evaporate every meaning of GetHashCode override.
Consider using IEqualityComperer interface, and equality alghorithms for task you are solving.
I think best solution to find equal objects is Indexing. You can see for example how databases are made, and how they use bit-indexing.
Why hashes is so cruel?
If it were possible, all databases in the world would easily hash everything in single hash table, and all problems with fast access will be solved.
For example, imagine your object not as object with properties but as entire object state (for example 32 boolean properties can be represented as integer).
Hash function calculates hash based on this state, but in your case you explicitely tell that some states from it's space is actually equal:
class X
{
bool A;
bool B;
}
Your space is:
A B
false false -> 0
false true -> 1
true false -> 2
true true -> 3
If you define equality like this:
bool Equal(X x) { return x.A == A || x.B == B; }
You basicaly define this state equality:
0 == 0
0 == 1
0 == 2
0 != 3
1 == 0
1 == 1
1 != 2
1 == 3
2 == 0
2 != 1
2 == 2
2 == 3
3 != 0
3 == 1
3 == 2
3 == 3
This sets should have same hash: {0,1,2} {0,1,3} {0,2,3} {1,2,3}
So, all your sets should be EQUAL in hash. This concludes that this is impossible to create Hash function better than constant value.
In this case, I would say that the hash code that defines an object as unique (i.e. overriding GetHashCode) shouldn't be the one used for your specific HashSet.
In other words, you should consider two instances of your class equal if their properties are all equal (not if any of the properties match). But then, if you want to group them by a certain criteria, use a specific implementation of IEqualityComparer<X>.
Also, strongly consider making the class immutable.
Apart from that, the only hash code I believe will really will work is constant. Anything trying to be smarter than that will fail:
// if any of the properties match, consider the class equal
public class AnyPropertyEqualityComparer : IEqualityComparer<X>
{
public bool Equals(X x, X y)
{
if (object.ReferenceEquals(x, y))
return true;
if (object.ReferenceEquals(y, null) ||
object.ReferenceEquals(x, null))
return false;
return (x.A == y.A ||
x.B == y.B ||
(x.C + x.D) == (y.C + y.D));
}
public int GetHashCode(X x)
{
return 42;
}
}
Since you will have to evaluate all properties in any case, a HashSet will not help much in this case and you might as well use a plain List<T> (in which case insertion of a list of items into a "hashset" will degrade to O(n*n).
You could consider creating an anonymous type and then returning the hashcode from that:
public override int GetHashCode()
{
// Check that an existing code hasn't already been returned
return new { A, B, C + D }.GetHashCode();
}
Make sure you create some automated tests to verify that objects with the same values return the same hashcode.
Bear in mind that once the hashcode is given out, you must continue to return that code and not a new one.
I have a class that is similar to this:
public class Int16_2D
{
public Int16 a, b;
public override bool Equals(Object other)
{
return other is Int16_2D &&
a == ((Int16_2D)other).a &&
b == ((Int16_2D)other).b;
}
}
This works in HashSet<Int16_2D>. However in Dictionary<Int16_2D, myType>, .ContainsKey returns false when it shouldn't. Am I missing something in my implementation of ==?
For a class to work in a hash table or dictionary, you need to implement GetHashCode()! I have no idea why it's working in HashSet; I would guess it was just luck.
Note that it's dangerous to use mutable fields for calculating Equals or GetHashCode(). Why? Consider this:
var x = new Int16_2D { a = 1, b = 2 };
var set = new HashSet<Int16_2D> { x };
var y = new Int16_2D { a = 1, b = 2 };
Console.WriteLine(set.Contains(y)); // True
x.a = 3;
Console.WriteLine(set.Contains(y)); // False
Console.WriteLine(set.Contains(x)); // Also false!
In other words, when you set x.a = 3; you're changing x's hash code. But x's location in the hash table is based on its old hash code, so x is basically lost now. See this in action at http://ideone.com/QQw08
Also, as svick notes, implementing Equals does not implement ==. If you don't implement ==, the == operator will provide a reference comparison, so:
var x = new Int16_2d { a = 1, b = 2 };
var y = new Int16_2d { a = 1, b = 2 };
Console.WriteLine(x.Equals(y)); //True
Console.WriteLine(x == y); //False
In conclusion, you're better off making this an immutable type; since it's only 4 bytes long, I'd probably make it an immutable struct.
You need to override GetHashCode(). The fact that it works with HashSet<T> is probably just a lucky coincidence.
Both collections use the hash code obtained from GetHashCode to find a bucket (ie. list of objects), where the object should be placed. Then it searches that bucket to find the object, and uses Equals to ensure equality. This is what gives the nice fast lookup properties of the Dictionary and HashSet. However, this also means, that if GetHashCode is not overridden so that it corresponds to the types Equals method, you will not be able to find such an object in one of the collections.
You should, almost always, implement both GetHashCode and Equals, or none of them.
You need to override GetHashCode as well for the dictionary to work.
You have to override GetHashCode() as well - this goes hand in hand with overriding Equals. Dictionary is using GetHashCode() to determine what bin a value would fall into - only if a suitable item is found in that bin it checks on actual equality of the items.
Referring to the question that I previously asked:
Compare two lists that contain a lot of objects
It is impressive to see how fast that comparison is maide by implementing the IEqualityComparer interface: example here
As I mentioned in my other question this comparison helps me to backup a sourse folder on a destination folder. Know I want to sync to folders therefore I need to compare the dates of the files. Whenever I do something like:
public class MyFileComparer2 : IEqualityComparer<MyFile>
{
public bool Equals(MyFile s, MyFile d)
{
return
s.compareName.Equals(d.compareName) &&
s.size == d.size &&
s.deepness == d.deepness &&
s.dateModified.Date <= d.dateModified.Date; // This line does not work.
// I also tried comparing the strings by converting it to a string and it does
// not work. It does not give me an error but it does not seem to include the files
// where s.dateModified.Date < d.dateModified.Date
}
public int GetHashCode(MyFile a)
{
int rt = (a.compareName.GetHashCode() * 251 + a.size.GetHashCode() * 251 + a.deepness.GetHashCode() + a.dateModified.Date.GetHashCode());
return rt;
}
}
It will be nice if I could do something similar using greater or equal than signs. I also tried using the tick property and it does not work. Maybe I am doing something wrong. I believe it is not possible to compare things with the less than equal sign implementing this interface. Moreover, I don't understand how this Class works; I just know it is impressive how fast it iterates through the whole list.
Your whole approach is fundementally flawed because your IEqualityComparer.Equals method is not symmetric. This means Equals(file1, file2) does not equal Equals(file2, file1) because of the way you are using the less than operator.
The documentation:
IEqualityComparer.Equals Method
clearly states:
Notes to Implementers
The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x; and true for two objects x and z if it is true for x and y and also true for y and z.
Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.
Instead you need to use the IComparable interface or IEqualityComparer in combination with date comparisions. If you do not, things might seem to work for a while but you will regret it later.
Since the DateTime objects are different in the case when one DateTime is less than the other, you get different hashcodes for the objects s and d and the Equals method is not called. In order for your comparison of the dates to work, you should remove the date part from the GetHashCode method:
public int GetHashCode(MyFile a)
{
int rt = ((a.compareName.GetHashCode() * 251 + a.size.GetHashCode())
* 251 + a.deepness.GetHashCode()) *251;
return rt;
}
Your GetHashCode has a problem:
public int GetHashCode(MyFile a)
{
int rt = (((a.compareName.GetHashCode() * 251)
+ a.size.GetHashCode() * 251)
+ a.deepness.GetHashCode() *251)
+ a.dateModified.Date.GetHashCode();
return rt;
}
I changed the date part because I also needed the time therefore I use the ticks property instead. I got rid of the dateModified hashed code and it works great. here is how I modified my program. I was having trouble comparing the dates therefore I used the Ticks property.
public class MyFileComparer2 : IEqualityComparer<MyFile>
{
public bool Equals(MyFile s, MyFile d)
{
return
s.compareName.Equals(d.compareName) &&
s.size == d.size &&
s.deepness == d.deepness &&
//s.dateModified.Date <= d.dateModified.Date &&
s.dateModified.Ticks >= d.dateModified.Ticks
;
}
public int GetHashCode(MyFile a)
{
int rt = (((a.compareName.GetHashCode() * 251)
+ a.size.GetHashCode() * 251)
+ a.deepness.GetHashCode() * 251)
//+ a.dateModified.Ticks.GetHashCode();
;
return rt;
}
}
I still don't know how this hash code function works. The nice thing is that it works great.
Take a look at this class:
public class MemorialPoint:IMemorialPoint,IEqualityComparer<MemorialPoint>
{
private string _PointName;
private IPoint _PointLocation;
private MemorialPointType _PointType;
private DateTime _PointStartTime;
private DateTime _PointFinishTime;
private string _NeighborName;
private double _Rms;
private double _PointPdop;
private double _PointHdop;
private double _PointVdop;
// getters and setters omitted
public bool Equals(MemorialPoint x, MemorialPoint y)
{
if (x.PointName == y.PointName)
return true;
else if (x.PointName == y.PointName && x.PointLocation.X == y.PointLocation.X && x.PointLocation.Y == y.PointLocation.Y)
return true;
else
return false;
}
public int GetHashCode(MemorialPoint obj)
{
return (obj.PointLocation.X.ToString() + obj.PointLocation.Y.ToString() + obj.PointName).GetHashCode();
}
}
I also have a Vector class, which is merely two points and some other atributes. I don't want to have equal points in my Vector, so I came up with this method:
public void RecalculateVector(IMemorialPoint fromPoint, IMemorialPoint toPoint, int partIndex)
{
if (fromPoint.Equals(toPoint))
throw new ArgumentException(Messages.VectorWithEqualPoints);
this.FromPoint = FromPoint;
this.ToPoint = ToPoint;
this.PartIndex = partIndex;
// the constructDifference method has a weird way of working:
// difference of Point1 and Point 2, so point2 > point1 is the direction
IVector3D vector = new Vector3DClass();
vector.ConstructDifference(toPoint.PointLocation, fromPoint.PointLocation);
this.Azimuth = MathUtilities.RadiansToDegrees(vector.Azimuth);
IPointCollection pointCollection = new PolylineClass();
pointCollection.AddPoint(fromPoint.PointLocation, ref _missing, ref _missing);
pointCollection.AddPoint(toPoint.PointLocation, ref _missing, ref _missing);
this._ResultingPolyline = pointCollection as IPolyline;
}
And this unit test, which should give me an exception:
[TestMethod]
[ExpectedException(typeof(ArgumentException), Messages.VectorWithEqualPoints)]
public void TestMemoriaVector_EqualPoints()
{
IPoint p1 = PointPolygonBuilder.BuildPoint(0, 0);
IPoint p2 = PointPolygonBuilder.BuildPoint(0, 0);
IMemorialPoint mPoint1 = new MemorialPoint("teste1", p1);
IMemorialPoint mPoint2 = new MemorialPoint("teste1", p2);
Console.WriteLine(mPoint1.GetHashCode().ToString());
Console.WriteLine(mPoint2.GetHashCode().ToString());
vector = new MemorialVector(mPoint1, mPoint1, 0);
}
When i use the same point, that is, mPoint1, as in the code the exception is thrown. When I use mPoint2, even their name and coordinates being the same, the exception is not thrown. I checked their hash codes, and they are in fact different. Based on the code I created in GetHashCode, I tought these two point would have the same hashcode.
Can someone explain to me why this is not working as I tought it would? I'm not sure I explained this well, but.. I appreciate the help :D
George
You're implementing IEqualityComparer<T> within the type it's trying to compare - which is very odd. You should almost certainly just be implementing IEquatable<T> and overriding Equals(object) instead. That would definitely make your unit test work.
The difference between IEquatable<T> and IEqualityComparer<T> is that the former is implemented by a class to say, "I can compare myself with another instance of the same type." (It doesn't have to be the same type, but it usually is.) This is appropriate if there's a natural comparison - for example, the comparison chosen by string is ordinal equality - it's got to be exactly the same sequence of char values.
Now IEqualityComparer<T> is different - it can compare any two instances of a type. There can be multiple different implementations of this for a given type, so it doesn't matter whether or not a particular comparison is "the natural one" - it's just got to be the right one for your job. So for example, you could have a Shape class, and different equality comparers to compare shapes by colour, area or something like that.
You need to override Object.Equals as well.
Add this to your implementation:
// In MemorialPoint:
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
return false;
MemorialPoint y = obj as MemorialPoint;
if (this.PointName == y.PointName)
return true;
else if (this.PointName == y.PointName && this.PointLocation.X == y.PointLocation.X && this.PointLocation.Y == y.PointLocation.Y)
return true;
else
return false;
}
I'd then rework your other implementation to use the first, plus add the appropriate null checks.
public bool Equals(MemorialPoint x, MemorialPoint y)
{
if (x == null)
return (y == null);
return x.Equals(y);
}
You also need to rethink your concept of "equality", since it's not currently meeting .NET framework requirements.
If at all possible, I recommend a re-design with a Repository of memorial point objects (possibly keyed by name), so that simple reference equality can be used.
You've put an arcobjects tag on this, so I just thought I'd mention IRelationalOperator.Equals. I've never tested to see if this method honors the cluster tolerance of the geometries' spatial references. This can be adjusted using ISpatialReferenceTolerance.XYTolerance.
I have an immutable Value Object, IPathwayModule, whose value is defined by:
(int) Block;
(Entity) Module, identified by (string) ModuleId;
(enum) Status; and
(entity) Class, identified by (string) ClassId - which may be null.
Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
int hx = GetHashCode(x);
int hy = GetHashCode(y);
return hx == hy;
}
public int GetHashCode(IPathwayModule obj)
{
int h;
if (obj.Class != null)
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
}
else
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
}
return h;
}
}
IPathwayModule is definitely immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.
I suppose my questions are:
Am I using the interface correctly in this case?
Are there cases where I might not see the desired behaviour?
Is there any way to improve the robustness, performance?
Are there any good practices that I am not following?
Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:
return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...
Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.
return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...
You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...
The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.
GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
if (obj.class != null)
h += obj.Class.ClassId.GetHashCode();
return h;
}
}
The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.
You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.
There could be multiple objects with the same hash code (this is called 'hash code collisions').
Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.
You should implement better versions of Equals and GetHashCode.
For instance, the hash code of enums is simply their numerical value.
In other words, with these two enums:
public enum A { x, y, z }
public enum B { k, l, m }
Then with your implementation, the following value type:
public struct AB {
public A;
public B;
}
the following two values would be considered equal:
AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };
I'm assuming you don't want that.
Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.
Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
It is proabably better to use ^ rather than + to combine the hash codes.
If I understand you well, you'd like to hear some comments on your code. Here're my remarks:
GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
You say your value types are immutable, but you reference objects (.Class), which are not immutable
Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.
Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.
EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.
Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
if (x == y) return true;
if (x == null || y == null) return false;
if ((x.Class == null) ^ (y.Class == null)) return false;
if (x.Class == null) //and implicitly y.Class == null
{
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
}
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
}
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
if (obj.Class != null)
{
h ^= obj.Class.ClassId.GetHashCode();
}
return h;
}
}
}