Equals and hashcode for vertices - c#

I have a couple of vertices which I want to put into a Hashtable. Vertices which are really close to each other are considered as the same vertex. My C# vertex class looks like this:
public class Vertex3D
{
protected double _x, _y, _z;
public static readonly double EPSILON = 1e-10;
public virtual double x
{
get { return _x;}
set { _x = value; }
}
public virtual double y
{
get { return _y; }
set { _y = value; }
}
public virtual double z
{
get { return _z; }
set { _z = value; }
}
public Vertex3D(double p1, double p2, double p3)
{
this._x = p1;
this._y = p2;
this._z = p3;
}
public override bool Equals(object obj)
{
var other = obj as Vertex3D;
if (other == null)
{
return false;
}
double diffx = this.x - other.x;
double diffy = this.y - other.y;
double diffz = this.z - other.z;
bool eqx = diffx > -EPSILON && diffx < EPSILON;
bool eqy = diffy > -EPSILON && diffy < EPSILON;
bool eqz = diffz > -EPSILON && diffz < EPSILON;
return eqx && eqy && eqz;
}
public override int GetHashCode()
{
return this.x.GetHashCode() ^ this.y.GetHashCode() ^ this.z.GetHashCode();
}
public override string ToString()
{
return "Vertex:" + " " + x + " " + y + " " + z;
}
Now lets say I put the following two vertices into a dictionary (a dictionary is a hashtable which doesn't allow null keys):
Dictionary<Vertex3D, Vertex3D> vertexList = new Dictionary<Vertex3D, Vertex3D>();
Vertex3D v0 = new Vertex3D(0.000000000000000037842417475065449, -1, 0.00000000000000011646698526992202));
Vertex3D v1 = new Vertex3D(0, -1, 0));
vertexList.Add(v0, v0);
vertexList.Add(v1, v1);
The problem is that my implementation of equals and hashcode is faulty. The above two vertices are considered as equal because the distance to each other is smaller than EPSILON. BUT they don't return the same hashcode.
How do I implement equals and hashcode correctly?

Hashtables require equivalence classes, but your Equals() is not transitive. Therefore you cannot use a hashtable for this purpose. (If, for example, you allowed nearby objects to compare equal by rounding to lattice points, you would have transitivity and equivalence classes. But then there still would be arbitrarily close points, down to the precision of your representation, which fell on opposite sides of a threshold and thus in different equivalence classes)
There are other data structures, such as octtrees, which are designed to accelerate finding nearby points. I suggest you use one of those.

Generally, mutable-thing references should only be considered equivalent if they both refer to the same object. Only references to immutable things should use any other definition of equality. It would be helpful if Object included virtual functions to test for equivalence in the scenario where two references are held by separate objects, neither of which will expose its reference to anything that might mutate it. Unfortunately, even though the effectively-immutable-instance-of-mutable-type pattern is very common (nearly all immutable collections, for example, use one or more mutable-type objects such as arrays to hold their data) there's no standard pattern for equivalence testing with it.
If you want to store vertices in a dictionary using Object.Equals for equality testing, it should be an immutable type. Alternatively, you could define a custom IEqualityComparer<T> for use with the dictionary, but you should be aware that Dictionary should only be used to find perfect matches. If you want to be able to find any point that's within EPSILON of a given point, you should use a which maps rounded values to lists of precise values (values should be rounded to a power of two that's at least twice as great as epsilon). If adding or subtracting EPSILON from some or all of the coordinates in a point would cause it to be rounded differently, the point should be included in the dictionary, rounded every such possible way.

Related

Why does List<T>.Sort() where T:IComparable<T> produce a different order than List<T>.Sort(IComparer<T>)?

List<T>.Sort() where T:IComparable<T> produces a different order than List<T>.Sort(IComparer<T>), with equal inputs.
Note the comparer is not really correct - it does not return 0 for equality.
Both orders are valid (because the difference lies in the equal elements), but I'm wondering where the difference arises from, having looked through the source, and if it is possible to alter the IComparable.CompareTo to match the behavior of the IComparer.Compare when passed into the list.
The reason I'm looking at this is because IComparable is far faster than using an IComparer (which I'm guessing is why there are two different implementations in .NET), and I was hoping to improve the performance of an open source library by switching the IComparer to IComparable.
A full example is at: https://dotnetfiddle.net/b7s6my
Here's the class/comparer:
public class Point : IComparable<Point>
{
public int X { get; }
public int Y { get; }
public int UID { get; set; } //Set by outside code
public Point(int x, int y)
{
X = x;
Y = y;
}
public int CompareTo(Point b)
{
return PointComparer.Default.Compare(this, b);
}
}
public class PointComparer : IComparer<Point>
{
public readonly static PointComparer Default = new PointComparer();
public int Compare(Point a, Point b)
{
if (a.Y == b.Y)
{
return (a.X < b.X) ? -1 : 1;
}
return (a.Y > b.Y) ? -1 : 1;
}
}
Note the comparer is not mine (so I can't really change it) - and changing the sort order causes the surrounding code to fail.
As mentioned in comments problem is with IComparer<Point>, which for two equal objects a and b (ones that have same X and Y) returns Compare(a, b) = 1, and Compare(b, a) = 1.
However, question arose why are the sorts still different.
Checking source of ArraySortHelper (see comment of #Sweeper) showed two versions of quick sort algorithm implementations (one of explicit IComparer and one for implicit).
Algorithms are mostly the same, however, function PickPivotAndPartition is a bit different. One function is PickPivotAndParition(Span<T> keys), another is PickPivotAndPartition(Span<T> keys, Comparison<T> comparer).
In first function there's line:
while (... && GreaterThan(ref pivot, ref leftRef = ref Unsafe.Add(ref leftRef, 1))) ;
And in second function similar line looks like:
while (comparer(keys[++left], pivot) < 0) ;
So, that looks to be a point - first function line can be thought as Compare(pivot, left) > 0, while second line as Compare(left, pivot) < 0, so when you have Compare(left, pivot) = 1 and Compare(pivot, left) = 1, condition in first function will be true, while in second - false.
This means that two algorithm implementations can select different array slices and hence have different output.

Why removing objects from a list with duplicate properties of types double in C# does not give consistent result using different methods?

I am trying to find the quickest way to remove duplicate entries in a list.
My list contains objects which have properties X and Y which are both of type double.
I need to remove any objects which contain the same X and Y values.
My first attempt is very slow.
It will take a list that contains 81403 objects and spit out a new list with 25900 but it takes over a minute to run. Had this run quickly I would have compared the difference in order to add some rounding but it's too slow.
private List<DelaunayPoint> DeleteDuplicatesSlowWay(List<DelaunayPoint> points)
{
List<DelaunayPoint> distinctPoints = new();
int i = 0;
foreach (DelaunayPoint p in points)
{
if (i == 0)
{
distinctPoints.Add(p);
}
else
{
if (distinctPoints.Any(pnt => pnt.X == p.X) == false ||
distinctPoints.Any(pnt => pnt.Y == p.Y) == false)
{
distinctPoints.Add(p);
}
}
i++;
}
return distinctPoints;
}
The following method will take the same list of 81403 objects but it will spit out a list containing 73385 objects, however, it takes less than a second to run.
private List<DelaunayPoint> DeleteDuplicatesFast(List<DelaunayPoint> points)
{
return points
.GroupBy(p => new { p.X, p.Y })
.Select(output => output.First())
.ToList();
}
Why do the above two methods give different results?
Assuming the difference is a rounding error between the two methods, how can I add rounding to the second DeleteDuplicatesFast method so I can compare the two?
I would need any rounding to not apply the rounding to the output list.
To answer the first part of your question: points are only equal if both their X and Y values are equal. You're testing for either X or Y being equal.
About the filtering of duplicates. The fastest way is to make your DelaunayPoint class implement IEquatable<DelaunayPoint> and then add the collection to a HashSet:
class DelaunayPoint : IEquatable<DelaunayPoint>
{
public DelaunayPoint(double x, double y)
{
X = x;
Y = y;
}
public double X { get; }
public double Y { get; }
public bool Equals(DelaunayPoint other)
{
return other != null && this.X == other.X && this.Y == other.Y;
}
public override int GetHashCode()
{
return System.HashCode.Combine(X,Y);
}
}
var set = new HashSet<DelaunayPoint>(points);
Now set contains distinct points. I tested it to be approx. 7 times faster than GroupBy.

Why is Dictionary.ContainsKey() & ToString() causing GC Alloc?

There's not much more I have to offer in terms of details other than the fact that there's around 1k instances that are running their own ContainsKey() and ToString() pretty often.
Location is just my personal replacement for Unity's Vector3 to fit my needs:
[Serializable] public struct Location
{
public double X;
public double Y;
public double Z;
public Location(double x, double y, double z) : this()
{
X = x;
Y = y;
Z = z;
}
public override string ToString()
{
return String.Format("{0}, {1}, {2}", X, Y, Z);
}
}
(I know I'm breaking a rule of some sort with Structs.. just not sure how to achieve my needs another way atm.)
Here's a screenshot of the Profiler running:
As you can see, for most of the timeline, it's stable, then all of the sudden after my instances reach around 1k(quantity), (they start around 100-250) the CPU and Memory go wild due to what seems to be the GC allocations. I've been going through looking for what I can clean up a bit better, but all I see that's even causing ANY GC alloc is when I am running an:
if (_dictionary.ContainsKey(key)) {...}
and when renaming Unity GameObjects with:
part.name = "Part: " + part.Location.ToString();
If it just pertains to the unavoidable time it takes for the lookup, then are there any alternatives to Dictionaries that tend to function even slower but causing less GC alloc, and is there a more effective way to override the ToString() method?
Addition: My Dictionary is Key: (my personal struct) Location, Value: Class instance.
Turning comments into an answer...
Your ToString() method is always going to create a new string, so that's no surprise. However, you're also using string concatenation, so you're creating two new strings. You could reduce this to one by just inlining the ToString() method. For example, using C# 6 interpolated strings for brevity:
var location = part.Location;
part.name = $"Part: {location.X}, {location.Y}, {location.Z}";
For the dictionary aspect, there are two issues:
You're not overriding Equals and GetHashCode, which may mean that the value is being boxed in order to call the implementations in ValueType. I'm not 100% sure on this; the rules on boxing can be complicated.
You're not implementing IEquatable<T>, so it's very likely that any Equals calls will be boxing.
You can fix both of these easily:
[Serializable] public struct Location : IEquatable<Location>
{
public double X;
public double Y;
public double Z;
public Location(double x, double y, double z) : this()
{
X = x;
Y = y;
Z = z;
}
public override string ToString() => $"{X}, {Y}, {Z}";
public override bool Equals(object obj) =>
obj is Location loc && Equals(loc);
public bool Equals(Location other) =>
X == other.X && Y == other.Y && Z == other.Z;
public override int GetHashCode()
{
// Replace with whatever implementation you want
int hash = 17;
hash = hash * 23 + X.GetHashCode();
hash = hash * 23 + Y.GetHashCode();
hash = hash * 23 + Z.GetHashCode();
return hash;
};
}
(That's using C# 7 syntax, but I'd expect that to still be okay if you're using a modern version of Unity with VS2017. If you're using an older version, you should be able to implement the same methods just in a slightly more longwinded fashion.)

C# performant alternatives to HashSet and Dictionary that do not use GetHashCode

I'm looking for built-in alternatives of HashSet and Dictionary objects that have better performance than lists but do not use the internal GetHashCode method. I need this because for the class I have written, there is no way of writing a GetHashCode method that fulfills the usual contract with Equals other than
public override int GetHashCode() { return 0; } // or return any other constant value
which would turn HashSet and Dictionary into ordinary lists (performance-wise).
So what I need is a set implementation and a mapping implementation. Any suggestions?
EDIT:
My class is a tolerance-based 3-dimensional vector class:
public class Vector
{
private static const double TOL = 1E-10;
private double x, y, z;
public Vector(double x, double y, double z)
{
this.x = x; this.y = y; this.z = z;
}
public override bool Equals(object o)
{
Vector other = o as Vector;
if (other == null)
return false;
return ((Math.Abs(x - other.x) <= TOL) &&
(Math.Abs(y - other.y) <= TOL) &&
(Math.Abs(z - other.z) <= TOL));
}
}
Note that my Equals method is not transitive. However, in my use case I can make it "locally" transitive because at some point, I will know all vectors that I need to put into my set / mapping key set, and I also know that they will come in clusters. So when I have collected all vectors, I will choose one representative per cluster and replace all original vectors by the representative. Then Equals will be transitive among the elements of my set / mapping key set.
When I have my set or mapping, I will collect vectors from another source (for the sake of this question let's assume I'll ask a user to type in a vector). These can be any possible vector. Those will never be added to the set/mapping, but I will need to know if they are contained in the set / key set of the mapping (regarding tolerance), and I will need to know their value from the mapping.
You need a data structure that supports sorting, binary search and fast insertion. Unfortunately there is no such collection in the .NET Framework. The SortedDictionary doesn't supports binary search, while the SortedList suffers from O(n) insertion for unsorted data. So you must search for a third party tool. A good candidate seems to be the TreeDictionary of C5 library. It is a red-black tree implementation that offers the important method RangeFromTo. Here is an incomplete implementation of a Dictionary that has Vectors as keys, backed internally by a C5.TreeDictionary:
public class VectorDictionary<TValue>
{
private readonly C5.TreeDictionary<double, (Vector, TValue)> _tree =
new C5.TreeDictionary<double, (Vector, TValue)>();
public bool TryGetKeyValue(Vector key, out (Vector, TValue) pair)
{
double xyz = key.X + key.Y + key.Z;
// Hoping that not all vectors are crowded in the same diagonal line
var range = _tree.RangeFromTo(xyz - Vector.TOL * 3, xyz + Vector.TOL * 3);
var equalPairs = range.Where(e => e.Value.Item1.Equals(key));
// Selecting a vector from many "equal" vectors is tricky.
// Some may be more equal than others. :-) Lets return the first for now.
var selectedPair = equalPairs.FirstOrDefault().Value;
pair = selectedPair;
return selectedPair.Item1 != null;
}
public Vector GetExisting(Vector key)
{
return TryGetKeyValue(key, out var pair) ? pair.Item1 : default;
}
public bool Contains(Vector key) => TryGetKeyValue(key, out var _);
public bool Add(Vector key, TValue value)
{
if (Contains(key)) return false;
_tree.Add(key.X + key.Y + key.Z, (key, value));
return true;
}
public TValue this[Vector key]
{
get => TryGetKeyValue(key, out var pair) ? pair.Item2 : default;
set => _tree.Add(key.X + key.Y + key.Z, (key, value));
}
public int Count => _tree.Count;
}
Usage example:
var dictionary = new VectorDictionary<int>();
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.5 * 1E-10, 0, 0), 1)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.6 * 1E-10, 0, 0), 2)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(1.6 * 1E-10, 0, 0), 3)}");
Console.WriteLine($"dictionary.Count: {dictionary.Count}");
Console.WriteLine($"dictionary.Contains: {dictionary.Contains(new Vector(2.5 * 1E-10, 0, 0))}");
Console.WriteLine($"dictionary.GetValue: {dictionary[new Vector(2.5 * 1E-10, 0, 0)]}");
Output:
Added: True
Added: False
Added: True
dictionary.Count: 2
dictionary.Contains: True
dictionary.GetValue: 3
You can get a reasonably good hashcode implementation in your case. Remember that the most important rule for a hash code is the following:
Two equal vectors must return the same value
This does not mean that two different vectors cannot return the same value; they obviously have to in some cases, the number of hashes is limited, the number of distinct vectors for all purposes isn't.
Well, with that in mind, simply evaluate your hashcode based upon the vectors coordinates truncated to the tolerance's significant digits minus one. All equal vectors will give you the same hash and a small minority of non equal vectors that differ in the last decimal wont...you can live with that.
UPDATE: Changed rounded to truncated. Rounding is not the right choice.

IEqualityComparer and weird results

Take a look at this class:
public class MemorialPoint:IMemorialPoint,IEqualityComparer<MemorialPoint>
{
private string _PointName;
private IPoint _PointLocation;
private MemorialPointType _PointType;
private DateTime _PointStartTime;
private DateTime _PointFinishTime;
private string _NeighborName;
private double _Rms;
private double _PointPdop;
private double _PointHdop;
private double _PointVdop;
// getters and setters omitted
public bool Equals(MemorialPoint x, MemorialPoint y)
{
if (x.PointName == y.PointName)
return true;
else if (x.PointName == y.PointName && x.PointLocation.X == y.PointLocation.X && x.PointLocation.Y == y.PointLocation.Y)
return true;
else
return false;
}
public int GetHashCode(MemorialPoint obj)
{
return (obj.PointLocation.X.ToString() + obj.PointLocation.Y.ToString() + obj.PointName).GetHashCode();
}
}
I also have a Vector class, which is merely two points and some other atributes. I don't want to have equal points in my Vector, so I came up with this method:
public void RecalculateVector(IMemorialPoint fromPoint, IMemorialPoint toPoint, int partIndex)
{
if (fromPoint.Equals(toPoint))
throw new ArgumentException(Messages.VectorWithEqualPoints);
this.FromPoint = FromPoint;
this.ToPoint = ToPoint;
this.PartIndex = partIndex;
// the constructDifference method has a weird way of working:
// difference of Point1 and Point 2, so point2 > point1 is the direction
IVector3D vector = new Vector3DClass();
vector.ConstructDifference(toPoint.PointLocation, fromPoint.PointLocation);
this.Azimuth = MathUtilities.RadiansToDegrees(vector.Azimuth);
IPointCollection pointCollection = new PolylineClass();
pointCollection.AddPoint(fromPoint.PointLocation, ref _missing, ref _missing);
pointCollection.AddPoint(toPoint.PointLocation, ref _missing, ref _missing);
this._ResultingPolyline = pointCollection as IPolyline;
}
And this unit test, which should give me an exception:
[TestMethod]
[ExpectedException(typeof(ArgumentException), Messages.VectorWithEqualPoints)]
public void TestMemoriaVector_EqualPoints()
{
IPoint p1 = PointPolygonBuilder.BuildPoint(0, 0);
IPoint p2 = PointPolygonBuilder.BuildPoint(0, 0);
IMemorialPoint mPoint1 = new MemorialPoint("teste1", p1);
IMemorialPoint mPoint2 = new MemorialPoint("teste1", p2);
Console.WriteLine(mPoint1.GetHashCode().ToString());
Console.WriteLine(mPoint2.GetHashCode().ToString());
vector = new MemorialVector(mPoint1, mPoint1, 0);
}
When i use the same point, that is, mPoint1, as in the code the exception is thrown. When I use mPoint2, even their name and coordinates being the same, the exception is not thrown. I checked their hash codes, and they are in fact different. Based on the code I created in GetHashCode, I tought these two point would have the same hashcode.
Can someone explain to me why this is not working as I tought it would? I'm not sure I explained this well, but.. I appreciate the help :D
George
You're implementing IEqualityComparer<T> within the type it's trying to compare - which is very odd. You should almost certainly just be implementing IEquatable<T> and overriding Equals(object) instead. That would definitely make your unit test work.
The difference between IEquatable<T> and IEqualityComparer<T> is that the former is implemented by a class to say, "I can compare myself with another instance of the same type." (It doesn't have to be the same type, but it usually is.) This is appropriate if there's a natural comparison - for example, the comparison chosen by string is ordinal equality - it's got to be exactly the same sequence of char values.
Now IEqualityComparer<T> is different - it can compare any two instances of a type. There can be multiple different implementations of this for a given type, so it doesn't matter whether or not a particular comparison is "the natural one" - it's just got to be the right one for your job. So for example, you could have a Shape class, and different equality comparers to compare shapes by colour, area or something like that.
You need to override Object.Equals as well.
Add this to your implementation:
// In MemorialPoint:
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
return false;
MemorialPoint y = obj as MemorialPoint;
if (this.PointName == y.PointName)
return true;
else if (this.PointName == y.PointName && this.PointLocation.X == y.PointLocation.X && this.PointLocation.Y == y.PointLocation.Y)
return true;
else
return false;
}
I'd then rework your other implementation to use the first, plus add the appropriate null checks.
public bool Equals(MemorialPoint x, MemorialPoint y)
{
if (x == null)
return (y == null);
return x.Equals(y);
}
You also need to rethink your concept of "equality", since it's not currently meeting .NET framework requirements.
If at all possible, I recommend a re-design with a Repository of memorial point objects (possibly keyed by name), so that simple reference equality can be used.
You've put an arcobjects tag on this, so I just thought I'd mention IRelationalOperator.Equals. I've never tested to see if this method honors the cluster tolerance of the geometries' spatial references. This can be adjusted using ISpatialReferenceTolerance.XYTolerance.

Categories