HashSet with complex equality - c#

Consider the following class
public class X
{
//Unique per set / never null
public ulong A { get; set; }
//Unique per set / never null
public string B { get; set; }
//Combination of C and D is Unique per set / both never null
public string C { get; set; }
public string D { get; set; }
public override bool Equals(object obj)
{
var x = (X)obj;
if (A == x.A || B==x.B)
return true;
if (C+D==x.C+x.D)
return true;
return false;
}
public override int GetHashCode()
{
return 0;
}
}
I can't think of writing a hash function in which the combination of comments over the properties above apply, just like in the Equals function, in that case is my best bet returning a 0 from the GetHashCode or am I missing something?

This is not possible. This is fundamental problem. In fact it is possible, but it is VERY hard problem to solve.
Explanation
Just think about it in reverse, in which cases your objects are NOT equal? From code I can see what they are equal by this expression:
return A == x.A || B==x.B || (C+D)==(x.C+x.D)
And not equal expression:
return A!=x.A && B!=x.B && (C+D)!=(x.C+x.D)
So your hash should be same for any particular value in equality expression and same for any particular value in not equality expression. Values can vary to infinity.
The only real possible solution for both expressions is constant value. But this solution is not optional in performance cause it will just evaporate every meaning of GetHashCode override.
Consider using IEqualityComperer interface, and equality alghorithms for task you are solving.
I think best solution to find equal objects is Indexing. You can see for example how databases are made, and how they use bit-indexing.
Why hashes is so cruel?
If it were possible, all databases in the world would easily hash everything in single hash table, and all problems with fast access will be solved.
For example, imagine your object not as object with properties but as entire object state (for example 32 boolean properties can be represented as integer).
Hash function calculates hash based on this state, but in your case you explicitely tell that some states from it's space is actually equal:
class X
{
bool A;
bool B;
}
Your space is:
A B
false false -> 0
false true -> 1
true false -> 2
true true -> 3
If you define equality like this:
bool Equal(X x) { return x.A == A || x.B == B; }
You basicaly define this state equality:
0 == 0
0 == 1
0 == 2
0 != 3
1 == 0
1 == 1
1 != 2
1 == 3
2 == 0
2 != 1
2 == 2
2 == 3
3 != 0
3 == 1
3 == 2
3 == 3
This sets should have same hash: {0,1,2} {0,1,3} {0,2,3} {1,2,3}
So, all your sets should be EQUAL in hash. This concludes that this is impossible to create Hash function better than constant value.

In this case, I would say that the hash code that defines an object as unique (i.e. overriding GetHashCode) shouldn't be the one used for your specific HashSet.
In other words, you should consider two instances of your class equal if their properties are all equal (not if any of the properties match). But then, if you want to group them by a certain criteria, use a specific implementation of IEqualityComparer<X>.
Also, strongly consider making the class immutable.
Apart from that, the only hash code I believe will really will work is constant. Anything trying to be smarter than that will fail:
// if any of the properties match, consider the class equal
public class AnyPropertyEqualityComparer : IEqualityComparer<X>
{
public bool Equals(X x, X y)
{
if (object.ReferenceEquals(x, y))
return true;
if (object.ReferenceEquals(y, null) ||
object.ReferenceEquals(x, null))
return false;
return (x.A == y.A ||
x.B == y.B ||
(x.C + x.D) == (y.C + y.D));
}
public int GetHashCode(X x)
{
return 42;
}
}
Since you will have to evaluate all properties in any case, a HashSet will not help much in this case and you might as well use a plain List<T> (in which case insertion of a list of items into a "hashset" will degrade to O(n*n).

You could consider creating an anonymous type and then returning the hashcode from that:
public override int GetHashCode()
{
// Check that an existing code hasn't already been returned
return new { A, B, C + D }.GetHashCode();
}
Make sure you create some automated tests to verify that objects with the same values return the same hashcode.
Bear in mind that once the hashcode is given out, you must continue to return that code and not a new one.

Related

Looking if List<T> has <T> (no matter of attribute orders in <T>) in C#

I have List<Moves> listOfMoves
ListOfMoves.Add(new Moves()
{
int position1= number1,
int position2= number2,
});
Now I want to check if ListOfMoves contains for example Move(2,3), but also to check if it contains Move(3,2).
I tried if(ListOfMoves.Contains(new Move(2,3))) but this does not work properly.
Method List<T>.Contains(T item) internally uses method Object.Equals to check if objects are equal. Therefore if you want to use method List<T>.Contains(T item) with your type T to check if the specified item is contained in the List<T> then you need to override method Object.Equals in your type T.
When you override Object.Equals you should also override Object.GetHashCode. Here is a good explanation "Why is it important to override GetHashCode when Equals method is overridden?".
Here is how you should override Object.Equals in the Move class to fit your requirement:
class Move
{
public Move(int p1, int p2)
{
position1 = p1;
position2 = p2;
}
public int position1 { get; }
public int position2 { get; }
public override bool Equals(object obj)
{
if (obj == null)
return false;
if (ReferenceEquals(this, obj))
return true;
Move other = obj as Move;
if (other == null)
return false;
// Here we specify how to compare two Moves. Here we implement your
// requirement that two moves are considered equal regardless of the
// order of the properties.
return (position1 == other.position1 && position2 == other.position2) ||
(position1 == other.position2 && position2 == other.position1);
}
public override int GetHashCode()
{
// When implementing GetHashCode we have to follow the next rules:
// 1. If two objects are equal then their hash codes must be equal too.
// 2. Hash code must not change during the lifetime of the object.
// Therefore Move must be immutable. (Thanks to Enigmativity's usefull tip).
return position1 + position2;
}
}
When you override Object.Equals you will be able to use condition ListOfMoves.Contains(new Move(2, 3)) to check if moves Move(2, 3) or Move(3, 2) are contained in the ListOfMoves.
Here is complete sample that demostrates overriding of Object.Equals.
For this you can use LINQ's Any function. If you want both combinations for the positions [ (2,3) or (3,2) ] you'll need two pass in two checks
ListOfMoves.Any(x =>
(x.position1 == 2 && x.position2 == 3)
|| (x.position1 == 3 && x.position2 == 2) )
Any returns a bool so you can wrap this line of code in an if statement or store the result for multiple uses
Potential improvement
If you're going to be doing a lot of these checks (and you're using at least c# version 7) you could consider some minor refactoring and use the built in tuples type: https://learn.microsoft.com/en-us/dotnet/csharp/tuples
Moves would become
public class Moves
{
public (int position1, int position2) positions { get; set; }
}
And the Any call would become
ListOfMoves.Any(x => x.positions == (2,3) || x.positions == (3,2))
Else where in the code you can still access the underlying value of each position as so:
ListOfMoves[0].positions.position1
Obviously depends on what else is going on in your code so totally up to you!
Obviously it won't work cause you can't compare the entity itself rather you will have to compare with property values like below using System.Linq
ListOfMoves.Where(x => x.position1 == 2 && x.position1 == 3)
Note: Your posted code shouldn't compile at all in first place
You said .. I need to get true if either Move(3,2) or (2,3) is in List
Then use Any() using the same predicate like
if(ListOfMoves.Any(x => x.position1 == 2 && x.position1 == 3))
{
// done something here
}

SortedSet with element duplication - can't remove element

I'm working on an implementation of the A-star algorithm in C# in Unity.
I need to evaluate a collection of Node :
class Node
{
public Cell cell;
public Node previous;
public int f;
public int h;
public Node(Cell cell, Node previous = null, int f = 0, int h = 0)
{
this.cell = cell;
this.previous = previous;
this.f = f;
this.h = h;
}
}
I have a SortedSet which allows me to store several Node, sorted by h property. Though, I need to be able to store two nodes with the same h property. So I've implemented a specific IComparer, in a way that allow me sorting by h property, and triggerring equality only when two nodes are representing the exact same cell.
class ByHCost : IComparer<Node>
{
public int Compare(Node n1, Node n2)
{
int result = n1.h.CompareTo(n2.h);
result = (result == 0) ? 1 : result;
result = (n1.cell == n2.cell) ? 0 : result;
return result;
}
}
My problem : I have a hard time to remove things from my SortedSet (I named it openSet).Here is an example:
At some point in the algorithm, I need to remove a node from the list based on some criteria (NB: I use isCell127 variable to focus my debug on an unique cell)
int removedNodesNb = openSet.RemoveWhere((Node n) => {
bool isSame = n.cell == candidateNode.cell;
bool hasWorseCost = n.f > candidateNode.f;
if(isCell127)
{
Debug.Log(isSame && hasWorseCost); // the predicate match exactly one time and debug.log return true
}
return isSame && hasWorseCost;
});
if(isCell127)
{
Debug.Log($"removed {removedNodesNb}"); // 0 nodes where removed
}
Here, the removeWhere method seems to find a match, but doesn't remove the node.
I tried another way :
Node worseNode = openSet.SingleOrDefault(n => {
bool isSame = n.cell == candidateNode.cell;
bool hasWorseCost = n.f > candidateNode.f;
return isSame && hasWorseCost;
});
if(isCell127)
{
Debug.Log($"does worseNode exists ? {worseNode != null}"); // Debug returns true, it does exist.
}
if(worseNode != null)
{
if(isCell127)
{
Debug.Log($"openSet length {openSet.Count}"); // 10
}
openSet.Remove(worseNode);
if(isCell127)
{
Debug.Log($"openSet length {openSet.Count}"); // 10 - It should have been 9.
}
}
I think the problem is related to my pretty unusual IComparer, but I can't figure whats exatcly the problem.
Also, I would like to know if there is a significative performance improvment about using an auto SortedSet instead of a manually sorted List, especially in the A-star algorithm use case.
If i write your test you do:
n1.h < n2.h
n1.cell = n2.cell -> final result = 0
n1.h > n2.h
n1.cell = n2.cell -> final result = 0
n1.h = n2.h
n1.cell != n2.cell -> final result = 1
n1.h < n2.h
n1.cell != n2.cell -> final result = -1
n1.h > n2.h
n1.cell != n2.cell -> final result = 1
when you have equality on h value (test number 3) you choose to have always the same result -> 1. so its no good you have to have another test on cell to clarify the position bacause there is a confusion with other test which gives the same result (test number 5)
So i could test with sample, but i am pretty sure you break the Sort.
So if you clarify the test, i suggest you to use Linq with a list...its best performance.
I'll answer my own topic because I've a pretty complete one.
Comparison
The comparison of the IComparer interface needs to follow some rules. Like #frenchy said, my own comparison was broken. Here are math fundamentals of a comparison I totally forgot (I found them here):
1) A.CompareTo(A) must return zero.
2) If A.CompareTo(B) returns zero, then B.CompareTo(A) must return zero.
3) If A.CompareTo(B) returns zero and B.CompareTo(C) returns zero, then A.CompareTo(C) must return zero.
4) If A.CompareTo(B) returns a value other than zero, then B.CompareTo(A) must return a value of the opposite sign.
5) If A.CompareTo(B) returns a value x not equal to zero, and B.CompareTo(C) returns a value y of the same sign as x, then A.CompareTo(C) must return a value of the same sign as x and y.
6) By definition, any object compares greater than (or follows) null, and two null references compare equal to each other.
In my case, rule 4) - symetry - was broken.
I needed to store multiple node with the same h property, but also to sort by that h property. So, I needed to avoid equality when h property are the same.
What I decided to do, instead of a default value when h comparison lead to 0 (which broke 4th rule), is refine the comparison in a way that never lead to 0 with a unique value foreach node instance. Well, this implementation is probably not the best, maybe there is something better to do for a unique value, but here is what I did.
private class Node
{
private static int globalIncrement = 0;
public Cell cell;
public Node previous;
public int f;
public int h;
public int uid;
public Node(Cell cell, Node previous = null, int f = 0, int h = 0)
{
Node.globalIncrement++;
this.cell = cell;
this.previous = previous;
this.f = f;
this.h = h;
this.uid = Node.globalIncrement;
}
}
private class ByHCost : IComparer<Node>
{
public int Compare(Node n1, Node n2)
{
if(n1.cell == n2.cell)
{
return 0;
}
int result = n1.h.CompareTo(n2.h);
result = (result == 0) ? n1.uid.CompareTo(n2.uid) : result; // Here is the additional comparison which never lead to 0. Depending on use case and number of object, it would be better to use another system of unique values.
return result;
}
}
RemoveWhere method
RemoveWhere use a predicate to look into the collection so I didn't think it cares about comparison. But RemoveWhere use internally Remove method, which do care about the comparison. So, even if the RemoveWhere have found one element, if your comparison is inconstent, it will silently pass its way. That's a pretty weird implementation, no ?

Create Unique Hashcode for the permutation of two Order Ids

I have a collection which is a permutation of two unique orders, where OrderId is unique. Thus it contains the Order1 (Id = 1) and Order2 (Id = 2) as both 12 and 21. Now while processing a routing algorithm, few conditions are checked and while a combination is included in the final result, then its reverse has to be ignored and needn't be considered for processing. Now since the Id is an integer, I have created a following logic:
private static int GetPairKey(int firstOrderId, int secondOrderId)
{
var orderCombinationType = (firstOrderId < secondOrderId)
? new {max = secondOrderId, min = firstOrderId}
: new { max = firstOrderId, min = secondOrderId };
return (orderCombinationType.min.GetHashCode() ^ orderCombinationType.max.GetHashCode());
}
In the logic, I create a Dictionary<int,int>, where key is created using the method GetPairKey shown above, where I ensure that out of given combination they are arranged correctly, so that I get the same Hashcode, which can be inserted and checked for an entry in a Dictionary, while its value is dummy and its ignored.
However above logic seems to have a flaw and it doesn't work as expected for all the logic processing, what am I doing wrong in this case, shall I try something different to create a Hashcode. Is something like following code a better choice, please suggest
Tuple.Create(minOrderId,maxOrderId).GetHashCode, following is relevant code usage:
foreach (var pair in localSavingPairs)
{
var firstOrder = pair.FirstOrder;
var secondOrder = pair.SecondOrder;
if (processedOrderDictionary.ContainsKey(GetPairKey(firstOrder.Id, secondOrder.Id))) continue;
Adding to the Dictionary, is the following code:
processedOrderDictionary.Add(GetPairKey(firstOrder.Id, secondOrder.Id), 0); here the value 0 is dummy and is not used
You need a value that can uniquely represent every possible value.
That is different to a hash-code.
You could uniquely represent each value with a long or with a class or struct that contains all of the appropriate values. Since after a certain total size using long won't work any more, let's look at the other approach, which is more flexible and more extensible:
public class KeyPair : IEquatable<KeyPair>
{
public int Min { get; private set; }
public int Max { get; private set; }
public KeyPair(int first, int second)
{
if (first < second)
{
Min = first;
Max = second;
}
else
{
Min = second;
Max = first;
}
}
public bool Equals(KeyPair other)
{
return other != null && other.Min == Min && other.Max == Max;
}
public override bool Equals(object other)
{
return Equals(other as KeyPair);
}
public override int GetHashCode()
{
return unchecked(Max * 31 + Min);
}
}
Now, the GetHashCode() here will not be unique, but the KeyPair itself will be. Ideally the hashcodes will be very different to each other to better distribute these objects, but doing much better than the above depends on information about the actual values that will be seen in practice.
The dictionary will use that to find the item, but it will also use Equals to pick between those where the hash code is the same.
(You can experiment with this by having a version for which GetHashCode() always just returns 0. It will have very poor performance because collisions hurt performance and this will always collide, but it will still work).
First, 42.GetHashCode() returns 42. Second, 1 ^ 2 is identical to 2 ^ 1, so there's really no point in sorting numbers. Third, your "hash" function is very weak and produces a lot of collisions, which is why you're observing the flaws.
There are two options I can think of right now:
Use a slightly "stronger" hash function
Replace your Dictionary<int, int> key with Dictionary<string, int> with keys being your two sorted numbers separated by whatever character you prever -- e.g. 56-6472
Given that XOR is commutative (so (a ^ b) will always be the same as (b ^ a)) it seems to me that your ordering is misguided... I'd just
(new {firstOrderId, secondOrderId}).GetHashCode()
.Net will fix you up a good well-distributed hashing implementation for anonymous types.

Array sorting by two parameters

I'm having a little difficulty with the array.sort. I have a class and this class has two fields, one is a random string the other one is a random number. If i want to sort it with one parameter it just works fine. But i would like to sort it with two parameters. The first one is the SUM of the numbers(from low to high), and THEN if these numbers are equal by the random string that is give to them(from low to high).
Can you give some hint and tips how may i can "merge" these two kinds of sort?
Array.Sort(Phonebook, delegate(PBook user1, PBook user2)
{ return user1.Sum().CompareTo(user2.Sum()); });
Console.WriteLine("ORDER");
foreach (PBook user in Phonebook)
{
Console.WriteLine(user.name);
}
That's how i order it with one parameter.
i think this is what you are after:
sourcearray.OrderBy(a=> a.sum).ThenBy(a => a.random)
Here is the general algorithm that you'll use for comparing multiple fields in a CompareTo method:
public int compare(MyClass first, MyClass second)
{
int firstComparison = first.FirstValue.CompareTo(second.SecondValue);
if (firstComparison != 0)
{
return firstComparison;
}
else
{
return first.SecondValue.CompareTo(second.SecondValue);
}
}
However, LINQ does make the syntax for doing this much easier, allowing you to only write:
Phonebook = Phonebook.OrderBy(book=> book.Sum())
.ThenBy(book => book.OtherProperty)
.ToArray();
You can do this in-place by using a custom IComparer<PBook>. The following should order your array as per your original code, but if two sums are equal it should fall back on the random string (which I've called RandomString):
public class PBookComparer : IComparer<PBook>
{
public int Compare(PBook x, PBook y)
{
// Sort null items to the top; you can drop this
// if you don't care about null items.
if (x == null)
return y == null ? 0 : -1;
else if (y == null)
return 1;
// Comparison of sums.
var sumCompare = x.Sum().CompareTo(y.Sum());
if (sumCompare != 0)
return sumCompare;
// Sums are the same; return comparison of strings
return String.Compare(x.RandomString, y.RandomString);
}
}
You call this as
Array.Sort(Phonebook, new PBookComparer());
You could just do this inline but it gets a bit hard to follow:
Array.Sort(Phonebook, (x, y) => {
int sc = x.Sum().CompareTo(y.Sum());
return sc != 0 ? sc : string.Compare(x.RandomString, y.RandomString); });
... Actually, that isn't too bad, although I have dropped the null checks.

IEqualityComparer for Value Objects

I have an immutable Value Object, IPathwayModule, whose value is defined by:
(int) Block;
(Entity) Module, identified by (string) ModuleId;
(enum) Status; and
(entity) Class, identified by (string) ClassId - which may be null.
Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
int hx = GetHashCode(x);
int hy = GetHashCode(y);
return hx == hy;
}
public int GetHashCode(IPathwayModule obj)
{
int h;
if (obj.Class != null)
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
}
else
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
}
return h;
}
}
IPathwayModule is definitely immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.
I suppose my questions are:
Am I using the interface correctly in this case?
Are there cases where I might not see the desired behaviour?
Is there any way to improve the robustness, performance?
Are there any good practices that I am not following?
Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:
return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...
Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.
return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...
You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...
The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.
GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
if (obj.class != null)
h += obj.Class.ClassId.GetHashCode();
return h;
}
}
The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.
You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.
There could be multiple objects with the same hash code (this is called 'hash code collisions').
Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.
You should implement better versions of Equals and GetHashCode.
For instance, the hash code of enums is simply their numerical value.
In other words, with these two enums:
public enum A { x, y, z }
public enum B { k, l, m }
Then with your implementation, the following value type:
public struct AB {
public A;
public B;
}
the following two values would be considered equal:
AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };
I'm assuming you don't want that.
Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.
Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
It is proabably better to use ^ rather than + to combine the hash codes.
If I understand you well, you'd like to hear some comments on your code. Here're my remarks:
GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
You say your value types are immutable, but you reference objects (.Class), which are not immutable
Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.
Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.
EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.
Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
if (x == y) return true;
if (x == null || y == null) return false;
if ((x.Class == null) ^ (y.Class == null)) return false;
if (x.Class == null) //and implicitly y.Class == null
{
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
}
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
}
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
if (obj.Class != null)
{
h ^= obj.Class.ClassId.GetHashCode();
}
return h;
}
}
}

Categories