How to guarantee equal hash codes if all properties are "Equal"? - c#

Is my current solution, (A, B, C, D, ...).GetHashCode(), guaranteed to always be the same for tuples with "Equal" items?
public class Pair
{
public int X { get; set; }
public int Y { get; set; }
public Pair(int x, int y)
{
X = x;
Y = y;
}
public override bool Equals(object other) => Equals(other as Pair);
public virtual bool Equals(Pair other)
{
if (other is null)
{
return false;
}
if (object.ReferenceEquals(this, other))
{
return true;
}
if (this.GetType() != other.GetType())
{
return false;
}
return X == other.X && Y == other.Y;
}
public override int GetHashCode() => (X, Y).GetHashCode();
public static bool operator ==(Pair lhs, Pair rhs)
{
if (lhs is null)
{
if (rhs is null)
{
return true;
}
return false;
}
return lhs.Equals(rhs);
}
public static bool operator !=(Pair lhs, Pair rhs) => !(lhs == rhs);
}
In this code always guaranteed to print 1:
var uniquePairs = new HashSet<Pair>();
uniquePairs.Add(new Pair(2, 4));
uniquePairs.Add(new Pair(2, 4));
uniquePairs.Add(new Pair(2, 4));
uniquePairs.Add(new Pair(2, 4));
Console.WriteLine(uniquePairs.Count);
What about for a greater number of non-trivial type properties?
What are reliable GetHashCode solutions that can be used for classes like these, which guarantee equal hashodes if all (not-necessarily-int) members are the same?

People usually use some arithmetic with factors derived from the values of fields to provide a good pseudo-random distribution but will compute to the same thing if all fields are equal. Have a look at this:
General advice and guidelines on how to properly override object.GetHashCode()
Also, look at the Microsoft documentation if you want more information on the subject.
If your goal is simply to have classes whose equality is determined by fields matching rather than by reference, C# has a new record reference type you can use which does this by default. If you're using the latest version of C#/.NET this would be the way to go.
If you get into anything really complicated that has to be secure, consider looking into using some robust hash algorithms like SHA-256 ... take all your fields and turn them into a padded buffer or bytes and run them through SHA-256 (all this is found in System.Security.Cryptography). You'll take the SHA-256 output and select 4 bytes of it to produce a 32-bit integer. Collision is very, very unlikely (but of course it's still possible with only 32-bits).

Related

Efficient way of retrieving an item from a collection with a mutable key

I have a collection of items Foos that have a property FooPosition, and I need to quickly access Foos by their positions.
For example : retrieve a Foo which is located at X=0 and Y=1.
My first thought was to use a dictionary for that purpose nd to use the FooPosition as dictionary key. I know that every FooPosition in my collection is unique, I don't mind throwing an Exception if it is not the case.
This works well as long as Foos do not move all over the place.
But, as I figured out the hard way, and understood thanks to this and this posts, this does not work anymore if the FooPosition is updated. I shouldn't use mutable keys in a dictionary : the dictionary keeps the FooPosition HashCode in memory but does not update it when the underlying FooPosition is modified. Therefore, calling dic[Position(0,1)] gives me the Foo which was at this position when the dictionary was built.
So, I am now wondering what should I use to retrieve Foos by their positions efficiently.
By efficiently I mean not going all across the whole collection every time I query for a Foo by its position. Is there a suitable structure which would accomodate mutable keys?
Thanks for your help
EDIT
As mentioned rightfully in comments, there is a missing part in my question : I have no control over Foo Moves. The software is actually connected to another software (Excel via VSTO) via a COM Protocol which changes the FooPosition (Excel Ranges) without notifying the change.
Therefore, I cannot take take any action in case a move happens because I don't know that a change did happen.
public class FooManager
{
public void DoSomething(IList<Foo> foos) {
Dictionary<FooPosition, Foo> fooPositionDictionary = foos.ToDictionary(x => x.Position, x => x); //I know that position is unique
//Move Foos all around the place by changing their positions.
FooPosition queryPosition = new FooPosition(0, 1);
fooPositionDictionary.TryGetValue(queryPosition, out var foo1); //DOES NOT WORK
var foo2 = foos.FirstOrDefault(x => x.Position == queryPosition); //NOT EFFICIENT
//Any better idea?
}
}
public class Foo
{
public string Name { get; set; }
public FooPosition Position { get; set; }
}
public class FooPosition : IEquatable<FooPosition>
{
public int X { get; set; }
public int Y { get; set; }
public FooPosition(int x, int y)
{
X = x;
Y = y;
}
public void MoveBy(int i)
{
X = X + i;
Y = Y + i;
}
public bool Equals(FooPosition other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return X == other.X && Y == other.Y;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((FooPosition) obj);
}
public override int GetHashCode()
{
unchecked
{
return (X * 397) ^ Y;
}
}
public static bool operator ==(FooPosition left, FooPosition right)
{
return Equals(left, right);
}
public static bool operator !=(FooPosition left, FooPosition right)
{
return !Equals(left, right);
}
}
In some sense a dictionary - as any other hash-based data-storage - uses some kind of caching. In this case the hashes are cached. However as for every cache you need some constant data that does not change during the lifetime of that data-storage. If there is no such constant data, there´s no way to efficiently cache that data.
So you end up to store all items in some linear collection - e.g. a List<T>- and iterate that list again and again.

Dictionary with class as Key

I am studying electronic engineering, and I am a beginner in C#. I have measured data and I would like to store it in a 2 dimensional way. I thought I could make a Dictionary like this:
Dictionary<Key, string> dic = new Dictionary<Key, string>();
"Key" here is my a own class with two int variables. Now I want to store the data in this Dictionary but it doesn't work so far. If I want to read the data with the special Key, the error report says, that the Key is not available in the Dictionary.
Here is the class Key:
public partial class Key
{
public Key(int Bahn, int Zeile) {
myBahn = Bahn;
myZeile = Zeile;
}
public int getBahn()
{
return myBahn;
}
public int getZeile()
{
return myZeile;
}
private int myBahn;
private int myZeile;
}
for testing it I made something like this:
Getting elements in:
Key KE = new Key(1,1);
dic.Add(KE, "hans");
...
Getting elements out:
Key KE = new Key(1,1);
monitor.Text = dic[KE];
Has someone an idea?
You need to override methods GetHashCode and Equals in your own class to use it as a key.
class Foo
{
public string Name { get; set;}
public int FooID {get; set;}
public override int GetHashCode()
{
return FooID;
}
public override bool Equals(object obj)
{
return Equals(obj as Foo);
}
public bool Equals(Foo obj)
{
return obj != null && obj.FooID == this.FooID;
}
}
Though you could use a class as key by implementing your own Equals and GetHashCode, I would not advise to do it if you're not yet familiar with C#.
These methods will be invoked by C# internal libraries, which expect them to work exactly as per specification, handling all edge cases gracefully. If you put a bug in them, you might be in for an unpleasant head scratching session.
In my opinion, it would be no less efficient and way simpler to create a key on the spot using tried, true and tested existing types that already support hashing and comparison.
From your angular coordinates, e.g.:
int Bahn = 15;
int Zeile = 30;
You could use a string (e.g. "15,30"):
String Key (int Bahn, int Zeile) { return $"{Bahn},{Zeile}"; }
var myDict = new Dictionary<string, string>();
myDict.Add (Key(Bahn,Zeile), myString);
or a two elements tuple (e.g. <15,30>) if you need something more efficient:
Tuple<int,int> Key (int Bahn, int Zeile) { return Tuple.Create(Bahn,Zeile); }
var myDict = new Dictionary<Tuple<int, int>, string>();
myDict.Add (Key(Bahn,Zeile), myString);
or a mere combination of your two angles if the range is small enough to fit into an int (e.g. 15+30*360) if you need something even more efficient:
int Key (int Bahn, int Zeile) { return Bahn+360*Zeile; }
var myDict = new Dictionary<int, string>();
myDict.Add (Key(Bahn,Zeile), myString);
That seems a lot less cumbersome than:
class Key {
// truckloads of code to implement the class,
// not to mention testing it thourougly, including edge cases
}
var myDict = new Dictionary<Key, string>();
myDict.Add (new Key(Bahn,Zeile), myString);
Mutability of the key
Also, note that your keys must be immutable as long as they are used to index an entry.
If you change the value of Bahn or Ziel after the key has been used to add an element, you will mess up your dictionary something bad.
The behaviour is undefined, but you will most likely lose random entries, cause memory leaks and possibly crash with an exception if the internal libraries end up detecting an inconsistent state (like several entries indexed by the same key).
For instance:
var myKey = new Key(15, 30);
for (String data in row_of_data_sampled_every_10_degrees)
{
myDict.Add (myKey, data); // myKey must remain constant until the entry is removed
myKey.Bahn += 10; // changing it now spells the death of your dictionary
}
A side note on hashing angular coordinates
Now the catch is, the generic hashing functions provided for ints, strings and tuples are unlikely to produce optimal results for your specific set of data.
I would advise to start with a simple solution and only resort to specialized code if you run into actual performance issues. In which case you would probably be better off using a data structure more suited to spatial indexing (typically a quadtree in case of polar coordinates, or an octree if you want to reconstruct a 3D model from your scanner data).
For the sake of an alternate opinion and not to be disparaging of Mr. Kuroi's solution (which is a good one) here is a simple class that can be used as a key in a map as well as other uses. We use a complex class as a key because we want to know track other things. In this example, we arrive at a vertex in a graph and we want to know if we have visited it before.
<code>
public class Vertex : IComparable<Vertex>, IEquatable<Vertex>, IComparable
{
String m_strVertexName = String.Empty;
bool m_bHasVisited = false;
public Vertex()
{
}
public Vertex(String strVertexName) : this()
{
m_strVertexName = strVertexName;
}
public override string ToString()
{
return m_strVertexName;
}
public string VertexName
{
get { return m_strVertexName; }
set
{
if (!String.IsNullOrEmpty(value))
m_strVertexName = value;
}
}
public bool HasVisited
{
get { return m_bHasVisited; }
set { m_bHasVisited = value; }
}
public override int GetHashCode()
{
return ToString().GetHashCode();
}
public int CompareTo(Vertex rhs)
{
if (Equals(rhs))
return 0;
return ToString().CompareTo(rhs.ToString());
}
int IComparable.CompareTo(object rhs)
{
if (!(rhs is Vertex))
throw new InvalidOperationException("CompareTo: Not a Vertex");
return CompareTo((Vertex)rhs);
}
public static bool operator < (Vertex lhs, Vertex rhs) => lhs.CompareTo(rhs) < 0;
public static bool operator > (Vertex lhs, Vertex rhs) => lhs.CompareTo(rhs) > 0;
public bool Equals (Vertex rhs) => ToString() == rhs.ToString();
public override bool Equals(object rhs)
{
if (!(rhs is Vertex))
return false;
return Equals((Vertex)rhs);
}
public static bool operator == (Vertex lhs, Vertex rhs) => lhs.Equals(rhs);
public static bool operator != (Vertex lhs, Vertex rhs) => !(lhs == rhs);
}
</code>

Overriding GetHash() And Equals()

I am having trouble overriding the GetHashCode() method and the Equals() method.
public class Coordinate
{
int x;
int y;
public Coordinate(int p,int q)
{
this.x = p ;
this.y = q;
}
}
Suppose I created two Coordinate point objects with same x and y coordinates .
I want my program to understand that they are equal.
Coordinate Point 1 = new Coordinate(0,0);
Coordinate Point 2 = new Coordinate(0,0);
By default they are giving different GetHashCode() as expected.
I want them to give same hash code by overriding it and then use that hash code as a Key to generate values from a Dictionary. After searching about it, I know that I also have to override Equals().
You have to override Equals(), because if two objects have the same hashcode, it doesn't mean they are to be considered equal. The hashcode simply acts as an "index" to speed up searches.
Every time you use new, an instance is created and it will not be the same instance as another instance. This is what ReferenceEquals() checks - imagine two identical bottles of soda - they're the same, but they're not the same bottle.
Equals() is meant to check whether you (the developer) want to consider two instances as equal, even though they are not the same instance.
You can implement something in this vein:
public override bool Equal(Object o) {
if (object.ReferenceEquals(o, this))
return true;
Coordinate other = o as Coordinate;
else if (null == other)
return false;
return x == other.x && y == other.y;
}
public override int GetHashCode() {
return x.GetHashCode() ^ y.GetHashCode();
}
where Equals return true if and only if instances are equal while GetHashCode() does a quick estimation (instances are not equal if they have different hash code, the reverse, however, is not true) and ensures uniform distribution of the hashes as far as it's possible (so that in Dictionary and alike structures we have roughly equally number of values per each key)
https://msdn.microsoft.com/en-us/library/336aedhh(v=vs.100).aspx
https://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx
I would override the named methods like this. For the GetHashCode method I took one of several options from this question, but you can choose another if you like.
I also changed the class to immutable. You should only use immutable properties/fields to calculate the hashcode.
public class Coordinate {
public Coordinate(int p, int q) {
x = p;
y = q;
}
private readonly int x;
private readonly int y;
public int X { get { return x; } }
public int Y { get { return y; } }
public override int GetHashCode() {
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ x.GetHashCode();
hash = (hash * 16777619) ^ y.GetHashCode();
return hash;
}
}
public override bool Equals(object obj) {
if (obj == null)
return false;
var otherCoordinate = obj as Coordinate;
if (otherCoordinate == null)
return false;
return
this.X == otherCoordinate.X &&
this.Y == otherCoordinate.Y;
}
}
Here's a simple way you to do it.
First, override the ToString() method of your class to something like this:
public override string ToString()
{
return string.Format("[{0}, {1}]", this.x, this.y);
}
Now you can easily override GetHashCode() and Equals() like this:
public override int GetHashCode()
{
return this.ToString().GetHashCode();
}
public override bool Equals(object obj)
{
return obj.ToString() == this.ToString();
}
Now if you try this:
Coordinate p1 = new Coordinate(5, 0);
Coordinate p2 = new Coordinate(5, 0);
Console.WriteLine(p1.Equals(p2));
you'll get:
True
What you try to do typically happens when you have immutable objects and such, anyway if you don't want to use a struct, you can do it like this :
public class Coord : IEquatable<Coord>
{
public Coord(int x, int y)
{
this.X = x;
this.Y = y;
}
public int X { get; }
public int Y { get; }
public override int GetHashCode()
{
object.Equals("a", "b");
// Just pick numbers that are prime between them
int hash = 17;
hash = hash * 23 + this.X.GetHashCode();
hash = hash * 23 + this.Y.GetHashCode();
return hash;
}
public override bool Equals(object obj)
{
var casted = obj as Coord;
if (object.ReferenceEquals(this, casted))
{
return true;
}
return this.Equals(casted);
}
public static bool operator !=(Coord first, Coord second)
{
return !(first == second);
}
public static bool operator ==(Coord first, Coord second)
{
if (object.ReferenceEquals(second, null))
{
if (object.ReferenceEquals(first, null))
{
return true;
}
return false;
}
return first.Equals(second);
}
public bool Equals(Coord other)
{
if (object.ReferenceEquals(other, null))
{
return false;
}
return object.ReferenceEquals(this, other) || (this.X.Equals(other.X) && this.Y.Equals(other.Y));
}
}
Note . You really should make your class immutable if you do custom equality since it could break your code if you use a hash based collection.
I think it is considered good practice to do all those overloads when you want custom equality checking like you do. Especially since when object.GetHashCode() returns the same value for two object, Dictionary and other hash based collections use the default equality operator which uses object.Equals.
Object.ReferenceEquals(Ob,Ob) determine reference equality, a.k.a if both reference point to the same allocated value, two references being equal ensure you it's the exact same object.
Object.Equals(Ob) is the virtual method in object class, by default it compares references just like Object.ReferenceEquals(Ob,Ob)
Object.Equals(Ob,Ob) calls the Ob.Equals(Ob), so yeah just a static shorthand checking for null beforehand IIRC.

Is it good to create wrapper struct for byte to represent char and small int?

In C# both integers and characters can be converted to byte:
byte b1 = (byte) 50;
byte b2 = (byte) '2';
However after conversion the information of the original type is lost. In the above example, b1 and b2 have the same value (50). By only looking at the value of b1 and b2, we have no way to tell if it used to represent an integer or a character.
You might be wondering why I care. I have a memory intensive application which maintains a table-like structure in memory. Different columns store value of different datatypes. Values are generally stored in large arrays within a column. I want to enjoy the memory advantage of byte since char in c# occupies two bytes and int occupies four (I know the value range is not an issue). However I need to know if the byte should be interpreted as a number or a char when I render the data to screen. Again in the above example, b2 should be displayed as "2" while b1 should be rendered as "50".
Here is the solution I came across:
public struct ByteChar
{
public readonly byte Value;
public ByteChar(char v)
{
Value = (byte) v;
}
/*
* Define two implicit conversion operators so that ByteChar can be used seamlessly with char
* ByteChar c = ‘1’; //char to byte char
* char a = c; //ByteChar to char
*/
public static implicit operator char(ByteChar v)
{
return (char)v.Value;
}
public static implicit operator ByteChar(char v)
{
return new ByteChar(v);
}
/*
* When converting to string, treat the value as a char instead of a number
*/
public override string ToString()
{
return ((char)Value).ToString();
}
public override bool Equals(Object obj)
{
return obj is ByteChar && this == (ByteChar)obj;
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
public static bool operator == (ByteChar x, ByteChar y)
{
return x.Value == y.Value;
}
public static bool operator != (ByteChar x, ByteChar y)
{
return x.Value != y.Value;
}
}
Similarly I introduced ByteInt:
public struct ByteInt
{
public readonly byte Value;
public ByteInt(int v)
{
Value = (byte) v;
}
/*
* Define two implicit conversion operators so that ByteInt can be used seamlessly with int
* ByteInt i = 1; //int to ByteInt
* int j = i; //ByteInt to int
*/
public static implicit operator int(ByteInt v)
{
return (int)v.Value;
}
public static implicit operator ByteInt(int v)
{
return new ByteInt(v);
}
public override string ToString()
{
return Value.ToString();
}
public override bool Equals(Object obj)
{
return obj is ByteInt && this == (ByteInt)obj;
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
public static bool operator == (ByteInt x, ByteInt y)
{
return x.Value == y.Value;
}
public static bool operator != (ByteInt x, ByteInt y)
{
return x.Value != y.Value;
}
}
Now I think this good because not only us developer can tell what is being stored as byte, but also the compiler and the runtime: DoSomething (byte b), DoSomething (ByteChar b), DoSomething (ByteInt b) will be three different signatures.
My question is: is this a good way to solve my problem? Is there any bad thing about the above implementation? Have I missed any detail in the above implementation that could lead to potential pitfalls?
It is unlikely that you will save any significant amount of memory with this approach.
It is very likely that you will see reduced performance and additional complexity.
See: How are byte variables stored in memory?

I'm implementing a CaseAccentInsensitiveEqualityComparer for Strings. I'm not sure how to implement the GetHashCode

My code is like this:
public class CaseAccentInsensitiveEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Compare(x, y, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0;
}
public int GetHashCode(string obj)
{
// not sure what to put here
}
}
I know the role of GetHashCode in this context, what I'm missing is how to produce the InvariantCulture, IgnoreNonSpace and IgnoreCase version of obj so that I can return it's HashCode.
I could remove diacritics and the case from obj myself and then return it's hashcode, but I wonder if there's a better alternative.
Returning 0 inside GetHashCode() works (as pointed out by #Michael Perrenoud) because Dictionaries and HashMaps call Equals() just if GetHashCode() for two objects return the same values.
The rule is, GetHashCode() must return the same value if objects are equal.
The drawback is that the HashSet (or Dictionary) performance decreases to the point it becomes the same as using a List. To find an item it has to call Equals() for each comparison.
A faster approach would be converting to Accent Insensitive string and getting its hashcode.
Code to remove accent (diacritics) from this post
static string RemoveDiacritics(string text)
{
return string.Concat(
text.Normalize(NormalizationForm.FormD)
.Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch) !=
UnicodeCategory.NonSpacingMark)
).Normalize(NormalizationForm.FormC);
}
Comparer code:
public class CaseAccentInsensitiveEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Compare(x, y, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0;
}
public int GetHashCode(string obj)
{
return obj != null ? RemoveDiacritics(obj).ToUpperInvariant().GetHashCode() : 0;
}
private string RemoveDiacritics(string text)
{
return string.Concat(
text.Normalize(NormalizationForm.FormD)
.Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch) !=
UnicodeCategory.NonSpacingMark)
).Normalize(NormalizationForm.FormC);
}
}
Ah, excuse me, I had my methods mixed up. When I implemented something like this before I just returned the hash code of the object itself return obj.GetHashCode(); so that it would always enter the Equals method.
Okay, after much confusion I believe I've got myself straight. I found that returning zero, always, will force the comparer to use the Equals method. I'm looking for the code I implemented this in to prove that and put it up here.
Here's the code to prove it.
class MyArrayComparer : EqualityComparer<object[]>
{
public override bool Equals(object[] x, object[] y)
{
if (x.Length != y.Length) { return false; }
for (int i = 0; i < x.Length; i++)
{
if (!x[i].Equals(y[i]))
{
return false;
}
}
return true;
}
public override int GetHashCode(object[] obj)
{
return 0;
}
}

Categories