General advice and guidelines on how to properly override object.GetHashCode() - c#

According to MSDN, a hash function must have the following properties:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
For the best performance, a hash function must generate a random distribution for all input.
I keep finding myself in the following scenario: I have created a class, implemented IEquatable<T> and overridden object.Equals(object). MSDN states that:
Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.
And then it usually stops up a bit for me. Because, how do you properly override object.GetHashCode()? Never really know where to start, and it seems to be a lot of pitfalls.
Here at StackOverflow, there are quite a few questions related to GetHashCode overriding, but most of them seems to be on quite particular cases and specific issues. So, therefore I would like to get a good compilation here. An overview with general advice and guidelines. What to do, what not to do, common pitfalls, where to start, etc.
I would like it to be especially directed at C#, but I would think it will work kind of the same way for other .NET languages as well(?).
I think maybe the best way is to create one answer per topic with a quick and short answer first (close to one-liner if at all possible), then maybe some more information and end with related questions, discussions, blog posts, etc., if there are any. I can then create one post as the accepted answer (to get it on top) with just a "table of contents". Try to keep it short and concise. And don't just link to other questions and blog posts. Try to take the essence of them and then rather link to source (especially since the source could disappear. Also, please try to edit and improve answers instead of created lots of very similar ones.
I am not a very good technical writer, but I will at least try to format answers so they look alike, create the table of contents, etc. I will also try to search up some of the related questions here at SO that answers parts of these and maybe pull out the essence of the ones I can manage. But since I am not very stable on this topic, I will try to stay away for the most part :p

Table of contents
When do I override object.GetHashCode?
Why do I have to override object.GetHashCode()?
What are those magic numbers seen in GetHashCode implementations?
Things that I would like to be covered, but haven't been yet:
How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways).
What fields to base the hash code upon.
If it should only be on immutable fields, what if there are only mutable ones?
How to generate a good random distribution. (MSDN Property #3)
Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly?
How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2)
Especially when the equality is based upon mutable fields. (MSDN Property #1)
How to deal with fields that are complex types (not among the built-in C# types).
Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc.
For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are.
How to deal with inherited classes.
Should you somehow incorporate base.GetHashCode() into your hash code?
Could you technically just be lazy and return 0? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P
Common pitfalls and gotchas.

What are those magic numbers often seen in GetHashCode implementations?
They are prime numbers. Prime numbers are used for creating hash codes because prime number maximize the usage of the hash code space.
Specifically, start with the small prime number 3, and consider only the low-order nybbles of the results:
3 * 1 = 3 = 3(mod 8) = 0011
3 * 2 = 6 = 6(mod 8) = 1010
3 * 3 = 9 = 1(mod 8) = 0001
3 * 4 = 12 = 4(mod 8) = 1000
3 * 5 = 15 = 7(mod 8) = 1111
3 * 6 = 18 = 2(mod 8) = 0010
3 * 7 = 21 = 5(mod 8) = 1001
3 * 8 = 24 = 0(mod 8) = 0000
3 * 9 = 27 = 3(mod 8) = 0011
And we start over. But you'll notice that successive multiples of our prime generated every possible permutation of bits in our nybble before starting to repeat. We can get the same effect with any prime number and any number of bits, which makes prime numbers optimal for generating near-random hash codes. The reason we usually see larger primes instead of small primes like 3 in the example above is that, for greater numbers of bits in our hash code, the results obtained from using a small prime are not even pseudo-random - they're simply an increasing sequence until an overflow is encountered. For optimal randomness, a prime number that results in overflow for fairly small coefficients should be used, unless you can guarantee that your coefficients will not be small.
Related links:
Why is ‘397’ used for ReSharper GetHashCode override?

Why do I have to override object.GetHashCode()?
Overriding this method is important because the following property must always remain true:
If two objects compare as equal, the GetHashCode method for each object must return the same value.
The reason, as stated by JaredPar in a blog post on implementing equality, is that
Many classes use the hash code to classify an object. In particular hash tables and dictionaries tend to place objects in buckets based on their hash code. When checking if an object is already in the hash table it will first look for it in a bucket. If two objects are equal but have different hash codes they may be put into different buckets and the dictionary would fail to lookup the object.
Related links:
Do I HAVE to override GetHashCode and Equals in new Classes?
Do I need to override GetHashCode() on reference types?
Override Equals and GetHashCode Question
Why is it important to override GetHashCode when Equals method is overriden in C#?
Properly Implementing Equality in VB

Check out Guidelines and rules for GetHashCode by Eric Lippert

You should override it whenever you have a meaningful measure of equality for objects of that type (i.e. you override Equals). If you knew the object wasn't going to be hashed for any reason you could leave it, but it's unlikely you could know this in advance.
The hash should be based only on the properties of the object that are used to define equality since two objects that are considered equal should have the same hash code. In general you would usually do something like:
public override int GetHashCode()
{
int mc = //magic constant, usually some prime
return mc * prop1.GetHashCode() * prop2.GetHashCode * ... * propN.GetHashCode();
}
I usually assume multiplying the values together will produce a fairly uniform distribution, assuming each property's hashcode function does the same, although this may well be wrong. Using this method, if the objects equality-defining properties change, then the hash code is also likely to change, which is acceptable given definition #2 in your question. It also deals with all types in a uniform way.
You could return the same value for all instances, although this will make any algorithms that use hashing (such as dictionarys) very slow - essentially all instances will be hashed to the same bucket and lookup will then become O(n) instead of the expected O(1). This of course negates any benefits of using such structures for lookup.

A) You must override both Equals and GetHashCode if you want to employ value equality instead of the default reference equality. With the later, two object references compare as equal if they both refer to the same object instance. With the former they compare as equal if their value is the same even if they refer to different objects. For example, you probably want to employ value equality for Date, Money, and Point objects.
B) In order to implement value equality you must override Equals and GetHashCode. Both should depend on the fields of the object that encapsulate the value. For example, Date.Year, Date.Month and Date.Day; or Money.Currency and Money.Amount; or Point.X, Point.Y and Point.Z. You should also consider overriding operator ==, operator !=, operator <, and operator >.
C) The hashcode doesn't have to stay constant all through the object lifetime. However it must remain immutable while it participates as the key in a hash. From MSDN doco for Dictionary: "As long as an object is used as a key in the Dictionary<(Of <(TKey, TValue>)>), it must not change in any way that affects its hash value." If you must change the value of a key remove the entry from the dictionary, change the key value, and replace the entry.
D) IMO, you will simplify your life if your value objects are themselves immutable.

When do I override object.GetHashCode()?
As MSDN states:
Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.
Related links:
When to override GetHashCode()?

How to generate GetHashCode() and Equals()
Visual Studio 2017
https://learn.microsoft.com/en-us/visualstudio/ide/reference/generate-equals-gethashcode-methods?view=vs-2017
ReSharper
https://www.jetbrains.com/help/resharper/Code_Generation__Equality_Members.html

What fields to base the hash code upon? If it should only be on immutable fields, what if there are only mutable ones?
It doesn't need to be based only on immutable fields. I would base it on the fields that determine the outcome of the equals method.

How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2) Especially when the equality is based upon mutable fields. (MSDN Property #1)
You seem to misunderstand Property #2. The hashcode doesn't need to stay the same thoughout the objects lifetime. It just needs to stay the same as long as the values that determine the outcome of the equals method are not changed. So logically, you base the hashcode on those values only. Then there shouldn't be a problem.

public override int GetHashCode()
{
return IntProp1 ^ IntProp2 ^ StrProp3.GetHashCode() ^ StrProp4.GetHashCode ^ CustomClassProp.GetHashCode;
}
Do the same in the customClass's GetHasCode method. Works like a charm.

Related

What to return when overriding Object.GetHashCode() in classes with no immutable fields?

Ok, before you get all mad because there are hundreds of similar sounding questions posted on the internet, I can assure you that I have just spent the last few hours reading all of them and have not found the answer to my question.
Background:
Basically, one of my large scale applications had been suffering from a situation where some Bindings on the ListBox.SelectedItem property would stop working or the program would crash after an edit had been made to the currently selected item. I initially asked the 'An item with the same key has already been added' Exception on selecting a ListBoxItem from code question here, but got no answers.
I hadn't had time to address that problem until this week, when I was given a number of days to sort it out. Now to cut a long story short, I found out the reason for the problem. It was because my data type classes had overridden the Equals method and therefore the GetHashCode method as well.
Now for those of you that are unaware of this issue, I discovered that you can only implement the GetHashCode method using immutable fields/properties. Using a excerpt from Harvey Kwok's answer to the Overriding GetHashCode() post to explain this:
The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary.
So the actual problem was caused because I had used mutable properties in the GetHashCode methods. When users changed these property values in the UI, the associated hash code values of the objects changed and then items could no longer be found in their collections.
Question:
So, my question is what is the best way of handling the situation where I need to implement the GetHashCode method in classes with no immutable fields? Sorry, let me be more specific, as that question has been asked before.
The answers in the Overriding GetHashCode() post suggest that in these situations, it is better to simply return a constant value... some suggest to return the value 1, while other suggest returning a prime number. Personally, I can't see any difference between these suggestions because I would have thought that there would only be one bucket used for either of them.
Furthermore, the Guidelines and rules for GetHashCode article in Eric Lippert's Blog has a section titled Guideline: the distribution of hash codes must be "random" which highlights the pitfalls of using an algorithm that results in not enough buckets being used. He warns of algorithms that decrease the number of buckets used and cause a performance problem when the bucket gets really big. Surely, returning a constant falls into this category.
I had an idea of adding an extra Guid field to all of my data type classes (just in C#, not the database) specifically to be used in and only in the GetHashCode method. So I suppose at the end of this long intro, my actual question is which implementation is better? To summarise:
Summary:
When overriding Object.GetHashCode() in classes with no immutable fields, is it better to return a constant from the GetHashCode method, or to create an additional readonly field for each class, solely to be used in the GetHashCode method? If I should add a new field, what type should it be and shouldn't I then include it in the Equals method?
While I am happy to receive answers from anyone, I am really hoping to receive answers from advanced developers with a sound knowledge on this subject.
Go back to basics. You read my article; read it again. The two ironclad rules that are relevant to your situation are:
if x equals y then the hash code of x must equal the hash code of y. Equivalently: if the hash code of x does not equal the hash code of y then x and y must be unequal.
the hash code of x must remain stable while x is in a hash table.
Those are requirements for correctness. If you can't guarantee those two simple things then your program will not be correct.
You propose two solutions.
Your first solution is that you always return a constant. That meets the requirement of both rules, but you are then reduced to linear searches in your hash table. You might as well use a list.
The other solution you propose is to somehow produce a hash code for each object and store it in the object. That is perfectly legal provided that equal items have equal hash codes. If you do that then you are restricted such that x equals y must be false if the hash codes differ. This seems to make value equality basically impossible. Since you wouldn't be overriding Equals in the first place if you wanted reference equality, this seems like a really bad idea, but it is legal provided that equals is consistent.
I propose a third solution, which is: never put your object in a hash table, because a hash table is the wrong data structure in the first place. The point of a hash table is to quickly answer the question "is this given value in this set of immutable values?" and you don't have a set of immutable values, so don't use a hash table. Use the right tool for the job. Use a list, and live with the pain of doing linear searches.
A fourth solution is: hash on the mutable fields used for equality, remove the object from all hash tables it is in just before every time you mutate it, and put it back in afterwards. This meets both requirements: the hash code agrees with equality, and hashes of objects in hash tables are stable, and you still get fast lookups.
I would either create an additional readonly field or else throw NotSupportedException. In my view the other option is meaningless. Let's see why.
Distinct (fixed) hash codes
Providing distinct hash codes is easy, e.g.:
class Sample
{
private static int counter;
private readonly int hashCode;
public Sample() { this.hashCode = counter++; }
public override int GetHashCode()
{
return this.hashCode;
}
public override bool Equals(object other)
{
return object.ReferenceEquals(this, other);
}
}
Technically you have to look out for creating too many objects and overflowing the counter here, but in practice I think that's not going to be an issue for anyone.
The problem with this approach is that instances will never compare equal. However, that's perfectly fine if you only want to use instances of Sample as indexes into a collection of some other type.
Constant hash codes
If there is any scenario in which distinct instances should compare equal then at first glance you have no other choice than returning a constant. But where does that leave you?
Locating an instance inside a container will always degenerate to the equivalent of a linear search. So in effect by returning a constant you allow the user to make a keyed container for your class, but that container will exhibit the performance characteristics of a LinkedList<T>. This might be obvious to someone familiar with your class, but personally I see it as letting people shoot themselves in the foot. If you know from beforehand that a Dictionary won't behave as one might expect, then why let the user create one? In my view, better to throw NotSupportedException.
But throwing is what you must not do!
Some people will disagree with the above, and when those people are smarter than oneself then one should pay attention. First of all, this code analysis warning states that GetHashCode should not throw. That's something to think about, but let's not be dogmatic. Sometimes you have to break the rules for a reason.
However, that is not all. In his blog post on the subject, Eric Lippert says that if you throw from inside GetHashCode then
your object cannot be a result in many LINQ-to-objects queries that use hash tables
internally for performance reasons.
Losing LINQ is certainly a bummer, but fortunately the road does not end here. Many (all?) LINQ methods that use hash tables have overloads that accept an IEqualityComparer<T> to be used when hashing. So you can in fact use LINQ, but it's going to be less convenient.
In the end you will have to weigh the options yourself. My opinion is that it's better to operate with a whitelist strategy (provide an IEqualityComparer<T> whenever needed) as long as it is technically feasible because that makes the code explicit: if someone tries to use the class naively they get an exception that helpfully tells them what's going on and the equality comparer is visible in the code wherever it is used, making the extraordinary behavior of the class immediately clear.
Where I want to override Equals, but there is no sensible immutable "key" for an object (and for whatever reason it doesn't make sense to make the whole object immutable), in my opinion there is only one "correct" choice:
Implement GetHashCode to hash the same fields as Equals uses. (This might be all the fields.)
Document that these fields must not be altered while in a dictionary.
Trust that users either don't put these objects in dictionaries, or obey the second rule.
(Returning a constant value compromises dictionary performance. Throwing an exception disallows too many useful cases where objects are cached but not modified. Any other implementation for GetHashCode would be wrong.)
Where this runs the user into trouble anyway, it's probably their fault. (Specifically: using a dictionary where they shouldn't, or using a model type in a context where they should be using a view-model type that uses reference equality instead.)
Or perhaps I shouldn't be overriding Equals in the first place.
If the classes truly contain nothing constant on which a hash value can be calculated then I would use something simpler than a GUID. Just use a random number persisted in the class (or in a wrapper class).
A simple approach is to store the hashCode in a private member and generate it on the first use. If your entity doesn't change often, and you're not going to be using two different objects that are Equal (where your Equals method returns true) as keys in your dictionary, then this should be fine:
private int? _hashCode;
public override int GetHashCode() {
if (!_hashCode.HasValue)
_hashCode = Property1.GetHashCode() ^ Property2.GetHashCode() etc... based on whatever you use in your equals method
return _hashCode.Value;
}
However, if you have, say, object a and object b, where a.Equals(b) == true, and you store an entry in your dictionary using a as the key (dictionary[a] = value).
If a does not change, then dictionary[b] will return value, however, if you change a after storing the entry in the dictionary, then dictionary[b] will most likely fail.
The only workaround to this is to rehash the dictionary when any of the keys change.

C# .NET GetHashCode function question

Hi I have a class with 6 string properties. A unique object will have different values for atleast one of these fields
To implement IEqualityComparer's GetHashCode function, I am concatenating all 6 properties and calling the GetHashCode on the resultant string.
I had the following doubts:
Is it necessary to call the GetHashcode on a unique value?
Will the concatenation operation on the six properties make the comparison slow?
Should I use some other approach?
If your string fields are named a-f and known not to be null, this is ReSharper's proposal for your GetHashCode()
public override int GetHashCode() {
unchecked {
int result=a.GetHashCode();
result=(result*397)^b.GetHashCode();
result=(result*397)^c.GetHashCode();
result=(result*397)^d.GetHashCode();
result=(result*397)^e.GetHashCode();
result=(result*397)^f.GetHashCode();
return result;
}
}
GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object).
This means that:
If two objects compare as equal with Equals, then their GetHashCode must return the same value.
If some of the 6 string properties are not strictly read-only, they cannot take part in the GetHashCode implementation.
If you cannot satisfy both points at the same time, you should re-evaluate your design because anything else will leave the door open for bugs.
Finally, you could probably make GetHashCode faster by calling GetHashCode on each of the 6 strings and then integrating all 6 results in one value using some bitwise operations.
GetHashCode() should return the same hash code for all objects that return true if you call Equals() on those objects. This means, for example, that you can return zero as the hash code regardless of what the field values are. But that would make your object very inefficient when stored in data structures such as hash tables.
Combining the strings is one option, but note that you could for example combine just two of the stringsfor the hash code (while still comparing all the strings in equals!).
You can also combine the hashes of the six separate strings, rather than computing a single hash for a combined string. See for example
Quick and Simple Hash Code Combinations
I'm not sure if this will be significantly faster than concatenating the string.
You can use the behavior from:
http://moh-abed.com/2011/07/13/entities-and-value-objects/

C#: Same Object have to Same HashCode?

Let's assume I have two objects called K and M
if(K.Equals(M))
{
}
If that's true, K and M always has the same HashCode ?
Or It depends on the programming language ?
The contract for GetHashCode() requires it, but since anyone can make their own implementation it is never guaranteed.
Many classes (especially hashtables) require it in order to behave correctly.
If you are implementing a class, you should always make sure that two equal objects have the same hashcode.
If you are implementing an utility method/class, you can assume that two equal objects have the same hashcode (if not, it is the other class, not yours, that is buggy).
If you are implementing something with security implications, you cannot assume it.
If that's true, K and M always has the same HashCode ?
Yes.
Or rather it should be the case. Consumers of hash codes (eg. containers) can assume that equal objects have equal hash codes, or rather unequal hash codes means the objects are unequal. (Unequal objects can have the same hash code: there are more possible objects than hash codes so this has to be allowed.)
Or It depends on the programming language ?
No
If that's true, K and M always has the same HashCode ?
Yes. Unless they have a wickedly overridden Equals method. But that would be considered broken.
But note that the reverse is not true,
if K and M have the same HashCode it could still be that K.Equals(M) == false
Yes, it should return the same hash code.
I'd say it's language independent. But there's no guaranty as if other programmes has implemented that correctly.
GetHashCode returns a value based on the current instance that is
suited for hashing algorithms and data structures such as a hash
table. Two objects that are the same type and are equal must return
the same hash code to ensure that instances of
System.Collections.HashTable and
System.Collections.Generic.Dictionary work correctly.
in your application the hashcode has to uniquely identify an instance of the object. this is part of to the .net platform, so, the hashcode value should work regardless of which .net language you are authoring in.
GetHashCode() could return the same hash for different objects. You should use Equals() to compare objects not GetHashCode(), in case when GetHashCode() return the same value - implementation of Equals() should consider another object equality checks.
Hash data structure able to handle such cases by using collision resolution algotithms.
From wikipedia:
Hash collisions are practically unavoidable when hashing a random
subset of a large set of possible keys. For example, if 2,500 keys are
hashed into a million buckets, even with a perfectly uniform random
distribution, according to the birthday problem there is a 95% chance
of at least two of the keys being hashed to the same slot.
Therefore, most hash table implementations have some collision
resolution strategy to handle such events. Some common strategies are
described below. All these methods require that the keys (or pointers
to them) be stored in the table, together with the associated values.
It depends on the Equals implementation of the object. It may use GetHashCode under the hood, but it doesn´t have too. So basically if you have an object with a custom Equals implementation the HashCode may be different for both objects.

If I never ever use HashSet, should I still implement GetHashCode?

I never need to store objects in a hash table. The reason is twofold:
coming up with a good hash function is difficult and error prone.
an AVL tree is almost always fast enough, and it merely requires a strict order predicate, which is much easier to implement.
The Equals() operation on the other hand is a very frequently used function.
Therefore I wonder whether it is necessary to implement GetHashCode (which I never need) when implementing the Equals function (which I often need)?
my advise - if you don't want to use it, override it and throw new NotImplementedException(); so that you will see where did you need it.
I think you're quite wrong if you believe that implementing a strict order predicate is much easier to implement than a hash function - it needs to handle a large number of edge cases (null values, class hierarchies). And hash functions aren't that difficult, really.
An AVL tree will be much slower than a hashtable. If you are dealing with only a few items then it will not be much of an issue. Hashtables have O(1) inserts, deletes, and searches, but an AVL tree has O(log(n)) operations.
I would go ahead and override GetHashCode and Equals for two reasons.
It really is not that difficult to get a decent distribution by using a trivial XOR implementation.1
If your classes are part of a public API then someone else might want to store them in a hashtable.
Also, I have to question the choice of BST. AVL trees are bit out of style of these days. There are other more modern BSTs that are easier to implement and work just as well (sometimes better). If you really do need a data structure that maintains ordering then consider these alternatives.
Red Black Tree - Already implemented by SortedDictionary.
Skip List
Scapegoat Tree
Splay Tree
1The XOR strategy has a subtle associativity problem that can cause collisions in some cases since a^b = b^a. There is a solution from Effective Java that has achieved cult-like recognition which is fairly simple to implement as well.
If you use Dictionary or SortedList, and override Equals, you need to have a hash function, else they will break. Equals is also used all over the place in the BCL, and if anyone else uses your objects they will expect GetHashCode to behave sensibly.
Note that a hash function doesn't have to be that complicated. A basic version is to take the hash of whatever member variables you're using for equality, multiply each one with a separate coprime number, and XOR them together.
You don't need to implement it. If you write your own Equals() method I'd recommend to use some GetHashCode implementation that doesn't break HashSet though. You could for instance return a static value (typically 42). HashSet performance will degrade dramatically, but at least it will still work - you'll never know who'll use/edit/maintain your code in the future. (edit: you may want to log a warning if such a class is used in a hashed structure in order to early spot performance problems)
EDIT: don't only use XOR to combine hash codes of your properties
It has already been said by others that you may simply combine hash codes of all your properties. Instead of only using XOR I'd encourage multiplying results though. XOR may result in a 0 value if both values are equal (e.g. 0xA ^ 0xA == 0x0). This may be easily improved using 0xA * 0xA, 0xA * 31 + 0xA or 0xA ^ (0xA * 31).
Still, the intent of my answer is that any hash function is better than one that isn't consistent with equals - even if it only returns a static value. Simply select any subset of properties (from none to all) you use for equality and throw the results together. While selecting properties for hash code, prefer those small subsets which combinations are pretty unique (e.g. firstname, lastname, birthday - no need to add the whole address)
Coming up with an adequate hash function is not difficult. Most often, a simple XOR of the results from GetHashCode() of all fields is sufficient.
If you override equals you should override GetHashCode() from MSDN: "It is recommended that any class that overrides Equals also override System.Object.GetHashCode." http://msdn.microsoft.com/en-us/library/ms173147.aspx
The two functions should match in the sense that if two objects are equal they should have the same hash value. That doesn't mean that if two objects have the same hash they should be equal. You don't need an overly complex hash algorithm but it should attempt to distribute well across the integer space.

C# How to select a Hashcode for a class that violates the Equals contract?

I've got multiple classes that, for certain reasons, do not follow the official Equals contract. In the overwritten GetHashCode() these classes simply return 0 so they can be used in a Hashmap.
Some of these classes implement the same interface and there are Hashmaps using this interface as key. So I figured that every class should at least return a different (but still constant) value in GetHashCode().
The question is how to select this value. Should I simply let the first class return 1, the next class 2 and so on? Or should I try something like
class SomeClass : SomeInterface {
public overwrite int GetHashCode() {
return "SomeClass".GetHashCode();
}
}
so the hash is distributed more evenly? (Do I have to cache the returned value myself or is Microsoft's compiler able to optimize this?)
Update: It is not possible to return an individual hashcode for each object, because Equals violates the contract. Specifially, I'm refering to this problem.
If it "violates the Equals contract", then I'm not sure you should be using it as a key.
It something is using that as a key, you really need to get the hashing right... it is very unclear what the Equals logic is, but two values that are considered equal must have the same hash-code. It is not required that two values with the same hash-code are equal.
Using a constant string won't really help much - you'll get the values split evenly over the types, but that is about it...
I'm curious what the reasoning would be for overriding GetHashCode() and returning a constant value. Why violate the idea of a hash rather than just violating the "contract" and not overriding the GetHashCode() function at all and leave the default implementation from Object?
Edit
If what you've done is that so you can have your objects match based on their contents rather than their reference then what you propose with having different classes simply use different constants can WORK, but is highly inefficient. What you want to do is come up with a hashing algorithm that can take the contents of your class and produce a value that balances speed with even distribution (that's hashing 101).
I guess I'm not sure what you're looking for...there isn't a "good" scheme for choosing constant numbers for this paradigm. One is not any better than the other. Try to improve your objects so that you're creating a real hash.
I ran into this exact problem when writing a vector class. I wanted to compare vectors for equality, but float operations give rounding errors, so I wanted approximate equality. Long story short, overriding equals is a bad idea unless your implementation is symmetric, reflexive, and transitive.
Other classes are going to assume equals has those properties, and so will classes using those classes, and so you can end up in weird cases. For example a list might enforce uniqueness, but end up with two elements which evaluate as equal to some element B.
A hash table is the perfect example of unpredictable behavior when you break equality. For example:
//Assume a == b, b == c, but a != c
var T = new Dictionary<YourType, int>()
T[a] = 0
T[c] = 1
return T[b] //0 or 1? who knows!
Another example would be a Set:
//Assume a == b, b == c, but a != c
var T = new HashSet<YourType>()
T.Add(a)
T.Add(c)
if (T.contains(b)) then T.remove(b)
//surely T can't contain b anymore! I sure hope no one breaks the properties of equality!
if (T.contains(b)) then throw new Exception()
I suggest using another method, with a name like ApproxEquals. You might also consider overriding the == operator, because it isn't virtual and therefore won't be used accidentally by other classes like Equals could be.
If you really can't use reference equality for the hash table, don't ruin the performance of cases where you can. Add an IApproxEquals interface, implement it in your class, and add an extension method GetApprox to Dictionary which enumerates the keys looking for an approximately equal one, and returns the associated value. You could also write a custom dictionary especially for 3-dimensional vectors, or whatever you need.
When hash collisions occur, the HashTable/Dictionary calls Equals to find the key you're looking for. Using a constant hash code removes the speed advantages of using a hash in the first place - it becomes a linear search.
You're saying the Equals method hasn't been implemented according to the contract. What exactly do you mean with this? Depending on the kind of violation, the HashTable or Dictionary will merely be slow (linear search) or not work at all.

Categories