Why does C# not implement GetHashCode for Collections? - c#

I am porting something from Java to C#. In Java the hashcode of a ArrayList depends on the items in it. In C# I always get the same hashcode from a List...
Why is this?
For some of my objects the hashcode needs to be different because the objects in their list property make the objects non-equal. I would expect that a hashcode is always unique for the object's state and only equals another hashcode when the object is equal. Am I wrong?

In order to work correctly, hashcodes must be immutable – an object's hash code must never change.
If an object's hashcode does change, any dictionaries containing the object will stop working.
Since collections are not immutable, they cannot implement GetHashCode.
Instead, they inherit the default GetHashCode, which returns a (hopefully) unique value for each instance of an object. (Typically based on a memory address)

Hashcodes must depend upon the definition of equality being used so that if A == B then A.GetHashCode() == B.GetHashCode() (but not necessarily the inverse; A.GetHashCode() == B.GetHashCode() does not entail A == B).
By default, the equality definition of a value type is based on its value, and of a reference type is based on it's identity (that is, by default an instance of a reference type is only equal to itself), hence the default hashcode for a value type is such that it depends on the values of the fields it contains* and for reference types it depends on the identity. Indeed, since we ideally want the hashcodes for non-equal objects to be different particularly in the low-order bits (most likely to affect the value of a re-hashing), we generally want two equivalent but non-equal objects to have different hashes.
Since an object will remain equal to itself, it should also be clear that this default implementation of GetHashCode() will continue to have the same value, even when the object is mutated (identity does not mutate even for a mutable object).
Now, in some cases reference types (or value types) re-define equality. An example of this is string, where for example "ABC" == "AB" + "C". Though there are two different instances of string compared, they are considered equal. In this case GetHashCode() must be overridden so that the value relates to the state upon which equality is defined (in this case, the sequence of characters contained).
While it is more common to do this with types that also are immutable, for a variety of reasons, GetHashCode() does not depend upon immutability. Rather, GetHashCode() must remain consistent in the face of mutability - change a value that we use in determining the hash, and the hash must change accordingly. Note though, that this is a problem if we are using this mutable object as a key into a structure using the hash, as mutating the object changes the position in which it should be stored, without moving it to that position (it's also true of any other case where the position of an object within a collection depends on its value - e.g. if we sort a list and then mutate one of the items in the list, the list is no longer sorted). However, this doesn't mean that we must only use immutable objects in dictionaries and hashsets. Rather it means that we must not mutate an object that is in such a structure, and making it immutable is a clear way to guarantee this.
Indeed, there are quite a few cases where storing mutable objects in such structures is desirable, and as long as we don't mutate them during this time, this is fine. Since we don't have the guarantee immutability brings, we then want to provide it another way (spending a short time in the collection and being accessible from only one thread, for example).
Hence immutability of key values is one of those cases where something is possible, but generally a idea. To the person defining the hashcode algorithm though, it's not for them to assume any such case will always be a bad idea (they don't even know the mutation happened while the object was stored in such a structure); it's for them to implement a hashcode defined on the current state of the object, whether calling it in a given point is good or not. Hence for example, a hashcode should not be memoised on a mutable object unless the memoisation is cleared on every mutate. (It's generally a waste to memoise hashes anyway, as structures that hit the same objects hashcode repeatedly will have their own memoisation of it).
Now, in the case in hand, ArrayList operates on the default case of equality being based on identity, e.g.:
ArrayList a = new ArrayList();
ArrayList b = new ArrayList();
for(int i = 0; i != 10; ++i)
{
a.Add(i);
b.Add(i);
}
return a == b;//returns false
Now, this is actually a good thing. Why? Well, how do you know in the above that we want to consider a as equal to b? We might, but there are plenty of good reasons for not doing so in other cases too.
What's more, it's much easier to redefine equality from identity-based to value-based, than from value-based to identity-based. Finally, there are more than one value-based definitions of equality for many objects (classic case being the different views on what makes a string equal), so there isn't even a one-and-only definition that works. For example:
ArrayList c = new ArrayList();
for(short i = 0; i != 10; ++i)
{
c.Add(i);
}
If we considered a == b above, should we consider a == c aslo? The answer depends on just what we care about in the definition of equality we are using, so the framework could't know what the right answer is for all cases, since all cases don't agree.
Now, if we do care about value-based equality in a given case we have two very easy options. The first is to subclass and over-ride equality:
public class ValueEqualList : ArrayList, IEquatable<ValueEqualList>
{
/*.. most methods left out ..*/
public Equals(ValueEqualList other)//optional but a good idea almost always when we redefine equality
{
if(other == null)
return false;
if(ReferenceEquals(this, other))//identity still entails equality, so this is a good shortcut
return true;
if(Count != other.Count)
return false;
for(int i = 0; i != Count; ++i)
if(this[i] != other[i])
return false;
return true;
}
public override bool Equals(object other)
{
return Equals(other as ValueEqualList);
}
public override int GetHashCode()
{
int res = 0x2D2816FE;
foreach(var item in this)
{
res = res * 31 + (item == null ? 0 : item.GetHashCode());
}
return res;
}
}
This assumes that we will always want to treat such lists this way. We can also implement an IEqualityComparer for a given case:
public class ArrayListEqComp : IEqualityComparer<ArrayList>
{//we might also implement the non-generic IEqualityComparer, omitted for brevity
public bool Equals(ArrayList x, ArrayList y)
{
if(ReferenceEquals(x, y))
return true;
if(x == null || y == null || x.Count != y.Count)
return false;
for(int i = 0; i != x.Count; ++i)
if(x[i] != y[i])
return false;
return true;
}
public int GetHashCode(ArrayList obj)
{
int res = 0x2D2816FE;
foreach(var item in obj)
{
res = res * 31 + (item == null ? 0 : item.GetHashCode());
}
return res;
}
}
In summary:
The default equality definition of a reference type is dependant upon identity alone.
Most of the time, we want that.
When the person defining the class decides that this isn't what is wanted, they can override this behaviour.
When the person using the class wants a different definition of equality again, they can use IEqualityComparer<T> and IEqualityComparer so their that dictionaries, hashmaps, hashsets, etc. use their concept of equality.
It's disastrous to mutate an object while it is the key to a hash-based structure. Immutability can be used of ensure this doesn't happen, but is not compulsory, nor always desirable.
All in all, the framework gives us nice defaults and detailed override possibilities.
*There is a bug in the case of a decimal within a struct, because there is a short-cut used in some cases with stucts when it is safe and not othertimes, but while a struct containing a decimal is one case when the short-cut is not safe, it is incorrectly identified as a case where it is safe.

Yes, you are wrong. In both Java and C#, being equal implies having the same hash-code, but the converse is not (necessarily) true.
See GetHashCode for more information.

It is not possible for a hashcode to be unique across all variations of most non-trivial classes. In C# the concept of List equality is not the same as in Java (see here), so the hash code implementation is also not the same - it mirrors the C# List equality.

You're only partly wrong. You're definitely wrong when you think that equal hashcodes means equal objects, but equal objects must have equal hashcodes, which means that if the hashcodes differ, so do the objects.

The core reasons are performance and human nature - people tend to think about hashes as something fast but it normally requires traversing all elements of an object at least once.
Example: If you use a string as a key in a hash table every query has complexity O(|s|) - use 2x longer strings and it will cost you at least twice as much. Imagine that it was a full blown tree (just a list of lists) - oops :-)
If full, deep hash calculation was a standard operation on a collection, enormous percentage of progammers would just use it unwittingly and then blame the framework and the virtual machine for being slow. For something as expensive as full traversal it is crucial that a programmer has to be aware of the complexity. The only was to achieve that is to make sure that you have to write your own. It's a good deterrent as well :-)
Another reason is updating tactics. Calculating and updating a hash on the fly vs. doing the full calculation every time requires a judgement call depending on concrete case in hand.
Immutabilty is just an academic cop out - people do hashes as a way of detecting a change faster (file hashes for example) and also use hashes for complex structures which change all the time. Hash has many more uses beyong the 101 basics. The key is again that what to use for a hash of a complex object has to be a judgement call on a case by case basis.
Using object's address (actually a handle so it doesn't change after GC) as a hash is actually the case where the hash value remains the same for arbitrary mutable object :-) The reason C# does it is that it's cheap and again nudges people to calculate their own.

Why is too philosophical. Create helper method (may be extension method) and calculate hashcode as you like. May be XOR elements' hashcodes

Related

C# Immutability and Equality

I'm trying to create and use only immutable classes where all fields are readonly immutable types, though there may be additional fields which are mutable and not considered to be part of the object's state (mainly a cached hashcode).
When implementing IEquatable I do the same as I would for non immutable objects
Ie,
public bool Equals(MyImmutableType o) =>
object.Equals(this.x, o.x) && object.Equals(this.y, o.y);
Now being immutable this seems inefficient, the object will never change, if I could calculate and store some unique fingerprint of it I could simply compare fingerprints instead of whole fields (which may call their own Equals etc).
I am wondering what can be a good solution for this ? will BinaryFormatter + MD5 be worth exploring ?
Since you've already overridden Equals, you are required to also overload GetHashCode. Remember, the fundamental rule of GetHashCode is equal objects have equal hashes.
Therefore, you have overridden GetHashCode.
Since equal objects are required to have equal hash codes, you can implement Equals as:
public static bool Equals(M a, M b)
{
if (object.ReferenceEquals(a, b)) return true;
// If both of them are null, we're done, but maybe one is.
if (object.ReferenceEquals(null, a)) return false;
if (object.ReferenceEquals(null, b)) return false;
// Both are not null.
if (a.GetHashCode() != b.GetHashCode()) return false;
if (!object.Equals(a.x, b.x)) return false;
if (!object.Equals(a.y, b.y)) return false;
return true;
}
And now you can implement as many instance versions of Equals as you like by calling the static helper. Also overload == and != while you're at it.
That implementation takes as many early outs as possible. Of course, the worst-performing case is the case where we have value equality but not reference equality, but that's also the rarest case! In practice, most objects are unequal to each other, and most objects that are equal to each other are reference equal. In those 99% cases we get the right answer in four or fewer highly efficient comparisons.
If you are in a scenario where it is extremely common for there to be objects that are value equal but not reference equal, then solve the problem in the factory; memoize the factory!

How to write a good GetHashCode() implementation for a class that is compared by value?

Let's say we have such a class:
class MyClass
{
public string SomeValue { get; set; }
// ...
}
Now, let's say two MyClass instances are equal when their SomeValue property is equal. Thus, I overwrite the Object.Equals() and the Object.GetHashCode() methods to represent that. Object.GetHashCode() returns SomeValue.GetHashCode() But at the same time I need to follow these rules:
If two instances of an object are equal, they should return the same hash code.
The hash code should not change throughout the runtime.
But apparently, SomeValue can change, and the hash code we did get before may turn to be invalid.
I can only think of making the class immutable, but I'd like to know what others do in this case.
What do you do in such cases? Is having such a class represents a subtler problem in the design decisions?
The general contract says that if A.equals(B) is true, then their hash codes must be the same. If SomeValue changes in A in such a way that A.equals(B) is no longer true, then A.GetHashCode() can return a different value than before. Mutable objects cannot cache GetHashCode(), it must be calculated every time the method is called.
This article has detailed guidelines for GetHashCode and mutability:
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
If your GetHashCode() depends on some mutable value you have to change your hash whenever your value changes. Otherwise you break the equals law.
The part, that a hash should never be changed, once somebody asked for it, is needed if you put your object into a HashSet or as a key within a Dictionary. In these cases you have to ensure that the hash code won't be changed as long as it is stored in such a container. This can either be ensured manually, by simply taking care of this issue when you program or you could provide some Freeze() method to your object. If this is called any subsequent try to set a property would lead to some kind of exception (also you should then provide some Defrost() method). Additionally you put the call of the Freeze() method into your GetHashCode() implementation and so you can be quite sure that nobody alter a frozen object by mistake.
And just one last tip: If you need to alter a object within such a container, simply remove it, alter it (don't forget to defrost it) and re-add it again.
You sort of need to choose between mutability and GetHashCode returning the same value for 'equal' objects. Often when you think you want to implement 'equal' for mutable objects, you end up later deciding that you have "shades of equal" and really didn't mean Object.Equals equality.
Having a mutable object as the 'key' in any sort of data structure is a big red flag to me. For example:
MyObj a = new MyObj("alpha");
MyObj b = new MyObj("beta");
HashSet<MyObj> objs = new HashSet<MyObj>();
objs.Add(a);
objs.Add(b);
// objs.Count == 2
b.SomeValue = "alpha";
// objs.Distinct().Count() == 1, objs.Count == 2
We've badly violated the contract of HashSet<T>. This is an obvious example, there are subtle ones.

Should the hash code of null always be zero, in .NET

Given that collections like System.Collections.Generic.HashSet<> accept null as a set member, one can ask what the hash code of null should be. It looks like the framework uses 0:
// nullable struct type
int? i = null;
i.GetHashCode(); // gives 0
EqualityComparer<int?>.Default.GetHashCode(i); // gives 0
// class type
CultureInfo c = null;
EqualityComparer<CultureInfo>.Default.GetHashCode(c); // gives 0
This can be (a little) problematic with nullable enums. If we define
enum Season
{
Spring,
Summer,
Autumn,
Winter,
}
then the Nullable<Season> (also called Season?) can take just five values, but two of them, namely null and Season.Spring, have the same hash code.
It is tempting to write a "better" equality comparer like this:
class NewNullEnumEqComp<T> : EqualityComparer<T?> where T : struct
{
public override bool Equals(T? x, T? y)
{
return Default.Equals(x, y);
}
public override int GetHashCode(T? x)
{
return x.HasValue ? Default.GetHashCode(x) : -1;
}
}
But is there any reason why the hash code of null should be 0?
EDIT/ADDITION:
Some people seem to think this is about overriding Object.GetHashCode(). It really is not, actually. (The authors of .NET did make an override of GetHashCode() in the Nullable<> struct which is relevant, though.) A user-written implementation of the parameterless GetHashCode() can never handle the situation where the object whose hash code we seek is null.
This is about implementing the abstract method EqualityComparer<T>.GetHashCode(T) or otherwise implementing the interface method IEqualityComparer<T>.GetHashCode(T). Now, while creating these links to MSDN, I see that it says there that these methods throw an ArgumentNullException if their sole argument is null. This must certainly be a mistake on MSDN? None of .NET's own implementations throw exceptions. Throwing in that case would effectively break any attempt to add null to a HashSet<>. Unless HashSet<> does something extraordinary when dealing with a null item (I will have to test that).
NEW EDIT/ADDITION:
Now I tried debugging. With HashSet<>, I can confirm that with the default equality comparer, the values Season.Spring and null will end in the same bucket. This can be determined by very carefully inspecting the private array members m_buckets and m_slots. Note that the indices are always, by design, offset by one.
The code I gave above does not, however, fix this. As it turns out, HashSet<> will never even ask the equality comparer when the value is null. This is from the source code of HashSet<>:
// Workaround Comparers that throw ArgumentNullException for GetHashCode(null).
private int InternalGetHashCode(T item) {
if (item == null) {
return 0;
}
return m_comparer.GetHashCode(item) & Lower31BitMask;
}
This means that, at least for HashSet<>, it is not even possible to change the hash of null. Instead, a solution is to change the hash of all the other values, like this:
class NewerNullEnumEqComp<T> : EqualityComparer<T?> where T : struct
{
public override bool Equals(T? x, T? y)
{
return Default.Equals(x, y);
}
public override int GetHashCode(T? x)
{
return x.HasValue ? 1 + Default.GetHashCode(x) : /* not seen by HashSet: */ 0;
}
}
So long as the hash code returned for nulls is consistent for the type, you should be fine. The only requirement for a hash code is that two objects that are considered equal share the same hash code.
Returning 0 or -1 for null, so long as you choose one and return it all the time, will work. Obviously, non-null hash codes should not return whatever value you use for null.
Similar questions:
GetHashCode on null fields?
What should GetHashCode return when object's identifier is null?
The "Remarks" of this MSDN entry goes into more detail around the hash code. Poignantly, the documentation does not provide any coverage or discussion of null values at all - not even in the community content.
To address your issue with the enum, either re-implement the hash code to return non-zero, add a default "unknown" enum entry equivalent to null, or simply don't use nullable enums.
Interesting find, by the way.
Another problem I see with this generally is that the hash code cannot represent a 4 byte or larger type that is nullable without at least one collision (more as the type size increases). For example, the hash code of an int is just the int, so it uses the full int range. What value in that range do you choose for null? Whatever one you pick will collide with the value's hash code itself.
Collisions in and of themselves are not necessarily a problem, but you need to know they are there. Hash codes are only used in some circumstances. As stated in the docs on MSDN, hash codes are not guaranteed to return different values for different objects so shouldn't be expected to.
It doesn't have to be zero -- you could make it 42 if you wanted to.
All that matters is consistency during the execution of the program.
It's just the most obvious representation, because null is often represented as a zero internally. Which means, while debugging, if you see a hash code of zero, it might prompt you to think, "Hmm.. was this a null reference issue?"
Note that if you use a number like 0xDEADBEEF, then someone could say you're using a magic number... and you kind of would be. (You could say zero is a magic number too, and you'd be kind of right... except that it's so widely used as to be somewhat of an exception to the rule.)
Bear in mind that the hash code is used as a first-step in determining equality only, and [is/should]never (be) used as a de-facto determination as to whether two objects are equal.
If two objects' hash codes are not equal then they are treated as not equal (because we assume that the unerlying implementation is correct - i.e. we don't second-guess that). If they have the same hash code, then they should then be checked for actual equality which, in your case, the null and the enum value will fail.
As a result - using zero is as good as any other value in the general case.
Sure, there will be situations, like your enum, where this zero is shared with a real value's hash code. The question is whether, for you, the miniscule overhead of an additional comparison causes problems.
If so, then define your own comparer for the case of the nullable for your particular type, and ensure that a null value always yields a hash code that is always the same (of course!) and a value that cannot be yielded by the underlying type's own hash code algorithm. For your own types, this is do-able. For others - good luck :)
Good question.
I just tried to code this:
enum Season
{
Spring,
Summer,
Autumn,
Winter,
}
and execute this like this:
Season? v = null;
Console.WriteLine(v);
it returns null
if I do, instead normal
Season? v = Season.Spring;
Console.WriteLine((int)v);
it return 0, as expected, or simple Spring if we avoid casting to int.
So.. if you do the following:
Season? v = Season.Spring;
Season? vnull = null;
if(vnull == v) // never TRUE
EDIT
From MSDN
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values
In other words: if two objects have same hash code that doesn't mean that they are equal, cause real equality is determined by Equals.
From MSDN again:
The GetHashCode method for an object must consistently return the same
hash code as long as there is no modification to the object state that
determines the return value of the object's Equals method. Note that
this is true only for the current execution of an application, and
that a different hash code can be returned if the application is run
again.
But is there any reason why the hash code of null should be 0?
It could have been anything at all. I tend to agree that 0 wasn't necessarily the best choice, but it's one that probably leads to fewest bugs.
A hash function absolutely must return the same hash for the same value. Once there exists a component that does this, this is really the only valid value for the hash of null. If there were a constant for this, like, hm, object.HashOfNull, then someone implementing an IEqualityComparer would have to know to use that value. If they don't think about it, the chance they'll use 0 is slightly higher than every other value, I reckon.
at least for HashSet<>, it is not even possible to change the hash of null
As mentioned above, I think it's completely impossible full stop, just because there exist types which already follow the convention that hash of null is 0.
It is 0 for the sake of simplicity. There is no such hard requirement. You only need to ensure the general requirements of hash coding.
For example, you need to make sure that if two objects are equal, their hashcodes must always be equal too. Therefore, different hashcodes must always represent different objects (but it's not necessarily true vice versa: two different objects may have the same hashcode, even though if this happens often then this is not a good quality hash function -- it doesn't have a good collision resistance).
Of course, I restricted my answer to requirements of mathematical nature. There are .NET-specific, technical conditions as well, which you can read here. 0 for a null value is not among them.
So this could be avoided by using an Unknown enum value (although it seems a bit weird for a Season to be unknown). So something like this would negate this issue:
public enum Season
{
Unknown = 0,
Spring,
Summer,
Autumn,
Winter
}
Season some_season = Season.Unknown;
int code = some_season.GetHashCode(); // 0
some_season = Season.Autumn;
code = some_season.GetHashCode(); // 3
Then you would have unique hash code values for each season.
Personally I find using nullable values a bit awkward and try to avoid them whenever I can. Your issue is just another reason. Sometimes they are very handy though but my rule of thumb is not to mix value types with null if possible simply because these are from two different worlds. In .NET framework they seem to do the same - a lot of value types provide TryParse method which is a way of separating values from no value (null).
In your particular case it is easy to get rid of the problem because you handle your own Season type.
(Season?)null to me means 'season is not specified' like when you have a webform where some fields are not required. In my opinion it is better to specify that special 'value' in the enum itself rather than use a bit clunky Nullable<T>. It will be faster (no boxing) easier to read (Season.NotSpecified vs null) and will solve your problem with hash codes.
Of course for other types, like int you can't expand value domain and to denominate one of the values as special is not always possible. But with int? hash code collision is much smaller problem, if at all.
Tuple.Create( (object) null! ).GetHashCode() // 0
Tuple.Create( 0 ).GetHashCode() // 0
Tuple.Create( 1 ).GetHashCode() // 1
Tuple.Create( 2 ).GetHashCode() // 2

Dictionary.ContainsKey() not working as expected

I have a dictionary.
Dictionary<YMD, object> cache = new Dictionary<YMD, object>();
The YMD class is one of my inventions, it is a class containing only the year, month, and date. The purpose is that the data will be indexed by the day is relates to. Anyhow, I have implemented the Equals() and CompareTo() functions, as well as the == and != operators.
Despite this, the Dictionary.ContainsKey() function will always return false, even if the key exists.
I immediately thought my comparison functions must be broken, but after writing unit tests for all of them it does not appear to be the case.
Is there something about the dictionary class that I do not know?
With a dictionary, GetHashCode() is critical. For things that are equal (Equals() == true) it must return the same number (but it is permitted to have collisions - i.e. two items can return the same number by coincidence but not be considered equals).
Additionally - the hash-code must not change while the item is in the dictionary. Hashing on readonly values are good for this, but alternatively: just don't change it! For example, if your equals / hashcode spans an entities Name and Id (say), then don't change those properties of the object, or you may never see that record again (even if you pass in the same instance as the key).
You need only to override the Equals and GetHashcode functions.
The most common implementation for GetHashcode is to XOR (^) all of the instance's data members.

C# How to select a Hashcode for a class that violates the Equals contract?

I've got multiple classes that, for certain reasons, do not follow the official Equals contract. In the overwritten GetHashCode() these classes simply return 0 so they can be used in a Hashmap.
Some of these classes implement the same interface and there are Hashmaps using this interface as key. So I figured that every class should at least return a different (but still constant) value in GetHashCode().
The question is how to select this value. Should I simply let the first class return 1, the next class 2 and so on? Or should I try something like
class SomeClass : SomeInterface {
public overwrite int GetHashCode() {
return "SomeClass".GetHashCode();
}
}
so the hash is distributed more evenly? (Do I have to cache the returned value myself or is Microsoft's compiler able to optimize this?)
Update: It is not possible to return an individual hashcode for each object, because Equals violates the contract. Specifially, I'm refering to this problem.
If it "violates the Equals contract", then I'm not sure you should be using it as a key.
It something is using that as a key, you really need to get the hashing right... it is very unclear what the Equals logic is, but two values that are considered equal must have the same hash-code. It is not required that two values with the same hash-code are equal.
Using a constant string won't really help much - you'll get the values split evenly over the types, but that is about it...
I'm curious what the reasoning would be for overriding GetHashCode() and returning a constant value. Why violate the idea of a hash rather than just violating the "contract" and not overriding the GetHashCode() function at all and leave the default implementation from Object?
Edit
If what you've done is that so you can have your objects match based on their contents rather than their reference then what you propose with having different classes simply use different constants can WORK, but is highly inefficient. What you want to do is come up with a hashing algorithm that can take the contents of your class and produce a value that balances speed with even distribution (that's hashing 101).
I guess I'm not sure what you're looking for...there isn't a "good" scheme for choosing constant numbers for this paradigm. One is not any better than the other. Try to improve your objects so that you're creating a real hash.
I ran into this exact problem when writing a vector class. I wanted to compare vectors for equality, but float operations give rounding errors, so I wanted approximate equality. Long story short, overriding equals is a bad idea unless your implementation is symmetric, reflexive, and transitive.
Other classes are going to assume equals has those properties, and so will classes using those classes, and so you can end up in weird cases. For example a list might enforce uniqueness, but end up with two elements which evaluate as equal to some element B.
A hash table is the perfect example of unpredictable behavior when you break equality. For example:
//Assume a == b, b == c, but a != c
var T = new Dictionary<YourType, int>()
T[a] = 0
T[c] = 1
return T[b] //0 or 1? who knows!
Another example would be a Set:
//Assume a == b, b == c, but a != c
var T = new HashSet<YourType>()
T.Add(a)
T.Add(c)
if (T.contains(b)) then T.remove(b)
//surely T can't contain b anymore! I sure hope no one breaks the properties of equality!
if (T.contains(b)) then throw new Exception()
I suggest using another method, with a name like ApproxEquals. You might also consider overriding the == operator, because it isn't virtual and therefore won't be used accidentally by other classes like Equals could be.
If you really can't use reference equality for the hash table, don't ruin the performance of cases where you can. Add an IApproxEquals interface, implement it in your class, and add an extension method GetApprox to Dictionary which enumerates the keys looking for an approximately equal one, and returns the associated value. You could also write a custom dictionary especially for 3-dimensional vectors, or whatever you need.
When hash collisions occur, the HashTable/Dictionary calls Equals to find the key you're looking for. Using a constant hash code removes the speed advantages of using a hash in the first place - it becomes a linear search.
You're saying the Equals method hasn't been implemented according to the contract. What exactly do you mean with this? Depending on the kind of violation, the HashTable or Dictionary will merely be slow (linear search) or not work at all.

Categories