I have a class that is similar to this:
public class Int16_2D
{
public Int16 a, b;
public override bool Equals(Object other)
{
return other is Int16_2D &&
a == ((Int16_2D)other).a &&
b == ((Int16_2D)other).b;
}
}
This works in HashSet<Int16_2D>. However in Dictionary<Int16_2D, myType>, .ContainsKey returns false when it shouldn't. Am I missing something in my implementation of ==?
For a class to work in a hash table or dictionary, you need to implement GetHashCode()! I have no idea why it's working in HashSet; I would guess it was just luck.
Note that it's dangerous to use mutable fields for calculating Equals or GetHashCode(). Why? Consider this:
var x = new Int16_2D { a = 1, b = 2 };
var set = new HashSet<Int16_2D> { x };
var y = new Int16_2D { a = 1, b = 2 };
Console.WriteLine(set.Contains(y)); // True
x.a = 3;
Console.WriteLine(set.Contains(y)); // False
Console.WriteLine(set.Contains(x)); // Also false!
In other words, when you set x.a = 3; you're changing x's hash code. But x's location in the hash table is based on its old hash code, so x is basically lost now. See this in action at http://ideone.com/QQw08
Also, as svick notes, implementing Equals does not implement ==. If you don't implement ==, the == operator will provide a reference comparison, so:
var x = new Int16_2d { a = 1, b = 2 };
var y = new Int16_2d { a = 1, b = 2 };
Console.WriteLine(x.Equals(y)); //True
Console.WriteLine(x == y); //False
In conclusion, you're better off making this an immutable type; since it's only 4 bytes long, I'd probably make it an immutable struct.
You need to override GetHashCode(). The fact that it works with HashSet<T> is probably just a lucky coincidence.
Both collections use the hash code obtained from GetHashCode to find a bucket (ie. list of objects), where the object should be placed. Then it searches that bucket to find the object, and uses Equals to ensure equality. This is what gives the nice fast lookup properties of the Dictionary and HashSet. However, this also means, that if GetHashCode is not overridden so that it corresponds to the types Equals method, you will not be able to find such an object in one of the collections.
You should, almost always, implement both GetHashCode and Equals, or none of them.
You need to override GetHashCode as well for the dictionary to work.
You have to override GetHashCode() as well - this goes hand in hand with overriding Equals. Dictionary is using GetHashCode() to determine what bin a value would fall into - only if a suitable item is found in that bin it checks on actual equality of the items.
Related
Why HashSet<T>.GetHashCode() returns different hashcodes when they have the same elements?
For instance:
[Fact]
public void EqualSetsHaveSameHashCodes()
{
var set1 = new HashSet<int>(new [] { 1, 2, 3 } );
var set2 = new HashSet<int>(new [] { 1, 2, 3 } );
Assert.Equal(set1.GetHashCode(), set2.GetHashCode());
}
This test fails. Why?
How can I get the result I need? "Equal sets give the same hashcode"
HashSet<T> by default does not have value equality semantics. It has reference equality semantics, so two distinct hash sets won't be equal or have the same hash code even if the containing elements are the same.
You need to use a special purpose IEqualityComparer<HashSet<int>> to get the behavior you want. You can roll your own or use the default one the framework provides for you:
var hashSetOfIntComparer = HashSet<int>.CreateSetComparer();
//will evaluate to true
var haveSameHash = hashSetOfIntComparer.GetHashCode(set1) ==
hashSetOfIntComparer.GetHashCode(set2);
So, to make a long story short:
How can I get the result I need? "Equal sets give the same hashcode"
You can't if you are planning on using the default implementation of HashSet<T>.GetHashCode(). You either use a special purpose comparer or you extend HashSet<T> and override Equals and GetHashCode to suit your needs.
By default (and unless otherwise specifically documented), reference types are only considered equal if they reference the same object. As a developer, you can override the Equals() and GetHashCode() methods so that objects that you consider equal return true for the Equals and the same int for GetHashCode.
Depending on which test framework you are using, there will be either CollectionAssert.AreEquivalent() or an override to Assert.Equal that takes a comparer.
You could implement a custom HashSet that overrides the GetHashCode function which generates a new hashcode from all of the contents like below:
public class HashSetWithGetHashCode<T> : HashSet<T>
{
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
foreach (var item in this)
hash = hash * 23 + item.GetHashCode();
return hash;
}
}
}
Is there any built in collection type (IEnumerable<S>) or IEqualityComparer<T> for an IEnumerable<S> in the framework that has it's Equals (and GetHashCode accordingly) defined by the equality of the items in it?
Something like:
var x = new SomeCollection { 1, 2, 3 };
var y = new SomeCollection { 1, 2, 3 };
// so that x.Equals(y) -> true
// and x.Shuffle().Equals(y) -> false
Or a
class SomeComparer<T> : EqalityComparer<IEnumerable<T>> { }
// so that for
var x = new[] { 1, 2, 3 };
var y = new[] { 1, 2, 3 };
// gives
// new SomeComparer<int>().Equals(x, y) -> true
// new SomeComparer<int>().Equals(x.Shuffle(), y) -> false
? My question is, is there something in the framework that behaves like SomeCollection or SomeComparer<T> as shown in the code?
Why I need it: because I have a case for a Dictionary<Collection, T> where the Key part should be a collection and its equality is based on its entries.
Requirements:
Collection need be only a simple enumerable type with Add method
Order of items is important
Duplicate items can exist in the collection
Note: I can write one my own, it's trivial. There are plenty of questions on SO helping with that. I'm asking is there a class in the framework itself.
Just keep it simple. Just use the Dictionary ctor that takes in a specialized IEqualityComparer (just implement your equality logic in a comparer) and you are good to go. No need for special collection types and so on...
See here
If you can, it may be better to define your own immutable collection class which accepts an IEqualityComparer<T> as a constructor parameter, and have its Equals and GetHashCode() members chain to those of the underlying collection, than to try to define an IEqualityComparer<T> for the purpose. Among other things, your immutable collection class would be able to cache its own hash value, and possibly the hash values for the items contained therein. This would accelerate not only calls to GetHashCode() on the collection, but also comparisons between two collections. If two collections' hashcodes are unequal, there's no point in checking anything further; even if two collections' hashcodes are equal, it may be worthwhile to check that the hashcodes of corresponding items match before testing the items themselves for equality [note that in general, using a hash-code test as an early exit before checking equality is not particularly helpful, because the slowest Equals case (where the items match) is the one where hash codes are going to match anyway; here, however, if all but the last item match, testing the hash code of the items may find the mismatch before one has spent time inspecting each item in detail.
Starting in .NET 4.0, it became possible to write an IEqualityComparer<T> which could achieve the performance advantage of an immutable collection class which caches hash values, by using a ConditionalWeakTable to map collections to objects which would cache information about them. Nonetheless, unless one is unable to use a custom immutable-collection class, I think such a class would probably be better than an IEqualityComparer<T> in this scenario anyway.
I do not beleive that such a thing exists. I had a need to compare two dictionary's contents for equality and wrote this awhile back.
public class DictionaryComparer<TKey, TValue> : EqualityComparer<IDictionary<TKey, TValue>>
{
public DictionaryComparer()
{
}
public override bool Equals(IDictionary<TKey, TValue> x, IDictionary<TKey, TValue> y)
{
// early-exit checks
if (object.ReferenceEquals(x, y))
return true;
if (null == x || y == null)
return false;
if (x.Count != y.Count)
return false;
// check keys are the same
foreach (TKey k in x.Keys)
if (!y.ContainsKey(k))
return false;
// check values are the same
foreach (TKey k in x.Keys)
{
TValue v = x[k];
if (object.ReferenceEquals(v, null))
return object.ReferenceEquals(y[k], null);
if (!v.Equals(y[k]))
return false;
}
return true;
}
public override int GetHashCode(IDictionary<TKey, TValue> obj)
{
if (obj == null)
return 0;
int hash = 0;
foreach (KeyValuePair<TKey, TValue> pair in obj)
{
int key = pair.Key.GetHashCode(); // key cannot be null
int value = pair.Value != null ? pair.Value.GetHashCode() : 0;
hash ^= ShiftAndWrap(key, 2) ^ value;
}
return hash;
}
private static int ShiftAndWrap(int value, int positions)
{
positions = positions & 0x1F;
// Save the existing bit pattern, but interpret it as an unsigned integer.
uint number = BitConverter.ToUInt32(BitConverter.GetBytes(value), 0);
// Preserve the bits to be discarded.
uint wrapped = number >> (32 - positions);
// Shift and wrap the discarded bits.
return BitConverter.ToInt32(BitConverter.GetBytes((number << positions) | wrapped), 0);
}
}
What would be the best way to override the GetHashCode function for the case, when
my objects are considered equal if there is at least ONE field match in them.
In the case of generic Equals method the example might look like this:
public bool Equals(Whatever other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
// Considering that the values can't be 'null' here.
return other.Id.Equals(Id) || Equals(other.Money, Money) ||
Equals(other.Code, Code);
}
Still, I'm confused about making a good GetHashCode implementation for this case.
How should this be done?
Thank you.
This is a terrible definition of Equals because it is not transitive.
Consider
x = { Id = 1, Money = 0.1, Code = "X" }
y = { Id = 1, Money = 0.2, Code = "Y" }
z = { Id = 3, Money = 0.2, Code = "Z" }
Then x == y and y == z but x != z.
Additionally, we can establish that the only reasonable implementation of GetHashCode is a constant map.
Suppose that x and y are distinct objects. Let z be the object
z = { Id = x.Id, Money = y.Money, Code = "Z" }
Then x == z and y == z so that x.GetHashCode() == z.GetHashCode() and y.GetHashCode() == z.GetHashCode() establishing that x.GetHashCode() == y.GetHashCode(). Since x and y were arbitrary we have established that GetHashCode is constant.
Thus, we have shown that the only possible implementation of GetHashCode is
private readonly int constant = 17;
public override int GetHashCode() {
return constant;
}
All of this put together makes it clear that you need to rethink the concept you are trying model, and come up with a different definition of Equals.
I don't think you should be using Equals for this. People have a very explicit notion of what equals means, and if the Ids are different but the code or name are the same, I would not consider those "Equal". Maybe you need a different method like "IsCompatible".
If you want to be able to group them, you could use the extension method ToLookup() on a list of these objects, to use a predicate which would be your IsCompatible method. Then they would be grouped.
The golden rule is: if the objects compare equal, they must produce the same hash code.
Therefore a conforming (but let's say, undesirable) implementation would be
public override int GetHashCode()
{
return 0;
}
Frankly, if Id, Name and Code are independent of each other then I don't know if you can do any better. Putting objects of this type in a hash table is going to be painful.
Testing the Equals method is pretty much straight forward (as far as I know). But how on earth do you test the GetHashCode method?
Test that two distinct objects which are equal have the same hash code (for various values). Check that non-equal objects give different hash codes, varying one aspect/property at a time. While the hash codes don't have to be different, you'd be really unlucky to pick different values for properties which happen to give the same hash code unless you've got a bug.
Gallio/MbUnit v3.2 comes with convenient contract verifiers which are able to test your implementation of GetHashCode() and IEquatable<T>. More specifically you may be interested by the EqualityContract and the HashCodeAcceptanceContract. See here, here and there for more details.
public class Spot
{
private readonly int x;
private readonly int y;
public Spot(int x, int y)
{
this.x = x;
this.y = y;
}
public override int GetHashCode()
{
int h = -2128831035;
h = (h * 16777619) ^ x;
h = (h * 16777619) ^ y;
return h;
}
}
Then you declare your contract verifier like this:
[TestFixture]
public class SpotTest
{
[VerifyContract]
public readonly IContract HashCodeAcceptanceTests = new HashCodeAcceptanceContract<Spot>()
{
CollisionProbabilityLimit = CollisionProbability.VeryLow,
UniformDistributionQuality = UniformDistributionQuality.Excellent,
DistinctInstances = DataGenerators.Join(Enumerable.Range(0, 1000), Enumerable.Range(0, 1000)).Select(o => new Spot(o.First, o.Second))
};
}
It would be fairly similar to Equals(). You'd want to make sure two objects which were the "same" at least had the same hash code. That means if .Equals() returns true, the hash codes should be identical as well. As far as what the proper hashcode values are, that depends on how you're hashing.
From personal experience. Aside from obvious things like same objects giving you same hash codes, you need to create large enough array of unique objects and count unique hash codes among them. If unique hash codes make less than, say 50% of overall object count, then you are in trouble, as your hash function is not good.
List<int> hashList = new List<int>(testObjectList.Count);
for (int i = 0; i < testObjectList.Count; i++)
{
hashList.Add(testObjectList[i]);
}
hashList.Sort();
int differentValues = 0;
int curValue = hashList[0];
for (int i = 1; i < hashList.Count; i++)
{
if (hashList[i] != curValue)
{
differentValues++;
curValue = hashList[i];
}
}
Assert.Greater(differentValues, hashList.Count/2);
In addition to checking that object equality implies equality of hashcodes, and the distribution of hashes is fairly flat as suggested by Yann Trevin (if performance is a concern), you may also wish to consider what happens if you change a property of the object.
Suppose your object changes while it's in a dictionary/hashset. Do you want the Contains(object) to still be true? If so then your GetHashCode had better not depend on the mutable property that was changed.
I would pre-supply a known/expected hash and compare what the result of GetHashCode is.
You create separate instances with the same value and check that the GetHashCode for the instances returns the same value, and that repeated calls on the same instance returns the same value.
That is the only requirement for a hash code to work. To work well the hash codes should of course have a good distribution, but testing for that requires a lot of testing...
I have an immutable Value Object, IPathwayModule, whose value is defined by:
(int) Block;
(Entity) Module, identified by (string) ModuleId;
(enum) Status; and
(entity) Class, identified by (string) ClassId - which may be null.
Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
int hx = GetHashCode(x);
int hy = GetHashCode(y);
return hx == hy;
}
public int GetHashCode(IPathwayModule obj)
{
int h;
if (obj.Class != null)
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
}
else
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
}
return h;
}
}
IPathwayModule is definitely immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.
I suppose my questions are:
Am I using the interface correctly in this case?
Are there cases where I might not see the desired behaviour?
Is there any way to improve the robustness, performance?
Are there any good practices that I am not following?
Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:
return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...
Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.
return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...
You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...
The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.
GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
if (obj.class != null)
h += obj.Class.ClassId.GetHashCode();
return h;
}
}
The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.
You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.
There could be multiple objects with the same hash code (this is called 'hash code collisions').
Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.
You should implement better versions of Equals and GetHashCode.
For instance, the hash code of enums is simply their numerical value.
In other words, with these two enums:
public enum A { x, y, z }
public enum B { k, l, m }
Then with your implementation, the following value type:
public struct AB {
public A;
public B;
}
the following two values would be considered equal:
AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };
I'm assuming you don't want that.
Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.
Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
It is proabably better to use ^ rather than + to combine the hash codes.
If I understand you well, you'd like to hear some comments on your code. Here're my remarks:
GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
You say your value types are immutable, but you reference objects (.Class), which are not immutable
Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.
Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.
EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.
Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
if (x == y) return true;
if (x == null || y == null) return false;
if ((x.Class == null) ^ (y.Class == null)) return false;
if (x.Class == null) //and implicitly y.Class == null
{
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
}
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
}
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
if (obj.Class != null)
{
h ^= obj.Class.ClassId.GetHashCode();
}
return h;
}
}
}