Null vs Empty Collections in GetHashCode - c#

In the implementation of GetHashCode below, when Collection is null or empty will both result in a hash code of 0.
A colleague suggested return a random hard coded number like 19 to differentiate from a null collection. Why would I want to do this? Why would I care that a null or empty collection produces a different hash code?
public class Foo
{
public List<int> Collection { get; set; }
// Other properties omitted.
public int override GetHashCode()
{
var hashCode = 0;
if (this.Collection != null)
{
foreach (var item in this.Collection)
{
var itemHashCode = item == null ? 0 : item.GetHashCode();
hashCode = ((hashCode << 5) + hashCode) ^ itemHashCode;
}
}
return hashCode;
}
}

The design of GetHashCode is that it is supposed to minimize the number of collisions that will take place, as best as it can. While having some hash collisions is inevitable, you'll want to be mindful of what types of objects are colliding, what type of data are going to be stored in your hash based collections, and working to ensure that types of objects stored together in the same collection are less likely to collide.
So if you happen to know something about how hash-based collections of this type are going to be used, and that there are likely to be both null and empty objects in them, then it would improve the performance to have them not collide. If you suspect that having both a null and empty value in the same collection is not particularly likely, then having them collide isn't actually a concern.

Related

C# Hash Function for Dictionary Lookup [duplicate]

Given the following class
public class Foo
{
public int FooId { get; set; }
public string FooName { get; set; }
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
public override int GetHashCode()
{
// Which is preferred?
return base.GetHashCode();
//return this.FooId.GetHashCode();
}
}
I have overridden the Equals method because Foo represent a row for the Foos table. Which is the preferred method for overriding the GetHashCode?
Why is it important to override GetHashCode?
Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T>, etc - since this is used (in the absence of a custom IEqualityComparer<T>) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).
The GetHashCode() method should reflect the Equals logic; the rules are:
if two things are equal (Equals(...) == true) then they must return the same value for GetHashCode()
if the GetHashCode() is equal, it is not necessary for them to be the same; this is a collision, and Equals will be called to see if it is a real equality or not.
In this case, it looks like "return FooId;" is a suitable GetHashCode() implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so that new Foo(3,5) has a different hash-code to new Foo(5,3)):
In modern frameworks, the HashCode type has methods to help you create a hashcode from multiple values; on older frameworks, you'd need to go without, so something like:
unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
int hash = 13;
hash = (hash * 7) + field1.GetHashCode();
hash = (hash * 7) + field2.GetHashCode();
...
return hash;
}
Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GetHashCode.
A demonstration of what happens when you get this wrong is here.
It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.
I finally found a solution to this problem when I was working with NHibernate.
My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.
By overriding Equals you're basically stating that you know better how to compare two instances of a given type.
Below you can see an example of how ReSharper writes a GetHashCode() function for you. Note that this snippet is meant to be tweaked by the programmer:
public override int GetHashCode()
{
unchecked
{
var result = 0;
result = (result * 397) ^ m_someVar1;
result = (result * 397) ^ m_someVar2;
result = (result * 397) ^ m_someVar3;
result = (result * 397) ^ m_someVar4;
return result;
}
}
As you can see it just tries to guess a good hash code based on all the fields in the class, but if you know your object's domain or value ranges you could still provide a better one.
Please donĀ“t forget to check the obj parameter against null when overriding Equals().
And also compare the type.
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
The reason for this is: Equals must return false on comparison to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx
How about:
public override int GetHashCode()
{
return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}
Assuming performance is not an issue :)
As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.
// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();
In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets.
Just to add on above answers:
If you don't override Equals then the default behavior is that references of the objects are compared. The same applies to hashcode - the default implmentation is typically based on a memory address of the reference.
Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.
Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).
In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys -
you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.
It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).
We have two problems to cope with.
You cannot provide a sensible GetHashCode() if any field in the
object can be changed. Also often a object will NEVER be used in a
collection that depends on GetHashCode(). So the cost of
implementing GetHashCode() is often not worth it, or it is not
possible.
If someone puts your object in a collection that calls
GetHashCode() and you have overrided Equals() without also making
GetHashCode() behave in a correct way, that person may spend days
tracking down the problem.
Therefore by default I do.
public class Foo
{
public int FooId { get; set; }
public string FooName { get; set; }
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
public override int GetHashCode()
{
// Some comment to explain if there is a real problem with providing GetHashCode()
// or if I just don't see a need for it for the given class
throw new Exception("Sorry I don't know what GetHashCode should do for this class");
}
}
Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).
It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:
public override int GetHashCode()
{
return base.GetHashCode();
}
(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)
When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:
class A
{
public int Value;
public override int GetHashCode()
{
return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
}
}
On the other hand, if Value can't be changed it's ok to use:
class A
{
public readonly int Value;
public override int GetHashCode()
{
return Value.GetHashCode(); //OK Value is read-only and can't be changed during the instance's life time
}
}
You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code. As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary. I would advice you to always follow this rule though. The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.
For example, suppose we keep some objects in a List. Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example. This is when we can get into trouble. List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). If the two behave differently, so will your program. And bear in mind that such issues are not the easiest to troubleshoot.
I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.
As of C# 9(.net 5 or .net core 3.1), you may want to use records as it does Value Based Equality by default.
It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.
EDITED:
That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values. Though objects that are equal return the same hash code.
Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).
public int getHashCode()
{
PropertyInfo[] theProperties = this.GetType().GetProperties();
int hash = 31;
foreach (PropertyInfo info in theProperties)
{
if (info != null)
{
var value = info.GetValue(this,null);
if(value != null)
unchecked
{
hash = 29 * hash ^ value.GetHashCode();
}
}
}
return hash;
}

Distinct of List <T> not workiong

I have this class with comparer
public partial class CityCountryID :IEqualityComparer<CityCountryID>
{
public string City { get; set; }
public string CountryId { get; set; }
public bool Equals(CityCountryID left, CityCountryID right)
{
if ((object)left == null && (object)right == null)
{
return true;
}
if ((object)left == null || (object)right == null)
{
return false;
}
return left.City.Trim().TrimEnd('\r', '\n') == right.City.Trim().TrimEnd('\r', '\n')
&& left.CountryId == right.CountryId;
}
public int GetHashCode(CityCountryID obj)
{
return (obj.City + obj.CountryId).GetHashCode();
}
}
I Tried using Hashset and Distinct but neither one is working. i did not want to do this in db as the list was too big and too for everrrrrrrr. why is this not working in c#? i want to get a unique country, city list.
List<CityCountryID> CityList = LoadData("GetCityList").ToList();
//var unique = new HashSet<CityCountryID>(CityList);
Console.WriteLine("Loading Completed/ Checking Duplicates");
List<CityCountryID> unique = CityList.Distinct().ToList();
Your Equals and GetHashCode methods aren't consistent. In Equals, you're trimming the city name - but in GetHashCode you're not. That means two equal values can have different hash codes, violating the normal contract.
That's the first thing to fix. I would suggest trimming the city names in the database itself for sanity, and then removing the Trim operations in your Equality check. That'll make things a lot simpler.
The second is work out why it was taking a long time in the database: I'd strongly expect it to perform better in the database than locally, especially if you have indexes on the two fields.
The next is to consider making your type immutable if at all possible. It's generally a bad idea to allow mutable properties of an object to affect equality; if you change an equality-sensitive property of an object after using it as a key in a dictionary (or after adding it to a HashSet) you may well find that you can't retrieve it again, even using the exact same reference.
EDIT: Also, as Scott noted, you either need to pass in an IEqualityComparer to perform the equality comparison or make your type override the normal Equals and GetHashCode methods. At the moment you're half way between the two (implementing IEqualityComparer<T>, but not actually providing a comparer as an argument to Distinct or the HashSet constructor). In general it's unusual for a type to implement IEqualityComparer for itself. Basically you either implement a "natural" equality check in the type or you implement a standalone equality check in a type implementing IEqualityComparer<T>. You don't have to implement IEquatable<T> - just overriding the normal Equals(object) method will work - but it's generally a good idea to implement IEquatable<T> at the same time.
As an aside, I would also suggest computing a hash code without using string concatenation. For example:
public override int GetHashCode()
{
int hash = 17;
hash = hash * 31 + CountryId.GetHashCode();
hash = hash * 31 + City.GetHashCode();
return hash;
}
You needed to implment the interface IEquatable<T> not IEqualityComparer<T> (Be sure to read the documentation, especially the "Notes to Implementers" section!). IEqualityComparer is when you want to use a custom comparer other than the default one built in to the class.
Also you need to make the changes that Jon mentioned about GetHashCode not matching Equals

Fast collection comparison

I have the following data type:
ISet<IEnumerable<Foo>>
So, I need to be able to create sets of sequences. E.g. this is ok:
ABC,AC,A
but this is not (since "AB" is repeated here"):
AB,A,ABC,BCA,AB
But, in order to do this - for "set" to not contain duplicates, I need to wrap my IEnumerable in some kind of other data type:
ISet<Seq>
//where
Seq : IEnumerable<Foo>, IEquatable<Seq>
Thus, I will be able to compare two sequences, and provide the Set data structure with a way of eliminating duplicates.
My question is: is there a fast data structure that allows for comparing sequences? I am thinking that somehow when Seq gets created, or added two, some kind of cumulative value is computed.
In other words, is it possible to implement Seq in such a way that I could do this:
var seq1 = new Seq( IList<Foo> );
var seq2 = new Seq( IList<Foo> )
seq1.equals(seq2) // O(1)
Thanks.
I have provided an implementation your sequence below. There are several points to note:
This only works if the IEnumerable<T> returns the same items every time it is enumerated, and that those items are not mutated during the scope of this object.
The hash code is cached. The first time it is requested it calculated it (feel free to improve the hash code algorithm if you know a better one) based on a full iteration of the underlying sequence. Because it only needs to be calculated once, this can be effectively considered O(1) if you compute it often. It's likely that adding to the set will be a bit slower (first time computation of the hash value) but searching or removing will be very quick.
The equals method first compares the hash codes. If the hash codes are different then the objects cannot possibly be equal (if the hash codes were properly implemented on all objects in the sequence, and nothing was mutated). As long as you have a low rate of collision, and are usually comparing items that aren't actually equal, this means that equals checks will not often get past that hash code check. If they do, an iteration of the sequence is needed (there is no way around that). Because of that the equals is likely to average O(1), even though its worst case is still O(n).
public class Foo : IEnumerable
{
private IEnumerable sequence;
private int? myHashCode = null;
public Foo(IEnumerable<T> sequence)
{
this.sequence = sequence;
}
public IEnumerator<T> GetEnumerator()
{
return sequence.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return sequence.GetEnumerator();
}
public override bool Equals(object obj)
{
Foo<T> other = obj as Foo<T>;
if(other == null)
return false;
//if the hash codes are different we don't need to bother doing a deep equals check
//the hash code is cached, so it's fast.
if (GetHashCode() != obj.GetHashCode())
return false;
return Enumerable.SequenceEqual(sequence, other.sequence);
}
public override int GetHashCode()
{
//note that the hash code is cached, so the underlying sequence
//needs to not change.
return myHashCode ?? populateHashCode();
}
private int populateHashCode()
{
int somePrimeNumber = 37;
myHashCode = 1;
foreach (T item in sequence)
{
myHashCode = (myHashCode * somePrimeNumber) + item.GetHashCode();
}
return myHashCode.Value;
}
}
O(1) essentially mean you are not allowed to compare values of elements. If you can represent sequence as list of immutable objects (with caching on add so there is no duplicates across all instances) you can achieve it as you'd only need to compare first element - similar how string interning works.
Insert will have to search for all instances of elements for "current"+"with this next" element. Some sort of dictionary may be reasonable approach...
EDIT: I think it simply tried to come up with suffix tree.

Implementation of Dictionary where equivalent contents are equal and return the same hash code regardless of order of insertion

I need to use Dictionary<long, string> collections that given two instances d1 and d2 where they each have the same KeyValuePair<long, string> contents, which could be inserted in any order:
(d1 == d2) evaluates to true
d1.GetHashCode() == d2.GetHashCode()
The first requirement was achieved most easily by using a SortedDictionary instead of a regular Dictionary.
The second requirement is necessary because I have one point where I need to store Dictionary<Dictionary<long, string>, List<string> - the main Dictionary type is used as the key for another Dictionary, and if the HashCodes don't evaluate based on identical contents, the using ContainsKey() will not work the way that I want (ie: if there is already an item inserted into the dictionary with d1 as its key, then dictionary.ContainsKey(d2) should evaluate to true.
To achieve this, I have created a new object class ComparableDictionary : SortedDictionary<long, string>, and have included the following:
public override int GetHashCode() {
StringBuilder str = new StringBuilder();
foreach (var item in this) {
str.Append(item.Key);
str.Append("_");
str.Append(item.Value);
str.Append("%%");
}
return str.ToString().GetHashCode();
}
In my unit testing, this meets the criteria for both equality and hashcodes. However, in reading Guidelines and Rules for GetHashCode, I came across the following:
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.
If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.
Remember, objects can be put into hash tables in ways that you didn't expect. A lot of the LINQ sequence operators use hash tables internally. Don't go dangerously mutating objects while enumerating a LINQ query that returns them!
Now, the Dictionary<ComparableDictionary, List<String>> is used only once in code, in a place where the contents of all ComparableDictionary collections should be set. Thus, according to these guidelines, I think that it would be acceptable to override GetHashCode as I have done (basing it completely on the contents of the dictionary).
After that introduction my questions are:
I know that the performance of SortedDictionary is very poor compared to Dictionary (and I can have hundreds of object instantiations). The only reason for using SortedDictionary is so that I can have the equality comparison work based on the contents of the dictionary, regardless of order of insertion. Is there a better way to achieve this equality requirement without having to use a SortedDictionary?
Is my implementation of GetHashCode acceptable based on the requirements? Even though it is based on mutable contents, I don't think that that should pose any risk, since the only place where it is using (I think) is after the contents have been set.
Note: while I have been setting these up using Dictionary or SortedDictionary, I am not wedded to these collection types. The main need is a collection that can store pairs of values, and meet the equality and hashing requirements defined out above.
Your GetHashCode implementation looks acceptable to me, but it's not how I'd do it.
This is what I'd do:
Use composition rather than inheritance. Aside from anything else, inheritance gets odd in terms of equality
Use a Dictionary<TKey, TValue> variable inside the dictionary
Implement GetHashCode by taking an XOR of the individual key/value pair hash codes
Implement equality by checking whether the sizes are the same, then checking every key in "this" to see if its value is the same in the other dictionary.
So something like this:
public sealed class EquatableDictionary<TKey, TValue>
: IDictionary<TKey, TValue>, IEquatable<ComparableDictionary<TKey, TValue>>
{
private readonly Dictionary<TKey, TValue> dictionary;
public override bool Equals(object other)
{
return Equals(other as ComparableDictionary<TKey, TValue>);
}
public bool Equals(ComparableDictionary<TKey, TValue> other)
{
if (ReferenceEquals(other, null))
{
return false;
}
if (Count != other.Count)
{
return false;
}
foreach (var pair in this)
{
var otherValue;
if (!other.TryGetValue(pair.Key, out otherValue))
{
return false;
}
if (!EqualityComparer<TValue>.Default.Equals(pair.Value,
otherValue))
{
return false;
}
}
return true;
}
public override int GetHashCode()
{
int hash = 0;
foreach (var pair in this)
{
int miniHash = 17;
miniHash = miniHash * 31 +
EqualityComparer<TKey>.Default.GetHashCode(pair.Key);
miniHash = miniHash * 31 +
EqualityComparer<Value>.Default.GetHashCode(pair.Value);
hash ^= miniHash;
}
return hash;
}
// Implementation of IDictionary<,> which just delegates to the dictionary
}
Also note that I can't remember whether EqualityComparer<T>.Default.GetHashCode copes with null values - I have a suspicion that it does, returning 0 for null. Worth checking though :)

IEqualityComparer for Value Objects

I have an immutable Value Object, IPathwayModule, whose value is defined by:
(int) Block;
(Entity) Module, identified by (string) ModuleId;
(enum) Status; and
(entity) Class, identified by (string) ClassId - which may be null.
Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
int hx = GetHashCode(x);
int hy = GetHashCode(y);
return hx == hy;
}
public int GetHashCode(IPathwayModule obj)
{
int h;
if (obj.Class != null)
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
}
else
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
}
return h;
}
}
IPathwayModule is definitely immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.
I suppose my questions are:
Am I using the interface correctly in this case?
Are there cases where I might not see the desired behaviour?
Is there any way to improve the robustness, performance?
Are there any good practices that I am not following?
Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:
return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...
Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.
return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...
You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...
The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.
GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
if (obj.class != null)
h += obj.Class.ClassId.GetHashCode();
return h;
}
}
The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.
You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.
There could be multiple objects with the same hash code (this is called 'hash code collisions').
Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.
You should implement better versions of Equals and GetHashCode.
For instance, the hash code of enums is simply their numerical value.
In other words, with these two enums:
public enum A { x, y, z }
public enum B { k, l, m }
Then with your implementation, the following value type:
public struct AB {
public A;
public B;
}
the following two values would be considered equal:
AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };
I'm assuming you don't want that.
Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.
Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
It is proabably better to use ^ rather than + to combine the hash codes.
If I understand you well, you'd like to hear some comments on your code. Here're my remarks:
GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
You say your value types are immutable, but you reference objects (.Class), which are not immutable
Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.
Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.
EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.
Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
if (x == y) return true;
if (x == null || y == null) return false;
if ((x.Class == null) ^ (y.Class == null)) return false;
if (x.Class == null) //and implicitly y.Class == null
{
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
}
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
}
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
if (obj.Class != null)
{
h ^= obj.Class.ClassId.GetHashCode();
}
return h;
}
}
}

Categories