I want to check if an object is in a Queue before I enqueue it. If don't explicitly define an EqualityComparer, what does the Contains() function compare?
If it compares property values, that's perfect. If it compares to see if a reference to that object exists in the Queue then that defeats what I'm trying to accomplish in my code.
For classes, the default equality operation is by reference - it assumes that object identity and equality are the same, basically.
You can overcome this by overriding Equals and GetHashCode. I'd also suggest implementing IEquatable<T> to make this clear. Your hash code implementation should generate the hash code from the same values as the equality operation.
The default for reference types is to compare the reference.
However, if the type implements IEquatable<> it can be doing a different comparison. If you need to have a specific equality comparison in place, you need to create one yourself.
Related
Attempt #3 to simplify this question:
A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Questions:
Am I understanding that correctly?
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
In general, how does one separate value equality from reference equality when overriding .Equals()?
The original line of code spurring this question:
// For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;
StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));
.
Ok, let's handle a few misconceptions first:
By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.
This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.
I can't think of a single case where that would be desirable for a reference type.
Oh I'm sure you can... String is a reference type for instance :)
What I'm hearing from #Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().
Err... No.
IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:
It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)
So, for your actual questions:
Am I understanding that correctly?
You still seem a bit confused about this, but don't worry :)
Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()
By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()
Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).
It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.
What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...
Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.
Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.
Am I understanding that correctly?
Partially - the "default" IEqualityComparer will use either (in order):
The implementation of IEquatable<T>
An overridden Equals(object)
the base object.Equals(object), which is reference equality for reference types.
I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.
The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.
So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.
If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.
Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.
What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.
It would be strange to have an instance method accept itself as a parameter...
Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>
Bottom Line
If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.
Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.
This question already has answers here:
Should an override of Equals on a reference type always mean value equality?
(3 answers)
Closed 7 years ago.
Let's consider Polygon class. Check for equality should compare references most of the time, but there are many situations where value equality comes in handy (like when one compares two polygons with Assert.AreEqual).
My idea is to make value equality somewhat secondary to reference equality. In this case it's pretty obvious that ==operator should keep its default reference check implementation.
What about object.Equals() and IEquatable<Polygon>.Equals() then? MSDN doesn't imply that == and .Equals() should do the same but still - wouldn't it make the behavior of Polygon objects too ambiguous?
Also, the Polygon class is mutable.
MSDN is almost clear about it
To check for reference equality, use ReferenceEquals. To check for
value equality, you should generally use Equals. However, Equals as it
is implemented by Object just performs a reference identity check. It
is therefore important, when you call Equals, to verify whether the
type overrides it to provide value equality semantics. When you create
your own types, you should override Equals.
By default, the operator == tests for reference equality by
determining if two references indicate the same object, so reference
types do not need to implement operator == in order to gain this
functionality. When a type is immutable, meaning the data contained in
the instance cannot be changed, overloading operator == to compare
value equality instead of reference equality can be useful because, as
immutable objects, they can be considered the same as long as they
have the same value. Overriding operator == in non-immutable types is
not recommended.
IEquatable documentation is also very clear
Defines a generalized method that a value type or class implements to
create a type-specific method for determining equality of instances.
A major difficulty with equality testing in .NET (and also Java) is that there are two useful equivalence relations, each based on a question that can be sensibly asked of any class object, but .NET isn't consistent about which question or relationship should be encapsulated by Equals and GetHashCode supposed to answer. The questions are:
Will you always and forever be equivalent to the object identified by some particular reference, no matter what happens to you.
Will you consider yourself equivalent to the object identified by some particular reference unless or until something with a reference to you does something that would affect that equivalence.
For immutable objects, both relationships should test value equality. For mutable objects, the first question should test referential equivalence and the second should test value equality. For an immutable object which holds a reference to an object which is of mutable type, but which nobody will ever mutate, both questions should test value equality of that encapsulated object.
My personal recommendation would be that mutable objects not override Object.Equals, but that they provide a static property that returns an IEqualityComparer which tests value equality. This would require that
any object that immutably encapsulates the mutable object will have to
get that IEqualityComparer to be able to report the encapsulated object's
value-equivalence relation as its own, but having an IEqualityComparer
would make it possible to store such things in e.g. a Dictionary provided
they are never modified.
I am using Dictionary using objects like DirectoryInfo and FileInfo as keys. Is it safe to use them that way? How can I make sure an object type can be safely used this way?
In general, the only way to know exactly how this will behave is to check the reference source. You can check the documentation to see whether IEquatable<T> is implemented (in which case you'll get "expected" behavior), as well.
If the type in question doesn't override Equals and GetHashCode, then the default hashing and equality for any object will be used. This means you can use it as a key, but only if your lookups are being done with the same instance of the type, since the equality is reference equality by default for System.Object.
In this case, DirectoryInfo and related classes do not override those methods, which means you'll get the default equality and hashing of object, which is likely not what you want. I would, instead, recommend using the FullName property as a key, since it's a a string and provides the proper equality semantics for use in a dictionary.
To use it as a key it would make sense that two DirectoryInfo objects that represent the same directory have the same hash code.
Let's see if it's the case!
new System.IO.DirectoryInfo(#"C:\").GetHashCode(); // 25422474
new System.IO.DirectoryInfo(#"C:\").GetHashCode(); // 48007696
The hash codes are different. You can use it as a key, but it won't make much sense. You'd be better off using the full path.
You can also see from the available sources that GetHashCode is not overridden.
Normally, the objects would be recycled when the method is exited, but since you've added them to a container, they are 'pinned' in memory. So in that sense they're safe. However, if an external agent deletes the file or folder, then you have an object that refers to a non-existent entity. So test if they exist before attempting to interact with them.
I just want to confirm my understanding of a few fundamentals. Hope you don't mind!
I understand the static equals method
Object.Equals(objA, objB)
first checks for reference equality. If not equal by reference, then calls the object instance equals method
objA.Equals(objB)
Currently in my override for equals, i first check for reference equality, and if not equal referentially then check with all members to see if the semantics are the same. Is this a good approach? If so, then the static version seems superfluous?
Also what exactly does the default GetHashCode for an object do?
If I add my object to a dictionary which is a HashTable underneath and don't override equals and GetHashCode, then I guess I should do to make it sort optimally hence better retrieval time?
Currently in my override for equals, i first check for reference
equality, and if not equal referentially then check with all members
to see if the semantics are the same. Is this a good approach? If so,
then the static version seems superfluous?
Yes, it's a great idea to do the fast reference-equality check. There's no guarantee that your method will be called through the static Object.Equals method - it could well be called directly. For example, EqualityComparer<T>.Default (the typical middleman for equality checking) will directly call this method in many situations (when the type does not implement IEquatable<T>) without first doing a reference-equality check.
Also what exactly does the default GetHashCode for an object do?
It forwards toRuntimeHelpers.GetHashCode: a magic, internally-implemented CLR method that is a compliant GetHashCode implementation for reference-equality. For more information, see Default implementation for Object.GetHashCode(). You should definitely override it whenever you override Equals.
EDIT:
If I add my object to a dictionary which is a HashTable underneath and
don't override equals and GetHashCode, then I guess I should do to
make it sort optimally hence better retrieval time?
If you don't override either, you'll get reference-equality with (probably) a well-balanced table.
If you override one but not the other or implement them in any other non-compliant way, you'll get a broken hashtable.
By the way, hashing is quite different from sorting.
For more information, see Why is it important to override GetHashCode when Equals method is overriden in C#?
Your first question was already answered, but I think the second was not fully answered.
Implementing your GetHashCode is important if you want to use your object as a key in a hash table or a dictionary. It minimizes collisions and therefore it speeds the lookup. A lookup collision happens when two or more keys have the same hashcode and for those equals method is invoked. If the hashcode is unique, an equals will only be called once, otherwise it will be called for every key with the same hashcode until the equals returns true.
I have a mutable class that I'm using as a key to a generic dictionary. Two keys should be equal only if their references are equal.
From what I've read, in this case, I don't need to override Equals, GetHashCode , or implement IEqualityComparer.
Is this correct?
Yes. The default comparison operation in System.Object uses reference equality. If this behavior is what you want, the defaults should work fine.
Yes, this is correct. As long as you don't override, reference is the default comparison.
I'll add on to what everyone else has said here (yes) but with one more point that no one seems to have mentioned here.
When using generic collections (Dictionary, List, etc) you can override IEquatable to provide a type specific version that can do your comparison without boxing or up/down casting. These generic collections will use this overload when present to do comparisons and it can be a bit more efficient.
As noted in the docs, when implementing IEquatable you still need to override Equals/Hashcode from Object.
As everyone else pointed out already, yes, you are correct. In fact, you definitely do not want to override the equality members if your type is mutable (it has setters). But, if you want to have equality checking which uses values in your type, you can make your type immutable (like String) by ensuring that there are no setters (only the constructor sets values). Or use a struct.
For anybody using .Net 5 or later it comes with a ReferenceEqualityComparer class that you can pass to the dictionary's constructor. This means you don't need to worry about someone overriding GetHashCode and Equals in the future.
Yes you are correct doing a == comparison (or .Equals) on two objects compares their references if no other overload is specified.
String s = "a";
object test1 = (object)s;
object test2 = (object)s;
Debug.Assert(test1.Equals(test2));