GetHashCode() based on a primary key - is it safe? - c#

a class has an ID property and this property gets value from a primary key column of an SQL table.
Is it a good practice if I write
public override int GetHashCode()
{
return this.ID + GetType().GetHashCode();
}
into my class? (Equals overrided already on the same way.)

Why would you particularly want to include the type in the hashcode? I can see how that could be useful if you had a lot of different types of object with the same ID in the same map, but normally I'd just use
public override int GetHashCode()
{
return ID; // If ID is an int
// return ID.GetHashCode(); // otherwise
}
Note that ideas of equality become tricky within inheritance hierarchies - another reason to prefer composition over inheritance. Do you actually need to worry about this? If you can seal your class, it will make the equality test easier as you only need to write:
public override bool Equals(object obj)
{
MyType other = obj as other;
return other != null && other.ID == ID;
}
(You may well want to have a strongly-typed Equals method and implement IEquatable.)

Why can't you just do
public override int GetHashCode() {
return this.ID.GetHashCode();
}
I am not sure if what you are doing is good practice because I am not familiar with how the hash code is assigned to a type instance. And the purpose of the hashcode is to have a consistence representation of the object in Int32 form.

Related

Is it safe to override GetHashCode and get it from string property?

I have a class:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
The purpose of overriding GetHashCode is that I want to have only one occurence of an object with specified name in Dictionary.
But is it safe to get hash code from string?
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
But is it safe to get hash code from string?
Yes, it is safe. But, what you're doing isn't. You're using a mutable string field to generate your hash code. Let's imagine that you inserted an Item as a key for a given value. Then, someone changes the Name string to something else. You now are no longer able to find the same Item inside your Dictionary, HashSet, or whichever structure you use.
More-so, you should be relying on immutable types only. I'd also advise you to implement IEquatable<T> as well:
public class Item : IEquatable<Item>
{
public Item(string name)
{
Name = name;
}
public string Name { get; }
public bool Equals(Item other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(Name, other.Name);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Item) obj);
}
public static bool operator ==(Item left, Item right)
{
return Equals(left, right);
}
public static bool operator !=(Item left, Item right)
{
return !Equals(left, right);
}
public override int GetHashCode()
{
return (Name != null ? Name.GetHashCode() : 0);
}
}
is there any chance that two objects with different values of property
Name would return the same hash code?
Yes, there is a statistical chance that such a thing will happen. Hash codes do not guarantee uniqueness. They strive for uni-formal distribution. Why? because your upper boundary is Int32, which is 32bits. Given the Pigenhole Principle, you may happen at end up with two different strings containing the same hash code.
Your class is buggy, because you have a GetHashCode override, but no Equals override. You also don't consider the case where Name is null.
The rule for GetHashCode is simple:
If a.Equals(b) then it must be the case that a.GetHashCode() == b.GetHashCode().
The more cases where if !a.Equals(b) then a.GetHashCode() != b.GetHashCode() the better, indeed the more cases where !a.Equals(b) then a.GetHashCode() % SomeValue != b.GetHashCode() % SomeValue the better, for any given SomeValue (you can't predict it) so we like to have a good mix of bits in the results. But the vital thing is that two objects considered equal must have equal GetHashCode() results.
Right now this isn't the case, because you've only overridden one of these. However the following is sensible:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public override bool Equals(object obj)
{
var asItem = obj as Item;
return asItem != null && Name == obj.Name;
}
}
The following is even better, because it allows for faster strongly-typed equality comparisons:
public class Item : IEquatable<Item>
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public bool Equals(Item other)
{
return other != null && Name == other.Name;
}
public override bool Equals(object obj)
{
return Equals(obj as Item);
}
}
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
Yes, this can happen, but it won't happen often, so that's fine. The hash-based collections like Dictionary and HashSet can handle a few collisions; indeed there'll be collisions even if the hash codes are all different because they're modulo'd down to a smaller index. It's only if this happens a lot that it impacts performance.
Another danger is that you'll be using a mutable value as a key. There's a myth that you shouldn't use mutable values for hash-codes, which isn't true; if a mutable object has a mutable property that affects what it is considered equal with then it must result in a change to the hash-code.
The real danger is mutating an object that is a key to a hash collection at all. If you are defining equality based on Name and you have such an object as the key to a dictionary then you must not change Name while it is used as such a key. The easiest way to ensure that is to have Name be immutable, so that is definitely a good idea if possible. If it is not possible though, you need to be careful just when you allow Name to be changed.
From a comment:
So, even if there is a collision in hash codes, when Equals will return false (because the names are different), the Dictionary will handle propertly?
Yes, it will handle it, though it's not ideal. We can test this with a class like this:
public class SuckyHashCode : IEquatable<SuckyHashCode>
{
public int Value { get; set; }
public bool Equals(SuckyHashCode other)
{
return other != null && other.Value == Value;
}
public override bool Equals(object obj)
{
return Equals(obj as SuckyHashCode);
}
public override int GetHashCode()
{
return 0;
}
}
Now if we use this, it works:
var dict = Enumerable.Range(0, 1000).Select(i => new SuckyHashCode{Value = i}).ToDictionary(shc => shc);
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = 3})); // True
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = -1})); // False
However, as the name suggests, it isn't ideal. Dictionaries and other hash-based collections all have means to deal with collisions, but those means mean that we no longer have the great nearly O(1) look-up, but rather as the percentage of collisions gets greater the look-up approaches O(n). In the case above where the GetHashCode is as bad as it could be without actually throwing an exception, the look-up would be O(n) which is the same as just putting all the items into an unordered collection and then finding them by looking at every one to see if it matches (indeed, due to differences in overheads, it's actually worse than that).
So for this reason we always want to avoid collisions as much as possible. Indeed, to not just avoid collisions, but to avoid collisions after the result has been modulo'd down to make a smaller hash code (because that's what happens internally to the dictionary).
In your case though because string.GetHashCode() is reasonably good at avoiding collisions, and because that one string is the only thing that equality is defined by, your code would in turn be reasonably good at avoiding collisions. More collision-resistant code is certainly possible, but comes at a cost to performance in the the code itself* and/or is more work than can be justified.
*(Though see https://www.nuget.org/packages/SpookilySharp/ for code of mine that is faster than string.GetHashCode() on large strings on 64-bit .NET and more collision-resistant, though it is slower to produce those hash codes on 32-bit .NET or when the string is short).
Instead of using GetHashCode to prevent duplicates to be added to a dictionary, which is risky in your case as explained already, I would recommend to use a (custom) equality comparer for your dictionary.
If the key is an object, you should create an own equality comparer that compares the string Name value. If the key is the string itself, you can use StringComparer.CurrentCulture for example.
Also in this case it is key to make the string immutable, since else you might invalidate your dictionary by changing the Name.

Hibernate and Equality checks with lazy loaded objects

I had some fun with hibernate. A function like that:
public class Key
{
public virtual bool IsEqual(Key key)
{
return this == key;
}
}
One would expect this function to always return true if the parameter was the same as the instance where IsEqual was called upon:
Assert.IsTrue(MyKey.IsEqual(MyKey));
But this is only the case as long as the instance "myKey" is not a lazily loaded object / proxy. A KeyProxy will delegate that call to the internal Key object that is wrapped, and this results in the wrapped object to be compared with the Proxy object (which will in turn fail).
Basically, it has also been discussed here : NHibernate, proxies and equality
The solution there is a little bit disappointing. Overriding equals to compare the primary key properties has the drawback that it only works for objects that already have a value, whereas new objects don´t have a primary key value until saved. I could try to force new objects to directly receive a valid primary key value, but that doesnt sound like a great way of handling this issue.
Is there a better (more general) way known to handle such situations? Would´nt overriding Equals and comparing with a unique (non-persisted) property just do the trick?
Something like that?
public object Identifier {get; private set;}
public Key()
{
Identifer = new object();
}
public override bool Equals(object obj)
{
if (obj == null)
{
return false;
}
Key k = obj as Key;
if (k == null)
{
return false;
}
return this.Identifier == key.Identifier;
}
To overcome this and other problems, such as using an identity column as the primary key, we added a GUID to the base class of our domain model, object creation is handled by factory classes that gives each entity a GUID and this is then persisted as part of the entity.
The GUID is then used to compare entities, basicly we use it in the Equals() and GetHashCode() methods.
public override int GetHashCode()
{
return this.EqualityIdentifier.GetHashCode();
}
public override bool Equals(object obj)
{
IDomainObject Obj = obj as IDomainObject
if (Obj == null)
{
return false;
}
return this.EqualityIdentifier == Obj.EqualityIdentifier;
}
To have a minimum performance impact, I decided to use a non-persisted readonly int property "Identifier" that is filled (lazily/at first access) by a small static and thread-safe number generator method.
private static int _equalityIdentifierSequence;
private static int GenerateEqualityIdentifier()
{
Interlocked.Increment(ref _equalityIdentifierSequence);
return _equalityIdentifierSequence;
}
I am quite comfortable with the fact that two objects that were loaded from different sessions but are representing the same entity are regarded as "not equal", so the GUID strategy did not look that promising to me. The original problem of proxies compared with their wrapped objects seems to be solved with that.

Implemeting GetHashCode and Equals methods for ValueObjects

There is a passage from NHibernate documentation:
Note: if you define an ISet of composite elements, it is very important to implement Equals() and GetHashCode() correctly.
What does correctly mean there? Is it neccessary to implement those methods for all value objects in domain?
EXTENDING MY QUESTION
In the article Marc attached user Albic states:
It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.
I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.
I suddenly realized what I've got inside my AbstractEntity class:
public abstract class AbstractEntity<T> where T : AbstractEntity<T> {
private Nullable<Int32> hashCode;
public virtual Guid Id { get; protected set; }
public virtual Byte[] Version { get; set; }
public override Boolean Equals(Object obj) {
var other = obj as T;
if(other == null) {
return false;
}
var thisIsNew = Equals(this.Id, Guid.Empty);
var otherIsNew = Equals(other.Id, Guid.Empty);
if(thisIsNew && otherIsNew) {
return ReferenceEquals(this, other);
}
return this.Id.Equals(other.Id);
} // public override Boolean Equals(Object obj) {
public override Int32 GetHashCode() {
if(this.hashCode.HasValue) {
return this.hashCode.Value;
}
var thisIsNew = Equals(this.Id, Guid.Empty);
if(thisIsNew) {
this.hashCode = base.GetHashCode();
return this.hashCode.Value;
}
return this.Id.GetHashCode();
} // public override Int32 GetHashCode() {
public static Boolean operator ==(AbstractEntity<T> l, AbstractEntity<T> r) {
return Equals(l, r);
}
public static Boolean operator !=(AbstractEntity<T> l, AbstractEntity<T> r) {
return !Equals(l, r);
}
} // public abstract class AbstractEntity<T>...
As all components are nested within entities should I then implement Equals() and GetHashCode() for them?
Correctly means that GetHashCode returns the same hash code for the entities that are expected to be equal. Because equality of 2 entities is made by comparison of that code.
On the other side, that means that for entities that are not equal, the uniqueness of hash code has to be guaranteed, as much as it possible.
The documentation for Equals and GetHashCode explain this well and include specific guidance on implementation for value objects. For value objects, Equals is true if the objects are the same type and the public and private fields are equal. However, this explanation applies to framework value types and you are free to create your own Equals by overriding it.
GetHashCode has two rules that must be followed:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not
compare as equal, the GetHashCode methods for the two object do not
have to return different values.
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state
that determines the return value of the object's Equals method. Note
that this is true only for the current execution of an application,
and that a different hash code can be returned if the application is
run again.

Why do we need GetHashCode() function in the Object Model Project? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is it important to override GetHashCode when Equals method is overriden in C#?
I was looking into the following class in my Object Model and could not understand the significance of adding GetHashCode() in the Class.
Sample Class
public class SampleClass
{
public int ID { get; set; }
public String Name { get; set; }
public String SSN_Number { get; set; }
public override bool Equals(Object obj)
{
if (obj == null || GetType() != obj.GetType())
return false;
SampleClass cls = (SampleClass)obj;
return (ID == cls.ID) &&
(Name == cls.Name) &&
(SSN_Number == cls.SSN_Number);
}
public override int GetHashCode()
{
return ID.GetHashCode() ^ Name.GetHashCode() ^ SSN_Number.GetHashCode();
}
}
Suppose I have a list of Sample Class Object and I want to get a specific index. Then Equals() can help me to get that record. Why should I use GetHashCode() ?
You need to handle both, because GetHashCode() is used by many collection implementations (like Dictionary) in concert with the Equals method. The important thing is that if you override the implementation of Equals, then you must override GetHashCode in such a way that any two objects that are Equal according to your new implementation also must return an identical Hash Code.
If they don't, then they will not work in Dictionary's properly. It's generally not that hard. One way that I often times do this is by taking the Properties of an object that I use for equality, and joining them together in a String object, and then return String.GetHashCode.
String has a pretty good implementation of GetHashCode that returns a wide range of integers for various values that make for good spreads in a sparse collection.
It is necessary to provide an override to GetHashCode, when your custom class overrides Equals. If you omit GetHashCode, you will get a compiler warning saying "A public type overrides System.Object.Equals but does not override System.Object.GetHashCode".
GetHashCode returns a value based on the current instance that is suited for hashing algorithms and data structures such as a hash table. Two objects that are the same type and are equal must return the same hash code to ensure that instances of System.Collections.HashTable and System.Collections.Generic.Dictionary<TKey, TValue> work correctly.
Suppose it was not necessary to override the GetHashCode in your custom class, the hash based collections would have to then use the base class' Object.GetHashCode which might not give correct results for all instances of your custom class.
If you observe the code you have posted, your Equals method compares
ID, Name and SSN for the 2 instances to return equality result
and the same attributes are being used for the hashing algorithm
(ID^Name^SSN) inside your GetHashCode method.

What should GetHashCode return when object's identifier is null?

Which of the following is correct/better, considering that identity property could be null.
public override int GetHashCode()
{
if (ID == null) {
return base.GetHashCode();
}
return ID.GetHashCode();
}
OR
public override int GetHashCode()
{
if (ID != null) {
return ID.GetHashCode();
}
return 0;
}
Update 1: Updated 2nd option.
Update 2: Below are the Equals implementations:
public bool Equals(IContract other)
{
if (other == null)
return false;
if (this.ID.Equals(other.ID)) {
return true;
}
return false;
}
public override bool Equals(object obj)
{
if (obj == null)
return base.Equals(obj);
if (!obj is IContract) {
throw new InvalidCastException("The 'obj' argument is not an IContract object.");
} else {
return Equals((IContract)obj);
}
}
And ID is of string type.
It really depends on what you want equality to mean - the important thing is that two equal objects return the same hashcode. What does equality mean when ID is null? Currently your Equals method would have to return true if the ID properties have the same value... but we don't know what it does if ID is null.
If you actually want the behaviour of the first version, I'd personally use:
return ID == null ? base.GetHashCode() : ID.GetHashCode();
EDIT: Based on your Equals method, it looks like you could make your GetHashCode method:
return ID == null ? 0 : ID.GetHashCode();
Note that your Equals(IContract other) method could also look like this:
return other != null && object.Equals(this.ID, other.ID);
Your current implementation will actually throw an exception if this.ID is null...
Additionally, your Equals(object) method is incorrect - you shouldn't throw an exception if you're passed an inappropriate object type, you should just return false... ditto if obj is null. So you can actually just use:
public override bool Equals(object obj)
{
return Equals(obj as IContract);
}
I'm concerned about equality based on an interface, however. Normally two classes of different types shouldn't be considered to be equal even if the implement the same interfaces.
You can simply return 0; , you need to return same HashCode for same values and 0 wont be often returned by ID.GetHashCode() so such Hash function can be pretty ok for any needs. Since your not combining any values (like ID and Name Hashes ) its pretty clear ID is the defining source of HashCode so fixed 0 for Null ID sounds reasonable.
Otherwise it might be true that your whole approach on GetHashCode override only taking into account ID field is wrong ( and you need to combine several fields to compute hash from them)
After your edits I can say that second Equals override has too much code , simply replace it with
public override bool Equals(object obj)
{
return Equals(obj as Contract);
}
Your Equals(IContract contract) override appears buggy to me cause only thing defining contract is ID and if IContract has more fields than ID its going to be a bad Equals override.
PS: Actually if IContract is an interface you probably need to replace your IEquatable<IContract> to a concrete IEquatable<ClassName> contract because its going to be bad design to be able to return that Different class intstances implementing the same interface are equal cause equality by definition requires to check that objects have the same Type on the first stage of equality check (usually in like 99,9% cases)
Perhaps what you want is something like this?
override int GetHashCode()
{
if (ID != null)
return ID.GetHashCode();
return DBNull.Value.GetHashCode();
}
The important thing is this, should two objects with null IDs be considered equal?

Categories