Why GetHashCode is not a property like HashCode in .NET?
Probably because it requires computation, and exposing it as a propery might imply that the hashcode is already available for free.
Edit:
Guidelines on this: Properties versus Methods
"The operation is expensive enough that you want to communicate to the user that they should consider caching the result."
Perhaps GetHashCode is expensive enough in some cases.
I don't think there's any good reason. Any implemention of GetHashCode should be fast enought to put into a property. That said, there are plenty of design flaws in the .Net framework, some small, some serious. This seems like a small one.
Often it is not possible to define a HashCode for a class that makes since:
e.g. the objects of the class don’t
have a well defined concept of
identity.
Therefore it is common to make the GetHashCode() method throw a NotImplementedException. This would course all sort of problem if HashCode was a property, as most people (and debuggers) assume it is always valid to get the value of a property
Besides that a property is nothing else than a getter and a setter method, from a design perspective a property should never contain any computations other than initializing or validation, eg:
private object _obj;
public object Obj
{
get
{
if(_obj == null)
{
_obj = new object();
}
return _obj;
}
set
{
if(value == badvalue)
{
throw new ArgumentException("value");
}
_obj = value;
}
}
GetHashCode() does not contain extensive computations, but it could contain such long running operations (just from the fact that it could compute the hashcode of an object in a complex manner), this is why its a method instead of a property.
properties should only be used if the computation behind them is really fast or cached
besides most of the time the only logic in properties should be validation
You have to remember that the .NET Framework is designed to be accessed by a wide variety of languages.
In theory you could create a compiler that is incapable of correctly overriding properties. While that would make for a pretty crappy compiler, it would not necessarily be illegal. (Remember properties are just methods with some meta data)
Related
I'm implementing IEquatable<T>, and I am having difficulty finding consensus on the GetHashCode override on a mutable class.
The following resources all provide an implementation where GetHashCode would return different values during the object's lifetime if the object changes:
https://stackoverflow.com/a/13906125/197591
https://csharp.2000things.com/tag/iequatable/
http://broadcast.oreilly.com/2010/09/understanding-c-equality-iequa.html
However, this link states that GetHashCode should not be implemented for mutable types for the reason that it could cause undesirable behaviour if the object is part of a collection (and this has always been my understanding also).
Interestingly, the MSDN example implements the GetHashCode using only immutable properties which is in line with my understanding. But I'm confused as to why the other resources don't cover this. Are they simply wrong?
And if a type has no immutable properties at all, the compiler warns that GetHashCode is missing when I override Equals(object). In this case, should I implement it and just call base.GetHashCode() or just disable the compiler warning, or have I missed something and GetHashCode should always be overridden and implemented? In fact, if the advice is that GetHashCode should not be implemented for mutable types, why bother implementing for immutable types? Is it simply to reduce collisions compared to the default GetHashCode implementation, or does it actually add more tangible functionality?
To summarise my Question, my dilemma is that using GetHashCode on mutable objects means it can return different values during the lifetime of the object if properties on it change. But not using it means that the benefit of comparing objects that might be equivalent is lost because it will always return a unique value and thus collections will always fall back to using Equals for its operations.
Having typed this Question out, another Question popped up in the 'Similar Questions' box that seems to address the same topic. The answer there seems to be quite explicit in that only immutable properties should be used in a GetHashCode implementation. If there are none, then simply don't write one. Dictionary<TKey, TValue> will still function correctly albeit not at O(1) performance.
Mutable classes work quite bad with Dictionaries and other classes that relies on GetHashCode and Equals.
In the scenario you are describing, with mutable object, I suggest one of the following:
class ConstantHasCode: IEquatable<ConstantHasCode>
{
public int SomeVariable;
public virtual Equals(ConstantHasCode other)
{
return other.SomeVariable == SomeVariable;
}
public override int GetHashCode()
{
return 0;
}
}
or
class ThrowHasCode: IEquatable<ThrowHasCode>
{
public int SomeVariable;
public virtual Equals(ThrowHasCode other)
{
return other.SomeVariable == SomeVariable;
}
public override int GetHashCode()
{
throw new ApplicationException("this class does not support GetHashCode and should not be used as a key for a dictionary");
}
}
With the first, Dictionary works (almost) as expected, with performance penalty in lookup and insertion: in both cases, Equals will be called for every element already in the dictionary until a comparison return true. You are actually reverting to performance of a List
The second is a way to tell the programmers will use your class "no, you cannot use this within a dictionary".
Unfortunately, as far as I know there is no method to detect it at compile time, but this will fail the first time the code adds an element to the dictionary, very likely quite early while developping, not the kind of bug happening only in production environment with an unpredicted set of input.
Last but not least, ignore the "mutable" problem and implement GetHashCode using member variables: now you have to be aware that you are not free to modify the class when it's used withing a Dictionary. In some scenario this can be acceptable, in other it's not
It all depends of what kind of collection type you are talking about. For my answer I will assume you are talking about Hash Table based collections and in particular I will address it for .NET Dictionary and Key calculation.
So best way to identify what will happen if you modify key( given your key is a class which does custom HashCode calculation) is to look at the .NET source. From .NET source we can see that your key value pair is now wrapped into Entry struct which carries hashcode which was calculated on addition of your value. Meaning that if you change HashCode value after that time of your key was added, it will no longer be able to find a value in dictionary.
Code to prove it:
static void Main()
{
var myKey = new MyKey { MyBusinessKey = "Ohai" };
var dic = new Dictionary<MyKey, int>();
dic.Add(myKey, 1);
Console.WriteLine(dic[myKey]);
myKey.MyBusinessKey = "Changing value";
Console.WriteLine(dic[myKey]); // Key Not Found Exception.
}
public class MyKey
{
public string MyBusinessKey { get; set; }
public override int GetHashCode()
{
return MyBusinessKey.GetHashCode();
}
}
.NET source reference.
So to answer your question. You want to have immutable values for which you base your hashcode calculation on.
Another point, hashcode for custom class if you do not override GetHashCode will be based on reference of the object. So concern of returning same hashcode for different object which are identical in underlying values could be mitigated by overriding GetHashCode method and calculating your HashCode depending on your business keys. For example you would have two string properties, to calculate hashcode you would concat strings and call base string GetHashCode method. This will guarantee that you will get same hashcode for same underlying values of the object.
After much discussion and reading other SO answers on the topic, it was eventually this ReSharper help page that summarised it very well for me:
MSDN documentation of the GetHashCode() method does not explicitly require that your override of this method returns a value that never changes during the object's lifetime. Specifically, it says:
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.
On the other hand, it says that the hash code should not change at least when your object is in a collection:
*You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
You can compute the hash code from fields that are not mutable; or
You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.*
But why do you need to override GetHashCode() in the first place? Normally, you will do it if your object is going to be used in a Hashtable, as a key in a dictionary, etc., and it's quite hard to predict when your object will be added to a collection and how long it will be kept there.
With all that said, if you want to be on the safe side make sure that your override of GetHashCode() returns the same value during the object's lifetime. ReSharper will help you here by pointing at each non-readonly field or non-get-only property in your implementation of GetHashCode(). If possible, ReSharper will also suggest quick-fixes to make these members read-only/get-only.
Of course, it doesn't suggest what to do if the quick-fixes are not possible. However, it does indicate that those quick-fixes should only be used "if possible" which implies that the the inspection could be suppressed. Gian Paolo's answer on this suggests to throw an exception which will prevent the class from being used as a key and would present itself early in development if it was inadvertently used as a key.
However, GetHashCode is used in other circumstances such as when an instance of your object is passed as a parameter to a mock method setup. Therefore, the only viable option is to implement GetHashCode using the mutable values and put the onus on the rest of the code to ensure the object is not mutated while it is being used as a key, or to not use it as a key at all.
How come when I implement IEqualityComparer, it has a parameter for GetHashCode(T obj)? It's not a static object of course, so why can't I just use the current instance's state to generate the hash code? Is this == obj?
I'm curious because I'm trying to do this:
public abstract class BaseClass : IEqualityComparer<BaseClass>
{
public abstract int GetHashCode(BaseClass obj);
}
public class DerivedClass : BaseClass
{
public int MyData;
public override int GetHashCode(BaseClass obj)
{
return MyData.GetHashCode();
// Or do I have to do this:
// return (DerivedClass)obj.MyData.GetHashCode();
}
}
I'm trying to prevent doing the cast, since it's being used in really high-performance code.
I think the main issue here is that you're confusing IEqualityComparer<T> with IEquatable<T>.
IEquatable<T> defines a method for determining if the current instance (this) is equal to an instance of the same type. In other words it's used for testing objA.Equals(objB). When implementing this interface, it is recommended that you also override the GetHashCode() instance method.
IEqualityComparer<T> defines methods for testing whether two objects of the given type are equal, in other words, it's for testing comparer.Equals(objA, objB). Hence the necessity to to provide an object as a parameter to GetHashCode (which, remember is different than the GetHashCode that it inherits from object)
You can think of IEquatable<T> as your object's way of saying, "this is how I know if I am equal to something else," and IEqualityComparer<T> as your object's way of saying, "this is how I know if two other things are equal".
For some good examples of how these two interfaces are used in the framework see:
String which implements IEquatable<string>
StringComparer which implements IEqualityComparer<string>
Should you use the current state of an IEqualityComparer<T> to determine the hash code? If the state is at all mutable, then no! Anywhere where the hash is used (e.g. HashSet<T> or Dictionary<T, V>) the hash code will be cached and used for efficient lookup. If that hash code can change because the state of the comparer changes, that would totally destroy the usefulness of the data structure storing the hash. Now, if the state is not mutable (i.e. it's set only when creating the comparer and cannot be modified throughout the lifetime of the comparer), then yes, you can, but I would still recommend against it, unless you have a really good reason.
Finally, you mentioned performance. Honestly, this sounds like premature optimization. I'd recommend not worrying so much about performance until you can be sure that this particular line of code is causing a problem.
If you are not using information from passed in obj arguments your hash code will not vary for different incoming objects and will not be useful. Comparer is not instance of object you want to get hash code for or compare to.
Indeed you can use local fields of comaprer in GetHashCode and even can return MyData as hash code as shown in your sample - it will still satisfy GetHashCode requirement to "return the same value data for the same object". But in your sample all hash codes will be the same for instance of comparer and hence using it for Dictionary will essentially turn dictionary into list.
The same applies to Equals call - indeed you can return true all the time, but how useful it will be?
My search for a helper to correctly combine constituent hashcodes for GetHashCode() seemed to garner some hostility. I got the impression from the comments that some C# developers don't think you should override GetHashCode() often - certainly some commenters seemed to think that a library for helping get the behaviour right would be useless. Such functionality was considered useful enough in Java for the Java community to ask for it to be added to the JDK, and it's now in JDK 7.
Is there some fundamental reason that in C# you don't need to - or should definitely not - override GetHashCode() (and correspondingly, Equals()) as often as in Java? I find myself doing this often with Java, for example whenever I create a type that I know I want to keep in a HashSet or use as a key in a HashMap (equivalently, .net Dictionary).
C# has built-in value types which provide value equality, whereas Java does not. So writing your own hashcode in Java may be a necessity, whereas doing it in C# may be a premature optimisation.
It's common to write a type to use as a composite key to use in a Dictionary/HashMap. Often on such types you need value equality (equivalence) as opposed to reference equality(identity), for example:
IDictionary<Person, IList<Movie> > moviesByActor; // e.g. initialised from DB
// elsewhere...
Person p = new Person("Chuck", "Norris");
IList<Movie> chuckNorrisMovies = moviesByActor[p];
Here, if I need to create a new instance of Person to do the lookup, I need Person to implement value equality otherwise it won't match existing entries in the Dictionary as they have a different identity.
To get value equality, you need an overridden Equals() and GetHashCode(), in both languages.
C#'s structs (value types) implement value equality for you (albeit a potentially inefficient one), and provide a consistent implementation of GetHashCode. This may suffice for many people's needs and they won't go further to implement their own improved version unless performance problems dictate otherwise.
Java has no such built-in language feature. If you want to create a type with value equality semantics to use as a composite key, you must implement equals() and correspondingly hashCode() yourself. (There are third-party helpers and libraries to help you do this, but nothing built into the language itself).
I've described C# value types as 'potentially inefficient' for use in a Dictionary because:
The implementation of ValueType.Equals itself can sometimes be slow. This is used in Dictionary lookups.
The implementation of ValueType.GetHashCode, whilst correct, can yield many collisions leading to very poor Dictionary performance also. Have a look at this answer to a Q by Jon Skeet, which demonstrates that KeyValuePair<ushort, uint> appears to always yield the same hashCode!
If your object represents a value or type, then you SHOULD override the GetHashCode() along with Equals. I never override hash codes for control classes, like "App". Though I see no reason why even overriding GetHashCode() in those circumstances would be a problem as they will never be put in a position to interfere with collection indexing or comparisons.
Example:
public class ePoint : eViewModel, IEquatable<ePoint>
{
public double X;
public double Y;
// Methods
#region IEquatable Overrides
public override bool Equals(object obj)
{
if (Object.ReferenceEquals(obj, null)) { return false; }
if (Object.ReferenceEquals(this, obj)) { return true; }
if (!(obj is ePoint)) { return false; }
return Equals((ePoint)obj);
}
public bool Equals(ePoint other)
{
return X == other.X && Y == other.Y;
}
public override int GetHashCode()
{
return (int)Math.Pow(X,Y);
}
#endregion
I wrote a helper class to implement GetHashCode(), Equals(), and CompareTo() using value semantics from an array of properties.
Let's say we have such a class:
class MyClass
{
public string SomeValue { get; set; }
// ...
}
Now, let's say two MyClass instances are equal when their SomeValue property is equal. Thus, I overwrite the Object.Equals() and the Object.GetHashCode() methods to represent that. Object.GetHashCode() returns SomeValue.GetHashCode() But at the same time I need to follow these rules:
If two instances of an object are equal, they should return the same hash code.
The hash code should not change throughout the runtime.
But apparently, SomeValue can change, and the hash code we did get before may turn to be invalid.
I can only think of making the class immutable, but I'd like to know what others do in this case.
What do you do in such cases? Is having such a class represents a subtler problem in the design decisions?
The general contract says that if A.equals(B) is true, then their hash codes must be the same. If SomeValue changes in A in such a way that A.equals(B) is no longer true, then A.GetHashCode() can return a different value than before. Mutable objects cannot cache GetHashCode(), it must be calculated every time the method is called.
This article has detailed guidelines for GetHashCode and mutability:
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
If your GetHashCode() depends on some mutable value you have to change your hash whenever your value changes. Otherwise you break the equals law.
The part, that a hash should never be changed, once somebody asked for it, is needed if you put your object into a HashSet or as a key within a Dictionary. In these cases you have to ensure that the hash code won't be changed as long as it is stored in such a container. This can either be ensured manually, by simply taking care of this issue when you program or you could provide some Freeze() method to your object. If this is called any subsequent try to set a property would lead to some kind of exception (also you should then provide some Defrost() method). Additionally you put the call of the Freeze() method into your GetHashCode() implementation and so you can be quite sure that nobody alter a frozen object by mistake.
And just one last tip: If you need to alter a object within such a container, simply remove it, alter it (don't forget to defrost it) and re-add it again.
You sort of need to choose between mutability and GetHashCode returning the same value for 'equal' objects. Often when you think you want to implement 'equal' for mutable objects, you end up later deciding that you have "shades of equal" and really didn't mean Object.Equals equality.
Having a mutable object as the 'key' in any sort of data structure is a big red flag to me. For example:
MyObj a = new MyObj("alpha");
MyObj b = new MyObj("beta");
HashSet<MyObj> objs = new HashSet<MyObj>();
objs.Add(a);
objs.Add(b);
// objs.Count == 2
b.SomeValue = "alpha";
// objs.Distinct().Count() == 1, objs.Count == 2
We've badly violated the contract of HashSet<T>. This is an obvious example, there are subtle ones.
I was wondering what the overhead of calling short methods were or if the code would get optimized either way and if it was different than the cost of getters?
I'll just give an example because it is hard to explain.
I have a ClaimsManager for a website that gets particular claims and returns them. The process for getting one claim from another differs only by a ClaimsType string.
public string GetClaimValueByType(string ClaimType)
{
return (from claim in _claimsIdentity.Claims
where claim.ClaimType == ClaimType
select claim.Value).SingleOrDefault();
}
/*Again would this be better or worse if I wanted to be able to choose if
I want the claim versus the value?
public Claim GetClaimByType(string ClaimType)
{
return (from claim in _claimsIdentity.Claims
where claim.ClaimType == ClaimType
select claim).SingleOrDefault();
}
public string GetClaimValueByType(string ClaimType)
{
return GetClaimByType(ClaimType).Value;
}
*/
public string GetEmail()
{
return GetClaimValueByType(ClaimTypes.Email);
}
/* Or should I use getters?...
public string Email
{
get
{
return return GetClaimValueByType(ClaimTypes.Email);
}
}
*/
So is this bad practice to have these short get methods? Should there be a large call overhead because it is so short or will this be optimized? Finally, does it make more sense to actually use getters here?..
Thanks
In my opinion, what ever marginal overhead there may be with using setters and getters is outweighed by the clean and more easily maintainable code that would most likely be easier for any .NET developer off the street to pick up and run with.
But I guess it also depends on how huge your Claim object is. :)
Performance wise there is no difference between a getter and a method. A getter is just syntactic sugar, and is converted to a method during compilation. There are some general guidelines as to when to use a getter and when to use a method. This msdn page advices to use a method instead of a property when:
The operation is a conversion, such as Object.ToString.
The operation is expensive enough that you want to communicate to the user that they should consider caching the result.
Obtaining a property value using the get accessor would have an observable side effect.
Calling the member twice in succession produces different results.
The order of execution is important. Note that a type's properties should be able to be set and retrieved in any order.
The member is static but returns a value that can be changed.
The member returns an array. Properties that return arrays can be very misleading. Usually it is necessary to return a copy of the internal array so that the user cannot change internal state. This, coupled with the fact that a user can easily assume it is an indexed property, leads to inefficient code.
I wouldn't use a getter for this, properties are intended to return a constant value. This means that that sequenced calls should return the same value. This is just a conceptual thing.