Hey all, I've been reading up on the best way to implement the GetHashCode() override for objects in .NET, and most answers I run across involve somehow munging numbers together from members that are numeric types to come up with a method. Problem is, I have an object that uses an alphanumeric string as its key, and I'm wondering if there's something fundamentally wrong with just using an internal ID for objects with strings as keys, something like the following?
// Override GetHashCode() to return a permanent, unique identifier for
// this object.
static private int m_next_hash_id = 1;
private int m_hash_code = 0;
public override int GetHashCode() {
if (this.m_hash_code == 0)
this.m_hash_code = <type>.m_next_hash_id++;
return this.m_hash_code;
}
Is there a better way to come up with a unique hash code for an object that uses an alphanumeric string as its key? (And no, the numeric parts of the alphanumeric string isn't unique; some of these strings don't actually have numbers in them at all.) Any thoughts would be appreciated!
You can call GetHashCode() on the non-numeric values that you use in your object.
private string m_foo;
public override int GetHashCode()
{
return m_foo.GetHashCode();
}
This is not a good pattern for generating hashes for an object.
It's important to undunderstand the purpose of GetHashCode() - it's a way to generate a numeric representation of the identifying properties of an object. Hash codes are used to allow an object to serve as a key in a dictionary and in some cases accelerate comparisons between complex types.
If you simply generate a random value and call it a hash code, you have no repeatability. Another instance with the same key fields will have a different hash code, and will violate the behavior expected by classes like HashSet, Dictionary, etc.
If you already have an identifying string member in you object, just return its hash code.
The documentation on MSDN for implementers of GetHashCode() is a must read for anyone that plans on overriding that method:
Notes to Implementers
A hash function
is used to quickly generate a number
(hash code) that corresponds to the
value of an object. Hash functions are
usually specific to each Type and, for
uniqueness, must use at least one of
the instance fields as input.
A hash function must have the
following properties:
If two objects compare as equal, the
GetHashCode method for each object
must return the same value. However,
if two objects do not compare as
equal, the GetHashCode methods for the
two object do not have to return
different values.
The GetHashCode method for an object
must consistently return the same hash
code as long as there is no
modification to the object state that
determines the return value of the
object's Equals method. Note that this
is true only for the current execution
of an application, and that a
different hash code can be returned if
the application is run again.
For the best performance, a hash
function must generate a random
distribution for all input.
For example, the implementation of the
GetHashCode method provided by the
String class returns identical hash
codes for identical string values.
Therefore, two String objects return
the same hash code if they represent
the same string value. Also, the
method uses all the characters in the
string to generate reasonably randomly
distributed output, even when the
input is clustered in certain ranges
(for example, many users might have
strings that contain only the lower
128 ASCII characters, even though a
string can contain any of the 65,535
Unicode characters).
Hash codes don't have to be unique. Provided your Equals implementation is correct, it's OK to return the same hash code for two instances. The m_next_hash_id logic is broken, since it allows two objects to have different hash codes even if they compare equals.
MSDN gives a good set of instructions on how to implement Equals and GetHashCode. Several of the examples here implement GetHashCode in terms of the hash codes of an object's fields
Yes, a better way would be to use the hashcode of the string you already have. If the alpha numeric string defines the identity of the object you have, it's hashcode will do quite nicely for the hashcode of your object.
The idea of incrementing a static field and using it as the hashcode, is a bad one. The hash code should have an even distribution across the space of possible values. This ensures, amongst other things, that it will perform well when used as the key in a hashtable.
I believe you generally want GetHashCode() to return something that identifies the object by it's value, rather than it's instance, if I'm understanding the idea here, I think your method would ensure GetHashCode() on two different objects with equivalent values would return different hashes just because they're different instances.
GetHashCode() is meant to return a value that lets you compare two objects values, not their references.
Related
I've made the assumption that the (generic) Dictionary class in .NET uses the GetHashCode() method on its keys to produce hashes. I have two questions leading on from it:
Object has an overridable GetHashCode() method. For a user defined
reference type object, will this method produce a hash based on the
referenced data? e.g. If I have a class OneString which contains
only one String instance variable - will two separate instances of
this class with matching strings always produce the same hash code?
Or does the GetHashCode() method of OneString need to be overridden
to achieve this functionality?
Presumably the hash function implemented in the String class is different to the hash function implemented in a different reference type (e.g. BitmapImage). Are the hash functions implemented in the most common classes publicly available?
No.
object.GetHashCode() returns a value based on that object's identity alone.
It will not return the same value for two equivalent objects; it is completely unaware of the type or meaning of the object.
Classes that represent values (such as String) override GetHashCode() to return a hash based on the value represented.
The algorithm used is up to the class designer; GetHashCode() is written like any other method.
However, GetHashCode() is supposed to return equal values whenever Equals() returns true; if your class does not do this, it is wrong.
Object has an overridable GetHashCode() method. For a user defined
reference type object, will this method produce a hash based on the
referenced data?
No, the default GetHashCode method doesn't attempt to use the data in the class, it only bases it on the reference. Two separate instances with identical content will have different hash codes.
If I have a class OneString which contains only one String instance
variable - will two separate instances of this class with matching
strings always produce the same hash code? Or does the GetHashCode()
method of OneString need to be overridden to achieve this
functionality?
You have to override it.
Presumably the hash function implemented in the String class is
different to the hash function implemented in a different reference
type (e.g. SqlCommand). Are the hash functions implemented in the most
common classes publicly available?
Yes, the GetHashCode for strings and common value types are implemented to produce a working hash code from the values.
1) Different string instances with the same contents will always produce the same hash code. (see: http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx)
2) GetHashCode() is a method of the base Object class, from which all types derive. So, there is always an implementation of this method for any type.
Hi I have a class with 6 string properties. A unique object will have different values for atleast one of these fields
To implement IEqualityComparer's GetHashCode function, I am concatenating all 6 properties and calling the GetHashCode on the resultant string.
I had the following doubts:
Is it necessary to call the GetHashcode on a unique value?
Will the concatenation operation on the six properties make the comparison slow?
Should I use some other approach?
If your string fields are named a-f and known not to be null, this is ReSharper's proposal for your GetHashCode()
public override int GetHashCode() {
unchecked {
int result=a.GetHashCode();
result=(result*397)^b.GetHashCode();
result=(result*397)^c.GetHashCode();
result=(result*397)^d.GetHashCode();
result=(result*397)^e.GetHashCode();
result=(result*397)^f.GetHashCode();
return result;
}
}
GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object).
This means that:
If two objects compare as equal with Equals, then their GetHashCode must return the same value.
If some of the 6 string properties are not strictly read-only, they cannot take part in the GetHashCode implementation.
If you cannot satisfy both points at the same time, you should re-evaluate your design because anything else will leave the door open for bugs.
Finally, you could probably make GetHashCode faster by calling GetHashCode on each of the 6 strings and then integrating all 6 results in one value using some bitwise operations.
GetHashCode() should return the same hash code for all objects that return true if you call Equals() on those objects. This means, for example, that you can return zero as the hash code regardless of what the field values are. But that would make your object very inefficient when stored in data structures such as hash tables.
Combining the strings is one option, but note that you could for example combine just two of the stringsfor the hash code (while still comparing all the strings in equals!).
You can also combine the hashes of the six separate strings, rather than computing a single hash for a combined string. See for example
Quick and Simple Hash Code Combinations
I'm not sure if this will be significantly faster than concatenating the string.
You can use the behavior from:
http://moh-abed.com/2011/07/13/entities-and-value-objects/
Let's assume I have two objects called K and M
if(K.Equals(M))
{
}
If that's true, K and M always has the same HashCode ?
Or It depends on the programming language ?
The contract for GetHashCode() requires it, but since anyone can make their own implementation it is never guaranteed.
Many classes (especially hashtables) require it in order to behave correctly.
If you are implementing a class, you should always make sure that two equal objects have the same hashcode.
If you are implementing an utility method/class, you can assume that two equal objects have the same hashcode (if not, it is the other class, not yours, that is buggy).
If you are implementing something with security implications, you cannot assume it.
If that's true, K and M always has the same HashCode ?
Yes.
Or rather it should be the case. Consumers of hash codes (eg. containers) can assume that equal objects have equal hash codes, or rather unequal hash codes means the objects are unequal. (Unequal objects can have the same hash code: there are more possible objects than hash codes so this has to be allowed.)
Or It depends on the programming language ?
No
If that's true, K and M always has the same HashCode ?
Yes. Unless they have a wickedly overridden Equals method. But that would be considered broken.
But note that the reverse is not true,
if K and M have the same HashCode it could still be that K.Equals(M) == false
Yes, it should return the same hash code.
I'd say it's language independent. But there's no guaranty as if other programmes has implemented that correctly.
GetHashCode returns a value based on the current instance that is
suited for hashing algorithms and data structures such as a hash
table. Two objects that are the same type and are equal must return
the same hash code to ensure that instances of
System.Collections.HashTable and
System.Collections.Generic.Dictionary work correctly.
in your application the hashcode has to uniquely identify an instance of the object. this is part of to the .net platform, so, the hashcode value should work regardless of which .net language you are authoring in.
GetHashCode() could return the same hash for different objects. You should use Equals() to compare objects not GetHashCode(), in case when GetHashCode() return the same value - implementation of Equals() should consider another object equality checks.
Hash data structure able to handle such cases by using collision resolution algotithms.
From wikipedia:
Hash collisions are practically unavoidable when hashing a random
subset of a large set of possible keys. For example, if 2,500 keys are
hashed into a million buckets, even with a perfectly uniform random
distribution, according to the birthday problem there is a 95% chance
of at least two of the keys being hashed to the same slot.
Therefore, most hash table implementations have some collision
resolution strategy to handle such events. Some common strategies are
described below. All these methods require that the keys (or pointers
to them) be stored in the table, together with the associated values.
It depends on the Equals implementation of the object. It may use GetHashCode under the hood, but it doesn´t have too. So basically if you have an object with a custom Equals implementation the HashCode may be different for both objects.
So I'm thinking of using a reference type as a key to a .NET Dictionary...
Example:
class MyObj
{
private int mID;
public MyObj(int id)
{
this.mID = id;
}
}
// whatever code here
static void Main(string[] args)
{
Dictionary<MyObj, string> dictionary = new Dictionary<MyObj, string>();
}
My question is, how is the hash generated for custom objects (ie not int, string, bool etc)? I ask because the objects I'm using as keys may change before I need to look up stuff in the Dictionary again. If the hash is generated from the object's address, then I'm probably fine... but if it is generated from some combination of the object's member variables then I'm in trouble.
EDIT:
I should've originally made it clear that I don't care about the equality of the objects in this case... I was merely looking for a fast lookup (I wanted to do a 1-1 association without changing the code of the classes involved).
Thanks
The default implementation of GetHashCode/Equals basically deals with identity. You'll always get the same hash back from the same object, and it'll probably be different to other objects (very high probability!).
In other words, if you just want reference identity, you're fine. If you want to use the dictionary treating the keys as values (i.e. using the data within the object, rather than just the object reference itself, to determine the notion of equality) then it's a bad idea to mutate any of the equality-sensitive data within the key after adding it to the dictionary.
The MSDN documentation for object.GetHashCode is a little bit overly scary - basically you shouldn't use it for persistent hashes (i.e. saved between process invocations) but it will be consistent for the same object which is all that's required for it to be a valid hash for a dictionary. While it's not guaranteed to be unique, I don't think you'll run into enough collections to cause a problem.
The hash used is the return value of the .GetHashcode method on the object. By default this essentially a value representing the reference. It is not guaranteed to be unique for an object, and in fact likely won't be in many situations. But the value for a particular reference will not change over the lifetime of the object even if you mutate it. So for this particular sample you will be OK.
In general though, it is a very bad idea to use objects which are not immutable as keys to a Dictionary. It's way too easy to fall into a trap where you override Equals and GetHashcode on an object and break code where the type was formerly used as a key in a Dictionary.
The dictionary will use the GetHashCode method defined on System.Object, which will not change over the object's lifetime regardless of field changes etc. So you won't encounter problems in the scenario you describe.
You can override GetHashCode, and should do so if you override Equals so that objects which are equal also return the same hash code. However, if the type is mutable then you must be aware that if you use the object as a key of a dictionary you will not be able to find it again if it is subsequently altered.
The default implementation of the
GetHashCode method does not guarantee
unique return values for different
objects. Furthermore, the .NET
Framework does not guarantee the
default implementation of the
GetHashCode method, and the value it
returns will be the same between
different versions of the .NET
Framework. Consequently, the default
implementation of this method must not
be used as a unique object identifier
for hashing purposes.
http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx
For CustomObject , derived from objects, the hash code will be generated in beginning of the object and they will remain same throughout its life of its instance. Further more, hash code will never change as values of internal fields/properties will change.
Hashtable/Dictionary will not use GetHashCode as unique identifier but rather it will only use it as "hash buckets". For example string "aaa123" and "aaa456" may have hash as "aaa" and that all objects having same hash "aaa" will be stored in one bucket. Whenever you will insert/retrive an object, Dictionary will always call GetHashCode and determine the bucket to further individual address comparison of objects.
Custom Object as Dictionary key should be taken as if, Dictionary only stores theirs "Reference (addresses or memory pointers)" it doesnt know its contents, and contents of objects change but Reference never change. This also means that if two objects are exact replica of each other, but they are different in memory, your hashtable will not consider them as same because their memory pointers are different.
Best way to guarentee identity equality is to override method "Equals" as following... if you are having any problem.
class MyObj
{
private int mID;
public MyObj(int id)
{
this.mID = id;
}
public bool override Equals(Object obj)
{
MyObj mobj = obj as MyObj;
if(mobj==null)
return false;
return this.mID == mobj.mID;
}
}
What is the use of GetHashCode()? Can I trace object identity using GetHashCode()? If so, could you provide an example?
Hash codes aren't about identity, they're about equality. In fact, you could say they're about non-equality:
If two objects have the same hash code, they may be equal
If two objects have different hash codes, they're not equal
Hash codes are not unique, nor do they guarantee equality (two objects may have the same hash but still be unequal).
As for their uses: they're almost always used to quickly select possibly equal objects to then test for actual equality, usually in a key/value map (e.g. Dictionary<TKey, TValue>) or a set (e.g. HashSet<T>).
No, a HashCode is not guaranteed to be unique. But you already have references to your objects, they are perfect for tracking identity, using object.ReferenceEquals().
The value itself is used in hashing algorithms, such as hashtables.
In its default implementation, GetHasCode does not guarantee the uniqueness of an object, thus for .NET objects should not be used as such,
In you own classes, it is generally good practice to override GetHashCode to create a unique value for your object.
It's used for algorithms\data structures that require hashing (such as a hash table). A hash code cannot on its own be used to track object identity since two objects with the same hash are not necessarily equal. However, two equal objects should have the same hash code (which is why C# emits a warning if you override one without overriding the other).