Let's assume I have two objects called K and M
if(K.Equals(M))
{
}
If that's true, K and M always has the same HashCode ?
Or It depends on the programming language ?
The contract for GetHashCode() requires it, but since anyone can make their own implementation it is never guaranteed.
Many classes (especially hashtables) require it in order to behave correctly.
If you are implementing a class, you should always make sure that two equal objects have the same hashcode.
If you are implementing an utility method/class, you can assume that two equal objects have the same hashcode (if not, it is the other class, not yours, that is buggy).
If you are implementing something with security implications, you cannot assume it.
If that's true, K and M always has the same HashCode ?
Yes.
Or rather it should be the case. Consumers of hash codes (eg. containers) can assume that equal objects have equal hash codes, or rather unequal hash codes means the objects are unequal. (Unequal objects can have the same hash code: there are more possible objects than hash codes so this has to be allowed.)
Or It depends on the programming language ?
No
If that's true, K and M always has the same HashCode ?
Yes. Unless they have a wickedly overridden Equals method. But that would be considered broken.
But note that the reverse is not true,
if K and M have the same HashCode it could still be that K.Equals(M) == false
Yes, it should return the same hash code.
I'd say it's language independent. But there's no guaranty as if other programmes has implemented that correctly.
GetHashCode returns a value based on the current instance that is
suited for hashing algorithms and data structures such as a hash
table. Two objects that are the same type and are equal must return
the same hash code to ensure that instances of
System.Collections.HashTable and
System.Collections.Generic.Dictionary work correctly.
in your application the hashcode has to uniquely identify an instance of the object. this is part of to the .net platform, so, the hashcode value should work regardless of which .net language you are authoring in.
GetHashCode() could return the same hash for different objects. You should use Equals() to compare objects not GetHashCode(), in case when GetHashCode() return the same value - implementation of Equals() should consider another object equality checks.
Hash data structure able to handle such cases by using collision resolution algotithms.
From wikipedia:
Hash collisions are practically unavoidable when hashing a random
subset of a large set of possible keys. For example, if 2,500 keys are
hashed into a million buckets, even with a perfectly uniform random
distribution, according to the birthday problem there is a 95% chance
of at least two of the keys being hashed to the same slot.
Therefore, most hash table implementations have some collision
resolution strategy to handle such events. Some common strategies are
described below. All these methods require that the keys (or pointers
to them) be stored in the table, together with the associated values.
It depends on the Equals implementation of the object. It may use GetHashCode under the hood, but it doesn´t have too. So basically if you have an object with a custom Equals implementation the HashCode may be different for both objects.
Related
In .NET, Whenever we override Equals() method for a class, it is a normal practice to override the GetHashCode() method as well. Doing so will ensure better performance when the object is used in Hashtables and Dictionaries. Two keys are considered to be equal in Hashtable only if their GetHashCode() values are same. My question is why can't the Hashtables use Equals() method to compare the keys?, that would have removed the burden of overriding GetHashCode() method.
HastTable/Dictionaries use Equals in case of collision (when two hash codes are same).
Why don't they use only Equals ?
Because that would require a lot more processing than accessing/(comparing) integer value value (hash code). (Since hash codes are used as index so they have the complexity of O(1))
A HashSet (or HashTable, or Dictionary) uses an array of buckets to distribute the items, those buckets are indexed by the object's hash code (which should be immutable), so the search of the bucket the item is in is O(1).
Then it uses Equals within that bucket to find the exact match if there's more than one item with the same hashcode: that's O(N) since it needs to iterate over all items within that bucket to find the match.
If a hashset used only Equals, finding an item would be O(N) and you could aswell be using a list, or an array.
That's also why two equal items must have the same hashcode, but two items with the same hashcode don't necessarily need to be equal.
Two object instances that compare as equal must always have identical hash codes. If this doesn't hold, hash-based data structures will not work correctly. It's not a matter of performance.
Two object instances that don't compare as equal should ideally have different hash codes. If this doesn't hold, hash-based data structures will have degraded performance, but at least they'll still work.
Thus, for a given object instance, GetHashCode needs to reflect the logic of Equals, to some extent.
Now if you're overriding the Equals method, you're providing custom comparison logic. As an example, let's say your custom comparison logic involves only one particular data member of the instance. For a non-virtual GetHashCode method to be useful, it would have to be general enough to understand your custom Equals logic and be able to come up with a custom hash code function (one that only involves your chosen data member) on the spot.
It's not that easy to write such a sophisticated GetHashCode and it's not worth the trouble either, when the user can simply provide a custom one-liner that honors the initial requirement.
I'm considering implementing my own custom hashcode for a given object... and use this as a key for my dictionary. Since it's possible (likely) that 2 objects will have the same hashcode, what additional operators should I override, and what should that override (conceptually) look like?
myDictionary.Add(myObj.GetHashCode(),myObj);
vs
myDictionary.Add(myObj,myObj);
In other words, does a Dictionary use a combination of the following in order to determine uniqueness and which bucket to place an object in?
Which are more important than others?
HashCode
Equals
==
CompareTo()
Is compareTo only needed in the SortedDictionary?
What is GetHashCode used for?
It is by design useful for only one thing: putting an object in a hash table. Hence the name.
GetHashCode is designed to do only one thing: balance a hash table. Do not use it for anything else. In particular:
It does not provide a unique key for an object; probability of collision is extremely high.
It is not of cryptographic strength, so do not use it as part of a digital signature or as a password equivalent
It does not necessarily have the error-detection properties needed for checksums.
and so on.
Eric Lippert
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
It's not the buckets that cause the problem - it is actually finding the right object instance once you have determined the bucket using the hash code. Since all objects in a bucket share the same hash code, object equality (Equals) is used to find the right one. The rule is that if two objects are considered equal, they should produce the same hash code - but two objects producing the same hash codes might not be equal.
Not sure whether it is sensible reopen my earlier thread on Hashing URL.
Nonetheless, I am still curious know how this work undercover.
Assumption: We have a hashtable with n (where n < Infinity) element where asymptotic time complexity is o(1); we (CLR) have achieved this while applying some hashing function ( Hn-1 hash function where n>1).
Question: Can someone explain me how CLR map Key to the hash code when we seek (retrieve) any element (if different hashing functions are used)? How CLR track (if it) the hashing function of any live object (hash table)?
Thanks in advance.
Conceptually, there are two hash functions. The first hash function, as you probably have guessed, is the key object's GetHashCode method. The second hash function is a hash of the key returned by the first hash function.
So, imagine a hash table that has a capacity of 1,024 items, and you're going to insert two keys: K1 and K2.
K1.GetHashCode() returns 1,023. K2.GetHashCode() returns 65,535
The code then divides the returned key by the hash table size and takes the remainder. So both of the keys map to position 1,023 in the hash table.
K1 is added to the table. When it comes time to add K2, there is a collision. So the code resorts to the second hash function. That second hash function is probably a "bit mixer" (often the last stage in calculating a hash code) of some sort that randomizes the bits in the returned key. Conceptually, the code would look something like this:
int hashCode = K2.GetHashCode();
int slot = hashCode % 1024;
if (table[slot] != null)
{
int secondHashCode = BitMixer(hashCode);
slot = secondHashCode % 1024;
}
The point here is that the code doesn't have to keep track of multiple hash functions for the different keys. It knows that it can call Key.GetHashCode() to get the object's hash code. From there, it can call its own bit mixer function or functions to generate additional hash codes.
A hash code does not uniquely identify an object. It's just used to quickly put that object into a bucket. The elements in one bucket may but need not be equal, but elements in different buckets must be unequal.
Conceptually you can think of the default GetHashCode() implementation on reference types as using a field in every instance containing a random value for the hashcode which gets initialized on object creation. The actual implementation is a bit more complex but that doesn't matter here.
Since there are only 2 billion different hash codes, the O(1) runtime of most hash table implementations will break down if you have more elements than that. And of course the distribution must be good, i.e. there must not be too many hash collisions, but having a few is no big problem.
For types with value semantics you override both Equals and GetHashCode consistently to use the fields which determine equality.
Not sure if I understand you question well, but every object in .NET implements GetHashCode function which returns a hash code usable (and used) in dictionaries / hashtables, so the object itself is responsible for generating a good hash code.
Of course, there may (and will) be conficts as the hash code is an int. The conflicts are handled / resolved by the dictionary / hashtable.
Every object implements the GetHashCode() function and Equals() function.
The default implementations for these are related to the object references. For example a.Equals(b) would return the same as object.ReferenceEquals(a,b). This would mean if two object references are equal so is their Hash Codes.
There are cases that you need to provide a different semantic to the Equals() function. In these cases you must maintain the contract that if a.Equals(b) then a.GetHashCode() == b.GetHashCode().
Hashing functions used are many and each with its own advantages and disadvantages. There is a useful explanation here. The actual function used is not something you should worry about, what is most important to keep the average o(1) lookup time in the Hashtable is (ideally) ensure that the objects which will be inserted have their GetHashCode() result is as close to uniformly distributed as possible.
What is the use of GetHashCode()? Can I trace object identity using GetHashCode()? If so, could you provide an example?
Hash codes aren't about identity, they're about equality. In fact, you could say they're about non-equality:
If two objects have the same hash code, they may be equal
If two objects have different hash codes, they're not equal
Hash codes are not unique, nor do they guarantee equality (two objects may have the same hash but still be unequal).
As for their uses: they're almost always used to quickly select possibly equal objects to then test for actual equality, usually in a key/value map (e.g. Dictionary<TKey, TValue>) or a set (e.g. HashSet<T>).
No, a HashCode is not guaranteed to be unique. But you already have references to your objects, they are perfect for tracking identity, using object.ReferenceEquals().
The value itself is used in hashing algorithms, such as hashtables.
In its default implementation, GetHasCode does not guarantee the uniqueness of an object, thus for .NET objects should not be used as such,
In you own classes, it is generally good practice to override GetHashCode to create a unique value for your object.
It's used for algorithms\data structures that require hashing (such as a hash table). A hash code cannot on its own be used to track object identity since two objects with the same hash are not necessarily equal. However, two equal objects should have the same hash code (which is why C# emits a warning if you override one without overriding the other).
I've got multiple classes that, for certain reasons, do not follow the official Equals contract. In the overwritten GetHashCode() these classes simply return 0 so they can be used in a Hashmap.
Some of these classes implement the same interface and there are Hashmaps using this interface as key. So I figured that every class should at least return a different (but still constant) value in GetHashCode().
The question is how to select this value. Should I simply let the first class return 1, the next class 2 and so on? Or should I try something like
class SomeClass : SomeInterface {
public overwrite int GetHashCode() {
return "SomeClass".GetHashCode();
}
}
so the hash is distributed more evenly? (Do I have to cache the returned value myself or is Microsoft's compiler able to optimize this?)
Update: It is not possible to return an individual hashcode for each object, because Equals violates the contract. Specifially, I'm refering to this problem.
If it "violates the Equals contract", then I'm not sure you should be using it as a key.
It something is using that as a key, you really need to get the hashing right... it is very unclear what the Equals logic is, but two values that are considered equal must have the same hash-code. It is not required that two values with the same hash-code are equal.
Using a constant string won't really help much - you'll get the values split evenly over the types, but that is about it...
I'm curious what the reasoning would be for overriding GetHashCode() and returning a constant value. Why violate the idea of a hash rather than just violating the "contract" and not overriding the GetHashCode() function at all and leave the default implementation from Object?
Edit
If what you've done is that so you can have your objects match based on their contents rather than their reference then what you propose with having different classes simply use different constants can WORK, but is highly inefficient. What you want to do is come up with a hashing algorithm that can take the contents of your class and produce a value that balances speed with even distribution (that's hashing 101).
I guess I'm not sure what you're looking for...there isn't a "good" scheme for choosing constant numbers for this paradigm. One is not any better than the other. Try to improve your objects so that you're creating a real hash.
I ran into this exact problem when writing a vector class. I wanted to compare vectors for equality, but float operations give rounding errors, so I wanted approximate equality. Long story short, overriding equals is a bad idea unless your implementation is symmetric, reflexive, and transitive.
Other classes are going to assume equals has those properties, and so will classes using those classes, and so you can end up in weird cases. For example a list might enforce uniqueness, but end up with two elements which evaluate as equal to some element B.
A hash table is the perfect example of unpredictable behavior when you break equality. For example:
//Assume a == b, b == c, but a != c
var T = new Dictionary<YourType, int>()
T[a] = 0
T[c] = 1
return T[b] //0 or 1? who knows!
Another example would be a Set:
//Assume a == b, b == c, but a != c
var T = new HashSet<YourType>()
T.Add(a)
T.Add(c)
if (T.contains(b)) then T.remove(b)
//surely T can't contain b anymore! I sure hope no one breaks the properties of equality!
if (T.contains(b)) then throw new Exception()
I suggest using another method, with a name like ApproxEquals. You might also consider overriding the == operator, because it isn't virtual and therefore won't be used accidentally by other classes like Equals could be.
If you really can't use reference equality for the hash table, don't ruin the performance of cases where you can. Add an IApproxEquals interface, implement it in your class, and add an extension method GetApprox to Dictionary which enumerates the keys looking for an approximately equal one, and returns the associated value. You could also write a custom dictionary especially for 3-dimensional vectors, or whatever you need.
When hash collisions occur, the HashTable/Dictionary calls Equals to find the key you're looking for. Using a constant hash code removes the speed advantages of using a hash in the first place - it becomes a linear search.
You're saying the Equals method hasn't been implemented according to the contract. What exactly do you mean with this? Depending on the kind of violation, the HashTable or Dictionary will merely be slow (linear search) or not work at all.