Dictionary internal hashcode [duplicate] - c#

I understand that it is not advisable to use "mutable" objects (objects whose GetHashCode() method can return different results while they being used as keys in a Dictionary).
Below is my understanding of how a dictionary, which is implemented as a hash table, works:
When I am adding new key, for example dict.Add(m1, "initially here was m1 object");, dict calculates the hashcode of m1 using the GetHashCode() method. Then it does some internal calculations and finally puts this object into some position of its internal array.
When I am using the key index to get the value, for example dict[m1], dict calculates the hashcode again. Then it does some internal calculations, and it gives me an object which is located at the calculated position inside of its internal array.
But I think there is an error which I can't find.
So lets assume that I have this code:
class MutableObject
{
Int32 m_value;
public MutableObject(Int32 value)
{
m_value = value;
}
public void Mutate(Int32 value)
{
m_value = value;
}
public override int GetHashCode()
{
return m_value;
}
}
static void Main(string[] args)
{
MutableObject m1 = new MutableObject(1);
MutableObject m2 = new MutableObject(2);
var dict = new Dictionary<MutableObject, String>();
dict.Add(m1, "initially here was m1 object");
dict.Add(m2, "initially here was m2 object");
Console.WriteLine("Before mutation:");
Console.WriteLine("dict[m1] = " + dict[m1]);
Console.WriteLine("dict[m2] = " + dict[m2]);
m1.Mutate(2);
m2.Mutate(1);
Console.WriteLine("After mutation:");
Console.WriteLine("dict[m1] = " + dict[m1]);
Console.WriteLine("dict[m2] = " + dict[m2]);
Console.ReadKey(true);
}
When I call Mutate methods, keys are swapped. So I thought it will give swapped results. But actually this line: Console.WriteLine("dict[m1] = " + dict[m1]); throws KeyNotFoundException, and I can't understand why. Obviously I am missing something here...

How .NET Dictionary implementation works with mutable objects
It doesn't. The documentation for Dictionary states:
As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value.
Since you're changing the object while it's in the Dictionary it will not work.
As for why, it's not too hard to see. We put in an object. Let's assume that the hash code is 1. We put the object in the 1 bucket of our hash table. Now the object is mutated from outside the Dictionary so that it's value (and hash code) is 2. Now when someone gives that object to the dictionary's indexer it gets the hash code, see's that it's 2, and looks in the 2 bucket. That bucket is empty, so it says, "sorry, no element".
Now let's assume that a new object is created with a value and hash of 1. It's passed to the Dictionary, who sees that the hash is 1. It looks in the 1 bucket and finds that there is indeed an item at that index. It now uses Equals to determine if the objects are in fact equal (or if this is just a hash collision).
Now, in your case, it will fail here because you don't override Equals, you're using the default implementation which compares references, and since this is a different object it won't have the same reference. However, even if you changed it to compare the values, *the first object was mutated to have a value of 2, not 1, so it won't match anyway. Others have suggested fixing this Equals method, and you really should do that, but it still won't fix your problem.
Once the object is mutated the only way of finding it is if it just so happens that the mutated value is a hash collision (which is possible, but unlikely). If it's not, then anything that is equal according to Equals will never know to check the right bucket, and anything that checks the right bucket won't be equal according to Equals.
The quote I mentioned at the start isn't just a best practice. It's not just unexpected or weird or unperformant to mutate items in a dictionary. It just doesn't work.
Now, if the object is mutable but isn't mutated while it's in the dictionary then that's fine. It may be a bit odd, and that's a case people may say is bad practice, even if it works.

It's not enough to do a dictionary lookup to have the same hash code. Since hash collisions are possible, the key must also be equal the index being looked up.

Your MutableObject class doesn't override Equals(object). Hence reference equality is used (inherited from base class System.Object).
The Dictionary<,> first (quickly) finds any keys with the correct hash code. It then examines each of those candidate keys to check if one of them Equals the key it is searching for.
Therefore Equals(object) and GetHashCode() should be overridden together. You would get a warning from the compiler if you overrode only one of them.
As soon as the hash code of a key mutates while the key is in the Dictionary<,>, that key will (probably) be misplaced inside the Dictionary<,>, be in the wrong "bucket", and hence be lost. It will not be found, because search for it will always take place in a bucket where it isn't located.
In this example, the key gets lost, and therefore can be added again:
var dict = new Dictionary<MutableObject, string>();
var m = new MutableObject(1);
dict.Add(m, "Hello");
m.Mutate(2);
dict.Add(m, "world");
foreach (var p in dict)
Console.WriteLine(p);
var otherDict = new Dictionary<MutableObject, string>(dict); // throws
I have actually seen an exception like that, during initializing of one Dictionary<,> with the items from an existing Dictionary<,> (both using the default EqualityComparer<> for the key type).

Related

TryGetValue Evaluating to False Even When Key is Present

This question is basically the same as this one, although the answer to that person's problem turned out to be a simple trailing space.
My issue is that I'm retrieving data from a web API as dictionary and then trying get the values out of it. I'm using TryGetValue because not every item in the dictionary will necessarily contain every key. For some reason, whilst I can get the value of one key with no problems at all when it's present, for another key TryGetValue always evaluates to false and therefore doesn't return the value, even though I can see in debug that the key is present.
So, this block always retrieves the value of the "System.Description" key if it's present:
string descriptionValue = "";
if (workItem.Fields.TryGetValue("System.Description", out descriptionValue))
{
feature.Description = descriptionValue;
}
However, this almost identical block NEVER retrieves the value of the "CustomScrum.RoadmapGroup" key:
int RoadmapGroupValue = 0;
if (workItem.Fields.TryGetValue("CustomScrum.RoadmapGroup", out RoadmapGroupValue))
{
feature.RoadmapGroup = RoadmapGroupValue;
}
As you can see in this screenshot, the dictionary DOES contain a key with a name exactly matching my TryGetValue statement:
If I put a breakpoint on the code which should be run if the TryGetValue statement evaluates to true (feature.Description = descriptionValue;) it never gets hit.
The feature.RoadmapGroup variable gets set to 0 for every item in the dictionary.
I've been staring at this for the last two hours at least and I can't see what I'm doing wrong.
Here's a scenario where your cast goes wrong.
private void foo()
{
Dictionary<string, object> dict = new Dictionary<string, object>();
object obj = new object();
obj = "1";
dict.Add("CustomScrum.RoadmapGroup", obj);
object val;
var result = dict.TryGetValue("CustomScrum.RoadmapGroup", out val);
int value = (int)val;
}
TryGetValue() returns true, but the last line (the cast), throws System.InvalidCastException: 'Specified cast is not valid.', although if you use a breakpoint to see the dictionary content it looks like you have something that can be converted to an int. See below:
So I believe that when you add the value to the dictionary, you're not really adding an int but something that looks like an int.
EDIT
I just replaced int value = (int)val; with int value = Convert.ToInt32(val); which converts the value just fine. So you might want to try to use that and see if that works as well.
Are you sure that this "CustomScrum.RoadmapGroup" key is a string? If yes, then make sure that it doesn't contain any special unreadable character. You can just copy this value while debugging, put it in Watch window and check length/bytes representation, then do the same for hand-written string with the same content.

C# Hash Function for Dictionary Lookup [duplicate]

Given the following class
public class Foo
{
public int FooId { get; set; }
public string FooName { get; set; }
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
public override int GetHashCode()
{
// Which is preferred?
return base.GetHashCode();
//return this.FooId.GetHashCode();
}
}
I have overridden the Equals method because Foo represent a row for the Foos table. Which is the preferred method for overriding the GetHashCode?
Why is it important to override GetHashCode?
Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T>, etc - since this is used (in the absence of a custom IEqualityComparer<T>) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).
The GetHashCode() method should reflect the Equals logic; the rules are:
if two things are equal (Equals(...) == true) then they must return the same value for GetHashCode()
if the GetHashCode() is equal, it is not necessary for them to be the same; this is a collision, and Equals will be called to see if it is a real equality or not.
In this case, it looks like "return FooId;" is a suitable GetHashCode() implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so that new Foo(3,5) has a different hash-code to new Foo(5,3)):
In modern frameworks, the HashCode type has methods to help you create a hashcode from multiple values; on older frameworks, you'd need to go without, so something like:
unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
int hash = 13;
hash = (hash * 7) + field1.GetHashCode();
hash = (hash * 7) + field2.GetHashCode();
...
return hash;
}
Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GetHashCode.
A demonstration of what happens when you get this wrong is here.
It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.
I finally found a solution to this problem when I was working with NHibernate.
My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.
By overriding Equals you're basically stating that you know better how to compare two instances of a given type.
Below you can see an example of how ReSharper writes a GetHashCode() function for you. Note that this snippet is meant to be tweaked by the programmer:
public override int GetHashCode()
{
unchecked
{
var result = 0;
result = (result * 397) ^ m_someVar1;
result = (result * 397) ^ m_someVar2;
result = (result * 397) ^ m_someVar3;
result = (result * 397) ^ m_someVar4;
return result;
}
}
As you can see it just tries to guess a good hash code based on all the fields in the class, but if you know your object's domain or value ranges you could still provide a better one.
Please donĀ“t forget to check the obj parameter against null when overriding Equals().
And also compare the type.
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
The reason for this is: Equals must return false on comparison to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx
How about:
public override int GetHashCode()
{
return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}
Assuming performance is not an issue :)
As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.
// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();
In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets.
Just to add on above answers:
If you don't override Equals then the default behavior is that references of the objects are compared. The same applies to hashcode - the default implmentation is typically based on a memory address of the reference.
Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.
Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).
In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys -
you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.
It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).
We have two problems to cope with.
You cannot provide a sensible GetHashCode() if any field in the
object can be changed. Also often a object will NEVER be used in a
collection that depends on GetHashCode(). So the cost of
implementing GetHashCode() is often not worth it, or it is not
possible.
If someone puts your object in a collection that calls
GetHashCode() and you have overrided Equals() without also making
GetHashCode() behave in a correct way, that person may spend days
tracking down the problem.
Therefore by default I do.
public class Foo
{
public int FooId { get; set; }
public string FooName { get; set; }
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
public override int GetHashCode()
{
// Some comment to explain if there is a real problem with providing GetHashCode()
// or if I just don't see a need for it for the given class
throw new Exception("Sorry I don't know what GetHashCode should do for this class");
}
}
Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).
It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:
public override int GetHashCode()
{
return base.GetHashCode();
}
(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)
When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:
class A
{
public int Value;
public override int GetHashCode()
{
return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
}
}
On the other hand, if Value can't be changed it's ok to use:
class A
{
public readonly int Value;
public override int GetHashCode()
{
return Value.GetHashCode(); //OK Value is read-only and can't be changed during the instance's life time
}
}
You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code. As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary. I would advice you to always follow this rule though. The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.
For example, suppose we keep some objects in a List. Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example. This is when we can get into trouble. List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). If the two behave differently, so will your program. And bear in mind that such issues are not the easiest to troubleshoot.
I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.
As of C# 9(.net 5 or .net core 3.1), you may want to use records as it does Value Based Equality by default.
It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.
EDITED:
That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values. Though objects that are equal return the same hash code.
Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).
public int getHashCode()
{
PropertyInfo[] theProperties = this.GetType().GetProperties();
int hash = 31;
foreach (PropertyInfo info in theProperties)
{
if (info != null)
{
var value = info.GetValue(this,null);
if(value != null)
unchecked
{
hash = 29 * hash ^ value.GetHashCode();
}
}
}
return hash;
}

How is the C# Dictionary used this way in C# in depth 2nd Edition?

I have come across the following code in C# in Depth 2nd Edition by Jon Skeet and I don't understand how it works.
Dictionary<string,int> frequencies;
frequencies = new Dictionary<string,int>();
string[] words = Regex.Split(text, #"\W+");
foreach (string word in words)
{
if (frequencies.ContainsKey(word))
{
frequencies[word]++;
}
else
{
frequencies[word] = 1;
}
}
Specifically how does the "word" key get added to the dictionary? As I see it, a new dictionary is created called frequencies, it is empty. There is then a method to split a string called text into an array of string using Regex.Split. So far all good. Next there is a foreach loop which loops through the array, but the next part trips me up, it is checking if frequencies contains the particular word, if it does then increase the value of it by 1 or if it doesn't yet have a value set it to 1. But how does the dictionary get populated with the "word" key in the first place to allow it to be checked?
It looks to happen in this line
frequencies[word] = 1;
But I can't find a reference anywhere that says specifying a dictionary object followed by square brackets and an assignment to a value also populates the key. I thought you needed to use the add method of the dictionary instance or do so when initializing the dictionary.
If I am correct what is the name of this action?
frequencies[word] = 1;
is the same as calling
frequencies.Add(word, 1);
if the key word does not already exist. Otherwise you override the value.
When you call [something] on a dictionary you get a value by key something. The same goes for setting. When setting a value you can call dictionary[key] = value.
The function used is the [] operator (brackets operator).
I dove into the Object Browser and found this about the [] operator of the generic dictionary:
public TValue this[TKey key] { get; set; }
Member of System.Collections.Generic.Dictionary<TKey, TValue>
Summary: Gets or sets the value associated with the specified key.
Parameters: key: The key of the value to get or set.
Return Values: The value associated with the specified key. If the
specified key is not found, a get operation throws a
System.Collections.Generic.KeyNotFoundException, and a set operation
creates a new element with the specified key.
Exceptions: System.ArgumentNullException: key is null.
System.Collections.Generic.KeyNotFoundException: The property is
retrieved and key does not exist in the collection.

Implementation of Dictionary where equivalent contents are equal and return the same hash code regardless of order of insertion

I need to use Dictionary<long, string> collections that given two instances d1 and d2 where they each have the same KeyValuePair<long, string> contents, which could be inserted in any order:
(d1 == d2) evaluates to true
d1.GetHashCode() == d2.GetHashCode()
The first requirement was achieved most easily by using a SortedDictionary instead of a regular Dictionary.
The second requirement is necessary because I have one point where I need to store Dictionary<Dictionary<long, string>, List<string> - the main Dictionary type is used as the key for another Dictionary, and if the HashCodes don't evaluate based on identical contents, the using ContainsKey() will not work the way that I want (ie: if there is already an item inserted into the dictionary with d1 as its key, then dictionary.ContainsKey(d2) should evaluate to true.
To achieve this, I have created a new object class ComparableDictionary : SortedDictionary<long, string>, and have included the following:
public override int GetHashCode() {
StringBuilder str = new StringBuilder();
foreach (var item in this) {
str.Append(item.Key);
str.Append("_");
str.Append(item.Value);
str.Append("%%");
}
return str.ToString().GetHashCode();
}
In my unit testing, this meets the criteria for both equality and hashcodes. However, in reading Guidelines and Rules for GetHashCode, I came across the following:
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.
If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.
Remember, objects can be put into hash tables in ways that you didn't expect. A lot of the LINQ sequence operators use hash tables internally. Don't go dangerously mutating objects while enumerating a LINQ query that returns them!
Now, the Dictionary<ComparableDictionary, List<String>> is used only once in code, in a place where the contents of all ComparableDictionary collections should be set. Thus, according to these guidelines, I think that it would be acceptable to override GetHashCode as I have done (basing it completely on the contents of the dictionary).
After that introduction my questions are:
I know that the performance of SortedDictionary is very poor compared to Dictionary (and I can have hundreds of object instantiations). The only reason for using SortedDictionary is so that I can have the equality comparison work based on the contents of the dictionary, regardless of order of insertion. Is there a better way to achieve this equality requirement without having to use a SortedDictionary?
Is my implementation of GetHashCode acceptable based on the requirements? Even though it is based on mutable contents, I don't think that that should pose any risk, since the only place where it is using (I think) is after the contents have been set.
Note: while I have been setting these up using Dictionary or SortedDictionary, I am not wedded to these collection types. The main need is a collection that can store pairs of values, and meet the equality and hashing requirements defined out above.
Your GetHashCode implementation looks acceptable to me, but it's not how I'd do it.
This is what I'd do:
Use composition rather than inheritance. Aside from anything else, inheritance gets odd in terms of equality
Use a Dictionary<TKey, TValue> variable inside the dictionary
Implement GetHashCode by taking an XOR of the individual key/value pair hash codes
Implement equality by checking whether the sizes are the same, then checking every key in "this" to see if its value is the same in the other dictionary.
So something like this:
public sealed class EquatableDictionary<TKey, TValue>
: IDictionary<TKey, TValue>, IEquatable<ComparableDictionary<TKey, TValue>>
{
private readonly Dictionary<TKey, TValue> dictionary;
public override bool Equals(object other)
{
return Equals(other as ComparableDictionary<TKey, TValue>);
}
public bool Equals(ComparableDictionary<TKey, TValue> other)
{
if (ReferenceEquals(other, null))
{
return false;
}
if (Count != other.Count)
{
return false;
}
foreach (var pair in this)
{
var otherValue;
if (!other.TryGetValue(pair.Key, out otherValue))
{
return false;
}
if (!EqualityComparer<TValue>.Default.Equals(pair.Value,
otherValue))
{
return false;
}
}
return true;
}
public override int GetHashCode()
{
int hash = 0;
foreach (var pair in this)
{
int miniHash = 17;
miniHash = miniHash * 31 +
EqualityComparer<TKey>.Default.GetHashCode(pair.Key);
miniHash = miniHash * 31 +
EqualityComparer<Value>.Default.GetHashCode(pair.Value);
hash ^= miniHash;
}
return hash;
}
// Implementation of IDictionary<,> which just delegates to the dictionary
}
Also note that I can't remember whether EqualityComparer<T>.Default.GetHashCode copes with null values - I have a suspicion that it does, returning 0 for null. Worth checking though :)

Is a Dictionary's order the same if it has exactly the same content?

I know that the order of a dictionary is undefined, MSDN says so:
For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair structure representing a value and its key. The order in which the items are returned is undefined.
Thats fine, but if I have two instances of a dictionary, each with the same content, will the order be the same?
I'm guessing so because as I understand, the order is determined by the hash of the keys, and if the two dictionaries have the same keys, they have the same hashes, and therefore the same order...
... Right?
Thanks!
Andy.
No it is not guaranteed to be the same order. Imagine the scenario where you had several items in the Dictionary<TKey, TValue> with the same hash code. If they are added to the two dictionaries in different orders it will result in different orders in enumeration .
Consider for example the following (equality conforming) code
class Example
{
public char Value;
public override int GetHashCode()
{
return 1;
}
public override bool Equals(object obj)
{
return obj is Example && ((Example)obj).Value == Value;
}
public override string ToString()
{
return Value.ToString();
}
}
class Program
{
static void Main(string[] args)
{
var e1 = new Example() { Value = 'a' };
var e2 = new Example() { Value = 'b' };
var map1 = new Dictionary<Example, string>();
map1.Add(e1, "1");
map1.Add(e2, "2");
var map2 = new Dictionary<Example, string>();
map2.Add(e2, "2");
map2.Add(e1, "1");
Console.WriteLine(map1.Values.Aggregate((x, y) => x + y));
Console.WriteLine(map2.Values.Aggregate((x, y) => x + y));
}
}
The output of running this program is
12
21
Short version: No.
Long version:
[TestMethod]
public void TestDictionary()
{
Dictionary<String, Int32> d1 = new Dictionary<string, int>();
Dictionary<String, Int32> d2 = new Dictionary<string, int>();
d1.Add("555", 1);
d1.Add("abc2", 2);
d1.Add("abc3", 3);
d1.Remove("abc2");
d1.Add("abc2", 2);
d1.Add("556", 1);
d2.Add("555", 1);
d2.Add("556", 1);
d2.Add("abc2", 2);
d2.Add("abc3", 3);
foreach (var i in d1)
{
Console.WriteLine(i);
}
Console.WriteLine();
foreach (var i in d2)
{
Console.WriteLine(i);
}
}
Output:
[555, 1]
[abc2, 2]
[abc3, 3]
[556, 1]
[555, 1]
[556, 1]
[abc2, 2]
[abc3, 3]
If MSDN says its undefined you have to rely on that. The thing with undefined is it means that the implementation of the dictionary is allowed to store it in whatever order it wants. This means that a programmer should never make any assumptions about the order. I would probably assume personally without looking that the order of the elements in the dictionary would depend on the order they went in but I could be wrong. Whatever the answer is though if you are wanting some behaviour whereby the order is the same for both then you are doing it wrong.
"if the two dictionaries have the same
keys, they have the same hashes, and
therefore the same order..."
I do not think this is the case. Even if it might be true, I would not rely on this. If it's true it is an implementation detail, that might change, or be different on different implementations of the CLR or BCL (Mono comes to mind).
The Microsoft Dictionary implementation is a little complex, but from looking at the code for 5 minutes, I am willing to guess that the sequence of enumeration will be based on how the dictionary got to it's current state, including the number of resizes and insertion order.
If the spec says the order is "undefined", you can't depend on the order without explicitly ordering it. The underlying implementation may be changed at any time with a new release or service pack, just for starters. Your dictionary may be upcast from any number of concrete implementations as well.
And underlying implementation may be sensitive to the order of operations applied. Adding keys 'a', 'b' and 'c', in that order may result in a different data structure than adding the same set of keys in a different order (say, 'b','c', and 'a'). Deletions may likewise affect the data structure.
A straight binary tree, for instance, if used as the data structure behind a dictionary, if the keys are added in order, the net result is a highly unbalanced tree that is essentially a linked list. The tree will be more balance if nodes are inserted in random order.
And some data structure morph as operations are performed. If, for instance, a dictionary is implemented with the underlying data structure being a red/black tree, tree nodes will be split/rotated in order to keep the tree balanced as inserts and deletes occur. So the actual data structure then is highly dependent on the order of operations, even if the final contents are the same.
I don't know the specifics of Microsoft's implementation, but in general your assumption holds only if there are no two items in the dictionary that hash to the same value or if those entries that do collide are added in the same order.

Categories