Critique this C# Hashmap Implementation? - c#

I wrote a hashmap in C# as a self study exercise. I wanted to implement chaining as a collision handling technique. At first I thought I'd simply use GetHashCode as my hashing algorithm, but I quickly found that use the numbers returned by GetHashCode would not always be viable (size of the int causes a out of mem if you want to index and array by the number and numbers can be negative :(). So, I came up with a kludgey method of narrowing the numbers (see MyGetHashCode).
Does anyone have any pointers/tips/criticism for this implementation (of the hash function and in general)? Thanks in advance!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace HashMap
{
class Program
{
public class MyKVP<T, K>
{
public T Key { get; set; }
public K Value { get; set; }
public MyKVP(T key, K value)
{
Key = key;
Value = value;
}
}
public class MyHashMap<T, K> : IEnumerable<MyKVP<T,K>>
where T:IComparable
{
private const int map_size = 5000;
private List<MyKVP<T,K>>[] storage;
public MyHashMap()
{
storage = new List<MyKVP<T,K>>[map_size];
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public IEnumerator<MyKVP<T, K>> GetEnumerator()
{
foreach (List<MyKVP<T, K>> kvpList in storage)
{
if (kvpList != null)
{
foreach (MyKVP<T, K> kvp in kvpList)
{
yield return kvp;
}
}
}
}
private int MyGetHashCode(T key)
{
int i = key.GetHashCode();
if (i<0) i=i*-1;
return i / 10000;
}
public void Add(T key, K data)
{
int value = MyGetHashCode(key);
SizeIfNeeded(value);
//is this spot in the hashmap null?
if (storage[value] == null)
{
//create a new chain
storage[value] = new List<MyKVP<T, K>>();
storage[value].Add(new MyKVP<T, K>(key, data));
}
else
{
//is this spot taken?
MyKVP<T, K> myKvp = Find(value, key);
if (myKvp != null) //key exists, throw
{
throw new Exception("This key exists. no soup for you.");
}
//if we didn't throw, then add us
storage[value].Add(new MyKVP<T, K>(key, data));
}
}
private MyKVP<T, K> Find(int value, T key)
{
foreach (MyKVP<T, K> kvp in storage[value])
{
if (kvp.Key.CompareTo(key) == 0)
{
return kvp;
}
}
return null;
}
private void SizeIfNeeded(int value)
{
if (value >= storage.Length)
{
List<MyKVP<T, K>>[] temp = storage;
storage = new List<MyKVP<T, K>>[value+1];
Array.Copy(temp, storage, temp.Length);
}
}
public K this[T key]
{
get
{
int value = MyGetHashCode(key);
if (value > storage.Length) { throw new IndexOutOfRangeException("Key does not exist."); }
MyKVP<T, K> myKvp = Find(value, key);
if (myKvp == null) throw new Exception("key does not exist");
return myKvp.Value;
}
set
{
Add(key, value);
}
}
public void Remove(T key)
{
int value = MyGetHashCode(key);
if (value > storage.Length) { throw new IndexOutOfRangeException("Key does not exist."); }
if (storage[value] == null) { throw new IndexOutOfRangeException("Key does not exist."); }
//loop through each kvp at this hash location
MyKVP<T, K> myKvp = Find(value, key);
if (myKvp != null)
{
storage[value].Remove(myKvp);
}
}
}
static void Main(string[] args)
{
MyHashMap<string, int> myHashMap = new MyHashMap<string, int>();
myHashMap.Add("joe", 1);
myHashMap.Add("mike", 2);
myHashMap.Add("adam", 3);
myHashMap.Add("dad", 4);
Assert.AreEqual(1, myHashMap["joe"]);
Assert.AreEqual(4, myHashMap["dad"]);
Assert.AreEqual(2, myHashMap["mike"]);
Assert.AreEqual(3, myHashMap["adam"]);
myHashMap.Remove("joe");
try
{
if (myHashMap["joe"] == 3) { }; //should throw
}
catch (Exception)
{
try { myHashMap.Add("mike",1); }
catch (Exception) {
foreach (MyKVP<string, int> kvp in myHashMap)
{
Console.WriteLine(kvp.Key + " " + kvp.Value.ToString());
}
return;
}
}
throw new Exception("fail");
}
}
}

Your hash method is of a fixed range. This means that a single item could cause 214748 buckets to be created (if it's hashcode rehashed to 214747). A more commonly used (and almost always better approach) is to start with an initial size that is either known (due to knowledge of the domain) to be big enough for all values or to start small and have hashmap resize itself as appropriate. With re-probing the obvious measure of a need to resize is how much reprobing was needed. With chaining as you are experimenting with here, you'll want to keep both average and maximum chain sizes down. This keeps down your worse-case lookup time, and hence your average lookup time closer to the best-case O(1).
The two most common approaches to such hashing (and hence to initial table size) is to either use prime numbers or powers of two. The former is considered (though there is some contention on the point) to offer better distribution of keys while the latter allows for faster computation (both cases do a modulo on the input-hash, but with a number known to be a power of 2, the modulo can be quickly done as a binary-and operation). Another advantage of using a power of two when you are chaining, is that its possible to test a chain to see if resizing the hash would actually cause that chain to be split or not (if you have an 8-value table and there's a chain whose hashes are all either 17, 1 or 33 then doubling the table size would still leave them in the same chain, but quadrupling it would re-distribute them).
You don't have a method offering replace semantics, which is usual with .NET dictionary types (where adding will error if there's already an item with that key, but assigning to an index won't).
Your error on a retrieval that would try to go beyond the number of buckets will make no sense to the user, who doesn't care whether the bucket existed or not, only the key (they need not know how your implementation works at all). Both cases where a key isn't found should throw the same error (System.Collections.Generic.KeyNotFoundException has precisely the right semantics, so you could reuse that.).
Using a List is rather heavy in this case. Generally I'd frown on anyone saying a BCL collection was too heavy, but when it comes to rolling your own collections, its generally either because (1) you want to learn from the exercise or (2) the BCL collections don't suit your purposes. In case (1) you should learn how to complete the job you started, and in case (2) you need to be sure that List doesn't have whatever failing you found with Dictionary.
Your removal both throws a nonsensical error for someone who doesn't know about the implementation details, and an inconsistent error (whether something else existed in that bucket is not something they should care about). Since removing a non-existent item isn't harmful it is more common to merely return a bool indicating whether the item had been present or not, and let the user decide if that indicates an error or not. It is also wasteful in continuing to search the entire bucket after the item has been removed.
Your implementation does now allow null keys, which is reasonable enough (indeed, the documentation for IDictionary<TKey, TValue> says that implementations may or may not do so). However, the way you reject them is by having the NullReferenceException caused by trying to call GetHashCode() on null be returned, rather than checking and throwing a ArgumentNullException. For the user to receive a NullReferenceException suggests that the collection itself was null. This is hence a clear bug.

A Remove method should never throw an exception. You are trying to remove an item. No harm is done if it have already been removed. All collection classes in .Net uses bool as a return value to indicate if an item was really removed.
Do not throw Exception, throw specific one. Browse through all exceptions in the Collection namespaces to find suitable ones.
Add a TryGetValue
Use KeyValuePair which already is a part of .Net instead of creating your own.
Add a constructor which can define map size.
When throwing exceptions include details to why it was thrown. For instance, instead of writing "This key exists", write string.Format("Key '{0}' already exists", key)

Sorry to say this, but this class won't be working as HashMap or even simple dictionary.
First of all, value returned from GetHashCode() is not unique. Two different objects, e.g. two strings, can possibly return same hash code value. The idea to use hash code as the array index then simply leads to record loss in case of hash code clashing. I would suggest reading about GetHashCode() method and how to implement it from MSDN. Some obvious example is if you get hash code of all possible Int64 values starting at 0, the hash code will surely be clashed at some point.
Another thing is, the for-loop lookup is slow. You should consider using binary search for look up. To do so, you must maintained your key-value pair sorted by the key at any time, which imply that you should use List instead of array for the storage variable so when adding new key-value pair you can insert it at the appropriate index.
After all, make sure that when you are coding for real hash map, you realized that hash code can be the same for different keys, and never do the look up with for-loop from 0 to len-1.

Related

Best way to store list of string pairs for optimal query performance

Right now I use Dictionary to store some configuration data in my app. The data gets added to Dictionary only once but it gets very frequent queries. Dictionary has around 2500 items, all "keys" are unique.
So right now I have something like this:
private Dictionary<string, string> Data;
public string GetValue(string key) // This gets hit very often
{
string value;
if (this.Data.TryGetValue(key, out value))
{
return value;
}
...
}
Is there more optimal way to do this?
What you have is pretty efficient. The only way to improve performance that I can think of is to use int as the dictionary key, instead of string. You would need to run performance tests to see how much it makes a difference in your use case -- it may or may not be significant.
And I would use an enum for storing the settings for convenience. Of course, this assumes you have a known set of settings.
private Dictionary<int, string> Data;
public string GetValue(MyAppSettingsEnum key)
{
string value;
if (this.Data.TryGetValue((int)key, out value))
{
return value;
}
...
}
Note that I don't use the enum directly as the dictionary key, as it is more efficient to use an int as the key. More details on that issue here.
Using TryGetValue is a pretty optimal way of returning an item so there's not much you can improve on that front. However, if this isn't causing a bottleneck at the moment, I wouldn't worry too much about trying to optimize TryGetValue.
One thing that you can do, but isn't shown in your code so I don't know if you are, is to create a Dictionary object with an estimated capacity. Since you seem to know the rough number of items that will be expected, creating the Dictionary with that capacity will improve performance as it would reduce the number of times .NET has to resize the dictionary.
From MSDN:
If the size of the collection can be estimated, specifying the initial
capacity eliminates the need to perform a number of resizing
operations while adding elements to the Dictionary.
The only faster way is using an array if your keys are int and have a short range.
As you can see from the source code of System.Collections.Generic.Dictionary (available at http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs) the most frequent code used in your case is
private int FindEntry(TKey key) {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}
}
return -1;
}
as you can see further the lookup is fast if the comparer.GetHashCode is fast and produces nice hash code distribution, if possible a perfect hash function.
The dictionary construction code is not visible in your example, but if you use the default constructor then the dictionary will use the default comparer EqualityComparer<string>.Default.
Providing your own comparer with time & space efficient hash function might speed up the code.
If you don't know how a good hash function should look like in your case, then using interned strings may also give you some boost (see http://www.dotnetperls.com/string-intern (or MSDN: String.Intern Method))

Is there any built in collection type or IEqualityComparer<T> for collection which bases equality on the items in it?

Is there any built in collection type (IEnumerable<S>) or IEqualityComparer<T> for an IEnumerable<S> in the framework that has it's Equals (and GetHashCode accordingly) defined by the equality of the items in it?
Something like:
var x = new SomeCollection { 1, 2, 3 };
var y = new SomeCollection { 1, 2, 3 };
// so that x.Equals(y) -> true
// and x.Shuffle().Equals(y) -> false
Or a
class SomeComparer<T> : EqalityComparer<IEnumerable<T>> { }
// so that for
var x = new[] { 1, 2, 3 };
var y = new[] { 1, 2, 3 };
// gives
// new SomeComparer<int>().Equals(x, y) -> true
// new SomeComparer<int>().Equals(x.Shuffle(), y) -> false
? My question is, is there something in the framework that behaves like SomeCollection or SomeComparer<T> as shown in the code?
Why I need it: because I have a case for a Dictionary<Collection, T> where the Key part should be a collection and its equality is based on its entries.
Requirements:
Collection need be only a simple enumerable type with Add method
Order of items is important
Duplicate items can exist in the collection
Note: I can write one my own, it's trivial. There are plenty of questions on SO helping with that. I'm asking is there a class in the framework itself.
Just keep it simple. Just use the Dictionary ctor that takes in a specialized IEqualityComparer (just implement your equality logic in a comparer) and you are good to go. No need for special collection types and so on...
See here
If you can, it may be better to define your own immutable collection class which accepts an IEqualityComparer<T> as a constructor parameter, and have its Equals and GetHashCode() members chain to those of the underlying collection, than to try to define an IEqualityComparer<T> for the purpose. Among other things, your immutable collection class would be able to cache its own hash value, and possibly the hash values for the items contained therein. This would accelerate not only calls to GetHashCode() on the collection, but also comparisons between two collections. If two collections' hashcodes are unequal, there's no point in checking anything further; even if two collections' hashcodes are equal, it may be worthwhile to check that the hashcodes of corresponding items match before testing the items themselves for equality [note that in general, using a hash-code test as an early exit before checking equality is not particularly helpful, because the slowest Equals case (where the items match) is the one where hash codes are going to match anyway; here, however, if all but the last item match, testing the hash code of the items may find the mismatch before one has spent time inspecting each item in detail.
Starting in .NET 4.0, it became possible to write an IEqualityComparer<T> which could achieve the performance advantage of an immutable collection class which caches hash values, by using a ConditionalWeakTable to map collections to objects which would cache information about them. Nonetheless, unless one is unable to use a custom immutable-collection class, I think such a class would probably be better than an IEqualityComparer<T> in this scenario anyway.
I do not beleive that such a thing exists. I had a need to compare two dictionary's contents for equality and wrote this awhile back.
public class DictionaryComparer<TKey, TValue> : EqualityComparer<IDictionary<TKey, TValue>>
{
public DictionaryComparer()
{
}
public override bool Equals(IDictionary<TKey, TValue> x, IDictionary<TKey, TValue> y)
{
// early-exit checks
if (object.ReferenceEquals(x, y))
return true;
if (null == x || y == null)
return false;
if (x.Count != y.Count)
return false;
// check keys are the same
foreach (TKey k in x.Keys)
if (!y.ContainsKey(k))
return false;
// check values are the same
foreach (TKey k in x.Keys)
{
TValue v = x[k];
if (object.ReferenceEquals(v, null))
return object.ReferenceEquals(y[k], null);
if (!v.Equals(y[k]))
return false;
}
return true;
}
public override int GetHashCode(IDictionary<TKey, TValue> obj)
{
if (obj == null)
return 0;
int hash = 0;
foreach (KeyValuePair<TKey, TValue> pair in obj)
{
int key = pair.Key.GetHashCode(); // key cannot be null
int value = pair.Value != null ? pair.Value.GetHashCode() : 0;
hash ^= ShiftAndWrap(key, 2) ^ value;
}
return hash;
}
private static int ShiftAndWrap(int value, int positions)
{
positions = positions & 0x1F;
// Save the existing bit pattern, but interpret it as an unsigned integer.
uint number = BitConverter.ToUInt32(BitConverter.GetBytes(value), 0);
// Preserve the bits to be discarded.
uint wrapped = number >> (32 - positions);
// Shift and wrap the discarded bits.
return BitConverter.ToInt32(BitConverter.GetBytes((number << positions) | wrapped), 0);
}
}

Fast collection comparison

I have the following data type:
ISet<IEnumerable<Foo>>
So, I need to be able to create sets of sequences. E.g. this is ok:
ABC,AC,A
but this is not (since "AB" is repeated here"):
AB,A,ABC,BCA,AB
But, in order to do this - for "set" to not contain duplicates, I need to wrap my IEnumerable in some kind of other data type:
ISet<Seq>
//where
Seq : IEnumerable<Foo>, IEquatable<Seq>
Thus, I will be able to compare two sequences, and provide the Set data structure with a way of eliminating duplicates.
My question is: is there a fast data structure that allows for comparing sequences? I am thinking that somehow when Seq gets created, or added two, some kind of cumulative value is computed.
In other words, is it possible to implement Seq in such a way that I could do this:
var seq1 = new Seq( IList<Foo> );
var seq2 = new Seq( IList<Foo> )
seq1.equals(seq2) // O(1)
Thanks.
I have provided an implementation your sequence below. There are several points to note:
This only works if the IEnumerable<T> returns the same items every time it is enumerated, and that those items are not mutated during the scope of this object.
The hash code is cached. The first time it is requested it calculated it (feel free to improve the hash code algorithm if you know a better one) based on a full iteration of the underlying sequence. Because it only needs to be calculated once, this can be effectively considered O(1) if you compute it often. It's likely that adding to the set will be a bit slower (first time computation of the hash value) but searching or removing will be very quick.
The equals method first compares the hash codes. If the hash codes are different then the objects cannot possibly be equal (if the hash codes were properly implemented on all objects in the sequence, and nothing was mutated). As long as you have a low rate of collision, and are usually comparing items that aren't actually equal, this means that equals checks will not often get past that hash code check. If they do, an iteration of the sequence is needed (there is no way around that). Because of that the equals is likely to average O(1), even though its worst case is still O(n).
public class Foo : IEnumerable
{
private IEnumerable sequence;
private int? myHashCode = null;
public Foo(IEnumerable<T> sequence)
{
this.sequence = sequence;
}
public IEnumerator<T> GetEnumerator()
{
return sequence.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return sequence.GetEnumerator();
}
public override bool Equals(object obj)
{
Foo<T> other = obj as Foo<T>;
if(other == null)
return false;
//if the hash codes are different we don't need to bother doing a deep equals check
//the hash code is cached, so it's fast.
if (GetHashCode() != obj.GetHashCode())
return false;
return Enumerable.SequenceEqual(sequence, other.sequence);
}
public override int GetHashCode()
{
//note that the hash code is cached, so the underlying sequence
//needs to not change.
return myHashCode ?? populateHashCode();
}
private int populateHashCode()
{
int somePrimeNumber = 37;
myHashCode = 1;
foreach (T item in sequence)
{
myHashCode = (myHashCode * somePrimeNumber) + item.GetHashCode();
}
return myHashCode.Value;
}
}
O(1) essentially mean you are not allowed to compare values of elements. If you can represent sequence as list of immutable objects (with caching on add so there is no duplicates across all instances) you can achieve it as you'd only need to compare first element - similar how string interning works.
Insert will have to search for all instances of elements for "current"+"with this next" element. Some sort of dictionary may be reasonable approach...
EDIT: I think it simply tried to come up with suffix tree.

Implementation of Dictionary where equivalent contents are equal and return the same hash code regardless of order of insertion

I need to use Dictionary<long, string> collections that given two instances d1 and d2 where they each have the same KeyValuePair<long, string> contents, which could be inserted in any order:
(d1 == d2) evaluates to true
d1.GetHashCode() == d2.GetHashCode()
The first requirement was achieved most easily by using a SortedDictionary instead of a regular Dictionary.
The second requirement is necessary because I have one point where I need to store Dictionary<Dictionary<long, string>, List<string> - the main Dictionary type is used as the key for another Dictionary, and if the HashCodes don't evaluate based on identical contents, the using ContainsKey() will not work the way that I want (ie: if there is already an item inserted into the dictionary with d1 as its key, then dictionary.ContainsKey(d2) should evaluate to true.
To achieve this, I have created a new object class ComparableDictionary : SortedDictionary<long, string>, and have included the following:
public override int GetHashCode() {
StringBuilder str = new StringBuilder();
foreach (var item in this) {
str.Append(item.Key);
str.Append("_");
str.Append(item.Value);
str.Append("%%");
}
return str.ToString().GetHashCode();
}
In my unit testing, this meets the criteria for both equality and hashcodes. However, in reading Guidelines and Rules for GetHashCode, I came across the following:
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.
If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.
Remember, objects can be put into hash tables in ways that you didn't expect. A lot of the LINQ sequence operators use hash tables internally. Don't go dangerously mutating objects while enumerating a LINQ query that returns them!
Now, the Dictionary<ComparableDictionary, List<String>> is used only once in code, in a place where the contents of all ComparableDictionary collections should be set. Thus, according to these guidelines, I think that it would be acceptable to override GetHashCode as I have done (basing it completely on the contents of the dictionary).
After that introduction my questions are:
I know that the performance of SortedDictionary is very poor compared to Dictionary (and I can have hundreds of object instantiations). The only reason for using SortedDictionary is so that I can have the equality comparison work based on the contents of the dictionary, regardless of order of insertion. Is there a better way to achieve this equality requirement without having to use a SortedDictionary?
Is my implementation of GetHashCode acceptable based on the requirements? Even though it is based on mutable contents, I don't think that that should pose any risk, since the only place where it is using (I think) is after the contents have been set.
Note: while I have been setting these up using Dictionary or SortedDictionary, I am not wedded to these collection types. The main need is a collection that can store pairs of values, and meet the equality and hashing requirements defined out above.
Your GetHashCode implementation looks acceptable to me, but it's not how I'd do it.
This is what I'd do:
Use composition rather than inheritance. Aside from anything else, inheritance gets odd in terms of equality
Use a Dictionary<TKey, TValue> variable inside the dictionary
Implement GetHashCode by taking an XOR of the individual key/value pair hash codes
Implement equality by checking whether the sizes are the same, then checking every key in "this" to see if its value is the same in the other dictionary.
So something like this:
public sealed class EquatableDictionary<TKey, TValue>
: IDictionary<TKey, TValue>, IEquatable<ComparableDictionary<TKey, TValue>>
{
private readonly Dictionary<TKey, TValue> dictionary;
public override bool Equals(object other)
{
return Equals(other as ComparableDictionary<TKey, TValue>);
}
public bool Equals(ComparableDictionary<TKey, TValue> other)
{
if (ReferenceEquals(other, null))
{
return false;
}
if (Count != other.Count)
{
return false;
}
foreach (var pair in this)
{
var otherValue;
if (!other.TryGetValue(pair.Key, out otherValue))
{
return false;
}
if (!EqualityComparer<TValue>.Default.Equals(pair.Value,
otherValue))
{
return false;
}
}
return true;
}
public override int GetHashCode()
{
int hash = 0;
foreach (var pair in this)
{
int miniHash = 17;
miniHash = miniHash * 31 +
EqualityComparer<TKey>.Default.GetHashCode(pair.Key);
miniHash = miniHash * 31 +
EqualityComparer<Value>.Default.GetHashCode(pair.Value);
hash ^= miniHash;
}
return hash;
}
// Implementation of IDictionary<,> which just delegates to the dictionary
}
Also note that I can't remember whether EqualityComparer<T>.Default.GetHashCode copes with null values - I have a suspicion that it does, returning 0 for null. Worth checking though :)

High Runtime for Dictionary.Add for a large amount of items

I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary.
So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed.
Here is my function:
private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections)
{
Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count);
foreach (NodeConnection con in connections)
{
...
resultSet.Add(nodeIdPair, newEdge);
}
return resultSet;
}
In my tests, I insert ~300k items.
I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add).
This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item.
Any idea, how to make the add-operation faster?
Thanks in advance,
Frank
Here is my IdPair-Struct:
public struct IdPair
{
public int id1;
public int id2;
public IdPair(int oneId, int anotherId)
{
if (oneId > anotherId)
{
id1 = anotherId;
id2 = oneId;
}
else if (anotherId > oneId)
{
id1 = oneId;
id2 = anotherId;
}
else
throw new ArgumentException("The two Ids of the IdPair can't have the same value.");
}
}
Since you have a struct, you get the default implementation of Equals() and GetHashCode(). As others have pointed out, this is not very efficient since it uses reflection, but I don't think the reflection is the issue.
My guess is that your hash codes get distributed unevenly by the default GetHashCode(), which could happen, for example, if the default implementation returns a simple XOR of all members (in which case hash(a, b) == hash(b, a)). I can't find any documentation of how ValueType.GetHashCode() is implemented, but try adding
public override int GetHashCode() {
return oneId << 16 | (anotherId & 0xffff);
}
which might be better.
IdPair is a struct, and you haven't overridden Equals or GetHashCode. This means that the default implementation of those methods will be used.
For value-types the default implementation of Equals and GetHashCode uses reflection, which is likely to result in poor performance. Try providing your own implementation of the methods and see if that helps.
My suggested implementation, it might not be exactly what you need/want:
public struct IdPair : IEquatable<IdPair>
{
// ...
public override bool Equals(object obj)
{
if (obj is IdPair)
return Equals((IdPair)obj);
return false;
}
public bool Equals(IdPair other)
{
return id1.Equals(other.id1)
&& id2.Equals(other.id2);
}
public override int GetHashCode()
{
unchecked
{
int hash = 269;
hash = (hash * 19) + id1.GetHashCode();
hash = (hash * 19) + id2.GetHashCode();
return hash;
}
}
}

Categories