How do I use HashSet<T> as a dictionary key? - c#

I wish to use HashSet<T> as the key to a Dictionary:
Dictionary<HashSet<T>, TValue> myDictionary = new Dictionary<HashSet<T>, TValue>();
I want to look up values from the dictionary such that two different instances of HashSet<T> that contain the same items will return the same value.
HashSet<T>'s implementations of Equals() and GetHashCode() don't seem to do this (I think they're just the defaults). I can override Equals() to use SetEquals() but what about GetHashCode()? I feel like I am missing something here...

You could use the set comparer provided by HashSet<T>:
var myDictionary = new Dictionary<HashSet<T>, TValue>(HashSet<T>.CreateSetComparer());

digEmAll's answer is clearly the better choice in practice, since it uses built in code instead of reinventing the wheel. But I'll leave this as a sample implementation.
You can use implement an IEqualityComparer<HashSet<T>> that uses SetEquals. Then pass it to the constructor of the Dictionary. Something like the following(Didn't test it):
class HashSetEqualityComparer<T>: IEqualityComparer<HashSet<T>>
{
public int GetHashCode(HashSet<T> hashSet)
{
if(hashSet == null)
return 0;
int h = 0x14345843; //some arbitrary number
foreach(T elem in hashSet)
{
h = unchecked(h + hashSet.Comparer.GetHashCode(elem));
}
return h;
}
public bool Equals(HashSet<T> set1, HashSet<T> set2)
{
if(set1 == set2)
return true;
if(set1 == null || set2 == null)
return false;
return set1.SetEquals(set2);
}
}
Note that the hash function here is commutative, that's important because the enumeration order of the elements in the set is undefined.
One other interesting point is that you can't just use elem.GetHashCode since that will give wrong results when a custom equality comparer was supplied to the set.

You can provide a IEqualityComparer<HashSet<T>> to the Dictionary constructor and make the desired implementation in that comparer.

Related

Creating a custom equality comparer for IEnumerables<T> when T is IEnumerable

I want to have a custom equality comparer IEnumerables. using #PaulZahra's code, I created the following class:
class CustomEqualityComparer<T> : IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
{
var enumerables = new Dictionary<T, uint>();
foreach (T item in x)
{
enumerables.Add(item, 1);
}
foreach (T item in y)
{
if (enumerables.ContainsKey(item))
{
enumerables[item]--;
}
else
{
return false;
}
}
return enumerables.Values.All(v => v == 0);
}
public int GetHashCode(IEnumerable<T> obj) => obj.GetHashCode();
}
The problem is that if T itself is an IEnumerable, then ContainsKey will check for reference equality, while the point of this equality comparer is to check for value equality at any given depth.
I thought to use .Keys.Contains() instead, since it can accept an IEqualityComparer as an argument:
if (enumerables.Keys.Contains(item, this)) // not sure if "this" or a new object
but I get the following error (CS1929):
'Dictionary.KeyCollection' does not contain a definition for 'Contains' and the best extension method overload 'Queryable.Contains(IQueryable, T, IEqualityComparer)' requires a receiver of type 'IQueryable'
I am not sure how to approach this. How to fix it? Thanks.
Edit: Note that this comparer doesn't care about order.
As others have mentioned, IEnumerable<T> can enumerate forever, so it's dangerous to do this on that interface. I'd recommend using ICollection<T> instead- it has a fixed size. And you'll find it will work for most any type you'd want to use anyway.
Furthermore, I'd recommend using TryGetValue to reduce the number of times you need to look up into the dictionary.
Your code is not correctly keeping the count of each item in the first enumerable.
GetHashCode needs to take into account every member of the enumerable.
All that being said, here is an adjustment of your implementation:
class CustomEqualityComparer<T> : IEqualityComparer<ICollection<T>>
{
public bool Equals(ICollection<T> x, ICollection<T> y)
{
if (x.Count != y.Count) return false;
var enumerables = new Dictionary<T, uint>(x.Count);
foreach (T item in x)
{
enumerables.TryGetValue(item, out var value);
enumerables[item] = value + 1;
}
foreach (T item in y)
{
var success = enumerables.TryGetValue(item, out var value);
if (success)
{
enumerables[item] = value - 1;
}
else
{
return false;
}
}
return enumerables.Values.All(v => v == 0);
}
public int GetHashCode(ICollection<T> obj)
{
unchecked
{
var hashCode = 0;
foreach(var item in obj)
{
hashCode += (item != null ? item.GetHashCode() : 0);
}
return hashCode;
}
}
}
To have such recursive comparer you simply need pass proper comparer to Dictionary if T is enumerable. I think getting type T from IEnumerable<T> and then equivalent of new Dictionary<U, uint>(new CustomEqualityComparer<U>) (using Create instance of generic type?) should achieve what you want.
Notes:
you must provide correct implementation of GetHashCode that matches Equals if you use comparer for any dictionary/HashSet. Default Equals for sequences is reference compare that does not align with your Equals. Note that most implementation of GetHashCode depend on order of the items in the collection - so you need to find one that works for sets. I.e. sum of hash codes of each item would do, probably making distribution slightly worse.
you may want to LINQ set operations instead of doing them by hand. All operations like Distinct already take comparers. In case of "sets are the same" you can use Distinct - x.Distinct(y, comparerBuiltViaReflection)
Beware of limitations of such code: not every enumerable can be enumerated more than once (user input, network streams,..) or may produce different result on re-iteration (while(count < 10){ count ++; yield return random.Next(); }), cost of re-iteartion many be significant (re-read all lines in huge file on each iteration) or enumerable can represent infinite sequence (while(true){ yield return count++; }).

Sorting a list with two parameters using CompareTo

I am presently sorting a C# list using the 'CompareTo' method in the object type contained in the list. I want to sort ascendingly all items by their WBS (Work Breakdown Structure) and I can manage this very well using the following code:
public int CompareTo(DisplayItemsEntity other)
{
string[] instanceWbsArray = this.WBS.Split('.');
string[] otherWbsArray = other.WBS.Split('.');
int result = 0;
for (int i = 0; i < maxLenght; i++)
{
if (instanceWbsArray[i].Equals(otherWbsArray[i]))
{
continue;
}
else
{
result = Int32.Parse(instanceWbsArray[i]).CompareTo(Int32.Parse(otherWbsArray[i]));
break;
}
}
return result;
}
Now, I would like to be able to sort considering more than one parameter, as in the project name alphabetically, before considering the second which would be the WBS. How can I do this?
I don't know the details of your class, so I'll provide an example using a list of strings and LINQ. OrderBy will order the strings alphabetically, ThenBy will order them by their lengths afterwards. You can adapt this sample to your needs easily enough.
var list = new List<string>
{
"Foo",
"Bar",
"Foobar"
};
var sortedList = list.OrderBy(i => i).
ThenBy(i => i.Length).
ToList();
What we generally do in cases like yours is this:
public int CompareTo( SomeClass other )
{
int result = this.SomeProperty.CompareTo( other.SomeProperty );
if( result != 0 )
return result;
result = this.AnotherProperty.CompareTo( other.AnotherProperty );
if( result != 0 )
return result;
[...]
return result;
}
P.S.
When posting code, please try to include only the code which is pertinent to your question. There is a load of stuff in the code that you posted that I did not need to read, and that in fact made my eyes hurt.
I like Eve's answer because of it's flexibility but I'm kinda surprised that no-one has mentioned creating a custom IComparer<T> instance
IComparer<T> is a generic interface that defines a method for comparing two instances of the type T. The advantage of using IComparer<T> is that you can create implementations for each sort order you commonly use and then use these as and when necessary. This allows you to create a default sort order in the types CompareTo() method and define alternative orders separately.
E.g.
public class MyComparer
: IComparer<YourType>
{
public int Compare(YourType x, YourType y)
{
//Add your comparison logic here
}
}
IComparer<T> is particularly useful for composition where you can do things like have a comparer which compares some properties of a given type using another comparer that operates on the type of the property.
It's also very useful if you ever need to define a sorting on a type you don't control. Another advantage it has is it doesn't require LINQ so can be used in older code (.Net 2.0 onwards)
First compare the project name alphabetically and if they are not equal return value, if not perform comparison based on second value
public int CompareTo(DisplayItemsEntity other)
{
if(other.ProjectName.CompareTo(this.ProjectName) != 0)
{
return other.ProjectName.CompareTo(this.ProjectName)
}
//else do the second comparison and return
return result;
}

Implementing GetHashCode for IEqualityComparer<T> with conditional equality

I'm wondering if anyone as any suggestions for this problem.
I'm using intersect and except (Linq) with a custom IEqualityComparer in order to query the set differences and set intersections of two sequences of ISyncableUsers.
public interface ISyncableUser
{
string Guid { get; }
string UserPrincipalName { get; }
}
The logic behind whether two ISyncableUsers are equal is conditional. The conditions center around whether either of the two properties, Guid and UserPrincipalName, have values. The best way to explain this logic is with code. Below is my implementation of the Equals method of my customer IEqualityComparer.
public bool Equals(ISyncableUser userA, ISyncableUser userB)
{
if (userA == null && userB == null)
{
return true;
}
if (userA == null)
{
return false;
}
if (userB == null)
{
return false;
}
if ((!string.IsNullOrWhiteSpace(userA.Guid) && !string.IsNullOrWhiteSpace(userB.Guid)) &&
userA.Guid == userB.Guid)
{
return true;
}
if (UsersHaveUpn(userA, userB))
{
if (userB.UserPrincipalName.Equals(userA.UserPrincipalName, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
}
return false;
}
private bool UsersHaveUpn(ISyncableUser userA, ISyncableUser userB)
{
return !string.IsNullOrWhiteSpace(userA.UserPrincipalName)
&& !string.IsNullOrWhiteSpace(userB.UserPrincipalName);
}
The problem I'm having, is with implementing GetHashCode so that the above conditional equality, represented above, is respected. The only way I've been able to get the intersect and except calls to work as expected is to simple always return the same value from GetHashCode(), forcing a call to Equals.
public int GetHashCode(ISyncableUser obj)
{
return 0;
}
This works but the performance penalty is huge, as expected. (I've tested this with non-conditional equality. With two sets containing 50000 objects, a proper hashcode implementation allows execution of intercept and except in about 40ms. A hashcode implementation that always returns 0 takes approximately 144000ms (yes, 2.4 minutes!))
So, how would I go about implementing a GetHashCode() in the scenario above?
Any thoughts would be more than welcome!
If I'm reading this correctly, your equality relation is not transitive. Picture the following three ISyncableUsers:
A { Guid: "1", UserPrincipalName: "2" }
B { Guid: "2", UserPrincipalName: "2" }
C { Guid: "2", UserPrincipalName: "1" }
A == B because they have the same UserPrincipalName
B == C because they have the same Guid
A != C because they don't share either.
From the spec,
The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x; and true for two objects x and z if it is true for x and y and also true for y and z.
If your equality relation isn't consistent, there's no way you can implement a hash code that backs it up.
From another point of view: you're essentially looking for three functions:
G mapping GUIDs to ints (if you know the GUID but the UPN is blank)
U mapping UPNs to ints (if you know the UPN but the GUID is blank)
P mapping (guid, upn) pairs to ints (if you know both)
such that G(g) == U(u) == P(g, u) for all g and u. This is only possible if you ignore g and u completely.
If we suppose that your Equals implementation is correct, i.e. it's reflective, transitive and symmetric then the basic implementation for your GetHashCode function should look like this:
public int GetHashCode(ISyncableUser obj)
{
if (obj == null)
{
return SOME_CONSTANT;
}
if (!string.IsNullOrWhiteSpace(obj.UserPrincipalName) &&
<can have user object with different guid and the same name>)
{
return GetHashCode(obj.UserPrincipalName);
}
return GetHashCode(obj.Guid);
}
You should also understand that you've got rather intricate dependencies between your objects.
Indeed, let's take two ISyncableUser objects: 'u1' and 'u2', such that u1.Guid != u2.Guid, but u1.UserPrincipalName == u2.UserPrincipalName and names are not empty. Requirements for Equality imposes that for any 'ISyncableUser' object 'u' such that u.Guid == u1.Guid, the condition u.UserPrincipalName == u1.UserPrincipalName should be also true. This reasoning dictates GetHashCode implementation, for each user object it should be based either on it's name or guid.
One way would be to maintain a dictionary of hashcodes for usernames and GUIDS.
You could generate this dictionary at the start once for all users, which would probably the cleanest solution.
You could add or update an entry in the Constructor of each user.
Or, you could maintain that dictionary inside the GetHashCode function. This means your GetHashCode function has more work to do and is not free of side-effects. Getting this to work with multiple threads or parallel-linq will need some more carefull work. So I don't know whether I would recommend this approach.
Nevertheless, here is my attempt:
private Dictionary<string, int> _guidHash =
new Dictionary<string, int>();
private Dictionary<string, int> _nameHash =
new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
public int GetHashCode(ISyncableUser obj)
{
int hash = 0;
if (obj==null) return hash;
if (!String.IsNullOrWhiteSpace(obj.Guid)
&& _guidHash.TryGetValue(obj.Guid, out hash))
return hash;
if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName)
&& _nameHash.TryGetValue(obj.UserPrincipalName, out hash))
return hash;
hash = RuntimeHelpers.GetHashCode(obj);
// or use some other method to generate an unique hashcode here
if (!String.IsNullOrWhiteSpace(obj.Guid))
_guidHash.Add(obj.Guid, hash);
if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName))
_nameHash.Add(obj.UserPrincipalName, hash);
return hash;
}
Note that this will fail if the ISyncableUser objects do not play nice and exhibit cases like in Rawling's answer. I am assuming that users with the same GUID will have the same name or no name at all, and users with the same principalName have the same GUID or no GUID at all. (I think the given Equals implementation has the same limitations)

How to index the Values property of C# Dictionary

Using the Values property of C# Dictionary,
var myDict = Dictionary < string, object> ;
How would I get the values in
myDict.Values
I tried
var theVales = myDict.Values ;
object obj = theValues[0] ;
But that is a syntax error.
Added:
I am trying to compare the values in two dictionaries that have
the same keys
You can't. The values do not have a fixed order. You could write the values into a new List<object> and index them there, but obviously that's not terribly helpful if the dictionary's contents change frequently.
You can also use linq: myDict.Values.ElementAt(0) but:
The elements will change position as the dictionary grows
It's really inefficient, since it's just calling foreach on the Values collection for the given number of iterations.
You could also use SortedList<TKey, TValue>. That maintains the values in order according to the key, which may or may not be what you want, and it allows you to access the values by key or by index. It has very unfortunate performance characteristics in certain scenarios, however, so be careful about that!
Here's a linq solution to determine if the values for matching keys also match. This only works if you're using the default equality comparer for the key type. If you're using a custom equality comparer, you can do this with method call syntax.
IEnumerable<bool> matches =
from pair1 in dict1
join pair2 in dict2
on pair1.Key equals pair2.Key
select pair1.Value.Equals(pair2.Value)
bool allValuesMatch = matches.All();
If you require that all items in one dictionary have a matching item in the other, you could do this:
bool allKeysMatch = new HashSet(dict1.Values).SetEquals(dict2.ValueS);
bool dictionariesMatch = allKeysMatch && allValuesMatch;
Well, you could use Enumerable.ElementAt if you really had to, but you shouldn't expect the order to be stable or meaningful. Alternatively, call ToArray or ToList to take a copy.
Usually you only use Values if you're going to iterate over them. What exactly are you trying to do here? Do you understand that the order of entries in a Dictionary<,> is undefined?
EDIT: It sounds like you want something like:
var equal = dict1.Count == dict2.Count &&
dict1.Keys.All(key => ValuesEqual(key, dict1, dict2));
...
private static bool ValuesEqual<TKey, TValue>(TKey key,
IDictionary<TKey, TValue> dict1,
IDictionary<TKey, TValue> dict2)
{
TValue value1, value2;
return dict1.TryGetValue(out value1) && dict2.TryGetValue(out value2) &&
EqualityComparer<TValue>.Default.Equals(value1, value2);
}
EDIT: Note that this isn't as fast as it could be, because it performs lookups on both dictionaries. This would be more efficient, but less elegant IMO:
var equal = dict1.Count == dict2.Count &&
dict1.All(pair => ValuesEqual(pair.Key, pair.Value, dict2));
...
private static bool ValuesEqual<TKey, TValue>(TKey key, TValue value1,
IDictionary<TKey, TValue> dict2)
{
TValue value2;
return dict2.TryGetValue(out value2) &&
EqualityComparer<TValue>.Default.Equals(value1, value2);
}
To add to #JonSkeet's answer, Dictionary<TKey, TValue> is backed by a HashTable, which is an un-ordered data structure. The index of the values is therefore meaningless- it is perfectly valid to get, say, A,B,C with one call and C,B,A with the next.
EDIT:
Based on the comment you made on JS's answer ("I am trying to compare the values in two dictionaries with the same keys"), you want something like this:
public boolean DictionariesContainSameKeysAndValues<TKey, TValue>(Dictionary<TKey, TValue> dict1, Dictionary<TKey, TValue> dict2) {
if (dict1.Count != dict2.Count) return false;
for (var key1 in dict1.Keys)
if (!dict2.ContainsKey(key1) || !dict2[key1].Equals(dict1[key1]))
return false;
return true;
}
You could use an Indexer propertie to lookup the string Key.
It is still not an Index but one more way:
using System.Collections.Generic;
...
class Client
{
private Dictionary<string, yourObject> yourDict
= new Dictionary<string, yourObject>();
public void Add (string id, yourObject value)
{ yourDict.Add (id, value); }
public string this [string id] // indexer
{
get { return yourDict[id]; }
set { yourDict[id] = value; }
}
}
public class Test
{
public static void Main( )
{
Client client = new Client();
client.Add("A1",new yourObject() { Name = "Bla",...);
Console.WriteLine ("Your result: " + client["A1"]); // indexer access
}
}

Best way to compare two Dictionary<T> for equality

Is this the best way to create a comparer for the equality of two dictionaries? This needs to be exact. Note that Entity.Columns is a dictionary of KeyValuePair(string, object) :
public class EntityColumnCompare : IEqualityComparer<Entity>
{
public bool Equals(Entity a, Entity b)
{
var aCol = a.Columns.OrderBy(KeyValuePair => KeyValuePair.Key);
var bCol = b.Columns.OrderBy(KeyValuePAir => KeyValuePAir.Key);
if (aCol.SequenceEqual(bCol))
return true;
else
return false;
}
public int GetHashCode(Entity obj)
{
return obj.Columns.GetHashCode();
}
}
Also not too sure about the GetHashCode implementation.
Thanks!
Here's what I would do:
public bool Equals(Entity a, Entity b)
{
if (a.Columns.Count != b.Columns.Count)
return false; // Different number of items
foreach(var kvp in a.Columns)
{
object bValue;
if (!b.Columns.TryGetValue(kvp.Key, out bValue))
return false; // key missing in b
if (!Equals(kvp.Value, bValue))
return false; // value is different
}
return true;
}
That way you don't need to order the entries (which is a O(n log n) operation) : you only need to enumerate the entries in the first dictionary (O(n)) and try to retrieve values by key in the second dictionary (O(1)), so the overall complexity is O(n).
Also, note that your GetHashCode method is incorrect: in most cases it will return different values for different dictionary instances, even if they have the same content. And if the hashcode is different, Equals will never be called... You have several options to implement it correctly, none of them ideal:
build the hashcode from the content of the dictionary: would be the best option, but it's slow, and GetHashCode needs to be fast
always return the same value, that way Equals will always be called: very bad if you want to use this comparer in a hashtable/dictionary/hashset, because all instances will fall in the same bucket, resulting in O(n) access instead of O(1)
return the Count of the dictionary (as suggested by digEmAll): it won't give a great distribution, but still better than always returning the same value, and it satisfies the constraint for GetHashCode (i.e. objects that are considered equal should have the same hashcode; two "equal" dictionaries have the same number of items, so it works)
Something like this comes to mind, but there might be something more efficient:
public static bool Equals<TKey, TValue>(IDictionary<TKey, TValue> x,
IDictionary<TKey, TValue> y)
{
return x.Keys.Intersect(y.Keys).Count == x.Keys.Count &&
x.Keys.All(key => Object.Equals(x[key], y[key]));
}
It seems good to me, perhaps not the fastest but working.
You just need to change the GetHashCode implementation that is wrong.
For example you could return obj.Columns.Count.GetHashCode()

Categories