Best way to compare two Dictionary<T> for equality - c#

Is this the best way to create a comparer for the equality of two dictionaries? This needs to be exact. Note that Entity.Columns is a dictionary of KeyValuePair(string, object) :
public class EntityColumnCompare : IEqualityComparer<Entity>
{
public bool Equals(Entity a, Entity b)
{
var aCol = a.Columns.OrderBy(KeyValuePair => KeyValuePair.Key);
var bCol = b.Columns.OrderBy(KeyValuePAir => KeyValuePAir.Key);
if (aCol.SequenceEqual(bCol))
return true;
else
return false;
}
public int GetHashCode(Entity obj)
{
return obj.Columns.GetHashCode();
}
}
Also not too sure about the GetHashCode implementation.
Thanks!

Here's what I would do:
public bool Equals(Entity a, Entity b)
{
if (a.Columns.Count != b.Columns.Count)
return false; // Different number of items
foreach(var kvp in a.Columns)
{
object bValue;
if (!b.Columns.TryGetValue(kvp.Key, out bValue))
return false; // key missing in b
if (!Equals(kvp.Value, bValue))
return false; // value is different
}
return true;
}
That way you don't need to order the entries (which is a O(n log n) operation) : you only need to enumerate the entries in the first dictionary (O(n)) and try to retrieve values by key in the second dictionary (O(1)), so the overall complexity is O(n).
Also, note that your GetHashCode method is incorrect: in most cases it will return different values for different dictionary instances, even if they have the same content. And if the hashcode is different, Equals will never be called... You have several options to implement it correctly, none of them ideal:
build the hashcode from the content of the dictionary: would be the best option, but it's slow, and GetHashCode needs to be fast
always return the same value, that way Equals will always be called: very bad if you want to use this comparer in a hashtable/dictionary/hashset, because all instances will fall in the same bucket, resulting in O(n) access instead of O(1)
return the Count of the dictionary (as suggested by digEmAll): it won't give a great distribution, but still better than always returning the same value, and it satisfies the constraint for GetHashCode (i.e. objects that are considered equal should have the same hashcode; two "equal" dictionaries have the same number of items, so it works)

Something like this comes to mind, but there might be something more efficient:
public static bool Equals<TKey, TValue>(IDictionary<TKey, TValue> x,
IDictionary<TKey, TValue> y)
{
return x.Keys.Intersect(y.Keys).Count == x.Keys.Count &&
x.Keys.All(key => Object.Equals(x[key], y[key]));
}

It seems good to me, perhaps not the fastest but working.
You just need to change the GetHashCode implementation that is wrong.
For example you could return obj.Columns.Count.GetHashCode()

Related

Creating a custom equality comparer for IEnumerables<T> when T is IEnumerable

I want to have a custom equality comparer IEnumerables. using #PaulZahra's code, I created the following class:
class CustomEqualityComparer<T> : IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
{
var enumerables = new Dictionary<T, uint>();
foreach (T item in x)
{
enumerables.Add(item, 1);
}
foreach (T item in y)
{
if (enumerables.ContainsKey(item))
{
enumerables[item]--;
}
else
{
return false;
}
}
return enumerables.Values.All(v => v == 0);
}
public int GetHashCode(IEnumerable<T> obj) => obj.GetHashCode();
}
The problem is that if T itself is an IEnumerable, then ContainsKey will check for reference equality, while the point of this equality comparer is to check for value equality at any given depth.
I thought to use .Keys.Contains() instead, since it can accept an IEqualityComparer as an argument:
if (enumerables.Keys.Contains(item, this)) // not sure if "this" or a new object
but I get the following error (CS1929):
'Dictionary.KeyCollection' does not contain a definition for 'Contains' and the best extension method overload 'Queryable.Contains(IQueryable, T, IEqualityComparer)' requires a receiver of type 'IQueryable'
I am not sure how to approach this. How to fix it? Thanks.
Edit: Note that this comparer doesn't care about order.
As others have mentioned, IEnumerable<T> can enumerate forever, so it's dangerous to do this on that interface. I'd recommend using ICollection<T> instead- it has a fixed size. And you'll find it will work for most any type you'd want to use anyway.
Furthermore, I'd recommend using TryGetValue to reduce the number of times you need to look up into the dictionary.
Your code is not correctly keeping the count of each item in the first enumerable.
GetHashCode needs to take into account every member of the enumerable.
All that being said, here is an adjustment of your implementation:
class CustomEqualityComparer<T> : IEqualityComparer<ICollection<T>>
{
public bool Equals(ICollection<T> x, ICollection<T> y)
{
if (x.Count != y.Count) return false;
var enumerables = new Dictionary<T, uint>(x.Count);
foreach (T item in x)
{
enumerables.TryGetValue(item, out var value);
enumerables[item] = value + 1;
}
foreach (T item in y)
{
var success = enumerables.TryGetValue(item, out var value);
if (success)
{
enumerables[item] = value - 1;
}
else
{
return false;
}
}
return enumerables.Values.All(v => v == 0);
}
public int GetHashCode(ICollection<T> obj)
{
unchecked
{
var hashCode = 0;
foreach(var item in obj)
{
hashCode += (item != null ? item.GetHashCode() : 0);
}
return hashCode;
}
}
}
To have such recursive comparer you simply need pass proper comparer to Dictionary if T is enumerable. I think getting type T from IEnumerable<T> and then equivalent of new Dictionary<U, uint>(new CustomEqualityComparer<U>) (using Create instance of generic type?) should achieve what you want.
Notes:
you must provide correct implementation of GetHashCode that matches Equals if you use comparer for any dictionary/HashSet. Default Equals for sequences is reference compare that does not align with your Equals. Note that most implementation of GetHashCode depend on order of the items in the collection - so you need to find one that works for sets. I.e. sum of hash codes of each item would do, probably making distribution slightly worse.
you may want to LINQ set operations instead of doing them by hand. All operations like Distinct already take comparers. In case of "sets are the same" you can use Distinct - x.Distinct(y, comparerBuiltViaReflection)
Beware of limitations of such code: not every enumerable can be enumerated more than once (user input, network streams,..) or may produce different result on re-iteration (while(count < 10){ count ++; yield return random.Next(); }), cost of re-iteartion many be significant (re-read all lines in huge file on each iteration) or enumerable can represent infinite sequence (while(true){ yield return count++; }).

ICollection - check if a collection contains an object

Knowing that the non-generic ICollection doesn't offer a Contains method, what's the best way to check if a given object already is in a collection?
If I had two ICollections: A and B and wanted to check if B has all elements of A, what would be the best way to accomplish that? My first thought is adding all elements of A to a HashSet and then checking if all B's elements are in the set using Contains.
If I had two ICollections A and B and wanted to check if B has all elements of A, what would be the best way to accomplish that?
Let me rephrase your question in the languages of sets.
If I had two sets A and B and wanted to check if A is a subset of B, what would be the best way to accomplish that?
Now it becomes easy to see the answer:
https://msdn.microsoft.com/en-us/library/bb358446%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396
Construct a HashSet<T> from A and then use the IsSubsetOf method to see if A is a subset of B.
I note that if these are the sorts of operations you must perform frequently, then you should keep your data in HashSet<T> collections to begin with. The IsSubsetOf operation is possibly more efficient if both collections are hash sets.
A and B and wanted to check if B has all elements of A
I think you have it backwards. Add the B to the HashSet.
HashSet.Contains is O(1)
Overall it will be O(n + m)
Going to assume string
HashSet<string> HashSetB = new HashSet<string>(iCollecionB);
foreach (string s in iCollecionA)
{
if(HashSetB.Contains(s))
{
}
else
{
}
}
Boolean ICollectionContains(ICollection collection, Object item)
{
for (Object o in collection)
{
if (o == item)
return true;
}
return false;
}
Or in extension form:
public static class CollectionExtensions
{
public static Boolean Contains(this ICollection collection, Object item)
{
for (Object o in collection)
{
if (o == item)
return true;
}
return false;
}
}
With usage:
ICollection turboEncabulators = GetSomeTrunnions();
if (turboEncabulators.Contains(me))
Environment.FailFast(); //How did you find me!

Getting List.Join to compare properly

I am trying to create a list by joining two lists if a property matches correctly. I am using the following command:
FooList = TrackedStrings.Join (FooList,
str => str,
Foo => Foo.GetString (),
(str, Foo) => Foo,
new Comparer ())
.ToList ();
And the following class to compare:
public class Comparer : IEqualityComparer<string>
{
public bool Equals (string x, string y)
{
return y.Contains (x);
}
public int GetHashCode (string str)
{
return str.GetHashCode ();
}
}
Now, the idea is that I only want to keep the items that have a GetString () containing any one of the strings from TrackedStrings. However, it doesn't work: the comparer only returns true if the strings are equal. For example, let's say that we have two lists:
List<string> TrackedActions = new List<string> { "Created", "Deleted" };
List<Foo> FooList = new List<FooList> { new Foo ("Created"), new Foo ("Deleted Something")};
With the current command, the second Foo is dropped from the list - instead of matching to TrackedActions[1] and being kept.
Thus, my question is: Why is Comparer not working the way I expect it to?
You should not use IEqualityComparer because The Equals method is reflexive, symmetric, and transitive. MSDN
In your case its not symmetric Equals(a,b) != Equals (b,a)
Glorfindel's answer is not totally correct too, because it's not transitive:
Equals("abcd","bc") == true
Equals("bcde", "bc") == true
Equals("abcd","bcde") == false
A custom comparer must make sure that the Equals relationship it defines is symmetric. This means that whenever x.Equals(y), y.Equals(x) and vice versa.
The reason for this is that you can never predict in which order the elements are compared, i.e. which one of these is called:
aStringFromLeftList.Equals(aStringFromRightList)
or
aStringFromRightList.Equals(aStringFromLeftList)
Because the relationship you need is neither symmetric nor transitive, you can't use a Comparer for your problem.
Your comparer not working is due to the implementation of the GetHashCode()
regardless the right way to implement the IEqualityComparer.
The match is done by
Compare the hashcode of 2 strings. In your case Deleted Something definitely return different hashcode with Deleted
If (1) is equal, then use Equals() to compare again because HashCode may have collision and not accurate, but fast.

Implementing GetHashCode for IEqualityComparer<T> with conditional equality

I'm wondering if anyone as any suggestions for this problem.
I'm using intersect and except (Linq) with a custom IEqualityComparer in order to query the set differences and set intersections of two sequences of ISyncableUsers.
public interface ISyncableUser
{
string Guid { get; }
string UserPrincipalName { get; }
}
The logic behind whether two ISyncableUsers are equal is conditional. The conditions center around whether either of the two properties, Guid and UserPrincipalName, have values. The best way to explain this logic is with code. Below is my implementation of the Equals method of my customer IEqualityComparer.
public bool Equals(ISyncableUser userA, ISyncableUser userB)
{
if (userA == null && userB == null)
{
return true;
}
if (userA == null)
{
return false;
}
if (userB == null)
{
return false;
}
if ((!string.IsNullOrWhiteSpace(userA.Guid) && !string.IsNullOrWhiteSpace(userB.Guid)) &&
userA.Guid == userB.Guid)
{
return true;
}
if (UsersHaveUpn(userA, userB))
{
if (userB.UserPrincipalName.Equals(userA.UserPrincipalName, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
}
return false;
}
private bool UsersHaveUpn(ISyncableUser userA, ISyncableUser userB)
{
return !string.IsNullOrWhiteSpace(userA.UserPrincipalName)
&& !string.IsNullOrWhiteSpace(userB.UserPrincipalName);
}
The problem I'm having, is with implementing GetHashCode so that the above conditional equality, represented above, is respected. The only way I've been able to get the intersect and except calls to work as expected is to simple always return the same value from GetHashCode(), forcing a call to Equals.
public int GetHashCode(ISyncableUser obj)
{
return 0;
}
This works but the performance penalty is huge, as expected. (I've tested this with non-conditional equality. With two sets containing 50000 objects, a proper hashcode implementation allows execution of intercept and except in about 40ms. A hashcode implementation that always returns 0 takes approximately 144000ms (yes, 2.4 minutes!))
So, how would I go about implementing a GetHashCode() in the scenario above?
Any thoughts would be more than welcome!
If I'm reading this correctly, your equality relation is not transitive. Picture the following three ISyncableUsers:
A { Guid: "1", UserPrincipalName: "2" }
B { Guid: "2", UserPrincipalName: "2" }
C { Guid: "2", UserPrincipalName: "1" }
A == B because they have the same UserPrincipalName
B == C because they have the same Guid
A != C because they don't share either.
From the spec,
The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x; and true for two objects x and z if it is true for x and y and also true for y and z.
If your equality relation isn't consistent, there's no way you can implement a hash code that backs it up.
From another point of view: you're essentially looking for three functions:
G mapping GUIDs to ints (if you know the GUID but the UPN is blank)
U mapping UPNs to ints (if you know the UPN but the GUID is blank)
P mapping (guid, upn) pairs to ints (if you know both)
such that G(g) == U(u) == P(g, u) for all g and u. This is only possible if you ignore g and u completely.
If we suppose that your Equals implementation is correct, i.e. it's reflective, transitive and symmetric then the basic implementation for your GetHashCode function should look like this:
public int GetHashCode(ISyncableUser obj)
{
if (obj == null)
{
return SOME_CONSTANT;
}
if (!string.IsNullOrWhiteSpace(obj.UserPrincipalName) &&
<can have user object with different guid and the same name>)
{
return GetHashCode(obj.UserPrincipalName);
}
return GetHashCode(obj.Guid);
}
You should also understand that you've got rather intricate dependencies between your objects.
Indeed, let's take two ISyncableUser objects: 'u1' and 'u2', such that u1.Guid != u2.Guid, but u1.UserPrincipalName == u2.UserPrincipalName and names are not empty. Requirements for Equality imposes that for any 'ISyncableUser' object 'u' such that u.Guid == u1.Guid, the condition u.UserPrincipalName == u1.UserPrincipalName should be also true. This reasoning dictates GetHashCode implementation, for each user object it should be based either on it's name or guid.
One way would be to maintain a dictionary of hashcodes for usernames and GUIDS.
You could generate this dictionary at the start once for all users, which would probably the cleanest solution.
You could add or update an entry in the Constructor of each user.
Or, you could maintain that dictionary inside the GetHashCode function. This means your GetHashCode function has more work to do and is not free of side-effects. Getting this to work with multiple threads or parallel-linq will need some more carefull work. So I don't know whether I would recommend this approach.
Nevertheless, here is my attempt:
private Dictionary<string, int> _guidHash =
new Dictionary<string, int>();
private Dictionary<string, int> _nameHash =
new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
public int GetHashCode(ISyncableUser obj)
{
int hash = 0;
if (obj==null) return hash;
if (!String.IsNullOrWhiteSpace(obj.Guid)
&& _guidHash.TryGetValue(obj.Guid, out hash))
return hash;
if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName)
&& _nameHash.TryGetValue(obj.UserPrincipalName, out hash))
return hash;
hash = RuntimeHelpers.GetHashCode(obj);
// or use some other method to generate an unique hashcode here
if (!String.IsNullOrWhiteSpace(obj.Guid))
_guidHash.Add(obj.Guid, hash);
if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName))
_nameHash.Add(obj.UserPrincipalName, hash);
return hash;
}
Note that this will fail if the ISyncableUser objects do not play nice and exhibit cases like in Rawling's answer. I am assuming that users with the same GUID will have the same name or no name at all, and users with the same principalName have the same GUID or no GUID at all. (I think the given Equals implementation has the same limitations)

How do I use HashSet<T> as a dictionary key?

I wish to use HashSet<T> as the key to a Dictionary:
Dictionary<HashSet<T>, TValue> myDictionary = new Dictionary<HashSet<T>, TValue>();
I want to look up values from the dictionary such that two different instances of HashSet<T> that contain the same items will return the same value.
HashSet<T>'s implementations of Equals() and GetHashCode() don't seem to do this (I think they're just the defaults). I can override Equals() to use SetEquals() but what about GetHashCode()? I feel like I am missing something here...
You could use the set comparer provided by HashSet<T>:
var myDictionary = new Dictionary<HashSet<T>, TValue>(HashSet<T>.CreateSetComparer());
digEmAll's answer is clearly the better choice in practice, since it uses built in code instead of reinventing the wheel. But I'll leave this as a sample implementation.
You can use implement an IEqualityComparer<HashSet<T>> that uses SetEquals. Then pass it to the constructor of the Dictionary. Something like the following(Didn't test it):
class HashSetEqualityComparer<T>: IEqualityComparer<HashSet<T>>
{
public int GetHashCode(HashSet<T> hashSet)
{
if(hashSet == null)
return 0;
int h = 0x14345843; //some arbitrary number
foreach(T elem in hashSet)
{
h = unchecked(h + hashSet.Comparer.GetHashCode(elem));
}
return h;
}
public bool Equals(HashSet<T> set1, HashSet<T> set2)
{
if(set1 == set2)
return true;
if(set1 == null || set2 == null)
return false;
return set1.SetEquals(set2);
}
}
Note that the hash function here is commutative, that's important because the enumeration order of the elements in the set is undefined.
One other interesting point is that you can't just use elem.GetHashCode since that will give wrong results when a custom equality comparer was supplied to the set.
You can provide a IEqualityComparer<HashSet<T>> to the Dictionary constructor and make the desired implementation in that comparer.

Categories