Example usage of HashSet<T>.CreateSetComparer - c#

I am aware of the HashSet<T>.SetEquals method, but when and how should CreateSetComparer be used?
The documentation states: "checks for equality at only one level; however, you can chain together comparers at additional levels to perform deeper equality testing"
What would be a simple example of that?
In particular, if each item in the sets I am comparing also contains a HashSet , what would be the correct usage of CreateSetComparer?
Here is my starting point. I'd like to know if the CreateSetComparer method is applicable and how to properly use it:
public class Foo : IEquatable<Foo>
{
public string Label { get; set; }
public string Value { get; set; }
public override string ToString() {return String.Format("{0}:{1}", Label, Value); }
// assume for this example that Label and Value are immutable once set;
public override int GetHashCode(){ return ToString().GetHashCode(); }
// simplified equality check; assume it meets my needs for this example;
public bool Equals(Foo other){ return String.Equals(this.ToString(), other.ToString()); }
}
public class FooGroup : IEquatable<FooGroup>
{
public int GroupIndex {get; set;}
public HashSet<Foo> FooCollection {get; set;}
// -----------------------------
// Does HashSet.CreateSetComparer somehow eliminate or simplify the following code?
// -----------------------------
public override int GetHashCode()
{
int hash = GroupIndex;
foreach(Foo f in FooCollection)
hash = hash ^ (f.GetHashCode() & 0x7FFFFFFF);
return hash;
}
public bool Equals(FooGroup other)
{
// ignore missing null checks for this example
return this.GroupIndex == other.GroupIndex && this.FooCollection.SetEquals(other.FooCollection);
}
}
public class GroupCollection : IEquatable<GroupCollection>
{
public string CollectionLabel {get; set;}
public HashSet<FooGroup> AllGroups {get; set;}
// -----------------------------
// Does HashSet.CreateSetComparer somehow eliminate or simplify the following code?
// -----------------------------
public override int GetHashCode()
{
int hash = CollectionLabel.GetHashCode();
foreach(FooGroup g in AllGroups)
hash = hash ^ (g.GetHashCode() & 0x7FFFFFFF);
return hash;
}
public bool Equals(GroupCollection other)
{
// ignore missing null checks for this example
return String.Equals(this.CollectionLabel, other.CollectionLabel) && this.AllGroups.SetEquals(other.AllGroups);
}
}
Ignoring arguments about system design and such, a simplified use-case would be: imagine I have pulled a complex set of data that looks like this:
var newSetA = new GroupCollection{ ... }
var oldSetA = new GroupCollection{ ... }
I simply want to check:
if (newSetA.Equals(oldSetA))
Process(newSetA);

Let's start with the question of "when would the CreateSetComparer" be useful? You already have quite an idea here:
In particular, if each item in the sets I am comparing also contains a HashSet , what would be the correct usage of CreateSetComparer?
Well, for example, the next example demonstrates the default behaviour when HashSet uses its default comparer (comparing only by references):
var set1 = new HashSet<HashSet<int>>{
new HashSet<int>{2,3,4},
new HashSet<int>{7,8,9}
};
var set2 = new HashSet<HashSet<int>>{
new HashSet<int>{2,3,4},
new HashSet<int>{7,8,9},
};
set1.SetEquals(set2).Dump(); // false :-(
set1.SequenceEqual(set2).Dump(); // false
set1.SequenceEqual(set2, HashSet<int>.CreateSetComparer()).Dump(); // true
It's also possible to use CreateSetComparer with SetEquals, like so:
// the order of elements in the set has been change.
var set1 = new HashSet<HashSet<int>>(HashSet<int>.CreateSetComparer()){
new HashSet<int>{2,3,4},
new HashSet<int>{7,8,9}
};
var set2 = new HashSet<HashSet<int>>{
new HashSet<int>{7,8,9},
new HashSet<int>{2,3,4},
};
set1.SetEquals(set2).Dump(); // true :-)
set1.SequenceEqual(set2).Dump(); // false
set1.SequenceEqual(set2, HashSet<int>.CreateSetComparer()).Dump(); // false
That is the usual usage, however the CreateSetComparer provides GetHashCode which you could exploit, although this is not necessarily shorter / cleaner, what you already do.
// -----------------------------
// Does HashSet.CreateSetComparer somehow eliminate or simplify the following code?
// -----------------------------
private IEqualityComparer<HashSet<FooGroup>> _ecomparer =
HashSet<FooGroup>.CreateSetComparer();
public override int GetHashCode()
{
int hash = CollectionLabel.GetHashCode();
hash ^= _ecomparer.GetHashCode(AllGroups);
return hash;
}

I've used it when providing to a Dictionary with "multiple" keys in which the order does not matter:
var dict = new Dictionary<HashSet<int>, string>(HashSet<int>.CreateSetComparer());
dict[new HashSet<int> { 1, 2 }] = "foo";
dict[new HashSet<int> { 2, 1 }].Dump();
You can provide a nicer API by wrapping it with a params indexer:
public class MultiKeyDictionary<TKey, TValue> : IDictionary<HashSet<TKey>, TValue>
{
private readonly IDictionary<HashSet<TKey>, TValue> _dict;
public MultiKeyDictionary()
{
_dict = new Dictionary<HashSet<TKey>, TValue>(HashSet<TKey>.CreateSetComparer());
}
public TValue this[params TKey[] keys]
{
get { return _dict[new HashSet<TKey>(keys)]; }
set { _dict[new HashSet<TKey>(keys)] = value; }
}
...
}
var dict = new MultiKeyDictionary<int, string>();
dict[1, 2] = "foo";
dict[2, 1].Dump();

Related

HashSet does not work correctly, reason or alternatives?

I have a HashSet with errors, <Error>ErrorList.
"Error" has the properties "file" and "block".
So I fill my HashSet with a number of errors, some of which are exactly the same and therefore repeat themselves. The multiple occurrences are completely tolerated by the HashSet. As a last attempt I created a separate list and distincted it: List<Error> noDupes = ErrorList.Distinct().ToList();
But also here my list remains unchanged. Why does neither the hashset nor my noDupes list work? Are there alternative solutions?
Here's the important part of my code:
#region Properties
HashSet<Error> ErrorList { get; set; } = new HashSet<Error>();
private Stopwatch StopWatch { get; set; } = new Stopwatch();
private string CSVFile { get; set; } = null;
int n;
#endregion
ErrorList.Add(new Error
{
File = x,
Block = block
}); ;
n = FileCall.IndexOf(i);
int p = n * 100 / FileCall.Count;
SetConsoleProgress(n.ToString("N0"), p);
}
}
int nx = 0;
List<Error> noDupes = ErrorList.Distinct().ToList();
The Error-Class:
namespace ApplicationNamespace
{
public class Error
{
public string File { set; get; }
public int Block { set; get; }
}
}
Override the default Equals() and GetHashCode() implementations (like the others have mentioned in the comments) for the HashSet<> or Distinct() to work. You can also implement IEquatable<>, which will require you to override the Equals() and GetHashCode() methods.
public class Error : IEquatable<Error>
{
public string File { set; get; }
public int Block { set; get; }
public bool Equals(Error other)
{
// Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
// Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
// Check whether the error's properties are equal.
return File == other.File && Block == other.Block;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
return $"{Block}-{File}".GetHashCode(); // adjust this as you see fit
}
}
Reference: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.distinct?view=netcore-3.1
Remember to handle null values on the File string. (Could replace it with String.Empty for instance.) It's also common to "cache" the hashcode in a private variable, so that once calculated the cached value can be returned on consequent calls to GetHashCode(). For this you will most likely also need to make the class immutable.
(You won't have to do any of this with C# 9's record types.)

Assert Equal two list of objects UnitTesting c#

I'm currently doing some unit testing of a copy function and I need to compare the elements of the objects between the old list, and the newly copied list.
It works fine, but I was wondering if I can do it in a way that doesn't involve a for loop.
Here is my object:
new NaturePointObject
{
SId = 1,
Name = "Test",
Category = NaturePointCategory.Category1,
CreatorType = CreatorTypeEnum.1,
NaturR = NaturR.Bn,
Description = "Test",
Kumulation = Kumulation.EnEjendom,
Id = 1
}
My old list contains "NaturePointObject" and is called naturPointList, and it will be copied to a list called newNaturePointList.
Here is how I Assert to know if it copied succesfully:
Assert.AreEqual(naturPointList.Count,newNaturePointList.Count);
for (var i = 0; i < newNatureList.Count; i++)
{
Assert.AreEqual(naturPointList[i].Category, newNaturePointList[i].Category);
Assert.AreEqual(naturPointList[i].Description, newNaturePointList[i].Description);
Assert.AreEqual(naturPointList[i].Kumulation, newNaturePointList[i].Kumulation);
Assert.AreEqual(naturPointList[i].Name, newNaturePointList[i].Name);
Assert.AreEqual(naturPointList[i].CreatorType, newNaturePointList[i].CreatorType);
Assert.AreEqual(naturPointList[i].NaturR, newNaturePointList[i].NaturR);
Assert.AreNotEqual(naturPointList[i].SId, newNaturePointList[i].SId);
}
As you can see not all elements of the object must be equal. And I don't care about the "Id" of the object.
Is there a shorter way to do this, than run a for loop?
Probably you want to use CollectionAssert:
CollectionAssert.AreEqual(naturPointList, newNaturePointList, NaturePointObject.CategoryCreatorTypeComparer);
The only thing you need to take in mind is that you need to implement IComparer, to use in the Assert method:
public class NaturePointObject
{
private static readonly Comparer<NaturePointObject> CategoryCreatorTypeComparerInstance = new CategoryCreatorTypeRelationalComparer();
private sealed class CategoryCreatorTypeRelationalComparer : Comparer<NaturePointObject>
{
public override int Compare(NaturePointObject x, NaturePointObject y)
{
// compare fields which makes sense
if (ReferenceEquals(x, y)) return 0;
if (ReferenceEquals(null, y)) return 1;
if (ReferenceEquals(null, x)) return -1;
var categoryComparison = string.Compare(x.Category, y.Category, StringComparison.Ordinal);
if (categoryComparison != 0) return categoryComparison;
return string.Compare(x.CreatorType, y.CreatorType, StringComparison.Ordinal);
}
}
public static Comparer<NaturePointObject> CategoryCreatorTypeComparer
{
get
{
return CategoryCreatorTypeComparerInstance;
}
}
public int SId { get; set; }
public string Category { get; set; }
//other properties
public string CreatorType { get; set; }
}
You can try
Assert.IsTrue(naturPointList.SequenceEqual(newNaturePointList));
If you want to ignore the Id, you can create other classes (without Ids).
Later edit: you could overwrite the Equals method and ignore the Id.

c#: collections with unique elements

Is there a collection in C# that guarantees me that I will have only unique elements? I've read about HashSet, but this collection can contain duplicates. Here is my code:
public class Bean
{
public string Name { get; set; }
public int Id { get; set; }
public override bool Equals(object obj)
{
var bean = obj as Bean;
if (bean == null)
{
return false;
}
return this.Name.Equals(bean.Name) && this.Id == bean.Id;
}
public override int GetHashCode()
{
return Name.GetHashCode() * this.Id.GetHashCode();
}
}
You may complain about using non-readonly properties in my GetHashCode method, but this is a way of doing (not the right one).
HashSet<Bean> set = new HashSet<Bean>();
Bean b1 = new Bean {Name = "n", Id = 1};
Bean b2 = new Bean {Name = "n", Id = 2};
set.Add(b1);
set.Add(b2);
b2.Id = 1;
var elements = set.ToList();
var elem1 = elements[0];
var elem2 = elements[1];
if (elem1.Equals(elem2))
{
Console.WriteLine("elements are equal");
}
And in this case, my set contains duplicates.
So is there a collection in C# that guarantees me that it does not contains duplicates?
So is there a collection in C# that guarantees me that it does not
contains duplicates?
There is no existing collection class in C# that does this. You could write your own, but there is no existing one.
Some extra information regarding the issue you are experiencing
If you change a HashSet entry after adding it to the HashSet, then you need to regenerate the HashSet. My below RegenerateHashSet can be used to do that.
The reason you need to regenerate is that duplicate detection only occurs at insertion time (or, in other words, it relies on you not changing an object after you insert it). Which makes sense, if you think about it. The HashSet has no way to detect that an object it contains has changed.
using System;
using System.Collections.Generic;
using System.Linq;
namespace Test
{
public static class HashSetExtensions
{
public static HashSet<T> RegenerateHashSet<T>(this HashSet<T> original)
{
return new HashSet<T>(original, original.Comparer);
}
}
public class Bean
{
public string Name { get; set; }
public int Id { get; set; }
public override bool Equals(object obj)
{
var bean = obj as Bean;
if (bean == null)
{
return false;
}
return Name.Equals(bean.Name) && Id == bean.Id;
}
public override int GetHashCode()
{
return Name.GetHashCode() * Id.GetHashCode();
}
}
public class Program
{
static void Main(string[] args)
{
HashSet<Bean> set = new HashSet<Bean>();
Bean b1 = new Bean { Name = "n", Id = 1 };
Bean b2 = new Bean { Name = "n", Id = 2 };
set.Add(b1);
set.Add(b2);
b2.Id = 1;
var elements = set.ToList();
var elem1 = elements[0];
var elem2 = elements[1];
if (elem1.Equals(elem2))
{
Console.WriteLine("elements are equal");
}
Console.WriteLine(set.Count);
set = set.RegenerateHashSet();
Console.WriteLine(set.Count);
Console.ReadLine();
}
}
}
Note that the above technique is not bullet-proof - if you add two objects (Object A and Object B) which are duplicates and then change Object B to be different to Object A then the HashSet will still only have one entry in it (since Object B was never added). As such, what you probably want to do is actually store your complete list in a List instead, and then use new HashSet<T>(yourList) whenever you want unique entries. The below class may assist you if you decide to go down that route.
public class RecalculatingHashSet<T>
{
private List<T> originalValues = new List<T>();
public HashSet<T> GetUnique()
{
return new HashSet<T>(originalValues);
}
public void Add(T item)
{
originalValues.Add(item);
}
}
If you don't write your own collection type and handle property changed events to re-evaluate the items, you need to re-evaluate the items at each access. This can be accomplished with LINQ deferred execution:
ICollection<Bean> items= new List<Bean>();
IEnumerable<Bean> reader = items.Distinct();
Rule: only use items to insert or remove elements, use reader for any read access.
Bean b1 = new Bean { Name = "n", Id = 1 };
Bean b2 = new Bean { Name = "n", Id = 2 };
items.Add(b1);
items.Add(b2);
b2.Id = 1;
var elements = reader.ToList();
var elem1 = elements[0];
var elem2 = elements[1]; // throws exception because there is only one element in the result list.

How to make Dictionary find object key by value

In my application i need to use custom object as a key for dictionary.
The problem is compare by reference,
like we all know when using value types the compare work comparing by values but in objects it's compare by reference so even if the objects are equal they sored in different places in memory heap so it returns false
to do it right i need to override Equals and GetHashCode methods (i think correct me if i'm wrong)
i override the Equals Method and it's working:
bool IsEqual = dictionaryList.Keys.First().Equals(compareKey); returns true.
what i didn't know is how to override the GetHashCode method (and if i need) to my case.
Exception that i get :
The given key was not present in the dictionary. -
The given key was not present in the dictionary.
How can i solve that issue or maybe i doing it completely in wrong way...
Thank's
using System;
using System.IO;
using System.Threading;
using System.Linq;
using System.Collections.Generic;
public sealed class Program
{
public class Options
{
public string x { get; set; }
public string y { get; set; }
}
public class Data
{
public string myData { get; set; }
}
public class KeyClass
{
public int KeyNumber { get; set; }
public List<Options> Options { get; set; }
public override bool Equals(object obj)
{
KeyClass keyClass = obj as KeyClass;
bool IsKeyNumberEqual = (KeyNumber == keyClass.KeyNumber);
bool IsOptionsEqual = true;
if (!(Options.Count == keyClass.Options.Count) || !IsKeyNumberEqual)
{
IsOptionsEqual = false;
}
else
{
for (int i = 0; i < Options.Count; i++)
{
if (!(Options[i].x == keyClass.Options[i].x) ||
!(Options[i].y == keyClass.Options[i].y))
{
IsOptionsEqual = false;
break;
}
}
}
return (IsKeyNumberEqual && IsOptionsEqual);
}
}
public static void Main()
{
try
{
List<Options> optionsList = new List<Options>();
optionsList.Add(new Options() { x = "x1", y = "y1" });
optionsList.Add(new Options() { x = "x2", y = "y2" });
Data someData = new Data() { myData = "someData" };
Data getData = new Data();
KeyClass dictionaryKey = new KeyClass() { KeyNumber = 1, Options = optionsList };
KeyClass compareKey = new KeyClass() { KeyNumber = 1, Options = optionsList };
Dictionary<KeyClass, Data> dictionaryList = new Dictionary<KeyClass, Data>();
dictionaryList.Add(dictionaryKey, someData);
bool IsEqual = dictionaryList.Keys.First().Equals(compareKey);
getData = dictionaryList[compareKey];
}
catch (Exception ex)
{
string exMessage = ex.Message;
}
}
}
to do it right i need to override Equals and GetHashCode methods (i think correct me if i'm wrong)
You're correct. .NET requires that two objects that compare as equal have the same hash code. This is not limited to dictionaries.
The trivial implementation is to make every object return the same hash code. But although two different objects are allowed to have the same hash code, you should keep this to a minimum. When you have a lot of hash collisions, performance of dictionaries and other containers will be worse.
A slightly better implementation would be to return KeyNumber (or KeyNumber.GetHashCode()). This can be a good enough implementation if you almost never have identical key numbers, if identical key numbers is a very strong indication that the options will be identical as well.
The best implementation would be to combine the hash codes of KeyNumber and all your Options values, as in Matthew Watson's answer.
You need to write a GetHashCode() that includes everything that contributes to the Equals() method.
For example:
public override int GetHashCode()
{
unchecked
{
int hash = KeyNumber * 397;
foreach (var opt in Options)
{
hash = hash*23 + opt.x.GetHashCode();
hash = hash*23 + opt.y.GetHashCode();
}
return hash;
}
}
If you implement GetHashCode() for your Options class, for example:
public class Options
{
public readonly string x;
public readonly string y;
public override int GetHashCode()
{
return x.GetHashCode() ^ y.GetHashCode();
}
}
Then you can write GetHashCode() more simply:
public override int GetHashCode()
{
unchecked
{
int hash = KeyNumber * 397;
foreach (var opt in Options)
hash = hash*23 + opt.GetHashCode();
return hash;
}
}
One important thing I forgot to mention earlier:
It is MOST IMPORTANT that none of your fields that contribute to equality or hash code are changed after the object has been put into the dictionary.
If you change any of them after adding the object to the dictionary, it's likely that you will no longer be able to retrieve the object from the dictionary.
The best way to ensure this is to use only immutable fields for equality and hash code.

Reflection - object comparison & default values

I'm trying to compare two complex objects in C#, and produce a Dictionary containing the differences between the two.
If I have a class like so:
public class Product
{
public int Id {get; set;}
public bool IsWhatever {get; set;}
public string Something {get; set;}
public int SomeOtherId {get; set;}
}
And one instance, thus:
var p = new Product
{
Id = 1,
IsWhatever = false,
Something = "Pony",
SomeOtherId = 5
};
and another:
var newP = new Product
{
Id = 1,
IsWhatever = true
};
To get the differences between these, i'm doing stuff that includes this:
var oldProps = p.GetType().GetProperties();
var newProps = newP.GetType().GetProperties();
// snip
foreach(var newInfo in newProps)
{
var oldVal = oldInfo.GetValue(oldVersion, null);
var newVal = newInfo.GetValue(newVersion,null);
}
// snip - some ifs & thens & other stuff
and it's this line that's of interest
var newVal = newInfo.GetValue(newVersion,null);
Using the example objects above, this line would give me a default value of 0 for SomeOtherId (same story for bools & DateTimes & whathaveyou).
What i'm looking for is a way to have newProps include only the properties that are explicitly specified in the object, so in the above example, Id and IsWhatever. I've played about with BindingFlags to no avail.
Is this possible? Is there a cleaner/better way to do it, or a tool that's out there to save me the trouble?
Thanks.
There is no flag to tell if you a property was explicitly set. What you could do is declare your properties as nullable types and compare value to null.
If i understand you correctly, this is what microsoft did with the xml wrapping classes, generated with the xsd utility, where you had a XIsSpecified, or something like that, for each property X.
So this is what You can do as well - instead of public int ID{get;set;}, add a private member _id , or whatever you choose to call it, and a boolean property IDSpecified which will be set to true whenever Id's setter is called
I ended up fixing the issue without using reflection (or, not using it in this way at least).
It goes, more or less, like this:
public class Comparable
{
private IDictionary<string, object> _cache;
public Comparable()
{
_cache = new Dictionary<string, object>();
}
public IDictionary<string, object> Cache { get { return _cache; } }
protected void Add(string name, object val)
{
_cache.Add(name, val);
}
}
And the product implementation goes to this:
public class Product : Comparable
{
private int _id;
private bool _isWhatever;
private string _something;
private int _someOtherId;
public int Id {get { return _id; } set{ _id = value; Add("Id", value); } }
public bool IsWhatever { get { return _isWhatever; } set{ _isWhatever = value; Add("IsWhatever ", value); } }
public string Something {get { return _something; } set{ _something = value; Add("Something ", value); } }
public int SomeOtherId {get { return _someOtherId; } set{ _someOtherId = value; Add("SomeOtherId", value); } }
}
And the comparison is then pretty straightforward
var dic = new Dictionary<string, object>();
foreach(var obj in version1.Cache)
{
foreach(var newObj in version2.Cache)
{
//snip -- do stuff to check equality
dic.Add(....);
}
}
Doesn't hugely dirty the model, and works nicely.

Categories