I have a class and a comparer for this class that implements IEqualityComparer:
class Foo
{
public int Int { get; set; }
public string Str { get; set; }
public Foo(int i, string s)
{
Int = i;
Str = s;
}
private sealed class FooEqualityComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo x, Foo y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.GetType() != y.GetType()) return false;
return x.Int == y.Int && string.Equals(x.Str, y.Str);
}
public int GetHashCode(Foo obj)
{
unchecked
{
return (obj.Int * 397) ^ (obj.Str != null ? obj.Str.GetHashCode() : 0);
}
}
}
public static IEqualityComparer<Foo> Comparer { get; } = new FooEqualityComparer();
}
The two methods Equals and GetHashCode are used for example in List.Except via an instance of the comparer.
My question is: how to implement properly unit tests on this comparer? I want to detect if someone adds a public property in Foo without modifying the comparer, because in this case the comparer becomes invalid.
If I do something like:
Assert.That(new Foo(42, "answer"), Is.EqualTo(new Foo(42, "answer")));
This cannot detect that a new property was added, and that this property differs in the two objects.
Is there any way to do this?
If it is possible, can we add an attribute to a property to say that this property is not relevant in the comparison?
You can use reflection to get the properties of the type, e.g.:
var knownPropNames = new string[]
{
"Int",
"Str",
};
var props = typeof(Foo).GetProperties(BindingFlags.Public | BindingFlags.Instance);
var unknownProps = props
.Where(x => !knownPropNames.Contains(x.Name))
.Select(x => x.Name)
.ToArray();
// Use assertion instead of Console.WriteLine
Console.WriteLine("Unknown props: {0}", string.Join("; ", unknownProps));
This way, you can implement a test that fails if any properties are added. Of course, you'd have to add new properties to the array at the beginning. As using reflection is an expensive operation from a performance point of view, I'd propose to use it in the test, not in the comparer itself if you need to compare lots of objects.
Please also note the use of the BindingFlags parameter so you can restrict the properties to only the public ones and the ones on instance-level.
Also, you can define a custom attribute that you use to mark properties that are not relevant. For example:
[AttributeUsage(AttributeTargets.Property)]
public class ComparerIgnoreAttribute : Attribute {}
You can apply it to a property:
[ComparerIgnore]
public decimal Dec { get; set; }
In addition, you'd have to extend the code that discovers unknown properties:
var unknownProps = props
.Where(x => !knownPropNames.Contains(x.Name)
&& !x.GetCustomAttributes(typeof(ComparerIgnoreAttribute)).Any())
.Select(x => x.Name)
.ToArray();
Basically you could check all the properties you want to check in Equals via reflection. To filter some of them out use an attribute on those properties:
class Foo
{
[MyAttribute]
public string IgnoredProperty { get; set; }
public string MyProperty { get; set; }
}
Now in your comparer check for that specific attribute. Afterwards compare every property that is contained in the remaining list via PropertyInfo.GetValue
class MyComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo x, Foo y)
{
var properties = this.GetType().GetProperties()
.Where(x => "Attribute.IsDefined(x, typeof(MyAttribute));
var equal = true;
foreach(var p in properties)
equal &= p.GetValue(x, null) == p.GetValue(y, null);
return equal;
}
}
However you should have some good pre-checks within GetHashCode to avoid unneccessary calls to this slow method.
EDIT: As you´ve mentioned ReSharper, I assume as you provide the actual properties to be validated at runtime even R# doesn´t know a good way to implement GetHashCode. You will need some properties that will allways be available on your type and that provide a good enough idea of what might be considered equal. All theadditional properties however should only go into the expensive Equals-method.
EDIT2: As mentioned in the comments doing reflection within Equals or even GetHashCode is a bad idea as it´s usually quite slow and can often be avoided. If you know the properties to be checked for eqality at compile-time you should definitly include them within those two methods as doing so gives you much more safety. When you find yourself really to need this because you have to many properties you probably have some basic problem as your class is doing too much.
I guess you can check properties count inside the comparer. Something like this:
private sealed class FooEqualityComparer : IEqualityComparer<Foo>
{
private List<bool> comparisonResults = new List<bool>();
private List<Func<Foo, Foo, bool>> conditions = new List<Func<Foo, Foo, bool>>{
(x, y) => x.Int == y.Int,
(x, y) => string.Equals(x.Str, y.Str)
};
private int propertiesCount = typeof(Foo)
.GetProperties(BindingFlags.Public | BindingFlags.Instance)
//.Where(someLogicToExclde(e.g attribute))
.Count();
public bool Equals(Foo x, Foo y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.GetType() != y.GetType()) return false;
//has new property which is not presented in the conditions list and not excluded
if (conditions.Count() != propertiesCount) return false;
foreach(var func in conditions)
if(!func(x, y)) return false;//returns false on first mismatch
return true;//only if all conditions are satisfied
}
public int GetHashCode(Foo obj)
{
unchecked
{
return (obj.Int * 397) ^ (obj.Str != null ? obj.Str.GetHashCode() : 0);
}
}
}
Related
This question already has answers here:
HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative?
(4 answers)
GroupBy on complex object (e.g. List<T>)
(3 answers)
Closed 2 years ago.
I have a situation where I need a collection to be GroupBy on a HashSet<myClass> where myClass overrides Equals(myClass), Equals(object), GetHashCode(), ==, and !=.
When I perform the GroupBy() the results are however not grouped. The same occurs for Distinct(). It is created in a large LINQ query which calls ToHashSet() on values of myClass. The result is then used where the resulting HashSet itself is the key to a Dictionary<HashSet<myClass>, someOtherCollection>.
I have distilled the problem down to the simplest case, where two HashSet<myClass>, myHashSet1 and myHashSet2, both contain only the same single element. If I call myHashSet1.Equals(myHashSet2) it returns false, while myHashSet1.SetEquals(myHashSet2) returns true.
What am doing wrong here? What can I do to make GroupBy group HashSets when all elements match?
Possibly one step along the way is HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative?
which explains how to override a default IEqualityComparer for HashSet. But IF this is part of the answer, the critical remaining questions becomes how do I let GroupBy know to use this equality comparer?
I assume I should be feeding it when I call ToHashSet() , maybe ToHashSet(myHashSetEqualityComparer<myClass>), but it only takes a ToHashSet(IEqualityComparer<myClass>), not a ToHashSet(IEqualityComparer<HashSet<myClass>>)
Here's the code of myClass distilled to the essentials:
public class myClass : myBaseClass, IEquatable<myClass>
{
public string Prop1 { get; set; }
public string Prop2 { get; set; }
public Guid Prop3 { get; set; }
public override bool Equals(myClass other)
{
if (Equals(other, null)) return false;
return (Prop1 == other.Prop1 && Prop2 == other.Prop2 && Prop3 == other.Prop3);
}
public override bool Equals(object obj)
{
if (Equals(obj, null) || !(obj is myClass))
return false;
return Equals((myClass)obj);
}
public static bool operator ==(myClass left, myClass right)
{
if (Object.Equals(left, null))
return (Object.Equals(right, null)) ? true : false;
else
return left.Equals(right);
}
public static bool operator !=(myClass left, myClass right)
{
return !(left == right);
}
public override int GetHashCode()
{
return Prop3.GetHashCode() + 31 * (Prop2.GetHashCode() + 31 * Prop1.GetHashCode());
}
}
NOTE: I asked this question before, but it was closed with reference to the above linked question, which DID NOT ANSWER the original question, which clearly stated this was about a GroupBy problem. I have added more detail here regarding use in LINQ to clarify.
EDIT per request in comment this is what I am doing:
var myGroupedResult = myUngroupedCollection.
GroupBy(x => x.Value).
ToDictionary(x => x.Key, x => x.ToList());
// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>> produced by LINQ
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>
I expect the result to produce a dictionary where the keys are HashSet<myClass> and the values are List<someClass>. If I have 5 distinct hashsets each with 10 occurrences of someClass, I expect a Dictionary with 5 keys, each with a value that is a List with 10 elements. Instead I get a Dictionary with 50 keys each with a value being a List that has 1 element.
I was able to solve my issue. Posting an answer here in case anyone else runs into the same issue.
The solution has two steps. First create a generic IEqualityComparer<HashSet<T>> (from the link in the question):
public class HashSetEqualityComparerBySetEquals<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (ReferenceEquals(x, null))
return false;
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> set)
{
int hashCode = 0;
if (set != null)
{
foreach (T t in set)
{
hashCode = hashCode ^
(set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
}
}
return hashCode;
}
}
Then provide it in the GroupBy() (hint came from here: GroupBy on complex object (e.g. List<T>), which works on List, but not as-is on HashSet, which needs an additional elementSelector as second parameter):
HashSetEqualityComparerBySetEquals<myClass> comparer = new HashSetEqualityComparerBySetEquals<myClass>();
var myGroupedResult = myUngroupedCollection.
GroupBy(x => x.Value, x => x.Key, comparer).
ToDictionary(x => x.Key, x => x.ToList());
// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>> produced by LINQ but could be a Dictionary or another collection.
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>
The same IEqualityComparer can also be used when performing other LINQ operations that check for equality, such as Distinct() and FirstOrDefault():
var thisWorksAsExpected = myGroupedResult.FirstOrDefault(x => comparer.Equals(x.Key, aHashSetWithSameElements));
var thisAlsoWorks = myGroupedResult.FirstOrDefault(x => x.Key.SetEquals(aHashSetWithSameElements));
var thisDoesNotWork = myGroupedResult.FirstOrDefault(x => x.Key == aHashSetWithSameElements);
// thisDoesNotWork returns null sometimes even when all elements match
I have a case where two objects can be compared many different ways for equality. For example:
public class HeightComparer : IEqualityComparer<Person> {
public bool Equals(Person x, Person y) {
return x.Height.Equals(y.Height);
}
public int GetHashCode(Person obj) {
return obj.Height;
}
}
And I use these comparers in Dictionary<Person,Person>(IEqualityComparer<Person>) for various methods. How would you make a comparer that guarantees each person is unique? I came up with the following, but it runs slow since the GetHashCode() method often returns the same value.
public class NullPersonComparer : IEqualityComparer<Person> {
public bool Equals(Person x, Person y) {
return false; // always unequal
}
public int GetHashCode(Person obj) {
return obj.GetHashCode();
}
}
I could return the same value of 0 from GetHashCode(Person obj) but it still is slow populating the dictionary.
Edit
Here is a use case:
Dictionary<Person, Person> people = new Dictionary<Person, Person>(comparer);
foreach (string name in Names)
{
Person person= new Person(name);
Person realPerson;
if (people.TryGetValue(person, out realPerson))
{
realPerson.AddName(name);
}
else
{
people.Add(person, person);
}
}
If the type has not overridden the Equals or GetHashCode methods then their default implementations, from object, do what you want, namely provide equality based on their identity, rather than their value. You can use EqualityComparer<Person>.Default to get an IEqualityComparer that uses those semantics if you want.
If the Equals method has been overridden to provide some sort of value semantics, but you don't want that, you want identity semantics, then you can use object.ReferenceEquals in your own implementation:
public class IdentityComparer<T> : IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return object.ReferenceEquals(x, y);
}
public int GetHashCode(T obj)
{
return System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(obj);
}
}
I have a strongly typed list of custom objects, MyObject, which has a property Id, along with some other properties.
Let's say that the Id of a MyObject defines it as unique and I want to check if my collection doesn't already have a MyObject object that has an Id of 1 before I add my new MyObject to the collection.
I want to use if(!List<MyObject>.Contains(myObj)), but how do I enforce the fact that only one or two properties of MyObject define it as unique?
I can use IComparable? Or do I only have to override an Equals method? If so, I'd need to inherit something first, is that right?
List<T>.Contains uses EqualityComparer<T>.Default, which in turn uses IEquatable<T> if the type implements it, or object.Equals otherwise.
You could just implement IEquatable<T> but it's a good idea to override object.Equals if you do so, and a very good idea to override GetHashCode() if you do that:
public class SomeIDdClass : IEquatable<SomeIDdClass>
{
private readonly int _id;
public SomeIDdClass(int id)
{
_id = id;
}
public int Id
{
get { return _id; }
}
public bool Equals(SomeIDdClass other)
{
return null != other && _id == other._id;
}
public override bool Equals(object obj)
{
return Equals(obj as SomeIDdClass);
}
public override int GetHashCode()
{
return _id;
}
}
Note that the hash code relates to the criteria for equality. This is vital.
This also makes it applicable for any other case where equality, as defined by having the same ID, is useful. If you have a one-of requirement to check if a list has such an object, then I'd probably suggest just doing:
return someList.Any(item => item.Id == cmpItem.Id);
List<T> uses the comparer returned by EqualityComparer<T>.Default and according to the documentation for that:
The Default property checks whether
type T implements the
System.IEquatable(Of T) interface and,
if so, returns an EqualityComparer(Of
T) that uses that implementation.
Otherwise, it returns an
EqualityComparer(Of T) that uses the
overrides of Object.Equals and
Object.GetHashCode provided by T.
So you can either implement IEquatable<T> on your custom class, or override the Equals (and GetHashCode) methods to do the comparison by the properties you require. Alternatively you could use linq:
bool contains = list.Any(i => i.Id == obj.Id);
You can use LINQ to do this pretty easily.
var result = MyCollection.Any(p=>p.myId == Id);
if(result)
{
//something
}
You can override Equals and GetHashCode, implement an IEqualityComparer<MyObject> and use that in the Contains call, or use an extension method like Any
if (!myList.Any(obj => obj.Property == obj2.Property && obj.Property2 == obj2.Property2))
myList.Add(obj2);
First define helper class with IEqualityComparer.
public class MyEqualityComparer<T> : IEqualityComparer<T>
{
Func<T, int> _hashDelegate;
public MyEqualityComparer(Func<T, int> hashDelegate)
{
_hashDelegate = hashDelegate;
}
public bool Equals(T x, T y)
{
return _hashDelegate(x) == _hashDelegate(y);
}
public int GetHashCode(T obj)
{
return _hashDelegate(obj);
}
}
Then in your code, just define comparator and use it:
var myComparer = new MyEqualityComparer<MyObject>(delegate(MyObject obj){
return obj.ID;
});
var result = collection
.Where(f => anotherCollection.Contains(f.First, myComparer))
.ToArray();
This way you can define the way how Equality is computed without modifying your classes. You can also use it for processing object from third party libraries as you cannot modify their code.
You can use IEquatable<T>. Implement this in your class, and then check to see if the T passed to the Equals has the same Id as this.Id. I'm sure this works for checking a key in a dictionary, but I've not used it for a collection.
I have two collections of type ICollection<MyType> called c1 and c2. I'd like to find the set of items that are in c2 that are not in c1, where the heuristic for equality is the Id property on MyType.
What is the quickest way to perform this in C# (3.0)?
Use Enumerable.Except and specifically the overload that accepts an IEqualityComparer<MyType>:
var complement = c2.Except(c1, new MyTypeEqualityComparer());
Note that this produces the set difference and thus duplicates in c2 will only appear in the resulting IEnumerable<MyType> once. Here you need to implement IEqualityComparer<MyType> as something like
class MyTypeEqualityComparer : IEqualityComparer<MyType> {
public bool Equals(MyType x, MyType y) {
return x.Id.Equals(y.Id);
}
public int GetHashCode(MyType obj) {
return obj.Id.GetHashCode();
}
}
If using C# 3.0 + Linq:
var complement = from i2 in c2
where c1.FirstOrDefault(i1 => i2.Id == i1.Id) == null
select i2;
Loop through complement to get the items.
public class MyTypeComparer : IEqualityComparer<MyType>
{
public MyTypeComparer()
{
}
#region IComparer<MyType> Members
public bool Equals(MyType x, MyType y)
{
return string.Equals(x.Id, y.Id);
}
public int GetHashCode(MyType obj)
{
return base.GetHashCode();
}
#endregion
}
Then, using Linq:
c3 collection = new collection().add(c1);
c3.add(c2);
var items = c3.Distinct(new MyTypeComparer());
You could also do it using generics and predicates. If you need a sample, let me know.
I am getting strange behaviour using the built-in C# List.Sort function with a custom comparer.
For some reason it sometimes calls the comparer class's Compare method with a null object as one of the parameters. But if I check the list with the debugger there are no null objects in the collection.
My comparer class looks like this:
public class DelegateToComparer<T> : IComparer<T>
{
private readonly Func<T,T,int> _comparer;
public int Compare(T x, T y)
{
return _comparer(x, y);
}
public DelegateToComparer(Func<T, T, int> comparer)
{
_comparer = comparer;
}
}
This allows a delegate to be passed to the List.Sort method, like this:
mylist.Sort(new DelegateToComparer<MyClass>(
(x, y) => {
return x.SomeProp.CompareTo(y.SomeProp);
});
So the above delegate will throw a null reference exception for the x parameter, even though no elements of mylist are null.
UPDATE: Yes I am absolutely sure that it is parameter x throwing the null reference exception!
UPDATE: Instead of using the framework's List.Sort method, I tried a custom sort method (i.e. new BubbleSort().Sort(mylist)) and the problem went away. As I suspected, the List.Sort method passes null to the comparer for some reason.
This problem will occur when the comparison function is not consistent, such that x < y does not always imply y < x. In your example, you should check how two instances of the type of SomeProp are being compared.
Here's an example that reproduces the problem. Here, it's caused by the pathological compare function "compareStrings". It's dependent on the initial state of the list: if you change the initial order to "C","B","A", then there is no exception.
I wouldn't call this a bug in the Sort function - it's simply a requirement that the comparison function is consistent.
using System.Collections.Generic;
class Program
{
static void Main()
{
var letters = new List<string>{"B","C","A"};
letters.Sort(CompareStrings);
}
private static int CompareStrings(string l, string r)
{
if (l == "B")
return -1;
return l.CompareTo(r);
}
}
Are you sure the problem isn't that SomeProp is null?
In particular, with strings or Nullable<T> values.
With strings, it would be better to use:
list.Sort((x, y) => string.Compare(x.SomeProp, y.SomeProp));
(edit)
For a null-safe wrapper, you can use Comparer<T>.Default - for example, to sort a list by a property:
using System;
using System.Collections.Generic;
public static class ListExt {
public static void Sort<TSource, TValue>(
this List<TSource> list,
Func<TSource, TValue> selector) {
if (list == null) throw new ArgumentNullException("list");
if (selector == null) throw new ArgumentNullException("selector");
var comparer = Comparer<TValue>.Default;
list.Sort((x,y) => comparer.Compare(selector(x), selector(y)));
}
}
class SomeType {
public override string ToString() { return SomeProp; }
public string SomeProp { get; set; }
static void Main() {
var list = new List<SomeType> {
new SomeType { SomeProp = "def"},
new SomeType { SomeProp = null},
new SomeType { SomeProp = "abc"},
new SomeType { SomeProp = "ghi"},
};
list.Sort(x => x.SomeProp);
list.ForEach(Console.WriteLine);
}
}
I too have come across this problem (null reference being passed to my custom IComparer implementation) and finally found out that the problem was due to using inconsistent comparison function.
This was my initial IComparer implementation:
public class NumericStringComparer : IComparer<String>
{
public int Compare(string x, string y)
{
float xNumber, yNumber;
if (!float.TryParse(x, out xNumber))
{
return -1;
}
if (!float.TryParse(y, out yNumber))
{
return -1;
}
if (xNumber == yNumber)
{
return 0;
}
else
{
return (xNumber > yNumber) ? 1 : -1;
}
}
}
The mistake in this code was that Compare would return -1 whenever one of the values could not be parsed properly (in my case it was due to wrongly formatted string representations of numeric values so TryParse always failed).
Notice that in case both x and y were formatted incorrectly (and thus TryParse failed on both of them), calling Compare(x, y) and Compare(y, x) would yield the same result: -1. This I think was the main problem. When debugging, Compare() would be passed null string pointer as one of its arguments at some point even though the collection being sorted did not cotain a null string.
As soon as I had fixed the TryParse issue and ensured consistency of my implementation the problem went away and Compare wasn't being passed null pointers anymore.
Marc's answer is useful. I agree with him that the NullReference is due to calling CompareTo on a null property. Without needing an extension class, you can do:
mylist.Sort((x, y) =>
(Comparer<SomePropType>.Default.Compare(x.SomeProp, y.SomeProp)));
where SomePropType is the type of SomeProp
For debugging purposes, you want your method to be null-safe. (or at least, catch the null-ref. exception, and handle it in some hard-coded way). Then, use the debugger to watch what other values get compared, in what order, and which calls succeed or fail.
Then you will find your answer, and you can then remove the null-safety.
Can you run this code ...
mylst.Sort((i, j) =>
{
Debug.Assert(i.SomeProp != null && j.SomeProp != null);
return i.SomeProp.CompareTo(j.SomeProp);
}
);
I stumbled across this issue myself, and found that it was related to a NaN property in my input. Here's a minimal test case that should produce the exception:
public class C {
double v;
public static void Main() {
var test =
new List<C> { new C { v = 0d },
new C { v = Double.NaN },
new C { v = 1d } };
test.Sort((d1, d2) => (int)(d1.v - d2.v));
}
}