Quickest way to find the complement of two collections in C#

Quickest way to find the complement of two collections in C# - c#

I have two collections of type ICollection<MyType> called c1 and c2. I'd like to find the set of items that are in c2 that are not in c1, where the heuristic for equality is the Id property on MyType.
What is the quickest way to perform this in C# (3.0)?

Use Enumerable.Except and specifically the overload that accepts an IEqualityComparer<MyType>:
var complement = c2.Except(c1, new MyTypeEqualityComparer());
Note that this produces the set difference and thus duplicates in c2 will only appear in the resulting IEnumerable<MyType> once. Here you need to implement IEqualityComparer<MyType> as something like
class MyTypeEqualityComparer : IEqualityComparer<MyType> {
public bool Equals(MyType x, MyType y) {
return x.Id.Equals(y.Id);
}
public int GetHashCode(MyType obj) {
return obj.Id.GetHashCode();
}
}

If using C# 3.0 + Linq:
var complement = from i2 in c2
where c1.FirstOrDefault(i1 => i2.Id == i1.Id) == null
select i2;
Loop through complement to get the items.

public class MyTypeComparer : IEqualityComparer<MyType>
{
public MyTypeComparer()
{
}
#region IComparer<MyType> Members
public bool Equals(MyType x, MyType y)
{
return string.Equals(x.Id, y.Id);
}
public int GetHashCode(MyType obj)
{
return base.GetHashCode();
}
#endregion
}
Then, using Linq:
c3 collection = new collection().add(c1);
c3.add(c2);
var items = c3.Distinct(new MyTypeComparer());
You could also do it using generics and predicates. If you need a sample, let me know.

Related

GroupBy HashSet not grouping, whilst SetEquals is true [duplicate]

This question already has answers here:
HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative?
(4 answers)
GroupBy on complex object (e.g. List<T>)
(3 answers)
Closed 2 years ago.
I have a situation where I need a collection to be GroupBy on a HashSet<myClass> where myClass overrides Equals(myClass), Equals(object), GetHashCode(), ==, and !=.
When I perform the GroupBy() the results are however not grouped. The same occurs for Distinct(). It is created in a large LINQ query which calls ToHashSet() on values of myClass. The result is then used where the resulting HashSet itself is the key to a Dictionary<HashSet<myClass>, someOtherCollection>.
I have distilled the problem down to the simplest case, where two HashSet<myClass>, myHashSet1 and myHashSet2, both contain only the same single element. If I call myHashSet1.Equals(myHashSet2) it returns false, while myHashSet1.SetEquals(myHashSet2) returns true.
What am doing wrong here? What can I do to make GroupBy group HashSets when all elements match?
Possibly one step along the way is HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative?
which explains how to override a default IEqualityComparer for HashSet. But IF this is part of the answer, the critical remaining questions becomes how do I let GroupBy know to use this equality comparer?
I assume I should be feeding it when I call ToHashSet() , maybe ToHashSet(myHashSetEqualityComparer<myClass>), but it only takes a ToHashSet(IEqualityComparer<myClass>), not a ToHashSet(IEqualityComparer<HashSet<myClass>>)
Here's the code of myClass distilled to the essentials:
public class myClass : myBaseClass, IEquatable<myClass>
{
public string Prop1 { get; set; }
public string Prop2 { get; set; }
public Guid Prop3 { get; set; }
public override bool Equals(myClass other)
{
if (Equals(other, null)) return false;
return (Prop1 == other.Prop1 && Prop2 == other.Prop2 && Prop3 == other.Prop3);
}
public override bool Equals(object obj)
{
if (Equals(obj, null) || !(obj is myClass))
return false;
return Equals((myClass)obj);
}
public static bool operator ==(myClass left, myClass right)
{
if (Object.Equals(left, null))
return (Object.Equals(right, null)) ? true : false;
else
return left.Equals(right);
}
public static bool operator !=(myClass left, myClass right)
{
return !(left == right);
}
public override int GetHashCode()
{
return Prop3.GetHashCode() + 31 * (Prop2.GetHashCode() + 31 * Prop1.GetHashCode());
}
}
NOTE: I asked this question before, but it was closed with reference to the above linked question, which DID NOT ANSWER the original question, which clearly stated this was about a GroupBy problem. I have added more detail here regarding use in LINQ to clarify.
EDIT per request in comment this is what I am doing:
var myGroupedResult = myUngroupedCollection.
GroupBy(x => x.Value).
ToDictionary(x => x.Key, x => x.ToList());
// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>> produced by LINQ
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>
I expect the result to produce a dictionary where the keys are HashSet<myClass> and the values are List<someClass>. If I have 5 distinct hashsets each with 10 occurrences of someClass, I expect a Dictionary with 5 keys, each with a value that is a List with 10 elements. Instead I get a Dictionary with 50 keys each with a value being a List that has 1 element.

I was able to solve my issue. Posting an answer here in case anyone else runs into the same issue.
The solution has two steps. First create a generic IEqualityComparer<HashSet<T>> (from the link in the question):
public class HashSetEqualityComparerBySetEquals<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (ReferenceEquals(x, null))
return false;
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> set)
{
int hashCode = 0;
if (set != null)
{
foreach (T t in set)
{
hashCode = hashCode ^
(set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
}
}
return hashCode;
}
}
Then provide it in the GroupBy() (hint came from here: GroupBy on complex object (e.g. List<T>), which works on List, but not as-is on HashSet, which needs an additional elementSelector as second parameter):
HashSetEqualityComparerBySetEquals<myClass> comparer = new HashSetEqualityComparerBySetEquals<myClass>();
var myGroupedResult = myUngroupedCollection.
GroupBy(x => x.Value, x => x.Key, comparer).
ToDictionary(x => x.Key, x => x.ToList());
// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>> produced by LINQ but could be a Dictionary or another collection.
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>
The same IEqualityComparer can also be used when performing other LINQ operations that check for equality, such as Distinct() and FirstOrDefault():
var thisWorksAsExpected = myGroupedResult.FirstOrDefault(x => comparer.Equals(x.Key, aHashSetWithSameElements));
var thisAlsoWorks = myGroupedResult.FirstOrDefault(x => x.Key.SetEquals(aHashSetWithSameElements));
var thisDoesNotWork = myGroupedResult.FirstOrDefault(x => x.Key == aHashSetWithSameElements);
// thisDoesNotWork returns null sometimes even when all elements match

How to implement unit tests on IEqualityComparer?

I have a class and a comparer for this class that implements IEqualityComparer:
class Foo
{
public int Int { get; set; }
public string Str { get; set; }
public Foo(int i, string s)
{
Int = i;
Str = s;
}
private sealed class FooEqualityComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo x, Foo y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.GetType() != y.GetType()) return false;
return x.Int == y.Int && string.Equals(x.Str, y.Str);
}
public int GetHashCode(Foo obj)
{
unchecked
{
return (obj.Int * 397) ^ (obj.Str != null ? obj.Str.GetHashCode() : 0);
}
}
}
public static IEqualityComparer<Foo> Comparer { get; } = new FooEqualityComparer();
}
The two methods Equals and GetHashCode are used for example in List.Except via an instance of the comparer.
My question is: how to implement properly unit tests on this comparer? I want to detect if someone adds a public property in Foo without modifying the comparer, because in this case the comparer becomes invalid.
If I do something like:
Assert.That(new Foo(42, "answer"), Is.EqualTo(new Foo(42, "answer")));
This cannot detect that a new property was added, and that this property differs in the two objects.
Is there any way to do this?
If it is possible, can we add an attribute to a property to say that this property is not relevant in the comparison?

You can use reflection to get the properties of the type, e.g.:
var knownPropNames = new string[]
{
"Int",
"Str",
};
var props = typeof(Foo).GetProperties(BindingFlags.Public | BindingFlags.Instance);
var unknownProps = props
.Where(x => !knownPropNames.Contains(x.Name))
.Select(x => x.Name)
.ToArray();
// Use assertion instead of Console.WriteLine
Console.WriteLine("Unknown props: {0}", string.Join("; ", unknownProps));
This way, you can implement a test that fails if any properties are added. Of course, you'd have to add new properties to the array at the beginning. As using reflection is an expensive operation from a performance point of view, I'd propose to use it in the test, not in the comparer itself if you need to compare lots of objects.
Please also note the use of the BindingFlags parameter so you can restrict the properties to only the public ones and the ones on instance-level.
Also, you can define a custom attribute that you use to mark properties that are not relevant. For example:
[AttributeUsage(AttributeTargets.Property)]
public class ComparerIgnoreAttribute : Attribute {}
You can apply it to a property:
[ComparerIgnore]
public decimal Dec { get; set; }
In addition, you'd have to extend the code that discovers unknown properties:
var unknownProps = props
.Where(x => !knownPropNames.Contains(x.Name)
&& !x.GetCustomAttributes(typeof(ComparerIgnoreAttribute)).Any())
.Select(x => x.Name)
.ToArray();

Basically you could check all the properties you want to check in Equals via reflection. To filter some of them out use an attribute on those properties:
class Foo
{
[MyAttribute]
public string IgnoredProperty { get; set; }
public string MyProperty { get; set; }
}
Now in your comparer check for that specific attribute. Afterwards compare every property that is contained in the remaining list via PropertyInfo.GetValue
class MyComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo x, Foo y)
{
var properties = this.GetType().GetProperties()
.Where(x => "Attribute.IsDefined(x, typeof(MyAttribute));
var equal = true;
foreach(var p in properties)
equal &= p.GetValue(x, null) == p.GetValue(y, null);
return equal;
}
}
However you should have some good pre-checks within GetHashCode to avoid unneccessary calls to this slow method.
EDIT: As you´ve mentioned ReSharper, I assume as you provide the actual properties to be validated at runtime even R# doesn´t know a good way to implement GetHashCode. You will need some properties that will allways be available on your type and that provide a good enough idea of what might be considered equal. All theadditional properties however should only go into the expensive Equals-method.
EDIT2: As mentioned in the comments doing reflection within Equals or even GetHashCode is a bad idea as it´s usually quite slow and can often be avoided. If you know the properties to be checked for eqality at compile-time you should definitly include them within those two methods as doing so gives you much more safety. When you find yourself really to need this because you have to many properties you probably have some basic problem as your class is doing too much.

I guess you can check properties count inside the comparer. Something like this:
private sealed class FooEqualityComparer : IEqualityComparer<Foo>
{
private List<bool> comparisonResults = new List<bool>();
private List<Func<Foo, Foo, bool>> conditions = new List<Func<Foo, Foo, bool>>{
(x, y) => x.Int == y.Int,
(x, y) => string.Equals(x.Str, y.Str)
};
private int propertiesCount = typeof(Foo)
.GetProperties(BindingFlags.Public | BindingFlags.Instance)
//.Where(someLogicToExclde(e.g attribute))
.Count();
public bool Equals(Foo x, Foo y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.GetType() != y.GetType()) return false;
//has new property which is not presented in the conditions list and not excluded
if (conditions.Count() != propertiesCount) return false;
foreach(var func in conditions)
if(!func(x, y)) return false;//returns false on first mismatch
return true;//only if all conditions are satisfied
}
public int GetHashCode(Foo obj)
{
unchecked
{
return (obj.Int * 397) ^ (obj.Str != null ? obj.Str.GetHashCode() : 0);
}
}
}

How can I make a monoid-like interface in C#?

I want to require things which implement an interface (or derive from a class) to have an implementation for Aggregate included. That is, if they are of type T I want them to have something of type Func<T,T,T>. In Haskell this is called a "monoid".
EDIT: What I want to call is something like this:
list.Aggregate((x, accum) => accump.MAppend(x));
Based on DigalD's answer, this is my best attempt, but it doesn't compile:
interface IMonoid<T>
{
T MAppend(T other);
}
class Test
{
public static void runTest<T>(IEnumerable<IMonoid<T>> list)
{
// doesn't work
list.Aggregate((x, ac) => ac.MAppend(x));
}
}

A monoid is an associative operation together with an identity for that operation.
interface Monoid<T> {
T MAppend(T t1, T t2);
T MEmpty
}
The contract of a monoid is that for all a, b, and c:
Associativity: MAppend(Mappend(a, b), c) = MAppend(a, Mappend(b, c))
Left identity: MAppend(MEmpty, a) = a
Right identity: MAppend(a, MEmpty) = a
You can use it to add up the elements in a list:
class Test {
public static T runTest<T>(IEnumerable<T> list, Monoid<T> m) {
list.Aggregate(m.MEmpty, (a, b) => m.MAppend(a, b));
}
}

The answer by Apocalisp looks closest to the mark, but I'd prefer something like this:
public interface IMonoid<T>
{
T Combine(T x, T y);
T Identity { get; }
}
While Haskell calls the monoid identity mempty, I think it's more reasonable to use the language of abstract algebra, so I named the identity value Identity. Likewise, I prefer the term Combine over Haskell's mappend, because the word append seems to indicate some sort of list append operation, which it doesn't have to be at all. Combine, however, isn't a perfect word either, because neither the first nor the last monoids combine the values; instead, they ignore one of them. I'm open to suggestions of a better name for the binary operation...
(In Haskell, BTW, I prefer using the <> operator alias instead of the mappend function, so that sort of side-steps the naming issue...)
Using the above IMonoid<T> interface, you can now write an extension method like this:
public static class Monoid
{
public static T Concat<T>(this IMonoid<T> m, IEnumerable<T> values)
{
return values.Aggregate(m.Identity, (acc, x) => m.Combine(acc, x));
}
}
Here, I completely arbitrarily and inconsistently decided to go with Haskell's naming, so I named the method Concat.
As I describe in my article Monoids accumulate, one always has to start the accumulation with the monoidal identity, in this case m.Identity.
As I describe in my article Semigroups accumulate, instead of an imperative for loop, you can use the Aggregate extension method, but you'll have to use the overload that takes an initial seed value. That seed value is m.Identity.
You can now define various monoids, such as Sum:
public class Sum : IMonoid<int>
{
public int Combine(int x, int y)
{
return x + y;
}
public int Identity
{
get { return 0; }
}
}
or Product:
public class Product : IMonoid<int>
{
public int Combine(int x, int y)
{
return x * y;
}
public int Identity
{
get { return 1; }
}
}
Since I made the monoid argument the this argument of the Concat method, the method extends the IMonoid<T> interface, rather than IEnumerable<T>. I think this gives you a more readable API. For example:
var s = new Sum().Concat(new[] { 1000, 300, 30, 7 });
produces s == 1337, while
var p = new Product().Concat(new[] { 2, 3, 7 });
produces p == 42.
If you don't like having to create a new Sum() or new Product() object every time, you can make your monoids Singletons, like this All monoid:
public class All : IMonoid<bool>
{
public static All Instance = new All();
private All() { }
public bool Combine(bool x, bool y)
{
return x && y;
}
public bool Identity
{
get { return true; }
}
}
which you can use like this:
var a = All.Instance.Concat(new[] { true, true, true });
Here, a is true. You can use a similarly written Any monoid in the same way:
var a = Any.Instance.Concat(new[] { false, true, false });
I'll leave it as an exercise for the reader to figure out how Any is implemented.

What about this version:
interface IMonoid<T>
{
T MAppend(IMonoid<T> other);
}
class Test
{
public static void runTest<T>(IEnumerable<IMonoid<T>> list)
where T : IMonoid<T>
{
list.Aggregate((x, ac) => ac.MAppend(x));
}
}
Or better yet, enforcing it from the start:
interface IMonoid<T>
where T : IMonoid<T>
{
T MAppend(IMonoid<T> other);
}

Shouldn't you just make the Interface generic as well?
interface IMonoid<T>
{
public IMonoidHandler<T> handler {get;set;}
}

What is the proper way to set up a always false IEqualityComparer<T>?

I have a case where two objects can be compared many different ways for equality. For example:
public class HeightComparer : IEqualityComparer<Person> {
public bool Equals(Person x, Person y) {
return x.Height.Equals(y.Height);
}
public int GetHashCode(Person obj) {
return obj.Height;
}
}
And I use these comparers in Dictionary<Person,Person>(IEqualityComparer<Person>) for various methods. How would you make a comparer that guarantees each person is unique? I came up with the following, but it runs slow since the GetHashCode() method often returns the same value.
public class NullPersonComparer : IEqualityComparer<Person> {
public bool Equals(Person x, Person y) {
return false; // always unequal
}
public int GetHashCode(Person obj) {
return obj.GetHashCode();
}
}
I could return the same value of 0 from GetHashCode(Person obj) but it still is slow populating the dictionary.
Edit
Here is a use case:
Dictionary<Person, Person> people = new Dictionary<Person, Person>(comparer);
foreach (string name in Names)
{
Person person= new Person(name);
Person realPerson;
if (people.TryGetValue(person, out realPerson))
{
realPerson.AddName(name);
}
else
{
people.Add(person, person);
}
}

If the type has not overridden the Equals or GetHashCode methods then their default implementations, from object, do what you want, namely provide equality based on their identity, rather than their value. You can use EqualityComparer<Person>.Default to get an IEqualityComparer that uses those semantics if you want.
If the Equals method has been overridden to provide some sort of value semantics, but you don't want that, you want identity semantics, then you can use object.ReferenceEquals in your own implementation:
public class IdentityComparer<T> : IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return object.ReferenceEquals(x, y);
}
public int GetHashCode(T obj)
{
return System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(obj);
}
}

List.Sort in C#: comparer being called with null object

I am getting strange behaviour using the built-in C# List.Sort function with a custom comparer.
For some reason it sometimes calls the comparer class's Compare method with a null object as one of the parameters. But if I check the list with the debugger there are no null objects in the collection.
My comparer class looks like this:
public class DelegateToComparer<T> : IComparer<T>
{
private readonly Func<T,T,int> _comparer;
public int Compare(T x, T y)
{
return _comparer(x, y);
}
public DelegateToComparer(Func<T, T, int> comparer)
{
_comparer = comparer;
}
}
This allows a delegate to be passed to the List.Sort method, like this:
mylist.Sort(new DelegateToComparer<MyClass>(
(x, y) => {
return x.SomeProp.CompareTo(y.SomeProp);
});
So the above delegate will throw a null reference exception for the x parameter, even though no elements of mylist are null.
UPDATE: Yes I am absolutely sure that it is parameter x throwing the null reference exception!
UPDATE: Instead of using the framework's List.Sort method, I tried a custom sort method (i.e. new BubbleSort().Sort(mylist)) and the problem went away. As I suspected, the List.Sort method passes null to the comparer for some reason.

This problem will occur when the comparison function is not consistent, such that x < y does not always imply y < x. In your example, you should check how two instances of the type of SomeProp are being compared.
Here's an example that reproduces the problem. Here, it's caused by the pathological compare function "compareStrings". It's dependent on the initial state of the list: if you change the initial order to "C","B","A", then there is no exception.
I wouldn't call this a bug in the Sort function - it's simply a requirement that the comparison function is consistent.
using System.Collections.Generic;
class Program
{
static void Main()
{
var letters = new List<string>{"B","C","A"};
letters.Sort(CompareStrings);
}
private static int CompareStrings(string l, string r)
{
if (l == "B")
return -1;
return l.CompareTo(r);
}
}

Are you sure the problem isn't that SomeProp is null?
In particular, with strings or Nullable<T> values.
With strings, it would be better to use:
list.Sort((x, y) => string.Compare(x.SomeProp, y.SomeProp));
(edit)
For a null-safe wrapper, you can use Comparer<T>.Default - for example, to sort a list by a property:
using System;
using System.Collections.Generic;
public static class ListExt {
public static void Sort<TSource, TValue>(
this List<TSource> list,
Func<TSource, TValue> selector) {
if (list == null) throw new ArgumentNullException("list");
if (selector == null) throw new ArgumentNullException("selector");
var comparer = Comparer<TValue>.Default;
list.Sort((x,y) => comparer.Compare(selector(x), selector(y)));
}
}
class SomeType {
public override string ToString() { return SomeProp; }
public string SomeProp { get; set; }
static void Main() {
var list = new List<SomeType> {
new SomeType { SomeProp = "def"},
new SomeType { SomeProp = null},
new SomeType { SomeProp = "abc"},
new SomeType { SomeProp = "ghi"},
};
list.Sort(x => x.SomeProp);
list.ForEach(Console.WriteLine);
}
}

I too have come across this problem (null reference being passed to my custom IComparer implementation) and finally found out that the problem was due to using inconsistent comparison function.
This was my initial IComparer implementation:
public class NumericStringComparer : IComparer<String>
{
public int Compare(string x, string y)
{
float xNumber, yNumber;
if (!float.TryParse(x, out xNumber))
{
return -1;
}
if (!float.TryParse(y, out yNumber))
{
return -1;
}
if (xNumber == yNumber)
{
return 0;
}
else
{
return (xNumber > yNumber) ? 1 : -1;
}
}
}
The mistake in this code was that Compare would return -1 whenever one of the values could not be parsed properly (in my case it was due to wrongly formatted string representations of numeric values so TryParse always failed).
Notice that in case both x and y were formatted incorrectly (and thus TryParse failed on both of them), calling Compare(x, y) and Compare(y, x) would yield the same result: -1. This I think was the main problem. When debugging, Compare() would be passed null string pointer as one of its arguments at some point even though the collection being sorted did not cotain a null string.
As soon as I had fixed the TryParse issue and ensured consistency of my implementation the problem went away and Compare wasn't being passed null pointers anymore.

Marc's answer is useful. I agree with him that the NullReference is due to calling CompareTo on a null property. Without needing an extension class, you can do:
mylist.Sort((x, y) =>
(Comparer<SomePropType>.Default.Compare(x.SomeProp, y.SomeProp)));
where SomePropType is the type of SomeProp

For debugging purposes, you want your method to be null-safe. (or at least, catch the null-ref. exception, and handle it in some hard-coded way). Then, use the debugger to watch what other values get compared, in what order, and which calls succeed or fail.
Then you will find your answer, and you can then remove the null-safety.

Can you run this code ...
mylst.Sort((i, j) =>
{
Debug.Assert(i.SomeProp != null && j.SomeProp != null);
return i.SomeProp.CompareTo(j.SomeProp);
}
);

I stumbled across this issue myself, and found that it was related to a NaN property in my input. Here's a minimal test case that should produce the exception:
public class C {
double v;
public static void Main() {
var test =
new List<C> { new C { v = 0d },
new C { v = Double.NaN },
new C { v = 1d } };
test.Sort((d1, d2) => (int)(d1.v - d2.v));
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Quickest way to find the complement of two collections in C# - c#

I have two collections of type ICollection<MyType> called c1 and c2. I'd like to find the set of items that are in c2 that are not in c1, where the heuristic for equality is the Id property on MyType. What is the quickest way to perform this in C# (3.0)?

If using C# 3.0 + Linq: var complement = from i2 in c2 where c1.FirstOrDefault(i1 => i2.Id == i1.Id) == null select i2; Loop through complement to get the items.

Related

GroupBy HashSet not grouping, whilst SetEquals is true [duplicate]

How to implement unit tests on IEqualityComparer?

How can I make a monoid-like interface in C#?

What is the proper way to set up a always false IEqualityComparer<T>?

List.Sort in C#: comparer being called with null object

Categories

Resources