"better" (simpler, faster, whatever) version of 'distinct by delegate'?

"better" (simpler, faster, whatever) version of 'distinct by delegate'? - c#

I hate posting this since it's somewhat subjective, but it feels like there's a better method to do this that I'm just not thinking of.
Sometimes I want to 'distinct' a collection by certain columns/properties but without throwing away other columns (yes, this does lose information, as it becomes arbitrary which values of those other columns you'll end up with).
Note that this extension is less powerful than the Distinct overloads that take an IEqualityComparer<T> since such things could do much more complex comparison logic, but this is all I need for now :)
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> getKeyFunc)
{
return from s in source
group s by getKeyFunc(s) into sourceGroups
select sourceGroups.First();
}
Example usage:
var items = new[]
{
new { A = 1, B = "foo", C = Guid.NewGuid(), },
new { A = 2, B = "foo", C = Guid.NewGuid(), },
new { A = 1, B = "bar", C = Guid.NewGuid(), },
new { A = 2, B = "bar", C = Guid.NewGuid(), },
};
var itemsByA = items.DistinctBy(item => item.A).ToList();
var itemsByB = items.DistinctBy(item => item.B).ToList();

Here you go. I don't think this is massively more efficient than your own version, but it should have a slight edge. It only requires a single pass through the sequence, yielding each item as it goes, rather than needing to group the entire sequence first.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.DistinctBy(keySelector, null);
}
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
if (source == null)
throw new ArgumentNullException("source");
if (keySelector == null)
throw new ArgumentNullException("keySelector");
return source.DistinctByIterator(keySelector, keyComparer);
}
private static IEnumerable<TSource> DistinctByIterator<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
var keys = new HashSet<TKey>(keyComparer);
foreach (TSource item in source)
{
if (keys.Add(keySelector(item)))
yield return item;
}
}

I've previously written a generic Func => IEqualityComparer utility class just for the purpose of being able to call overloads of LINQ methods that accept an IEqualityComparer with having to write a custom class each time.
It uses a delegate (just like your example) to supply the comparison semantics. This allows me to use the built-in implementations of the library methods rather than rolling my own - which I presume are more likely to be correct and efficiently implemented.
public static class ComparerExt
{
private class GenericEqualityComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> m_CompareFunc;
public GenericEqualityComparer( Func<T,T,bool> compareFunc ) {
m_CompareFunc = compareFunc;
}
public bool Equals(T x, T y) {
return m_CompareFunc(x, y);
}
public int GetHashCode(T obj) {
return obj.GetHashCode(); // don't override hashing semantics
}
}
public static IComparer<T> Compare<T>( Func<T,T,bool> compareFunc ) {
return new GenericEqualityComparer<T>(compareFunc);
}
}
You can use this as so:
var result = list.Distinct( ComparerExt.Compare( (a,b) => { /*whatever*/ } );
I also often throw in a Reverse() method to allow for changing the ordering of operands in the comparison, like so:
private class GenericComparer<T> : IComparer<T>
{
private readonly Func<T, T, int> m_CompareFunc;
public GenericComparer( Func<T,T,int> compareFunc ) {
m_CompareFunc = compareFunc;
}
public int Compare(T x, T y) {
return m_CompareFunc(x, y);
}
}
public static IComparer<T> Reverse<T>( this IComparer<T> comparer )
{
return new GenericComparer<T>((a, b) => comparer.Compare(b, a));
}

Related

How to check object variable is a LINQ Group?

I traced group source code, and found group will new a GroupedEnumerable object. But when I tried below code it can't use is GroupedEnumerable to check.
var orders = new[] {
new {ID=1,Country="US",CreateDate=new DateTime(2021,01,01),PrdtID="P0020",Qty=100,Amount=100},
new {ID=2,Country="US",CreateDate=new DateTime(2021,01,02),PrdtID="P0021",Qty=200,Amount=200},
new {ID=3,Country="US",CreateDate=new DateTime(2021,02,03),PrdtID="P0022",Qty=300,Amount=300},
new {ID=4,Country="US",CreateDate=new DateTime(2021,02,04),PrdtID="P0023",Qty=400,Amount=400},
new {ID=5,Country="CN",CreateDate=new DateTime(2021,01,01),PrdtID="P0020",Qty=100,Amount=100},
new {ID=6,Country="CN",CreateDate=new DateTime(2021,01,02),PrdtID="P0021",Qty=200,Amount=200},
new {ID=7,Country="CN",CreateDate=new DateTime(2021,02,03),PrdtID="P0022",Qty=300,Amount=300},
new {ID=8,Country="CN",CreateDate=new DateTime(2021,02,04),PrdtID="P0023",Qty=400,Amount=400},
};
var query = from order in orders
group order by new { order.CreateDate.Year, order.CreateDate.Month } into g1
from g2 in (
from order in g1
group order by order.Country
)
group g2 by g1.Key;
;
var isGroup = query is System.Linq.GroupedEnumerable;
group source code:
public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return new GroupedEnumerable<TSource, TKey>(source, keySelector, null);
}
public interface IGrouping<out TKey, out TElement> : IEnumerable<TElement>, IEnumerable
{
TKey Key
{
get;
}
}
using System.Collections.Generic;
public GroupedEnumerable(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
if (keySelector == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.keySelector);
}
_source = source;
_keySelector = keySelector;
_comparer = comparer;
}
Why I want to do this
I want to make a helper, check if input object is group value then using specified logic to solve it.
Update:
I found GroupedEnumerable internal sealed class so it can't use is GroupedEnumerable to check

I believe you sould make your helper checking an interface,
e.g. IEnumerable<IGrouping<TKey,TElement>>
I would suggest an extension on that interface
Sample:
static class Extension
{
public static string AmI<TKey, TElement>(this IEnumerable<IGrouping<TKey, TElement>> source)
{
Console.WriteLine(typeof(TKey));
Console.WriteLine(typeof(TElement));
return string.Join(',', source.Select(g => g.Key));
}
}

Here is function and test
[TestMethod]
public void Test()
{
var array = new int[] {1, 2, 3, 4, 5, 6};
var grouping = array.GroupBy(x => x % 2);
IsGrouping(grouping)
.Should()
.BeTrue();
IsGrouping(array)
.Should()
.BeFalse();
}
private bool IsGrouping(IEnumerable someEnumerable)
{
return someEnumerable
.GetType()
.GetInterfaces()
.Any(x => x.IsGenericType
&& typeof(IEnumerable<>) == x.GetGenericTypeDefinition()
&& x.GenericTypeArguments.Length == 1
&& x.GenericTypeArguments.First().IsGenericType
&& x.GenericTypeArguments.First().GetGenericTypeDefinition() == typeof(IGrouping<,>)
);
}

Now I use a bad way checking by name to solve problem.
var isGroup = query.GetType().Name.StartsWith("GroupedEnumerable");

set unique values to lookup edit data source

I'm trying to put non duplicated values in a lookup edit using the code below but the variable unique is always null, and I don't know where is the problem.
Any help please?
List<VueItemItemUnit> liste = ObjReservation.LoadAllFamilles();
var unique =
from element in liste
group element by element.FA_CODE into Group
where Group.Count() == 1
select Group.Key;
lookUpFamille.Properties.DataSource = unique;

Try this :
var unique = liste.Distinct(element => element.FA_CODE).Select(element => element.FA_CODE);

I suggest you use the following approach:
lookUpFamille.Properties.DataSource = list.DistinctBy(e => e.FA_CODE).ToList();
//...
// DistinctBy<T,TKey> extension
static class EnumerableHelper {
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector) {
return source.Distinct(new EqualityComparer<T, TKey>(keySelector));
}
class EqualityComparer<T, TKey> : IEqualityComparer<T> {
readonly Func<T, TKey> keySelector;
public EqualityComparer(Func<T, TKey> keySelector) {
this.keySelector = keySelector;
}
bool IEqualityComparer<T>.Equals(T x, T y) {
return Equals(keySelector(x), keySelector(y));
}
int IEqualityComparer<T>.GetHashCode(T obj) {
return keySelector(obj).GetHashCode();
}
}
}

LINQ: Use .Except() on collections of different types by making them convertible/comparable?

Given two lists of different types, is it possible to make those types convertible between or comparable to each other (eg with a TypeConverter or similar) so that a LINQ query can compare them? I've seen other similar questions on SO but nothing that points to making the types convertible between each other to solve the problem.
Collection Types:
public class Data
{
public int ID { get; set; }
}
public class ViewModel
{
private Data _data;
public ViewModel(Data data)
{
_data = data;
}
}
Desired usage:
public void DoMerge(ObservableCollection<ViewModel> destination, IEnumerable<Data> data)
{
// 1. Find items in data that don't already exist in destination
var newData = destination.Except(data);
// ...
}
It would seem logical that since I know how to compare an instance of ViewModel to an instance of Data I should be able to provide some comparison logic that LINQ would then use for queries like .Except(). Is this possible?

I assume that providing a projection from Data to ViewModel is problematic, so I'm offering another solution in addition to Jason's.
Except uses a hash set (if I recall correctly), so you can get similar performance by creating your own hashset. I'm also assuming that you are identifying Data objects as equal when their IDs are equal.
var oldIDs = new HashSet<int>(data.Select(d => d.ID));
var newData = destination.Where(vm => !oldIDs.Contains(vm.Data.ID));
You might have another use for a collection of "oldData" elsewhere in the method, in which case, you would want to do this instead. Either implement IEquatable<Data> on your data class, or create a custom IEqualityComparer<Data> for the hash set:
var oldData = new HashSet<Data>(data);
//or: var oldData = new HashSet<Data>(data, new DataEqualityComparer());
var newData = destination.Where(vm => !oldData.Contains(vm.Data));

I know this is late but there is a simpler syntax using Func that eliminates the need for a comparer.
public static class LinqExtensions
{
public static IEnumerable<TSource> Except<TSource, VSource>(this IEnumerable<TSource> first, IEnumerable<VSource> second, Func<TSource, VSource, bool> comparer)
{
return first.Where(x => second.Count(y => comparer(x, y)) == 0);
}
public static IEnumerable<TSource> Contains<TSource, VSource>(this IEnumerable<TSource> first, IEnumerable<VSource> second, Func<TSource, VSource, bool> comparer)
{
return first.Where(x => second.FirstOrDefault(y => comparer(x, y)) != null);
}
public static IEnumerable<TSource> Intersect<TSource, VSource>(this IEnumerable<TSource> first, IEnumerable<VSource> second, Func<TSource, VSource, bool> comparer)
{
return first.Where(x => second.Count(y => comparer(x, y)) == 1);
}
}
so with lists of class Foo and Bar
public class Bar
{
public int Id { get; set; }
public string OtherBar { get; set; }
}
public class Foo
{
public int Id { get; set; }
public string OtherFoo { get; set; }
}
one can run Linq statements like
var fooExceptBar = fooList.Except(barList, (f, b) => f.Id == b.Id);
var barExceptFoo = barList.Except(fooList, (b, f) => b.OtherBar == f.OtherFoo);
it's basically a slight variation on above but seems cleaner to me.

If you use this :
var newData = destination.Except(data.Select(x => f(x)));
You have to project 'data' to same type contained in 'destination', but using the code below you could get rid of this limitation :
//Here is how you can compare two different sets.
class A { public string Bar { get; set; } }
class B { public string Foo { get; set; } }
IEnumerable<A> setOfA = new A[] { /*...*/ };
IEnumerable<B> setOfB = new B[] { /*...*/ };
var subSetOfA1 = setOfA.Except(setOfB, a => a.Bar, b => b.Foo);
//alternatively you can do it with a custom EqualityComparer, if your not case sensitive for instance.
var subSetOfA2 = setOfA.Except(setOfB, a => a.Bar, b => b.Foo, StringComparer.OrdinalIgnoreCase);
//Here is the extension class definition allowing you to use the code above
public static class IEnumerableExtension
{
public static IEnumerable<TFirst> Except<TFirst, TSecond, TCompared>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TCompared> firstSelect,
Func<TSecond, TCompared> secondSelect)
{
return Except(first, second, firstSelect, secondSelect, EqualityComparer<TCompared>.Default);
}
public static IEnumerable<TFirst> Except<TFirst, TSecond, TCompared>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TCompared> firstSelect,
Func<TSecond, TCompared> secondSelect,
IEqualityComparer<TCompared> comparer)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
return ExceptIterator<TFirst, TSecond, TCompared>(first, second, firstSelect, secondSelect, comparer);
}
private static IEnumerable<TFirst> ExceptIterator<TFirst, TSecond, TCompared>(
IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TCompared> firstSelect,
Func<TSecond, TCompared> secondSelect,
IEqualityComparer<TCompared> comparer)
{
HashSet<TCompared> set = new HashSet<TCompared>(second.Select(secondSelect), comparer);
foreach (TFirst tSource1 in first)
if (set.Add(firstSelect(tSource1)))
yield return tSource1;
}
}
Some may argue that's memory inefficient due to the use of an HashSet. But actually the Enumerable.Except method of the framework is doing the same with a similar internal class called 'Set' (I took a look by decompiling).

Your best bet is to provide a projection from Data to ViewModel so that you can say
var newData = destination.Except(data.Select(x => f(x)));
where f maps Data to ViewModel. You will need a IEqualityComparer<Data> too.

Remove doubles from structs array

I know 2 ways to remove doubles from an array of objects that support explicit comparing:
Using HashSet constructor and
Using LINQ's Distinct().
How to remove doubles from array of structs, comparing array members by a single field only? In other words, how to write the predicate, that could be used by Distinct().
Regards,

Well, you could implement IEqualityComparer<T> to pick out that field and use that for equality testing and hashing... or you could use DistinctBy which is in MoreLINQ.
Of course, you don't have to take a dependency on MoreLINQ really - you can implement it very simply:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
// TODO: Implement null argument checking :)
HashSet<TKey> keys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}

I would probably just loop:
var values = new HashSet<FieldType>();
var newList = new List<ItemType>();
foreach(var item in oldList) {
if(hash.Add(item.TheField)) newList.Add(item);
}

The LINQ answer has been published before. I am copying from Richard Szalay's answer here: Filtering duplicates out of an IEnumerable
public static class EnumerationExtensions
{
public static IEnumerable<TSource> Distinct<TSource,TKey>(
this IEnumerable<TSource> source, Func<TSource,TKey> keySelector)
{
KeyComparer comparer = new KeyComparer(keySelector);
return source.Distinct(comparer);
}
private class KeyComparer<TSource,TKey> : IEqualityComparer<TSource>
{
private Func<TSource,TKey> keySelector;
public DelegatedComparer(Func<TSource,TKey> keySelector)
{
this.keySelector = keySelector;
}
bool IEqualityComparer.Equals(TSource a, TSource b)
{
if (a == null && b == null) return true;
if (a == null || b == null) return false;
return keySelector(a) == keySelector(b);
}
int IEqualityComparer.GetHashCode(TSource obj)
{
return keySelector(obj).GetHashCode();
}
}
}
Which, as Richard says, is used like this:
var distinct = arr.Distinct(x => x.Name);

implement a custom IEqualityComparer<T>
public class MyStructComparer : IEqualityComparer<MyStruct>
{
public bool Equals(MyStruct x, MyStruct y)
{
return x.MyVal.Equals(y.MyVal);
}
public int GetHashCode(MyStruct obj)
{
return obj.MyVal.GetHashCode();
}
}
then
var distincts = myStructList.Distinct(new MyStructComparer());

F# Seq module implemented in C# for IEnumerable?

F# has a bunch of standard sequence operators I have come to know and love from my experience with Mathematica. F# is getting lots of my attention now, and when it is in general release, I intend to use it frequently.
Right now, since F# isn't yet in general release, I can't really use it in production code. LINQ implements some of these operators using SQL-like names (e.g. 'select' is 'map', and 'where' is 'filter'), but I can find no implementation of 'fold', 'iter' or 'partition'.
Has anyone seen any C# implementation of standard sequence operators? Is this something someone should write?

If you look carefully, many Seq operations have a LINQ equivalent or can be easily derived. Just looking down the list...
Seq.append = Concat<TSource>(IEnumerable<TSource> second)
Seq.concat = SelectMany<IEnumerable<TSource>, TResult>(s => s)
Seq.distinct_by = GroupBy(keySelector).Select(g => g.First())
Seq.exists = Any<TSource>(Func<TSource, bool> predicate)
Seq.mapi = Select<TSource, TResult>(Func<TSource, Int32, TResult> selector)
Seq.fold = Aggregate<TSource, TAccumulate>(TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func)
List.partition is defined like this:
Split the collection into two collections, containing the elements for which the given predicate returns true and false respectively
Which we can implement using GroupBy and a two-element array as a poor-man's tuple:
public static IEnumerable<TSource>[] Partition<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
return source.GroupBy(predicate).OrderByDescending(g => g.Key).ToArray();
}
Element 0 holds the true values; 1 holds the false values. GroupBy is essentially Partition on steroids.
And finally, Seq.iter and Seq.iteri map easily to foreach:
public static void Iter<TSource>(this IEnumerable<TSource> source, Action<TSource> action)
{
foreach (var item in source)
action(item);
}
public static void IterI<TSource>(this IEnumerable<TSource> source, Action<Int32, TSource> action)
{
int i = 0;
foreach (var item in source)
action(i++, item);
}

fold = Aggregate
Tell use what iter and partition do, and we might fill in the blanks. I'm guessing iter=SelectMany and partition might involve Skip/Take?
(update) I looked up Partition - here's a crude implementation that does some of it:
using System;
using System.Collections.Generic;
static class Program { // formatted for space
// usage
static void Main() {
int[] data = { 1, 2, 3, 4, 5, 6 };
var qry = data.Partition(2);
foreach (var grp in qry) {
Console.WriteLine("---");
foreach (var item in grp) {
Console.WriteLine(item);
}
}
}
static IEnumerable<IEnumerable<T>> Partition<T>(
this IEnumerable<T> source, int size) {
int count = 0;
T[] group = null; // use arrays as buffer
foreach (T item in source) {
if (group == null) group = new T[size];
group[count++] = item;
if (count == size) {
yield return group;
group = null;
count = 0;
}
}
if (count > 0) {
Array.Resize(ref group, count);
yield return group;
}
}
}

iter exists as a method in the List class which is ForEach
otherwise :
public static void iter<T>(this IEnumerable<T> source, Action<T> act)
{
foreach (var item in source)
{
act(item);
}
}

ToLookup would probably be a better match for List.partition:
IEnumerable<T> sequence = SomeSequence();
ILookup<bool, T> lookup = sequence.ToLookup(x => SomeCondition(x));
IEnumerable<T> trueValues = lookup[true];
IEnumerable<T> falseValues = lookup[false];

Rolling your own in C# is an interesting exercise, here are a few of mine. (See also here)
Note that iter/foreach on an IEnumerable is slightly controversial - I think because you have to 'finalise' (or whatever the word is) the IEnumerable in order for anything to actually happen.
//mimic fsharp map function (it's select in c#)
public static IEnumerable<TResult> Map<T, TResult>(this IEnumerable<T> input, Func<T, TResult> func)
{
foreach (T val in input)
yield return func(val);
}
//mimic fsharp mapi function (doens't exist in C#, I think)
public static IEnumerable<TResult> MapI<T, TResult>(this IEnumerable<T> input, Func<int, T, TResult> func)
{
int i = 0;
foreach (T val in input)
{
yield return func(i, val);
i++;
}
}
//mimic fsharp fold function (it's Aggregate in c#)
public static TResult Fold<T, TResult>(this IEnumerable<T> input, Func<T, TResult, TResult> func, TResult seed)
{
TResult ret = seed;
foreach (T val in input)
ret = func(val, ret);
return ret;
}
//mimic fsharp foldi function (doens't exist in C#, I think)
public static TResult FoldI<T, TResult>(this IEnumerable<T> input, Func<int, T, TResult, TResult> func, TResult seed)
{
int i = 0;
TResult ret = seed;
foreach (T val in input)
{
ret = func(i, val, ret);
i++;
}
return ret;
}
//mimic fsharp iter function
public static void Iter<T>(this IEnumerable<T> input, Action<T> action)
{
input.ToList().ForEach(action);
}

Here is an update to dahlbyk's partition solution.
It returned an array[] where "element 0 holds the true values; 1 holds the false values" — but this doesn't hold when all the elements match or all fail the predicate, in which case you've got a singleton array and a world of pain.
public static Tuple<IEnumerable<T>, IEnumerable<T>> Partition<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var partition = source.GroupBy(predicate);
IEnumerable<T> matches = partition.FirstOrDefault(g => g.Key) ?? Enumerable.Empty<T>();
IEnumerable<T> rejects = partition.FirstOrDefault(g => !g.Key) ?? Enumerable.Empty<T>();
return Tuple.Create(matches, rejects);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

"better" (simpler, faster, whatever) version of 'distinct by delegate'? - c#

Related

How to check object variable is a LINQ Group?

set unique values to lookup edit data source

LINQ: Use .Except() on collections of different types by making them convertible/comparable?

Remove doubles from structs array

F# Seq module implemented in C# for IEnumerable?

Categories

Resources