F# Seq module implemented in C# for IEnumerable? - c#

F# has a bunch of standard sequence operators I have come to know and love from my experience with Mathematica. F# is getting lots of my attention now, and when it is in general release, I intend to use it frequently.
Right now, since F# isn't yet in general release, I can't really use it in production code. LINQ implements some of these operators using SQL-like names (e.g. 'select' is 'map', and 'where' is 'filter'), but I can find no implementation of 'fold', 'iter' or 'partition'.
Has anyone seen any C# implementation of standard sequence operators? Is this something someone should write?

If you look carefully, many Seq operations have a LINQ equivalent or can be easily derived. Just looking down the list...
Seq.append = Concat<TSource>(IEnumerable<TSource> second)
Seq.concat = SelectMany<IEnumerable<TSource>, TResult>(s => s)
Seq.distinct_by = GroupBy(keySelector).Select(g => g.First())
Seq.exists = Any<TSource>(Func<TSource, bool> predicate)
Seq.mapi = Select<TSource, TResult>(Func<TSource, Int32, TResult> selector)
Seq.fold = Aggregate<TSource, TAccumulate>(TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func)
List.partition is defined like this:
Split the collection into two collections, containing the elements for which the given predicate returns true and false respectively
Which we can implement using GroupBy and a two-element array as a poor-man's tuple:
public static IEnumerable<TSource>[] Partition<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
return source.GroupBy(predicate).OrderByDescending(g => g.Key).ToArray();
}
Element 0 holds the true values; 1 holds the false values. GroupBy is essentially Partition on steroids.
And finally, Seq.iter and Seq.iteri map easily to foreach:
public static void Iter<TSource>(this IEnumerable<TSource> source, Action<TSource> action)
{
foreach (var item in source)
action(item);
}
public static void IterI<TSource>(this IEnumerable<TSource> source, Action<Int32, TSource> action)
{
int i = 0;
foreach (var item in source)
action(i++, item);
}

fold = Aggregate
Tell use what iter and partition do, and we might fill in the blanks. I'm guessing iter=SelectMany and partition might involve Skip/Take?
(update) I looked up Partition - here's a crude implementation that does some of it:
using System;
using System.Collections.Generic;
static class Program { // formatted for space
// usage
static void Main() {
int[] data = { 1, 2, 3, 4, 5, 6 };
var qry = data.Partition(2);
foreach (var grp in qry) {
Console.WriteLine("---");
foreach (var item in grp) {
Console.WriteLine(item);
}
}
}
static IEnumerable<IEnumerable<T>> Partition<T>(
this IEnumerable<T> source, int size) {
int count = 0;
T[] group = null; // use arrays as buffer
foreach (T item in source) {
if (group == null) group = new T[size];
group[count++] = item;
if (count == size) {
yield return group;
group = null;
count = 0;
}
}
if (count > 0) {
Array.Resize(ref group, count);
yield return group;
}
}
}

iter exists as a method in the List class which is ForEach
otherwise :
public static void iter<T>(this IEnumerable<T> source, Action<T> act)
{
foreach (var item in source)
{
act(item);
}
}

ToLookup would probably be a better match for List.partition:
IEnumerable<T> sequence = SomeSequence();
ILookup<bool, T> lookup = sequence.ToLookup(x => SomeCondition(x));
IEnumerable<T> trueValues = lookup[true];
IEnumerable<T> falseValues = lookup[false];

Rolling your own in C# is an interesting exercise, here are a few of mine. (See also here)
Note that iter/foreach on an IEnumerable is slightly controversial - I think because you have to 'finalise' (or whatever the word is) the IEnumerable in order for anything to actually happen.
//mimic fsharp map function (it's select in c#)
public static IEnumerable<TResult> Map<T, TResult>(this IEnumerable<T> input, Func<T, TResult> func)
{
foreach (T val in input)
yield return func(val);
}
//mimic fsharp mapi function (doens't exist in C#, I think)
public static IEnumerable<TResult> MapI<T, TResult>(this IEnumerable<T> input, Func<int, T, TResult> func)
{
int i = 0;
foreach (T val in input)
{
yield return func(i, val);
i++;
}
}
//mimic fsharp fold function (it's Aggregate in c#)
public static TResult Fold<T, TResult>(this IEnumerable<T> input, Func<T, TResult, TResult> func, TResult seed)
{
TResult ret = seed;
foreach (T val in input)
ret = func(val, ret);
return ret;
}
//mimic fsharp foldi function (doens't exist in C#, I think)
public static TResult FoldI<T, TResult>(this IEnumerable<T> input, Func<int, T, TResult, TResult> func, TResult seed)
{
int i = 0;
TResult ret = seed;
foreach (T val in input)
{
ret = func(i, val, ret);
i++;
}
return ret;
}
//mimic fsharp iter function
public static void Iter<T>(this IEnumerable<T> input, Action<T> action)
{
input.ToList().ForEach(action);
}

Here is an update to dahlbyk's partition solution.
It returned an array[] where "element 0 holds the true values; 1 holds the false values" — but this doesn't hold when all the elements match or all fail the predicate, in which case you've got a singleton array and a world of pain.
public static Tuple<IEnumerable<T>, IEnumerable<T>> Partition<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var partition = source.GroupBy(predicate);
IEnumerable<T> matches = partition.FirstOrDefault(g => g.Key) ?? Enumerable.Empty<T>();
IEnumerable<T> rejects = partition.FirstOrDefault(g => !g.Key) ?? Enumerable.Empty<T>();
return Tuple.Create(matches, rejects);
}

Related

Class contains IEnumerable<class> of the same class, unknown amount of childelemnts [duplicate]

I need to search a tree for data that could be anywhere in the tree. How can this be done with linq?
class Program
{
static void Main(string[] args) {
var familyRoot = new Family() {Name = "FamilyRoot"};
var familyB = new Family() {Name = "FamilyB"};
familyRoot.Children.Add(familyB);
var familyC = new Family() {Name = "FamilyC"};
familyB.Children.Add(familyC);
var familyD = new Family() {Name = "FamilyD"};
familyC.Children.Add(familyD);
//There can be from 1 to n levels of families.
//Search all children, grandchildren, great grandchildren etc, for "FamilyD" and return the object.
}
}
public class Family {
public string Name { get; set; }
List<Family> _children = new List<Family>();
public List<Family> Children {
get { return _children; }
}
}
That's an extension to It'sNotALie.s answer.
public static class Linq
{
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(c, selector))
.Concat(new[] { source });
}
}
Sample test usage:
var result = familyRoot.Flatten(x => x.Children).FirstOrDefault(x => x.Name == "FamilyD");
Returns familyD object.
You can make it work on IEnumerable<T> source too:
public static IEnumerable<T> Flatten<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
{
return source.SelectMany(x => Flatten(x, selector))
.Concat(source);
}
Another solution without recursion...
var result = FamilyToEnumerable(familyRoot)
.Where(f => f.Name == "FamilyD");
IEnumerable<Family> FamilyToEnumerable(Family f)
{
Stack<Family> stack = new Stack<Family>();
stack.Push(f);
while (stack.Count > 0)
{
var family = stack.Pop();
yield return family;
foreach (var child in family.Children)
stack.Push(child);
}
}
Simple:
familyRoot.Flatten(f => f.Children);
//you can do whatever you want with that sequence there.
//for example you could use Where on it and find the specific families, etc.
IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(selector(c), selector))
.Concat(new[]{source});
}
So, the simplest option is to write a function that traverses your hierarchy and produces a single sequence. This then goes at the start of your LINQ operations, e.g.
IEnumerable<T> Flatten<T>(this T source)
{
foreach(var item in source) {
yield item;
foreach(var child in Flatten(item.Children)
yield child;
}
}
To call simply: familyRoot.Flatten().Where(n => n.Name == "Bob");
A slight alternative would give you a way to quickly ignore a whole branch:
IEnumerable<T> Flatten<T>(this T source, Func<T, bool> predicate)
{
foreach(var item in source) {
if (predicate(item)) {
yield item;
foreach(var child in Flatten(item.Children)
yield child;
}
}
Then you could do things like: family.Flatten(n => n.Children.Count > 2).Where(...)
I like Kenneth Bo Christensen's answer using stack, it works great, it is easy to read and it is fast (and doesn't use recursion).
The only unpleasant thing is that it reverses the order of child items (because stack is FIFO). If sort order doesn't matter to you then it's ok.
If it does, sorting can be achieved easily using selector(current).Reverse() in the foreach loop (the rest of the code is the same as in Kenneth's original post)...
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
var stack = new Stack<T>();
stack.Push(source);
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
foreach (var child in selector(current).Reverse())
stack.Push(child);
}
}
Well, I guess the way is to go with the technique of working with hierarchical structures:
You need an anchor to make
You need the recursion part
// Anchor
rootFamily.Children.ForEach(childFamily =>
{
if (childFamily.Name.Contains(search))
{
// Your logic here
return;
}
SearchForChildren(childFamily);
});
// Recursion
public void SearchForChildren(Family childFamily)
{
childFamily.Children.ForEach(_childFamily =>
{
if (_childFamily.Name.Contains(search))
{
// Your logic here
return;
}
SearchForChildren(_childFamily);
});
}
I have tried two of the suggested codes and made the code a bit more clear:
public static IEnumerable<T> Flatten1<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten1(c, selector)).Concat(new[] { source });
}
public static IEnumerable<T> Flatten2<T>(this T source, Func<T, IEnumerable<T>> selector)
{
var stack = new Stack<T>();
stack.Push(source);
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
foreach (var child in selector(current))
stack.Push(child);
}
}
Flatten2() seems to be a little bit faster but its a close run.
Some further variants on the answers of It'sNotALie., MarcinJuraszek and DamienG.
First, the former two give a counterintuitive ordering. To get a nice tree-traversal ordering to the results, just invert the concatenation (put the "source" first).
Second, if you are working with an expensive source like EF, and you want to limit entire branches, Damien's suggestion that you inject the predicate is a good one and can still be done with Linq.
Finally, for an expensive source it may also be good to pre-select the fields of interest from each node with an injected selector.
Putting all these together:
public static IEnumerable<R> Flatten<T,R>(this T source, Func<T, IEnumerable<T>> children
, Func<T, R> selector
, Func<T, bool> branchpredicate = null
) {
if (children == null) throw new ArgumentNullException("children");
if (selector == null) throw new ArgumentNullException("selector");
var pred = branchpredicate ?? (src => true);
if (children(source) == null) return new[] { selector(source) };
return new[] { selector(source) }
.Concat(children(source)
.Where(pred)
.SelectMany(c => Flatten(c, children, selector, pred)));
}

Native C# support for checking if an IEnumerable is sorted?

Is there any LINQ support for checking if an IEnumerable<T> is sorted? I have an enumerable that I want to verify is sorted in non-descending order, but I can't seem to find native support for it in C#.
I've written my own extension method using IComparables<T>:
public static bool IsSorted<T>(this IEnumerable<T> collection) where T : IComparable<T>
{
Contract.Requires(collection != null);
using (var enumerator = collection.GetEnumerator())
{
if (enumerator.MoveNext())
{
var previous = enumerator.Current;
while (enumerator.MoveNext())
{
var current = enumerator.Current;
if (previous.CompareTo(current) > 0)
return false;
previous = current;
}
}
}
return true;
}
And one using an IComparer<T> object:
public static bool IsSorted<T>(this IEnumerable<T> collection, IComparer<T> comparer)
{
Contract.Requires(collection != null);
using (var enumerator = collection.GetEnumerator())
{
if (enumerator.MoveNext())
{
var previous = enumerator.Current;
while (enumerator.MoveNext())
{
var current = enumerator.Current;
if (comparer.Compare(previous, current) > 0)
return false;
previous = current;
}
}
}
return true;
}
You can check if collection is IOrderedEnumerable but that will work only if ordering is the last operation which was applied to sequence. So, basically you need to check all sequence manually.
Also keep in mind, that if sequence is IOrderedEnumerable you really can't say which condition was used to sort sequence.
Here is generic method which you can use to check if sequence is sorted in ascending order by field you want to check:
public static bool IsOrdered<T, TKey>(
this IEnumerable<T> source, Func<T, TKey> keySelector)
{
if (source == null)
throw new ArgumentNullException("source");
var comparer = Comparer<TKey>.Default;
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
return true;
TKey current = keySelector(iterator.Current);
while (iterator.MoveNext())
{
TKey next = keySelector(iterator.Current);
if (comparer.Compare(current, next) > 0)
return false;
current = next;
}
}
return true;
}
Usage:
string[] source = { "a", "ab", "c" };
bool isOrdered = source.IsOrdered(s => s.Length);
You can create similar IsOrderedDescending method - just change checking comparison result to comparer.Compare(current, next) < 0.
There is no such built-in support.
Obviously if your IEnumerable<T> also implements IOrderedEnumerable<T> then you don't need to do an additional check, otherwise you'd have to implement an extension method like you did.
You might want to add a direction parameter or change its name to IsSortedAscending<T>, by the way. Also, there might be different properties in your T to sort on, so it would have to be obvious to you in some way what "sorted" means.
There is a short and simple version using Zip, although your IEnumerable does get enumerated twice.
var source = Enumerable.Range(1,100000);
bool isSorted = source.Zip(source.Skip(1),(a,b)=>b>=a).All(x=>x);
I often find usages for an extension method I've created called SelectPairs(), and also in this case:
/// <summary>
/// Projects two consecutive pair of items into tuples.
/// {1,2,3,4} -> {(1,2), (2,3), (3,4))
/// </summary>
public static IEnumerable<Tuple<T, T>> SelectPairs<T>(this IEnumerable<T> source)
{
return SelectPairs(source, (t1, t2) => new Tuple<T, T>(t1, t2));
}
/// <summary>
/// Projects two consecutive pair of items into a new form.
/// {1,2,3,4} -> {pairCreator(1,2), pairCreator(2,3), pairCreator(3,4))
/// </summary>
public static IEnumerable<TResult> SelectPairs<T, TResult>(
this IEnumerable<T> source, Func<T, T, TResult> pairCreator)
{
T lastItem = default(T);
bool isFirst = true;
foreach (T currentItem in source)
{
if (!isFirst)
{
yield return pairCreator(lastItem, currentItem);
}
isFirst = false;
lastItem = currentItem;
}
}
Use it like this:
bool isOrdered = myCollection
.SelectPairs()
.All(t => t.Item1.MyProperty < t.Item2.MyProperty);
This statement can of course be placed in another extension method:
public static bool IsOrdered<T>(
this IEnumerable<T> source, Func<T, T, int> comparer)
{
return source.SelectPairs().All(t => comparer(t.Item1, t.Item2) > 0);
}
bool isOrdered = myCollection
.IsOrdered((o1, o2) => o2.MyProperty - o1.MyProperty);
Here's an implementation that uses a predicate to select the value to order by.
public static bool IsOrdered<TKey, TValue>(this IEnumerable<TKey> list, Func<TKey, TValue> predicate) where TValue : IComparable
{
if (!list.Any()) return true;
var previous = predicate(list.First());
foreach(var entry in list.Skip(1))
{
var current = predicate(entry);
if (previous.CompareTo(current) > 0)
return false;
previous = current;
}
return true;
}

How to search Hierarchical Data with Linq

I need to search a tree for data that could be anywhere in the tree. How can this be done with linq?
class Program
{
static void Main(string[] args) {
var familyRoot = new Family() {Name = "FamilyRoot"};
var familyB = new Family() {Name = "FamilyB"};
familyRoot.Children.Add(familyB);
var familyC = new Family() {Name = "FamilyC"};
familyB.Children.Add(familyC);
var familyD = new Family() {Name = "FamilyD"};
familyC.Children.Add(familyD);
//There can be from 1 to n levels of families.
//Search all children, grandchildren, great grandchildren etc, for "FamilyD" and return the object.
}
}
public class Family {
public string Name { get; set; }
List<Family> _children = new List<Family>();
public List<Family> Children {
get { return _children; }
}
}
That's an extension to It'sNotALie.s answer.
public static class Linq
{
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(c, selector))
.Concat(new[] { source });
}
}
Sample test usage:
var result = familyRoot.Flatten(x => x.Children).FirstOrDefault(x => x.Name == "FamilyD");
Returns familyD object.
You can make it work on IEnumerable<T> source too:
public static IEnumerable<T> Flatten<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
{
return source.SelectMany(x => Flatten(x, selector))
.Concat(source);
}
Another solution without recursion...
var result = FamilyToEnumerable(familyRoot)
.Where(f => f.Name == "FamilyD");
IEnumerable<Family> FamilyToEnumerable(Family f)
{
Stack<Family> stack = new Stack<Family>();
stack.Push(f);
while (stack.Count > 0)
{
var family = stack.Pop();
yield return family;
foreach (var child in family.Children)
stack.Push(child);
}
}
Simple:
familyRoot.Flatten(f => f.Children);
//you can do whatever you want with that sequence there.
//for example you could use Where on it and find the specific families, etc.
IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(selector(c), selector))
.Concat(new[]{source});
}
So, the simplest option is to write a function that traverses your hierarchy and produces a single sequence. This then goes at the start of your LINQ operations, e.g.
IEnumerable<T> Flatten<T>(this T source)
{
foreach(var item in source) {
yield item;
foreach(var child in Flatten(item.Children)
yield child;
}
}
To call simply: familyRoot.Flatten().Where(n => n.Name == "Bob");
A slight alternative would give you a way to quickly ignore a whole branch:
IEnumerable<T> Flatten<T>(this T source, Func<T, bool> predicate)
{
foreach(var item in source) {
if (predicate(item)) {
yield item;
foreach(var child in Flatten(item.Children)
yield child;
}
}
Then you could do things like: family.Flatten(n => n.Children.Count > 2).Where(...)
I like Kenneth Bo Christensen's answer using stack, it works great, it is easy to read and it is fast (and doesn't use recursion).
The only unpleasant thing is that it reverses the order of child items (because stack is FIFO). If sort order doesn't matter to you then it's ok.
If it does, sorting can be achieved easily using selector(current).Reverse() in the foreach loop (the rest of the code is the same as in Kenneth's original post)...
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
var stack = new Stack<T>();
stack.Push(source);
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
foreach (var child in selector(current).Reverse())
stack.Push(child);
}
}
Well, I guess the way is to go with the technique of working with hierarchical structures:
You need an anchor to make
You need the recursion part
// Anchor
rootFamily.Children.ForEach(childFamily =>
{
if (childFamily.Name.Contains(search))
{
// Your logic here
return;
}
SearchForChildren(childFamily);
});
// Recursion
public void SearchForChildren(Family childFamily)
{
childFamily.Children.ForEach(_childFamily =>
{
if (_childFamily.Name.Contains(search))
{
// Your logic here
return;
}
SearchForChildren(_childFamily);
});
}
I have tried two of the suggested codes and made the code a bit more clear:
public static IEnumerable<T> Flatten1<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten1(c, selector)).Concat(new[] { source });
}
public static IEnumerable<T> Flatten2<T>(this T source, Func<T, IEnumerable<T>> selector)
{
var stack = new Stack<T>();
stack.Push(source);
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
foreach (var child in selector(current))
stack.Push(child);
}
}
Flatten2() seems to be a little bit faster but its a close run.
Some further variants on the answers of It'sNotALie., MarcinJuraszek and DamienG.
First, the former two give a counterintuitive ordering. To get a nice tree-traversal ordering to the results, just invert the concatenation (put the "source" first).
Second, if you are working with an expensive source like EF, and you want to limit entire branches, Damien's suggestion that you inject the predicate is a good one and can still be done with Linq.
Finally, for an expensive source it may also be good to pre-select the fields of interest from each node with an injected selector.
Putting all these together:
public static IEnumerable<R> Flatten<T,R>(this T source, Func<T, IEnumerable<T>> children
, Func<T, R> selector
, Func<T, bool> branchpredicate = null
) {
if (children == null) throw new ArgumentNullException("children");
if (selector == null) throw new ArgumentNullException("selector");
var pred = branchpredicate ?? (src => true);
if (children(source) == null) return new[] { selector(source) };
return new[] { selector(source) }
.Concat(children(source)
.Where(pred)
.SelectMany(c => Flatten(c, children, selector, pred)));
}

Remove doubles from structs array

I know 2 ways to remove doubles from an array of objects that support explicit comparing:
Using HashSet constructor and
Using LINQ's Distinct().
How to remove doubles from array of structs, comparing array members by a single field only? In other words, how to write the predicate, that could be used by Distinct().
Regards,
Well, you could implement IEqualityComparer<T> to pick out that field and use that for equality testing and hashing... or you could use DistinctBy which is in MoreLINQ.
Of course, you don't have to take a dependency on MoreLINQ really - you can implement it very simply:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
// TODO: Implement null argument checking :)
HashSet<TKey> keys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
I would probably just loop:
var values = new HashSet<FieldType>();
var newList = new List<ItemType>();
foreach(var item in oldList) {
if(hash.Add(item.TheField)) newList.Add(item);
}
The LINQ answer has been published before. I am copying from Richard Szalay's answer here: Filtering duplicates out of an IEnumerable
public static class EnumerationExtensions
{
public static IEnumerable<TSource> Distinct<TSource,TKey>(
this IEnumerable<TSource> source, Func<TSource,TKey> keySelector)
{
KeyComparer comparer = new KeyComparer(keySelector);
return source.Distinct(comparer);
}
private class KeyComparer<TSource,TKey> : IEqualityComparer<TSource>
{
private Func<TSource,TKey> keySelector;
public DelegatedComparer(Func<TSource,TKey> keySelector)
{
this.keySelector = keySelector;
}
bool IEqualityComparer.Equals(TSource a, TSource b)
{
if (a == null && b == null) return true;
if (a == null || b == null) return false;
return keySelector(a) == keySelector(b);
}
int IEqualityComparer.GetHashCode(TSource obj)
{
return keySelector(obj).GetHashCode();
}
}
}
Which, as Richard says, is used like this:
var distinct = arr.Distinct(x => x.Name);
implement a custom IEqualityComparer<T>
public class MyStructComparer : IEqualityComparer<MyStruct>
{
public bool Equals(MyStruct x, MyStruct y)
{
return x.MyVal.Equals(y.MyVal);
}
public int GetHashCode(MyStruct obj)
{
return obj.MyVal.GetHashCode();
}
}
then
var distincts = myStructList.Distinct(new MyStructComparer());

"better" (simpler, faster, whatever) version of 'distinct by delegate'?

I hate posting this since it's somewhat subjective, but it feels like there's a better method to do this that I'm just not thinking of.
Sometimes I want to 'distinct' a collection by certain columns/properties but without throwing away other columns (yes, this does lose information, as it becomes arbitrary which values of those other columns you'll end up with).
Note that this extension is less powerful than the Distinct overloads that take an IEqualityComparer<T> since such things could do much more complex comparison logic, but this is all I need for now :)
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> getKeyFunc)
{
return from s in source
group s by getKeyFunc(s) into sourceGroups
select sourceGroups.First();
}
Example usage:
var items = new[]
{
new { A = 1, B = "foo", C = Guid.NewGuid(), },
new { A = 2, B = "foo", C = Guid.NewGuid(), },
new { A = 1, B = "bar", C = Guid.NewGuid(), },
new { A = 2, B = "bar", C = Guid.NewGuid(), },
};
var itemsByA = items.DistinctBy(item => item.A).ToList();
var itemsByB = items.DistinctBy(item => item.B).ToList();
Here you go. I don't think this is massively more efficient than your own version, but it should have a slight edge. It only requires a single pass through the sequence, yielding each item as it goes, rather than needing to group the entire sequence first.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.DistinctBy(keySelector, null);
}
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
if (source == null)
throw new ArgumentNullException("source");
if (keySelector == null)
throw new ArgumentNullException("keySelector");
return source.DistinctByIterator(keySelector, keyComparer);
}
private static IEnumerable<TSource> DistinctByIterator<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
var keys = new HashSet<TKey>(keyComparer);
foreach (TSource item in source)
{
if (keys.Add(keySelector(item)))
yield return item;
}
}
I've previously written a generic Func => IEqualityComparer utility class just for the purpose of being able to call overloads of LINQ methods that accept an IEqualityComparer with having to write a custom class each time.
It uses a delegate (just like your example) to supply the comparison semantics. This allows me to use the built-in implementations of the library methods rather than rolling my own - which I presume are more likely to be correct and efficiently implemented.
public static class ComparerExt
{
private class GenericEqualityComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> m_CompareFunc;
public GenericEqualityComparer( Func<T,T,bool> compareFunc ) {
m_CompareFunc = compareFunc;
}
public bool Equals(T x, T y) {
return m_CompareFunc(x, y);
}
public int GetHashCode(T obj) {
return obj.GetHashCode(); // don't override hashing semantics
}
}
public static IComparer<T> Compare<T>( Func<T,T,bool> compareFunc ) {
return new GenericEqualityComparer<T>(compareFunc);
}
}
You can use this as so:
var result = list.Distinct( ComparerExt.Compare( (a,b) => { /*whatever*/ } );
I also often throw in a Reverse() method to allow for changing the ordering of operands in the comparison, like so:
private class GenericComparer<T> : IComparer<T>
{
private readonly Func<T, T, int> m_CompareFunc;
public GenericComparer( Func<T,T,int> compareFunc ) {
m_CompareFunc = compareFunc;
}
public int Compare(T x, T y) {
return m_CompareFunc(x, y);
}
}
public static IComparer<T> Reverse<T>( this IComparer<T> comparer )
{
return new GenericComparer<T>((a, b) => comparer.Compare(b, a));
}

Categories