I am playing with LINQ to learn about it, but I can't figure out how to use Distinct when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a List<TElement> on one or more properties of the TElement?
Example: If an object is Person, with property Id. How can I get all Person and use Distinct on them with the property Id of the object?
Person1: Id=1, Name="Test1"
Person2: Id=1, Name="Test1"
Person3: Id=2, Name="Test2"
How can I get just Person1 and Person3? Is that possible?
If it's not possible with LINQ, what would be the best way to have a list of Person depending on some of its properties?
What if I want to obtain a distinct list based on one or more properties?
Simple! You want to group them and pick a winner out of the group.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.First())
.ToList();
If you want to define groups on multiple properties, here's how:
List<Person> distinctPeople = allPeople
.GroupBy(p => new {p.PersonId, p.FavoriteColor} )
.Select(g => g.First())
.ToList();
Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.
Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155
EDIT: This is now part of MoreLINQ.
What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
So to find the distinct values using just the Id property, you could use:
var query = people.DistinctBy(p => p.Id);
And to use multiple properties, you can use anonymous types, which implement equality appropriately:
var query = people.DistinctBy(p => new { p.Id, p.Name });
Untested, but it should work (and it now at least compiles).
It assumes the default comparer for the keys though - if you want to pass in an equality comparer, just pass it on to the HashSet constructor.
Use:
List<Person> pList = new List<Person>();
/* Fill list */
var result = pList.Where(p => p.Name != null).GroupBy(p => p.Id)
.Select(grp => grp.FirstOrDefault());
The where helps you filter the entries (could be more complex) and the groupby and select perform the distinct function.
You could also use query syntax if you want it to look all LINQ-like:
var uniquePeople = from p in people
group p by new {p.ID} //or group by new {p.ID, p.Name, p.Whatever}
into mygroup
select mygroup.FirstOrDefault();
I think it is enough:
list.Select(s => s.MyField).Distinct();
Solution first group by your fields then select FirstOrDefault item.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.FirstOrDefault())
.ToList();
Starting with .NET 6, there is new solution using the new DistinctBy() extension in Linq, so we can do:
var distinctPersonsById = personList.DistinctBy(x => x.Id);
The signature of the DistinctBy method:
// Returns distinct elements from a sequence according to a specified
// key selector function.
public static IEnumerable<TSource> DistinctBy<TSource, TKey> (
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector);
You can do this with the standard Linq.ToLookup(). This will create a collection of values for each unique key. Just select the first item in the collection
Persons.ToLookup(p => p.Id).Select(coll => coll.First());
The following code is functionally equivalent to Jon Skeet's answer.
Tested on .NET 4.5, should work on any earlier version of LINQ.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
return source.Where(element => seenKeys.Add(keySelector(element)));
}
Incidentially, check out Jon Skeet's latest version of DistinctBy.cs on Google Code.
Update 2022-04-03
Based on an comment by Andrew McClement, best to take John Skeet's answer over this one.
I've written an article that explains how to extend the Distinct function so that you can do as follows:
var people = new List<Person>();
people.Add(new Person(1, "a", "b"));
people.Add(new Person(2, "c", "d"));
people.Add(new Person(1, "a", "b"));
foreach (var person in people.Distinct(p => p.ID))
// Do stuff with unique list here.
Here's the article (now in the Web Archive): Extending LINQ - Specifying a Property in the Distinct Function
Personally I use the following class:
public class LambdaEqualityComparer<TSource, TDest> :
IEqualityComparer<TSource>
{
private Func<TSource, TDest> _selector;
public LambdaEqualityComparer(Func<TSource, TDest> selector)
{
_selector = selector;
}
public bool Equals(TSource obj, TSource other)
{
return _selector(obj).Equals(_selector(other));
}
public int GetHashCode(TSource obj)
{
return _selector(obj).GetHashCode();
}
}
Then, an extension method:
public static IEnumerable<TSource> Distinct<TSource, TCompare>(
this IEnumerable<TSource> source, Func<TSource, TCompare> selector)
{
return source.Distinct(new LambdaEqualityComparer<TSource, TCompare>(selector));
}
Finally, the intended usage:
var dates = new List<DateTime>() { /* ... */ }
var distinctYears = dates.Distinct(date => date.Year);
The advantage I found using this approach is the re-usage of LambdaEqualityComparer class for other methods that accept an IEqualityComparer. (Oh, and I leave the yield stuff to the original LINQ implementation...)
You can use DistinctBy() for getting Distinct records by an object property. Just add the following statement before using it:
using Microsoft.Ajax.Utilities;
and then use it like following:
var listToReturn = responseList.DistinctBy(x => x.Index).ToList();
where 'Index' is the property on which i want the data to be distinct.
You can do it (albeit not lightning-quickly) like so:
people.Where(p => !people.Any(q => (p != q && p.Id == q.Id)));
That is, "select all people where there isn't another different person in the list with the same ID."
Mind you, in your example, that would just select person 3. I'm not sure how to tell which you want, out of the previous two.
In case you need a Distinct method on multiple properties, you can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;
This is how you use it:
using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);
When we faced such a task in our project we defined a small API to compose comparators.
So, the use case was like this:
var wordComparer = KeyEqualityComparer.Null<Word>().
ThenBy(item => item.Text).
ThenBy(item => item.LangID);
...
source.Select(...).Distinct(wordComparer);
And API itself looks like this:
using System;
using System.Collections;
using System.Collections.Generic;
public static class KeyEqualityComparer
{
public static IEqualityComparer<T> Null<T>()
{
return null;
}
public static IEqualityComparer<T> EqualityComparerBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc);
}
public static KeyEqualityComparer<T, K> ThenBy<T, K>(
this IEqualityComparer<T> equalityComparer,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer);
}
}
public struct KeyEqualityComparer<T, K>: IEqualityComparer<T>
{
public KeyEqualityComparer(
Func<T, K> keyFunc,
IEqualityComparer<T> equalityComparer = null)
{
KeyFunc = keyFunc;
EqualityComparer = equalityComparer;
}
public bool Equals(T x, T y)
{
return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) &&
EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y));
}
public int GetHashCode(T obj)
{
var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj));
if (EqualityComparer != null)
{
var hash2 = EqualityComparer.GetHashCode(obj);
hash ^= (hash2 << 5) + hash2;
}
return hash;
}
public readonly Func<T, K> KeyFunc;
public readonly IEqualityComparer<T> EqualityComparer;
}
More details is on our site: IEqualityComparer in LINQ.
If you don't want to add the MoreLinq library to your project just to get the DistinctBy functionality then you can get the same end result using the overload of Linq's Distinct method that takes in an IEqualityComparer argument.
You begin by creating a generic custom equality comparer class that uses lambda syntax to perform custom comparison of two instances of a generic class:
public class CustomEqualityComparer<T> : IEqualityComparer<T>
{
Func<T, T, bool> _comparison;
Func<T, int> _hashCodeFactory;
public CustomEqualityComparer(Func<T, T, bool> comparison, Func<T, int> hashCodeFactory)
{
_comparison = comparison;
_hashCodeFactory = hashCodeFactory;
}
public bool Equals(T x, T y)
{
return _comparison(x, y);
}
public int GetHashCode(T obj)
{
return _hashCodeFactory(obj);
}
}
Then in your main code you use it like so:
Func<Person, Person, bool> areEqual = (p1, p2) => int.Equals(p1.Id, p2.Id);
Func<Person, int> getHashCode = (p) => p.Id.GetHashCode();
var query = people.Distinct(new CustomEqualityComparer<Person>(areEqual, getHashCode));
Voila! :)
The above assumes the following:
Property Person.Id is of type int
The people collection does not contain any null elements
If the collection could contain nulls then simply rewrite the lambdas to check for null, e.g.:
Func<Person, Person, bool> areEqual = (p1, p2) =>
{
return (p1 != null && p2 != null) ? int.Equals(p1.Id, p2.Id) : false;
};
EDIT
This approach is similar to the one in Vladimir Nesterovsky's answer but simpler.
It is also similar to the one in Joel's answer but allows for complex comparison logic involving multiple properties.
However, if your objects can only ever differ by Id then another user gave the correct answer that all you need to do is override the default implementations of GetHashCode() and Equals() in your Person class and then just use the out-of-the-box Distinct() method of Linq to filter out any duplicates.
Override Equals(object obj) and GetHashCode() methods:
class Person
{
public int Id { get; set; }
public int Name { get; set; }
public override bool Equals(object obj)
{
return ((Person)obj).Id == Id;
// or:
// var o = (Person)obj;
// return o.Id == Id && o.Name == Name;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
and then just call:
List<Person> distinctList = new[] { person1, person2, person3 }.Distinct().ToList();
The best way to do this that will be compatible with other .NET versions is to override Equals and GetHash to handle this (see Stack Overflow question This code returns distinct values. However, what I want is to return a strongly typed collection as opposed to an anonymous type), but if you need something that is generic throughout your code, the solutions in this article are great.
List<Person>lst=new List<Person>
var result1 = lst.OrderByDescending(a => a.ID).Select(a =>new Player {ID=a.ID,Name=a.Name} ).Distinct();
You should be able to override Equals on person to actually do Equals on Person.id. This ought to result in the behavior you're after.
If you use old .NET version, where the extension method is not built-in, then you may define your own extension method:
public static class EnumerableExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> enumerable, Func<T, TKey> keySelector)
{
return enumerable.GroupBy(keySelector).Select(grp => grp.First());
}
}
Example of usage:
var personsDist = persons.DistinctBy(item => item.Name);
Definitely not the most efficient but for those, who are looking for a short and simple answer:
list.Select(x => x.Id).Distinct().Select(x => list.First(y => x == y.Id)).ToList();
Please give a try with below code.
var Item = GetAll().GroupBy(x => x .Id).ToList();
Given an existing ICollection<T> instance (e.g. dest) what is the most efficient and readable way to add items from an IEnumerable<T>?
In my use case, I have some kind of utility method Collect(IEnumerable items) which returns a new ICollection with the elements from items, so I am doing it in the following way:
public static ICollection<T> Collect<T>(IEnumerable<T> items) where T:ICollection<T>
{
...
ICollection<T> dest = Activator.CreateInstance<T>();
items.Aggregate(dest, (acc, item) => { acc.Add(item); return acc; });
...
return dest;
}
Question: Is there any “better” way (more efficient or readable) of doing it?
UPDATE: I think the use of Aggregate() is quite fluent and not so inefficient as invoking ToList().ForEach(). But it does not look very readable. Since nobody else agrees with the use of Aggregate() I would like to read your reasons to NOT use Aggregate() for this purpose.
Just use Enumerable.Concat:
IEnumerable<YourType> result = dest.Concat(items);
If you want a List<T> as result use ToList:
List<YourType> result = dest.Concat(items).ToList();
// perhaps:
dest = result;
If dest is actually already a list and you want to modify it use AddRange:
dest.AddRange(items);
Update:
if you have to add items to a ICollection<T> method argument you could use this extension:
public static void AddRange<T>(this ICollection<T> collection, IEnumerable<T> seq)
{
List<T> list = collection as List<T>;
if (list != null)
list.AddRange(seq);
else
{
foreach (T item in seq)
collection.Add(item);
}
}
// ...
public static void Foo<T>(ICollection<T> dest)
{
IEnumerable<T> items = ...
dest.AddRange(items);
}
Personally I'd go with #ckruczek's comment of a foreach loop:
foreach (var item in items)
dest.Add(item);
Simple, clean, and pretty much everybody immediately understands what it does.
If you do insist on some method call hiding the loop, then some people define a custom ForEach extension method for IEnumerable<T>, similar to what's defined for List<T>. The implementation is trivial:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action) {
if (source == null) throw new ArgumentNullException(nameof(source));
if (action == null) throw new ArgumentNullException(nameof(action));
foreach (item in source)
action(item);
}
Given that, you would be able to write
items.ForEach(dest.Add);
I don't see much benefit in it myself, but no drawbacks either.
We actually wrote an extension method for this (along with a bunch of other ICollection extension methods):
public static class CollectionExt
{
public static void AddRange<T>(this ICollection<T> collection, IEnumerable<T> source)
{
Contract.Requires(collection != null);
Contract.Requires(source != null);
foreach (T item in source)
{
collection.Add(item);
}
}
}
So we can just use AddRange() on an ICollection():
ICollection<int> test = new List<int>();
test.AddRange(new [] {1, 2, 3});
Note: If you wanted to use List<T>.AddRange() if the underlying collection was of type List<T> you could implement the extension method like so:
public static void AddRange<T>(this ICollection<T> collection, IEnumerable<T> source)
{
var asList = collection as List<T>;
if (asList != null)
{
asList.AddRange(source);
}
else
{
foreach (T item in source)
{
collection.Add(item);
}
}
}
Most efficient:
foreach(T item in itens) dest.Add(item)
Most readable (BUT inefficient because it is creating a throwaway list):
items.ToList().ForEach(dest.Add);
Less readable, but Not so inefficient:
items.Aggregate(dest, (acc, item) => { acc.Add(item); return acc; });
items.ToList().ForEach(dest.Add);
If you dont want to create a new collection instance, then create an extension method.
public static class Extension
{
public static void AddRange<T>(this ICollection<T> source, IEnumerable<T> items)
{
if (items == null)
{
return;
}
foreach (T item in items)
{
source.Add(item);
}
}
}
Then you can edit your code like this:
ICollection<T> dest = ...;
IEnumerable<T> items = ...;
dest.AddRange(items);
I have an extension method that works on any class, but I want to call a special version if I am working on IEnumerable<T>.
For Example
public static class ExtensionMethods
{
public static dynamic Test<T>(this T source)
{
dynamic expandoObject = new System.Dynamic.ExpandoObject();
var dictionary = (IDictionary<string,object>)expandoObject;
dictionary["Test"] = source.ToString();
return dictionary;
}
public static IEnumerable<dynamic> Test<T>(this List<T> source)
{
var result = new List<dynamic>();
foreach(var r in source)
yield return r.Test();
}
public static IEnumerable<dynamic> Test<T>(this IEnumerable<T> source)
{
var result = new List<dynamic>();
foreach(var r in source)
yield return r.Test();
}
}
// Usage
public class X
{
string guid = Guid.NewGuid().ToString();
}
void Main()
{
List<X> list = new List<X>() { new X() };
list.Test().Dump(); // Correct but only works because there is an explicit overload for List<T>
var array = list.ToArray();
((IEnumerable<X>) array).Test().Dump(); // Correct
array.Test().Dump(); // Calls the wrong extension method
}
Is there any way I can get array.Test() to call the IEnumerable version without having to explicitly cast it?
Alternatively, if I give the extension method different names, if there any way I can get a compiler error if I accidently use the wrong one?
I think you are trying to solve it in a wrong direction. The List implements IEnumerable interface and as such the compiler can have problem with solving the best method will be invoked on List. What you could do -- you could test if the IEnumerable is a list inside the extension method.
public static IEnumerable<dynamic> Test<T>(this IEnumerable<T> source)
{
if (source is List<T>) {
// here
}
var result = new List<dynamic>();
foreach(var r in source)
yield return r.Test();
}
You can specify T and not rely on type inference, this will hint compiler to use correct extension method. Code would look like this:
var array = list.ToArray();
array.Test<X>().Dump();
What happens is, that compiler cannot tell which extension to use, since Array is valid argument for both method signatures:
public static dynamic Test<T>(this T source) { .. }
public static IEnumerable<dynamic> Test<T>(this IEnumerable<T> source) { .. }
In first case compiler can assume T is of type Array. Because of it, compiler has to picks one (might be first defined?).
Add this extension method to explicitly catch all array types:
public static IEnumerable<dynamic> Test<T>(this T[] source)
{
var result = new List<dynamic>();
foreach(var r in source)
yield return r.Test();
}
I know when Linq's Any() extension is used to determine if an enumerable has at least one element it will only consume a single element. But how does that work actually? Does it have to cast all items in the enumerable first, or does it just cast them one at a time, starting with the first and stopping there?
Any() works on an IEnumerable<T> so no cast is required. It's implementation is very simple, it simply iterates through the enumerable and sees if it can find any elements matching the specified criteria.
Simple implementation looks like:
public bool Any<T>(IEnumerable<T> list)
{
using (var enumerator = list.GetEnumerator())
{
return enumerator.MoveNext();
}
}
So, no any casting required
Code in the public static class Enumerable:
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if(source==null) {
throw Error.ArgumentNull("source");
}
using(IEnumerator<TSource> enumerator=source.GetEnumerator()) {
if(enumerator.MoveNext()) {
return true;
}
}
return false;
}
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if(source==null) {
throw Error.ArgumentNull("source");
}
if(predicate==null) {
throw Error.ArgumentNull("predicate");
}
foreach(TSource local in source) {
if(predicate(local)) {
return true;
}
}
return false;
}
Not seen the casting, but generic.
This question already has answers here:
LINQ's Distinct() on a particular property
(23 answers)
Closed 3 years ago.
I have a collection:
List<Car> cars = new List<Car>();
Cars are uniquely identified by their property CarCode.
I have three cars in the collection, and two with identical CarCodes.
How can I use LINQ to convert this collection to Cars with unique CarCodes?
You can use grouping, and get the first car from each group:
List<Car> distinct =
cars
.GroupBy(car => car.CarCode)
.Select(g => g.First())
.ToList();
Use MoreLINQ, which has a DistinctBy method :)
IEnumerable<Car> distinctCars = cars.DistinctBy(car => car.CarCode);
(This is only for LINQ to Objects, mind you.)
Same approach as Guffa but as an extension method:
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
return items.GroupBy(property).Select(x => x.First());
}
Used as:
var uniqueCars = cars.DistinctBy(x => x.CarCode);
You can implement an IEqualityComparer and use that in your Distinct extension.
class CarEqualityComparer : IEqualityComparer<Car>
{
#region IEqualityComparer<Car> Members
public bool Equals(Car x, Car y)
{
return x.CarCode.Equals(y.CarCode);
}
public int GetHashCode(Car obj)
{
return obj.CarCode.GetHashCode();
}
#endregion
}
And then
var uniqueCars = cars.Distinct(new CarEqualityComparer());
Another extension method for Linq-to-Objects, without using GroupBy:
/// <summary>
/// Returns the set of items, made distinct by the selected value.
/// </summary>
/// <typeparam name="TSource">The type of the source.</typeparam>
/// <typeparam name="TResult">The type of the result.</typeparam>
/// <param name="source">The source collection.</param>
/// <param name="selector">A function that selects a value to determine unique results.</param>
/// <returns>IEnumerable<TSource>.</returns>
public static IEnumerable<TSource> Distinct<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
HashSet<TResult> set = new HashSet<TResult>();
foreach(var item in source)
{
var selectedValue = selector(item);
if (set.Add(selectedValue))
yield return item;
}
}
I think the best option in Terms of performance (or in any terms) is to Distinct using the The IEqualityComparer interface.
Although implementing each time a new comparer for each class is cumbersome and produces boilerplate code.
So here is an extension method which produces a new IEqualityComparer on the fly for any class using reflection.
Usage:
var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();
Extension Method Code
public static class LinqExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
return items.Distinct(comparer);
}
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
private Func<T, TKey> expr { get; set; }
public GeneralPropertyComparer (Func<T, TKey> expr)
{
this.expr = expr;
}
public bool Equals(T left, T right)
{
var leftProp = expr.Invoke(left);
var rightProp = expr.Invoke(right);
if (leftProp == null && rightProp == null)
return true;
else if (leftProp == null ^ rightProp == null)
return false;
else
return leftProp.Equals(rightProp);
}
public int GetHashCode(T obj)
{
var prop = expr.Invoke(obj);
return (prop==null)? 0:prop.GetHashCode();
}
}
You can't effectively use Distinct on a collection of objects (without additional work). I will explain why.
The documentation says:
It uses the default equality comparer, Default, to compare values.
For objects that means it uses the default equation method to compare objects (source). That is on their hash code. And since your objects don't implement the GetHashCode() and Equals methods, it will check on the reference of the object, which are not distinct.
Another way to accomplish the same thing...
List<Car> distinticBy = cars
.Select(car => car.CarCode)
.Distinct()
.Select(code => cars.First(car => car.CarCode == code))
.ToList();
It's possible to create an extension method to do this in a more generic way. It would be interesting if someone could evalute performance of this 'DistinctBy' against the GroupBy approach.
You can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;
This is how you use it:
using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);