I am playing with LINQ to learn about it, but I can't figure out how to use Distinct when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a List<TElement> on one or more properties of the TElement?
Example: If an object is Person, with property Id. How can I get all Person and use Distinct on them with the property Id of the object?
Person1: Id=1, Name="Test1"
Person2: Id=1, Name="Test1"
Person3: Id=2, Name="Test2"
How can I get just Person1 and Person3? Is that possible?
If it's not possible with LINQ, what would be the best way to have a list of Person depending on some of its properties?
What if I want to obtain a distinct list based on one or more properties?
Simple! You want to group them and pick a winner out of the group.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.First())
.ToList();
If you want to define groups on multiple properties, here's how:
List<Person> distinctPeople = allPeople
.GroupBy(p => new {p.PersonId, p.FavoriteColor} )
.Select(g => g.First())
.ToList();
Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.
Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155
EDIT: This is now part of MoreLINQ.
What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
So to find the distinct values using just the Id property, you could use:
var query = people.DistinctBy(p => p.Id);
And to use multiple properties, you can use anonymous types, which implement equality appropriately:
var query = people.DistinctBy(p => new { p.Id, p.Name });
Untested, but it should work (and it now at least compiles).
It assumes the default comparer for the keys though - if you want to pass in an equality comparer, just pass it on to the HashSet constructor.
Use:
List<Person> pList = new List<Person>();
/* Fill list */
var result = pList.Where(p => p.Name != null).GroupBy(p => p.Id)
.Select(grp => grp.FirstOrDefault());
The where helps you filter the entries (could be more complex) and the groupby and select perform the distinct function.
You could also use query syntax if you want it to look all LINQ-like:
var uniquePeople = from p in people
group p by new {p.ID} //or group by new {p.ID, p.Name, p.Whatever}
into mygroup
select mygroup.FirstOrDefault();
I think it is enough:
list.Select(s => s.MyField).Distinct();
Solution first group by your fields then select FirstOrDefault item.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.FirstOrDefault())
.ToList();
Starting with .NET 6, there is new solution using the new DistinctBy() extension in Linq, so we can do:
var distinctPersonsById = personList.DistinctBy(x => x.Id);
The signature of the DistinctBy method:
// Returns distinct elements from a sequence according to a specified
// key selector function.
public static IEnumerable<TSource> DistinctBy<TSource, TKey> (
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector);
You can do this with the standard Linq.ToLookup(). This will create a collection of values for each unique key. Just select the first item in the collection
Persons.ToLookup(p => p.Id).Select(coll => coll.First());
The following code is functionally equivalent to Jon Skeet's answer.
Tested on .NET 4.5, should work on any earlier version of LINQ.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
return source.Where(element => seenKeys.Add(keySelector(element)));
}
Incidentially, check out Jon Skeet's latest version of DistinctBy.cs on Google Code.
Update 2022-04-03
Based on an comment by Andrew McClement, best to take John Skeet's answer over this one.
I've written an article that explains how to extend the Distinct function so that you can do as follows:
var people = new List<Person>();
people.Add(new Person(1, "a", "b"));
people.Add(new Person(2, "c", "d"));
people.Add(new Person(1, "a", "b"));
foreach (var person in people.Distinct(p => p.ID))
// Do stuff with unique list here.
Here's the article (now in the Web Archive): Extending LINQ - Specifying a Property in the Distinct Function
Personally I use the following class:
public class LambdaEqualityComparer<TSource, TDest> :
IEqualityComparer<TSource>
{
private Func<TSource, TDest> _selector;
public LambdaEqualityComparer(Func<TSource, TDest> selector)
{
_selector = selector;
}
public bool Equals(TSource obj, TSource other)
{
return _selector(obj).Equals(_selector(other));
}
public int GetHashCode(TSource obj)
{
return _selector(obj).GetHashCode();
}
}
Then, an extension method:
public static IEnumerable<TSource> Distinct<TSource, TCompare>(
this IEnumerable<TSource> source, Func<TSource, TCompare> selector)
{
return source.Distinct(new LambdaEqualityComparer<TSource, TCompare>(selector));
}
Finally, the intended usage:
var dates = new List<DateTime>() { /* ... */ }
var distinctYears = dates.Distinct(date => date.Year);
The advantage I found using this approach is the re-usage of LambdaEqualityComparer class for other methods that accept an IEqualityComparer. (Oh, and I leave the yield stuff to the original LINQ implementation...)
You can use DistinctBy() for getting Distinct records by an object property. Just add the following statement before using it:
using Microsoft.Ajax.Utilities;
and then use it like following:
var listToReturn = responseList.DistinctBy(x => x.Index).ToList();
where 'Index' is the property on which i want the data to be distinct.
You can do it (albeit not lightning-quickly) like so:
people.Where(p => !people.Any(q => (p != q && p.Id == q.Id)));
That is, "select all people where there isn't another different person in the list with the same ID."
Mind you, in your example, that would just select person 3. I'm not sure how to tell which you want, out of the previous two.
In case you need a Distinct method on multiple properties, you can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;
This is how you use it:
using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);
When we faced such a task in our project we defined a small API to compose comparators.
So, the use case was like this:
var wordComparer = KeyEqualityComparer.Null<Word>().
ThenBy(item => item.Text).
ThenBy(item => item.LangID);
...
source.Select(...).Distinct(wordComparer);
And API itself looks like this:
using System;
using System.Collections;
using System.Collections.Generic;
public static class KeyEqualityComparer
{
public static IEqualityComparer<T> Null<T>()
{
return null;
}
public static IEqualityComparer<T> EqualityComparerBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc);
}
public static KeyEqualityComparer<T, K> ThenBy<T, K>(
this IEqualityComparer<T> equalityComparer,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer);
}
}
public struct KeyEqualityComparer<T, K>: IEqualityComparer<T>
{
public KeyEqualityComparer(
Func<T, K> keyFunc,
IEqualityComparer<T> equalityComparer = null)
{
KeyFunc = keyFunc;
EqualityComparer = equalityComparer;
}
public bool Equals(T x, T y)
{
return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) &&
EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y));
}
public int GetHashCode(T obj)
{
var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj));
if (EqualityComparer != null)
{
var hash2 = EqualityComparer.GetHashCode(obj);
hash ^= (hash2 << 5) + hash2;
}
return hash;
}
public readonly Func<T, K> KeyFunc;
public readonly IEqualityComparer<T> EqualityComparer;
}
More details is on our site: IEqualityComparer in LINQ.
If you don't want to add the MoreLinq library to your project just to get the DistinctBy functionality then you can get the same end result using the overload of Linq's Distinct method that takes in an IEqualityComparer argument.
You begin by creating a generic custom equality comparer class that uses lambda syntax to perform custom comparison of two instances of a generic class:
public class CustomEqualityComparer<T> : IEqualityComparer<T>
{
Func<T, T, bool> _comparison;
Func<T, int> _hashCodeFactory;
public CustomEqualityComparer(Func<T, T, bool> comparison, Func<T, int> hashCodeFactory)
{
_comparison = comparison;
_hashCodeFactory = hashCodeFactory;
}
public bool Equals(T x, T y)
{
return _comparison(x, y);
}
public int GetHashCode(T obj)
{
return _hashCodeFactory(obj);
}
}
Then in your main code you use it like so:
Func<Person, Person, bool> areEqual = (p1, p2) => int.Equals(p1.Id, p2.Id);
Func<Person, int> getHashCode = (p) => p.Id.GetHashCode();
var query = people.Distinct(new CustomEqualityComparer<Person>(areEqual, getHashCode));
Voila! :)
The above assumes the following:
Property Person.Id is of type int
The people collection does not contain any null elements
If the collection could contain nulls then simply rewrite the lambdas to check for null, e.g.:
Func<Person, Person, bool> areEqual = (p1, p2) =>
{
return (p1 != null && p2 != null) ? int.Equals(p1.Id, p2.Id) : false;
};
EDIT
This approach is similar to the one in Vladimir Nesterovsky's answer but simpler.
It is also similar to the one in Joel's answer but allows for complex comparison logic involving multiple properties.
However, if your objects can only ever differ by Id then another user gave the correct answer that all you need to do is override the default implementations of GetHashCode() and Equals() in your Person class and then just use the out-of-the-box Distinct() method of Linq to filter out any duplicates.
Override Equals(object obj) and GetHashCode() methods:
class Person
{
public int Id { get; set; }
public int Name { get; set; }
public override bool Equals(object obj)
{
return ((Person)obj).Id == Id;
// or:
// var o = (Person)obj;
// return o.Id == Id && o.Name == Name;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
and then just call:
List<Person> distinctList = new[] { person1, person2, person3 }.Distinct().ToList();
The best way to do this that will be compatible with other .NET versions is to override Equals and GetHash to handle this (see Stack Overflow question This code returns distinct values. However, what I want is to return a strongly typed collection as opposed to an anonymous type), but if you need something that is generic throughout your code, the solutions in this article are great.
List<Person>lst=new List<Person>
var result1 = lst.OrderByDescending(a => a.ID).Select(a =>new Player {ID=a.ID,Name=a.Name} ).Distinct();
You should be able to override Equals on person to actually do Equals on Person.id. This ought to result in the behavior you're after.
If you use old .NET version, where the extension method is not built-in, then you may define your own extension method:
public static class EnumerableExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> enumerable, Func<T, TKey> keySelector)
{
return enumerable.GroupBy(keySelector).Select(grp => grp.First());
}
}
Example of usage:
var personsDist = persons.DistinctBy(item => item.Name);
Definitely not the most efficient but for those, who are looking for a short and simple answer:
list.Select(x => x.Id).Distinct().Select(x => list.First(y => x == y.Id)).ToList();
Please give a try with below code.
var Item = GetAll().GroupBy(x => x .Id).ToList();
In (More)LINQ terms how do I do an ExceptBy involving different types?
For example, given the LeverPosting structure defined below, an IEnumerable<LeverPosting>, and an IEnumerable<string>, how do I find all the LeverPostings that "aren't in" the alreadyProcessedIds list?
class LeverPosting
{
public string Id {get; set;}
}
Normally you do something like:
IEnumerable<LeverPosting> postings = ...
IEnumerable<string> idsalreadyProcessedIds = ...
var idsalreadyProcessedIds2 = new HashSet<string>(idsalreadyProcessedIds);
var postings2 = postings.Where(x => !idsalreadyProcessedIds2.Contains(x.Id)));
Here's an extension method I came up with to do this:
public static class LinqExtensions
{
/// <summary>
/// Similar to MoreLinq's ExceptBy method but works on heterogeneous types.
/// </summary>
public static IEnumerable<TSource> ExceptBy<TSource, TOther, TKey>(this IEnumerable<TSource> sourceItems,
IEnumerable<TOther> otherItems, Func<TSource, TKey> sourceKeyFunc, Func<TOther, TKey> otherKeyFunc)
{
return from sourceItem in sourceItems
join otherItem in otherItems on sourceKeyFunc.Invoke(sourceItem) equals otherKeyFunc.Invoke(otherItem)
into gj
from subSourceItem in gj.DefaultIfEmpty() // left outer join
subSourceItem.Equals(default(TOther)) // only items on the left that don't match the set on the right
select sourceItem;
}
}
Or another extension method based off of #xanatos answer:
/// <summary>
/// Similar to MoreLinq's ExceptBy method but works on heterogeneous types.
/// </summary>
public static IEnumerable<TSource> ExceptBy2<TSource, TOther, TKey>(this IEnumerable<TSource> sourceItems,
IEnumerable<TOther> otherItems, Func<TSource, TKey> sourceKeyFunc, Func<TOther, TKey> otherKeyFunc)
{
var otherItemKeyHashset = otherItems
.Select(si => otherKeyFunc.Invoke(si))
.ToHashSet();
return sourceItems
.Where(oi => !otherItemKeyHashset.Contains(sourceKeyFunc.Invoke(oi)));
}
Usage:
public static IEnumerable<LeverPosting> ExceptAlreadyProcessed(this IEnumerable<LeverPosting> postings, IEnumerable<string> alreadyProcessedIds) =>
postings.ExceptBy(
alreadyProcessedIds,
posting => posting.Id,
alreadyProcessedId => alreadyProcessedId
);
Please let me know if there's a lib like MoreLinq that already does this.
I would like to make a sorting extension method which will take a Generic Collection and sort it using one or more keys. The keys will be properties of the collection's containing objects.
A sample LINQ query with 3 keys looks like this.
studentResults.OrderBy(x => x.CG).ThenBy(x => x.Student.Roll)
.ThenBy(x => x.Student.Name).ToList();
I have already found something which can do this with one key.
public static List<TSource> OrderByAsListOrNull<TSource, TKey>(
this ICollection<TSource> collection, Func<TSource,TKey> keySelector)
{
if (collection != null && collection.Count > 0) {
return collection
.OrderBy(x => keySelector(x))
.ToList();
}
return null;
}
I thought of using IEnumerable<Func<TSource, TKey> keySelector>, but I cannot call the function like that.
So, how may I implement a method of this kind?
In theory, you could build a multi-levelled sort extension, which diffentiates between the initial OrderBy and the subsequent ThenBys for secondary, tertiary sorting tiebreakers. Since by taking multiple order functions, each of which could reference a different type, you'll need to soften the projected type (I've used object, below).
public static class Extensions
{
public static IEnumerable<T> MyOrderBy<T>(
this IEnumerable<T> source,
params Func<T, object>[] orders)
{
Debug.Assert(orders.Length > 0);
var sortQuery = source.OrderBy(orders[0]);
foreach(var order in orders.Skip(1))
{
sortQuery = sortQuery.ThenBy(order);
}
return sortQuery;
}
}
public class Poco
{
public string Name {get; set;}
public int Number {get; set;}
}
void Main()
{
var items = new []{
new Poco{Name = "Zebra", Number = 99},
new Poco{Name = "Apple", Number = 123}};
foreach(var poco in items.MyOrderBy(i => i.Number, i => i.Name))
{
Console.WriteLine(poco.Name);
}
}
The problem with this (as with your original function) is that you'll probably want to order by descending at some point. Although for numeric sort functions this could be hacked by passing a *-1, it's going to be really difficult to do this for an arbitrary type
// Hack : Order a numeric descending
item => item.Number * -1
For me, I would just stay with Linq's sorting extensions, and not try to abstract them in any way!
I am trying to compare (values of the properties) a instance of type in a List and eliminate duplicates.
According to MSDN GetHashCode() is one of the way to compare two objects.
A hash code is intended for efficient insertion and lookup in
collections that are based on a hash table. A hash code is not a
permanent value
Considering that, I started writing my extension method as bellow
public static class Linq
{
public static IEnumerable<T> DistinctObjects<T>(this IEnumerable<T> source)
{
List<T> newList = new List<T>();
foreach (var item in source)
{
if(newList.All(x => x.GetHashCode() != item.GetHashCode()))
newList.Add(item);
}
return newList;
}
}
This condition always gives me false though the data of the object is same.
newList.All(x => x.GetHashCode() != item.GetHashCode())
Finally I would like to use it like
MyDuplicateList.DistinctObjects().ToList();
If comparing all fields of the object is too much, I am okay to use it like,
MyDuplicateList.DistinctObjects(x=>x.Id, x.Name).ToList();
Here I am telling compare only these two fields of those objects.
After reading your comments I would propose this solution:
public static IEnumerable<TSource> DistinctBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
HashSet<TResult> set = new HashSet<TResult>();
foreach(var item in source)
{
var selectedValue = selector(item);
if (set.Add(selectedValue))
yield return item;
}
}
Then you can use it like this:
var distinctedList = myList.DistinctBy(x => x.A);
or for multiple properties like that:
var distinctedList = myList.DistinctBy(x => new {x.A,x.B});
The advantage of this solution is you can exactly specify what properties should be used in distinction and you don't have to override Equals and GetHashCode for every object. You need to make sure that your properties can be compared.
You shouldn't need to create your own custom, generic method for this. Instead, provide a custom EqualityComparar for your data type:
var myDuplicates = myList.Distinct(new MyComparer());
Where you define a custom Comparer like this:
public class MyComparer : IEqualityComparer<Mine>
{
public bool Equals(Mine x, Mine y)
{
if (x == null && y == null) return true;
if (x == null || y == null) return false;
return x.Name == y.Name && x.Id == y.Id;
}
public int GetHashCode(Mine obj)
{
return obj.Name.GetHashCode() ^ obj.Id.GetHashCode();
}
}
Edit: I initially had incorrect code here, this should do what you want without you having to override an Equals operator
I have a class and list:
public class className
{
public string firstParam { get; set; }
public string secondParam { get; set; }
}
public static List<className> listName = new List<className>();
The list includes (for example):
Apple Banana
Corn Celery
Corn Celery
Corn Grapes
Raisins Pork
I am trying to edit the list (or create a new list) to get:
Apple Banana
Corn Celery
Corn Grapes
Raisins Pork
I have tried:
var listNoDupes = listName.Distinct();
And:
IEnumerable<className> listNoDupes = listName.Distinct();
But both return the list in the same condition as before, with duplicates.
You need to override/implement Equals() and GetHashCode(), right now you are listing distinct instances and they are correctly ALL distinct/unique from each other.
The problem you are running into is the identity of the objects is not what you think. Your intuition is telling you that the identity is the combination of firstParam and secondParam. What truly is happening is each distinct instance of className has its own identity that does not rely on the implementation of the object. You will need to override the methods provided via System.Object, mainly Equals and GetHashCode although you might get away with not overriding GetHashCode (this will be needed for hash sets to work properly.)
If your class only contains those two fields then instead of implementing Equals and GetHashCode You can also do:
var listNoDupes = listName.GroupBy(r => new { r.firstParam, r.secondParam })
.Select(grp => grp.First())
.ToList();
Or you can get an IEnumerable<T> back like:
IEnumerable<className> listNoDupes =
listName
.GroupBy(r => new { r.firstParam, r.secondParam })
.Select(grp => grp.First());
The code above would group on the properties firstParam and secondParam, later grp.First would return you a single item from the group and you will end up a single item from each group, (no duplicates)
There is the third possibility - use Distinct method version that takes IEqualityComparer. Unfortunately, C# does not support creating anonymous, temporary implementations of interfaces. We can create helper class and extension:
public static class IEnumerableExtensions
{
public class LambdaEqualityComparer<T> : IEqualityComparer<T>
{
private Func<T, T, bool> comparer;
private Func<T, int> hash;
public LambdaEqualityComparer(Func<T, T, bool> comparer,
Func<T, int> hash)
{
this.comparer = comparer;
this.hash = hash;
}
public bool Equals(T x, T y)
{
return comparer(x, y);
}
public int GetHashCode(T x)
{
return hash(x);
}
}
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> elems,
Func<T, T, bool> comparer,
Func<T, int> hash)
{
return elems.Distinct(new LambdaEqualityComparer<T>(comparer, hash));
}
}
and then we can provide lambdas for Distinct method:
var filteredList = myList.Distinct((x, y) => x.firstParam == y.firstParam &&
x.secondParam == y.secondParam,
x => 17 * x.firstParam.GetHashCode() + x.secondParam.GetHashCode());
This allows you to distinct objects on single shot, without implementing Equals and GetHashCode. If, for example, there is a single place in the project, where you are calling such Distinct, this is probably enough to use this extension. If, on the other hand, identity of the className objects is a concept that spans through many methods and classes, for sure it will be better to define simply Equals and GetHashCode.