I am playing with LINQ to learn about it, but I can't figure out how to use Distinct when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a List<TElement> on one or more properties of the TElement?
Example: If an object is Person, with property Id. How can I get all Person and use Distinct on them with the property Id of the object?
Person1: Id=1, Name="Test1"
Person2: Id=1, Name="Test1"
Person3: Id=2, Name="Test2"
How can I get just Person1 and Person3? Is that possible?
If it's not possible with LINQ, what would be the best way to have a list of Person depending on some of its properties?
What if I want to obtain a distinct list based on one or more properties?
Simple! You want to group them and pick a winner out of the group.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.First())
.ToList();
If you want to define groups on multiple properties, here's how:
List<Person> distinctPeople = allPeople
.GroupBy(p => new {p.PersonId, p.FavoriteColor} )
.Select(g => g.First())
.ToList();
Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.
Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155
EDIT: This is now part of MoreLINQ.
What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
So to find the distinct values using just the Id property, you could use:
var query = people.DistinctBy(p => p.Id);
And to use multiple properties, you can use anonymous types, which implement equality appropriately:
var query = people.DistinctBy(p => new { p.Id, p.Name });
Untested, but it should work (and it now at least compiles).
It assumes the default comparer for the keys though - if you want to pass in an equality comparer, just pass it on to the HashSet constructor.
Use:
List<Person> pList = new List<Person>();
/* Fill list */
var result = pList.Where(p => p.Name != null).GroupBy(p => p.Id)
.Select(grp => grp.FirstOrDefault());
The where helps you filter the entries (could be more complex) and the groupby and select perform the distinct function.
You could also use query syntax if you want it to look all LINQ-like:
var uniquePeople = from p in people
group p by new {p.ID} //or group by new {p.ID, p.Name, p.Whatever}
into mygroup
select mygroup.FirstOrDefault();
I think it is enough:
list.Select(s => s.MyField).Distinct();
Solution first group by your fields then select FirstOrDefault item.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.FirstOrDefault())
.ToList();
Starting with .NET 6, there is new solution using the new DistinctBy() extension in Linq, so we can do:
var distinctPersonsById = personList.DistinctBy(x => x.Id);
The signature of the DistinctBy method:
// Returns distinct elements from a sequence according to a specified
// key selector function.
public static IEnumerable<TSource> DistinctBy<TSource, TKey> (
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector);
You can do this with the standard Linq.ToLookup(). This will create a collection of values for each unique key. Just select the first item in the collection
Persons.ToLookup(p => p.Id).Select(coll => coll.First());
The following code is functionally equivalent to Jon Skeet's answer.
Tested on .NET 4.5, should work on any earlier version of LINQ.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
return source.Where(element => seenKeys.Add(keySelector(element)));
}
Incidentially, check out Jon Skeet's latest version of DistinctBy.cs on Google Code.
Update 2022-04-03
Based on an comment by Andrew McClement, best to take John Skeet's answer over this one.
I've written an article that explains how to extend the Distinct function so that you can do as follows:
var people = new List<Person>();
people.Add(new Person(1, "a", "b"));
people.Add(new Person(2, "c", "d"));
people.Add(new Person(1, "a", "b"));
foreach (var person in people.Distinct(p => p.ID))
// Do stuff with unique list here.
Here's the article (now in the Web Archive): Extending LINQ - Specifying a Property in the Distinct Function
Personally I use the following class:
public class LambdaEqualityComparer<TSource, TDest> :
IEqualityComparer<TSource>
{
private Func<TSource, TDest> _selector;
public LambdaEqualityComparer(Func<TSource, TDest> selector)
{
_selector = selector;
}
public bool Equals(TSource obj, TSource other)
{
return _selector(obj).Equals(_selector(other));
}
public int GetHashCode(TSource obj)
{
return _selector(obj).GetHashCode();
}
}
Then, an extension method:
public static IEnumerable<TSource> Distinct<TSource, TCompare>(
this IEnumerable<TSource> source, Func<TSource, TCompare> selector)
{
return source.Distinct(new LambdaEqualityComparer<TSource, TCompare>(selector));
}
Finally, the intended usage:
var dates = new List<DateTime>() { /* ... */ }
var distinctYears = dates.Distinct(date => date.Year);
The advantage I found using this approach is the re-usage of LambdaEqualityComparer class for other methods that accept an IEqualityComparer. (Oh, and I leave the yield stuff to the original LINQ implementation...)
You can use DistinctBy() for getting Distinct records by an object property. Just add the following statement before using it:
using Microsoft.Ajax.Utilities;
and then use it like following:
var listToReturn = responseList.DistinctBy(x => x.Index).ToList();
where 'Index' is the property on which i want the data to be distinct.
You can do it (albeit not lightning-quickly) like so:
people.Where(p => !people.Any(q => (p != q && p.Id == q.Id)));
That is, "select all people where there isn't another different person in the list with the same ID."
Mind you, in your example, that would just select person 3. I'm not sure how to tell which you want, out of the previous two.
In case you need a Distinct method on multiple properties, you can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;
This is how you use it:
using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);
When we faced such a task in our project we defined a small API to compose comparators.
So, the use case was like this:
var wordComparer = KeyEqualityComparer.Null<Word>().
ThenBy(item => item.Text).
ThenBy(item => item.LangID);
...
source.Select(...).Distinct(wordComparer);
And API itself looks like this:
using System;
using System.Collections;
using System.Collections.Generic;
public static class KeyEqualityComparer
{
public static IEqualityComparer<T> Null<T>()
{
return null;
}
public static IEqualityComparer<T> EqualityComparerBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc);
}
public static KeyEqualityComparer<T, K> ThenBy<T, K>(
this IEqualityComparer<T> equalityComparer,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer);
}
}
public struct KeyEqualityComparer<T, K>: IEqualityComparer<T>
{
public KeyEqualityComparer(
Func<T, K> keyFunc,
IEqualityComparer<T> equalityComparer = null)
{
KeyFunc = keyFunc;
EqualityComparer = equalityComparer;
}
public bool Equals(T x, T y)
{
return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) &&
EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y));
}
public int GetHashCode(T obj)
{
var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj));
if (EqualityComparer != null)
{
var hash2 = EqualityComparer.GetHashCode(obj);
hash ^= (hash2 << 5) + hash2;
}
return hash;
}
public readonly Func<T, K> KeyFunc;
public readonly IEqualityComparer<T> EqualityComparer;
}
More details is on our site: IEqualityComparer in LINQ.
If you don't want to add the MoreLinq library to your project just to get the DistinctBy functionality then you can get the same end result using the overload of Linq's Distinct method that takes in an IEqualityComparer argument.
You begin by creating a generic custom equality comparer class that uses lambda syntax to perform custom comparison of two instances of a generic class:
public class CustomEqualityComparer<T> : IEqualityComparer<T>
{
Func<T, T, bool> _comparison;
Func<T, int> _hashCodeFactory;
public CustomEqualityComparer(Func<T, T, bool> comparison, Func<T, int> hashCodeFactory)
{
_comparison = comparison;
_hashCodeFactory = hashCodeFactory;
}
public bool Equals(T x, T y)
{
return _comparison(x, y);
}
public int GetHashCode(T obj)
{
return _hashCodeFactory(obj);
}
}
Then in your main code you use it like so:
Func<Person, Person, bool> areEqual = (p1, p2) => int.Equals(p1.Id, p2.Id);
Func<Person, int> getHashCode = (p) => p.Id.GetHashCode();
var query = people.Distinct(new CustomEqualityComparer<Person>(areEqual, getHashCode));
Voila! :)
The above assumes the following:
Property Person.Id is of type int
The people collection does not contain any null elements
If the collection could contain nulls then simply rewrite the lambdas to check for null, e.g.:
Func<Person, Person, bool> areEqual = (p1, p2) =>
{
return (p1 != null && p2 != null) ? int.Equals(p1.Id, p2.Id) : false;
};
EDIT
This approach is similar to the one in Vladimir Nesterovsky's answer but simpler.
It is also similar to the one in Joel's answer but allows for complex comparison logic involving multiple properties.
However, if your objects can only ever differ by Id then another user gave the correct answer that all you need to do is override the default implementations of GetHashCode() and Equals() in your Person class and then just use the out-of-the-box Distinct() method of Linq to filter out any duplicates.
Override Equals(object obj) and GetHashCode() methods:
class Person
{
public int Id { get; set; }
public int Name { get; set; }
public override bool Equals(object obj)
{
return ((Person)obj).Id == Id;
// or:
// var o = (Person)obj;
// return o.Id == Id && o.Name == Name;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
and then just call:
List<Person> distinctList = new[] { person1, person2, person3 }.Distinct().ToList();
The best way to do this that will be compatible with other .NET versions is to override Equals and GetHash to handle this (see Stack Overflow question This code returns distinct values. However, what I want is to return a strongly typed collection as opposed to an anonymous type), but if you need something that is generic throughout your code, the solutions in this article are great.
List<Person>lst=new List<Person>
var result1 = lst.OrderByDescending(a => a.ID).Select(a =>new Player {ID=a.ID,Name=a.Name} ).Distinct();
You should be able to override Equals on person to actually do Equals on Person.id. This ought to result in the behavior you're after.
If you use old .NET version, where the extension method is not built-in, then you may define your own extension method:
public static class EnumerableExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> enumerable, Func<T, TKey> keySelector)
{
return enumerable.GroupBy(keySelector).Select(grp => grp.First());
}
}
Example of usage:
var personsDist = persons.DistinctBy(item => item.Name);
Definitely not the most efficient but for those, who are looking for a short and simple answer:
list.Select(x => x.Id).Distinct().Select(x => list.First(y => x == y.Id)).ToList();
Please give a try with below code.
var Item = GetAll().GroupBy(x => x .Id).ToList();
I'm trying to convert the following method to an extension method on IEnumerable:
private static IEnumerable<TTarget> MapList<TSource, TTarget>(IEnumerable<TSource> source)
{
return source.Select(
element =>
_mapper.Map<TSource, TTarget>(element)
).ToList();
}
Right now it's called like this:
var sourceList = new List<SourceType>();
return MapList<SourceType, TargetType>(sourceList);
But I want to call it like this:
var sourceList = new List<SourceType>();
return sourceList.MapTo<TargetType>();
I have tried doing this:
public static IEnumerable<TTarget> MapTo<TTarget>(this IEnumerable<TSource> source)
{
return source.Select(
element =>
Mapper.Map<TSource, TTarget>(element)
).ToList();
}
But I get type or namespace TSource not found since it's not included in the method's type parameter list. I can make it work like this:
public static IEnumerable<TTarget> MapTo<TSource, TTarget>(this IEnumerable<TSource> source)
{
return source.Select(
element =>
Mapper.Map<TSource, TTarget>(element)
).ToList();
}
But this I have to call it like this:
var sourceList = new List<SourceType>();
sourceList.MapTo<SourceType, TargetType>();
Which I feel is not as clear as sourceList.MapTo<TargetType>().
Is there any way to do what I want?
There's not enough information in the call to fully determine the generic type parameters to pass to MapTo, and C# doesn't support inferring only some of the types. You either have to specify all the types or none of them.
However, you can get around this by redesigning your interface. Here's just one solution:
public sealed class Mappable<TSource>
{
private readonly IEnumerable<TSource> source;
public Mappable(IEnumerable<TSource> source)
{
this.source = source;
}
public IEnumerable<TTarget> To<TTarget>()
{
return source.Select(
element =>
Mapper.Map<TSource, TTarget>(element)
).ToList();
}
}
public static class Extensions
{
public static Mappable<TSource> Map<TSource>(this IEnumerable<TSource> source)
{
return new Mappable<TSource>(source);
}
}
And now you can call it like this:
var sourceList = new List<SourceType>();
var target = sourceList.Map().To<TargetType>();
Alternatively, if you give up on using extension methods, you can do it like this:
public static class MapTo<TTarget>
{
public static IEnumerable<TTarget> From<TSource>(IEnumerable<TSource> source)
{
return source.Select(
element =>
Mapper.Map<TSource, TTarget>(element)
).ToList();
}
}
And call it like this:
var sourceList = new List<SourceType>();
var target = MapTo<TargetType>.From(sourceList);
Neither of these are particularly elegant. It's up to you if you prefer this syntax over fully specifying the generic parameters on each call.
The following code with a boolean parameter works pretty well:
public List<T> SearchByStatus(bool status, List<T> list)
{
return (List<T>)list.Where(_item => _item.Executed == status);
}
But if I want to use something like this
public List<T> SearchByCodeType(ECodes codeType, List<T> list)
{
return (List<T>)list.Where(_item => _item.CodeType == codeType);
}
, the IDE throws an error saying Func<T, int, bool> doesn't accept 1 parameter.
I researched a bit and found for example this.
If I now add a seond parameter, lets say
public List<T> SearchByCodeType(ECodes codeType, List<T> list)
{
return (List<T>)list.Where((_item, _index) => _item.CodeType == codeType);
}
it says Func<T, bool> doens't accept 2 parameters.
The messages itself are correct, but I don't get why it assumes I want to use the overloaded version of Where in the first case and the non-overloaded in the second... Am I doing something wrong?
P.S.: The ECodes-type used is defined as
public enum ECodes : int
{
....
}
May that cause the issue?
Both of these should work fine:
public List<T> SearchByCodeType(ECodes codeType, List<T> list)
{
return list.Where((_item, _index) => _item.CodeType == codeType).ToList();
}
public List<T> SearchByCodeType(ECodes codeType, List<T> list)
{
return list.Where(_item => _item.CodeType == codeType).ToList();
}
If they don't - please check whether you have using System.Linq; at the top, and are using regular LINQ (not something obscure like LINQBridge).
You could also use:
public List<T> SearchByCodeType(ECodes codeType, List<T> list)
{
return list.FindAll(_item => _item.CodeType == codeType);
}
Note that all of this assumes that you have a suitable generic constraint on T such that T.CodeType is well-defined - presumably:
class Foo<T> where T : IHazCodeType
{
List<T> SearchByCodeType(ECodes codeType, List<T> list) {...}
}
interface IHazCodeType
{
ECodes CodeType {get;}
}
I would like something like this:
public int NumberStudent()
{
int i = 0;
if (db.Tbl_Student.ToList().Count() > 0)
i = db. Tbl_Student.Max(d => d.id);
return i;
}
However, I would like to use it on any table:
public int FindMaxId(string TableName)
{
int i =0;
if ('db.'+TableName+'.ToList().Count() > 0' )
i = db. TableName.Max(d => d.id);
return i ;
}
I know it is wrong, but I'm not sure how to do it.
You can use the IEnumerable/IQueryable extension method DefaultIfEmpty for this.
var maxId = db.Tbl_Student.Select(x => x.Id).DefaultIfEmpty(0).Max();
In general, if you do Q.DefaultIfEmpty(D), it means:
If Q isn't empty, give me Q; otherwise, give me [ D ].
Below I have written a simple wrapper around the existing Max extension method that allows you provide an empty source (the table you were talking about).
Instead of throwing an exception, it will just return the default value of zero.
Original
public static class Extensions
{
public static int MaxId<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, int> selector)
{
if (source.Any())
{
return source.Max(selector);
}
return 0;
}
}
This was my attempt, which as noted by Timothy is actually quite inferior. This is because the sequence will be enumerated twice. Once when calling Any to check if the source sequence has any elements, and again when calling Max.
Improved
public static class Extensions
{
public static int MaxId<TSource>(this IQueryable<TSource> source, Func<TSource, int> selector)
{
return source.Select(selector).DefaultIfEmpty(0).Max();
}
}
This implementation uses Timothy's approach. By calling DefaultIfEmpty, we are making use of deferred execution and the sequence will only be enumerated when calling Max. In addition we are now using IQueryable instead of IEnumerable which means we don't have to enumerate the source before calling this method. As Scott said, should you need it you can create an overload that uses IEnumerable too.
In order to use the extension method, you just need to provide a delegate that returns the id of the source type, exactly the same way you would for Max.
public class Program
{
YourContext context = new YourContext();
public int MaxStudentId()
{
return context.Student.MaxId(s => s.Id);
}
public static void Main(string[] args)
{
Console.WriteLine("Max student id: {0}", MaxStudentId());
}
}
public static class Extensions
{
public static int MaxId<TSource>(this IQueryable<TSource> source, Func<TSource, int> selector)
{
return source.Select(selector).DefaultIfEmpty(0).Max();
}
}
db.Tbl_Student.Aggregate(0, (maxId, s) => Math.Max(maxId, s.Id))
or
db.Tbl_Student.Max(s => (int?)s.Id) ?? 0
Just a little niggle about LINQ syntax. I'm flattening an IEnumerable<IEnumerable<T>> with SelectMany(x => x).
My problem is with the lambda expression x => x. It looks a bit ugly. Is there some static 'identity function' object that I can use instead of x => x? Something like SelectMany(IdentityFunction)?
Unless I misunderstand the question, the following seems to work fine for me in C# 4:
public static class Defines
{
public static T Identity<T>(T pValue)
{
return pValue;
}
...
You can then do the following in your example:
var result =
enumerableOfEnumerables
.SelectMany(Defines.Identity);
As well as use Defines.Identity anywhere you would use a lambda that looks like x => x.
Note: this answer was correct for C# 3, but at some point (C# 4? C# 5?) type inference improved so that the IdentityFunction method shown below can be used easily.
No, there isn't. It would have to be generic, to start with:
public static Func<T, T> IdentityFunction<T>()
{
return x => x;
}
But then type inference wouldn't work, so you'd have to do:
SelectMany(Helpers.IdentityFunction<Foo>())
which is a lot uglier than x => x.
Another possibility is that you wrap this in an extension method:
public static IEnumerable<T> Flatten<T>
(this IEnumerable<IEnumerable<T>> source)
{
return source.SelectMany(x => x);
}
Unfortunately with generic variance the way it is, that may well fall foul of various cases in C# 3... it wouldn't be applicable to List<List<string>> for example. You could make it more generic:
public static IEnumerable<TElement> Flatten<TElement, TWrapper>
(this IEnumerable<TWrapper> source) where TWrapper : IEnumerable<TElement>
{
return source.SelectMany(x => x);
}
But again, you've then got type inference problems, I suspect...
EDIT: To respond to the comments... yes, C# 4 makes this easier. Or rather, it makes the first Flatten method more useful than it is in C# 3. Here's an example which works in C# 4, but doesn't work in C# 3 because the compiler can't convert from List<List<string>> to IEnumerable<IEnumerable<string>>:
using System;
using System.Collections.Generic;
using System.Linq;
public static class Extensions
{
public static IEnumerable<T> Flatten<T>
(this IEnumerable<IEnumerable<T>> source)
{
return source.SelectMany(x => x);
}
}
class Test
{
static void Main()
{
List<List<string>> strings = new List<List<string>>
{
new List<string> { "x", "y", "z" },
new List<string> { "0", "1", "2" }
};
foreach (string x in strings.Flatten())
{
Console.WriteLine(x);
}
}
}
With C# 6.0 and if you reference FSharp.Core you can do:
using static Microsoft.FSharp.Core.Operators
And then you're free to do:
SelectMany(Identity)
With C# 6.0 things are getting better. We can define the identity function in the way suggested by #Sahuagin:
static class Functions
{
public static T It<T>(T item) => item;
}
And then use it in SelectMany the using static constructor:
using Functions;
...
var result = enumerableOfEnumerables.SelectMany(It);
I think it looks very laconic in the such way. I also find the identity function useful when building dictionaries:
class P
{
P(int id, string name) // Sad. We are not getting primary constructors in C# 6.0
{
ID = id;
Name = id;
}
int ID { get; }
int Name { get; }
static void Main(string[] args)
{
var items = new[] { new P(1, "Jack"), new P(2, "Jill"), new P(3, "Peter") };
var dict = items.ToDictionary(x => x.ID, It);
}
}
This may work in the way you want. I realize Jon posted a version of this solution, but he has a second type parameter which is only necessary if the resulting sequence type is different from the source sequence type.
public static IEnumerable<T> Flatten<T>(this IEnumerable<T> source)
where T : IEnumerable<T>
{
return source.SelectMany(item => item);
}
You can get close to what you need. Instead of a regular static function, consider an extension method for your IEnumerable<T>, as if the identity function is of the collection, not the type (a collection can generate the identity function of its items):
public static Func<T, T> IdentityFunction<T>(this IEnumerable<T> enumerable)
{
return x => x;
}
with this, you don't have to specify the type again, and write:
IEnumerable<IEnumerable<T>> deepList = ... ;
var flat = deepList.SelectMany(deepList.IdentityFunction());
This does feel a bit abusive though, and I'd probably go with x=>x. Also, you cannot use it fluently (in chaining), so it will not always be useful.
I'd go with a simple class with a single static property and add as many as required down the line
internal class IdentityFunction<TSource>
{
public static Func<TSource, TSource> Instance
{
get { return x => x; }
}
}
SelectMany(IdentityFunction<Foo>.Instance)