I am playing with LINQ to learn about it, but I can't figure out how to use Distinct when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a List<TElement> on one or more properties of the TElement?
Example: If an object is Person, with property Id. How can I get all Person and use Distinct on them with the property Id of the object?
Person1: Id=1, Name="Test1"
Person2: Id=1, Name="Test1"
Person3: Id=2, Name="Test2"
How can I get just Person1 and Person3? Is that possible?
If it's not possible with LINQ, what would be the best way to have a list of Person depending on some of its properties?
What if I want to obtain a distinct list based on one or more properties?
Simple! You want to group them and pick a winner out of the group.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.First())
.ToList();
If you want to define groups on multiple properties, here's how:
List<Person> distinctPeople = allPeople
.GroupBy(p => new {p.PersonId, p.FavoriteColor} )
.Select(g => g.First())
.ToList();
Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.
Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155
EDIT: This is now part of MoreLINQ.
What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
So to find the distinct values using just the Id property, you could use:
var query = people.DistinctBy(p => p.Id);
And to use multiple properties, you can use anonymous types, which implement equality appropriately:
var query = people.DistinctBy(p => new { p.Id, p.Name });
Untested, but it should work (and it now at least compiles).
It assumes the default comparer for the keys though - if you want to pass in an equality comparer, just pass it on to the HashSet constructor.
Use:
List<Person> pList = new List<Person>();
/* Fill list */
var result = pList.Where(p => p.Name != null).GroupBy(p => p.Id)
.Select(grp => grp.FirstOrDefault());
The where helps you filter the entries (could be more complex) and the groupby and select perform the distinct function.
You could also use query syntax if you want it to look all LINQ-like:
var uniquePeople = from p in people
group p by new {p.ID} //or group by new {p.ID, p.Name, p.Whatever}
into mygroup
select mygroup.FirstOrDefault();
I think it is enough:
list.Select(s => s.MyField).Distinct();
Solution first group by your fields then select FirstOrDefault item.
List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.FirstOrDefault())
.ToList();
Starting with .NET 6, there is new solution using the new DistinctBy() extension in Linq, so we can do:
var distinctPersonsById = personList.DistinctBy(x => x.Id);
The signature of the DistinctBy method:
// Returns distinct elements from a sequence according to a specified
// key selector function.
public static IEnumerable<TSource> DistinctBy<TSource, TKey> (
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector);
You can do this with the standard Linq.ToLookup(). This will create a collection of values for each unique key. Just select the first item in the collection
Persons.ToLookup(p => p.Id).Select(coll => coll.First());
The following code is functionally equivalent to Jon Skeet's answer.
Tested on .NET 4.5, should work on any earlier version of LINQ.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
return source.Where(element => seenKeys.Add(keySelector(element)));
}
Incidentially, check out Jon Skeet's latest version of DistinctBy.cs on Google Code.
Update 2022-04-03
Based on an comment by Andrew McClement, best to take John Skeet's answer over this one.
I've written an article that explains how to extend the Distinct function so that you can do as follows:
var people = new List<Person>();
people.Add(new Person(1, "a", "b"));
people.Add(new Person(2, "c", "d"));
people.Add(new Person(1, "a", "b"));
foreach (var person in people.Distinct(p => p.ID))
// Do stuff with unique list here.
Here's the article (now in the Web Archive): Extending LINQ - Specifying a Property in the Distinct Function
Personally I use the following class:
public class LambdaEqualityComparer<TSource, TDest> :
IEqualityComparer<TSource>
{
private Func<TSource, TDest> _selector;
public LambdaEqualityComparer(Func<TSource, TDest> selector)
{
_selector = selector;
}
public bool Equals(TSource obj, TSource other)
{
return _selector(obj).Equals(_selector(other));
}
public int GetHashCode(TSource obj)
{
return _selector(obj).GetHashCode();
}
}
Then, an extension method:
public static IEnumerable<TSource> Distinct<TSource, TCompare>(
this IEnumerable<TSource> source, Func<TSource, TCompare> selector)
{
return source.Distinct(new LambdaEqualityComparer<TSource, TCompare>(selector));
}
Finally, the intended usage:
var dates = new List<DateTime>() { /* ... */ }
var distinctYears = dates.Distinct(date => date.Year);
The advantage I found using this approach is the re-usage of LambdaEqualityComparer class for other methods that accept an IEqualityComparer. (Oh, and I leave the yield stuff to the original LINQ implementation...)
You can use DistinctBy() for getting Distinct records by an object property. Just add the following statement before using it:
using Microsoft.Ajax.Utilities;
and then use it like following:
var listToReturn = responseList.DistinctBy(x => x.Index).ToList();
where 'Index' is the property on which i want the data to be distinct.
You can do it (albeit not lightning-quickly) like so:
people.Where(p => !people.Any(q => (p != q && p.Id == q.Id)));
That is, "select all people where there isn't another different person in the list with the same ID."
Mind you, in your example, that would just select person 3. I'm not sure how to tell which you want, out of the previous two.
In case you need a Distinct method on multiple properties, you can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;
This is how you use it:
using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);
When we faced such a task in our project we defined a small API to compose comparators.
So, the use case was like this:
var wordComparer = KeyEqualityComparer.Null<Word>().
ThenBy(item => item.Text).
ThenBy(item => item.LangID);
...
source.Select(...).Distinct(wordComparer);
And API itself looks like this:
using System;
using System.Collections;
using System.Collections.Generic;
public static class KeyEqualityComparer
{
public static IEqualityComparer<T> Null<T>()
{
return null;
}
public static IEqualityComparer<T> EqualityComparerBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc);
}
public static KeyEqualityComparer<T, K> ThenBy<T, K>(
this IEqualityComparer<T> equalityComparer,
Func<T, K> keyFunc)
{
return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer);
}
}
public struct KeyEqualityComparer<T, K>: IEqualityComparer<T>
{
public KeyEqualityComparer(
Func<T, K> keyFunc,
IEqualityComparer<T> equalityComparer = null)
{
KeyFunc = keyFunc;
EqualityComparer = equalityComparer;
}
public bool Equals(T x, T y)
{
return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) &&
EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y));
}
public int GetHashCode(T obj)
{
var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj));
if (EqualityComparer != null)
{
var hash2 = EqualityComparer.GetHashCode(obj);
hash ^= (hash2 << 5) + hash2;
}
return hash;
}
public readonly Func<T, K> KeyFunc;
public readonly IEqualityComparer<T> EqualityComparer;
}
More details is on our site: IEqualityComparer in LINQ.
If you don't want to add the MoreLinq library to your project just to get the DistinctBy functionality then you can get the same end result using the overload of Linq's Distinct method that takes in an IEqualityComparer argument.
You begin by creating a generic custom equality comparer class that uses lambda syntax to perform custom comparison of two instances of a generic class:
public class CustomEqualityComparer<T> : IEqualityComparer<T>
{
Func<T, T, bool> _comparison;
Func<T, int> _hashCodeFactory;
public CustomEqualityComparer(Func<T, T, bool> comparison, Func<T, int> hashCodeFactory)
{
_comparison = comparison;
_hashCodeFactory = hashCodeFactory;
}
public bool Equals(T x, T y)
{
return _comparison(x, y);
}
public int GetHashCode(T obj)
{
return _hashCodeFactory(obj);
}
}
Then in your main code you use it like so:
Func<Person, Person, bool> areEqual = (p1, p2) => int.Equals(p1.Id, p2.Id);
Func<Person, int> getHashCode = (p) => p.Id.GetHashCode();
var query = people.Distinct(new CustomEqualityComparer<Person>(areEqual, getHashCode));
Voila! :)
The above assumes the following:
Property Person.Id is of type int
The people collection does not contain any null elements
If the collection could contain nulls then simply rewrite the lambdas to check for null, e.g.:
Func<Person, Person, bool> areEqual = (p1, p2) =>
{
return (p1 != null && p2 != null) ? int.Equals(p1.Id, p2.Id) : false;
};
EDIT
This approach is similar to the one in Vladimir Nesterovsky's answer but simpler.
It is also similar to the one in Joel's answer but allows for complex comparison logic involving multiple properties.
However, if your objects can only ever differ by Id then another user gave the correct answer that all you need to do is override the default implementations of GetHashCode() and Equals() in your Person class and then just use the out-of-the-box Distinct() method of Linq to filter out any duplicates.
Override Equals(object obj) and GetHashCode() methods:
class Person
{
public int Id { get; set; }
public int Name { get; set; }
public override bool Equals(object obj)
{
return ((Person)obj).Id == Id;
// or:
// var o = (Person)obj;
// return o.Id == Id && o.Name == Name;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
and then just call:
List<Person> distinctList = new[] { person1, person2, person3 }.Distinct().ToList();
The best way to do this that will be compatible with other .NET versions is to override Equals and GetHash to handle this (see Stack Overflow question This code returns distinct values. However, what I want is to return a strongly typed collection as opposed to an anonymous type), but if you need something that is generic throughout your code, the solutions in this article are great.
List<Person>lst=new List<Person>
var result1 = lst.OrderByDescending(a => a.ID).Select(a =>new Player {ID=a.ID,Name=a.Name} ).Distinct();
You should be able to override Equals on person to actually do Equals on Person.id. This ought to result in the behavior you're after.
If you use old .NET version, where the extension method is not built-in, then you may define your own extension method:
public static class EnumerableExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> enumerable, Func<T, TKey> keySelector)
{
return enumerable.GroupBy(keySelector).Select(grp => grp.First());
}
}
Example of usage:
var personsDist = persons.DistinctBy(item => item.Name);
Definitely not the most efficient but for those, who are looking for a short and simple answer:
list.Select(x => x.Id).Distinct().Select(x => list.First(y => x == y.Id)).ToList();
Please give a try with below code.
var Item = GetAll().GroupBy(x => x .Id).ToList();
I would like to make a sorting extension method which will take a Generic Collection and sort it using one or more keys. The keys will be properties of the collection's containing objects.
A sample LINQ query with 3 keys looks like this.
studentResults.OrderBy(x => x.CG).ThenBy(x => x.Student.Roll)
.ThenBy(x => x.Student.Name).ToList();
I have already found something which can do this with one key.
public static List<TSource> OrderByAsListOrNull<TSource, TKey>(
this ICollection<TSource> collection, Func<TSource,TKey> keySelector)
{
if (collection != null && collection.Count > 0) {
return collection
.OrderBy(x => keySelector(x))
.ToList();
}
return null;
}
I thought of using IEnumerable<Func<TSource, TKey> keySelector>, but I cannot call the function like that.
So, how may I implement a method of this kind?
In theory, you could build a multi-levelled sort extension, which diffentiates between the initial OrderBy and the subsequent ThenBys for secondary, tertiary sorting tiebreakers. Since by taking multiple order functions, each of which could reference a different type, you'll need to soften the projected type (I've used object, below).
public static class Extensions
{
public static IEnumerable<T> MyOrderBy<T>(
this IEnumerable<T> source,
params Func<T, object>[] orders)
{
Debug.Assert(orders.Length > 0);
var sortQuery = source.OrderBy(orders[0]);
foreach(var order in orders.Skip(1))
{
sortQuery = sortQuery.ThenBy(order);
}
return sortQuery;
}
}
public class Poco
{
public string Name {get; set;}
public int Number {get; set;}
}
void Main()
{
var items = new []{
new Poco{Name = "Zebra", Number = 99},
new Poco{Name = "Apple", Number = 123}};
foreach(var poco in items.MyOrderBy(i => i.Number, i => i.Name))
{
Console.WriteLine(poco.Name);
}
}
The problem with this (as with your original function) is that you'll probably want to order by descending at some point. Although for numeric sort functions this could be hacked by passing a *-1, it's going to be really difficult to do this for an arbitrary type
// Hack : Order a numeric descending
item => item.Number * -1
For me, I would just stay with Linq's sorting extensions, and not try to abstract them in any way!
I want to use LinqKit's PredicateBuilder and pass the predicate into .Any method for related model.
So I want to build a predicate:
var castCondition = PredicateBuilder.New<CastInfo>(true);
if (movies != null && movies.Length > 0)
{
castCondition = castCondition.And(c => movies.Contains(c.MovieId));
}
if (roleType > 0)
{
castCondition = castCondition.And(c => c.RoleId == roleType);
}
And then use it to filter model that has relation to model in predicate:
IQueryable<Name> result = _context.Name.AsExpandable().Where(n => n.CastInfo.Any(castCondition));
return await result.OrderBy(n => n.Name1).Take(25).ToListAsync();
But this causes a System.NotSupportedException: Could not parse expression 'n.CastInfo.Any(Convert(__castCondition_0, Func``2))': The given arguments did not match the expected arguments: Object of type 'System.Linq.Expressions.UnaryExpression' cannot be converted to type 'System.Linq.Expressions.LambdaExpression'.
I saw similar question and answer there suggests to use .Compile. Or one more question that build an extra predicate.
So I tried to use extra predicate
var tp = PredicateBuilder.New<Name>(true);
tp = tp.And(n => n.CastInfo.Any(castCondition.Compile()));
IQueryable<Name> result = _context.Name.AsExpandable().Where(tp);
Or use compile directly
IQueryable<Name> result = _context.Name.AsExpandable().Where(n => n.CastInfo.Any(castCondition.Compile()));
But I have an error about Compile: System.NotSupportedException: Could not parse expression 'n.CastInfo.Any(__Compile_0)'
So is it possible to convert the result from PredicateBuilder to pass into Any?
Note: I was able to build the desired behavior combining expressions, but I don't like that I need extra variables.
System.Linq.Expressions.Expression<Func<CastInfo,bool>> castExpression = (c => true);
if (movies != null && movies.Length > 0)
{
castExpression = (c => movies.Contains(c.MovieId));
}
if (roleType > 0)
{
var existingExpression = castExpression;
castExpression = c => existingExpression.Invoke(c) && c.RoleId == roleType;
}
IQueryable<Name> result = _context.Name.AsExpandable().Where(n => n.CastInfo.Any(castExpression.Compile()));
return await result.OrderBy(n => n.Name1).Take(25).ToListAsync();
So I assume I just miss something about builder.
Update about versions: I use dotnet core 2.0 and LinqKit.Microsoft.EntityFrameworkCore 1.1.10
Looking at the code, one will assume that the type of castCondition variable is Expression<Func<CastInfo, bool>> (as it was in earlier versions of PredicateBuilder).
But if that was the case, then n.CastInfo.Any(castCondition) should not even compile (assuming CastInfo is a collection navigation property, so the compiler will hit Enumerable.Any which expects Func<CastInfo, bool>, not Expression<Func<CastInfo, bool>>). So what's going on here?
In my opinion, this is a good example of C# implicit operator abuse. The PredicateBuilder.New<T> method actually returns a class called ExpressionStarter<T>, which has many methods emulating Expression, but more importantly, has implicit conversion to Expression<Func<T, bool>> and Func<CastInfo, bool>. The later allows that class to be used for top level Enumerable / Queryable methods as replacement of the respective lambda func/expression. However, it also prevents the compile time error when used inside the expression tree as in your case - the complier emits something like n.CastInfo.Any((Func<CastInfo, bool>)castCondition) which of course causes exception at runtime.
The whole idea of LinqKit AsExpandable method is to allow "invoking" expressions via custom Invoke extension method, which then is "expanded" in the expression tree. So back at the beginning, if the variable type was Expression<Func<CastInfo, bool>>, the intended usage is:
_context.Name.AsExpandable().Where(n => n.CastInfo.Any(c => castCondition.Invoke(c)));
But now this doesn't compile because of the reason explained earlier. So you have to convert it first to Expression<Func<T, bool> outside of the query:
Expression<Func<CastInfo, bool>> castPredicate = castCondition;
and then use
_context.Name.AsExpandable().Where(n => n.CastInfo.Any(c => castPredicate.Invoke(c)));
or
_context.Name.AsExpandable().Where(n => n.CastInfo.Any(castPredicate.Compile()));
To let compiler infer the expression type, I would create a custom extension method like this:
using System;
using System.Linq.Expressions;
namespace LinqKit
{
public static class Extensions
{
public static Expression<Func<T, bool>> ToExpression<T>(this ExpressionStarter<T> expr) => expr;
}
}
and then simply use
var castPredicate = castCondition.ToExpression();
It still has to be done outside of the query, i.e. the following does not work:
_context.Name.AsExpandable().Where(n => n.CastInfo.Any(c => castCondition.ToExpression().Invoke(c)));
It may not be exactly related to the original question, but considering the following model :
public Class Music
{
public int Id { get; set; }
public List<Genre> Genres { get; set; }
}
public Class Genre
{
public int Id { get; set; }
public string Title { get; set; }
}
List<string> genresToFind = new() {"Pop", "Rap", "Classical"};
If you are trying to find all Musics that their genres exist in genresToFind list, here's what you can do:
Create PredicateBuilder expressions chain on Genre model :
var pre = PredicateBuilder.New<Genre>();
foreach (var genre in genresToFind)
{
pre = pre.Or(g => g.Title.Contains(genre));
}
Then execute your query like this :
var result = await _db.Musics.AsExpandable()
.Where(m => m.Genres
.Any(g => pre.ToExpression().Invoke(g)))
.ToListAsync();
ToExpression() is a generic extension method that we've created to convert ExpressionStarter<Genre> type to Expression<Func<Genre, bool>> :
public static class ExpressionExtensions
{
public static Expression<Func<T, bool>> ToExpression<T> (this
ExpressionStarter<T> exp) => exp;
}
Also, you'll need LinqKit.Microsoft.EntityFrameworkCore package for efcore.
I need to create a general routine in visual studio to get some parameters as input and return a list resulted from a repository. I am using Linq. But I am not sure how to develop this function and neither what key words I can use and find some resources.
This is a sample code that already is used in my program:
var lstReceiptDetails = Repository<TransactionDetail>()
.Where(current => current.HeaderId == headerId)
.OrderBy(current => current.DocumentRow)
.ToList();
I need to change the above linq statement to something like the following pseudocode:
private List<> GetQuery(repositoryName, conditionFieldName, orderFieldName )
{
var lstResult = Repository<repositiryName>()
.Where(current => current.ConditionFieldName == conditionFieldName)
.OrderBy(current => current.orderFieldName)
.ToList();
Return(lstResult);
}
Any help is appreciate.
Maryam
I think the closest way you can get is by using the following example below. I've tried a several ways to do this, but it would harm the usability and the readability. This is a compromise between code duplication and readability.
A sample POCO object:
class TransactionDetail
{
public DateTime DateProcessed { get; set; }
public string AccountName { get; set; }
}
The repositories:
abstract class GenericRepository<T>
{
public List<T> GetQuery<TKey>(
Func<T, bool> conditionFieldName,
Func<T, TKey> orderFieldName)
{
var lstResult = Repository()
.Where(conditionFieldName)
.OrderBy(orderFieldName)
.ToList();
return lstResult;
}
private IEnumerable<T> Repository()
{
throw new NotImplementedException();
}
}
class TransactionDetailRepository : GenericRepository<TransactionDetail>
{
}
And caller-side:
var repository = new TransactionDetailRepository();
var transactions = repository.GetQuery(
x => x.AccountName == "Foo Bar",
x => x.DateProcessed);
Argument checks should still be implemented properly though.
If this piece of code should be used in EntityFramework or Linq-to-SQL, parameters should be wrapped in Expression<T> such that, for example: Func<T, bool> becomes Expression<Func<T, bool>>
You can try to use the LINQ Dynamic Query Library that take string arguments instead of type-safe language operators.
Short example:
var result = Repository<repositoryName>().
Where("Id = 1").
Select("new(Id, Name)");
More information here: http://weblogs.asp.net/scottgu/dynamic-linq-part-1-using-the-linq-dynamic-query-library
I'm trying to maintain a list of unique models from a variety of queries. Unfortunately, the equals method of our models are not defined, so I couldn't use a hash map easily.
As a quick fix I used the following code:
public void AddUnique(
List<Model> source,
List<Model> result)
{
if (result != null)
{
if (result.Count > 0
&& source != null
&& source.Count > 0)
{
source.RemoveAll(
s => result.Contains(
r => r.ID == s.ID));
}
result.AddRange(source);
}
}
Unfortunately, this does not work. When I step throught the code, I find that even though I've checked to make sure that there was at least one Model with the same ID in both source and result, the RemoveAll(Predicate<Model>) line does not change the number of items in source. What am I missing?
The above code shouldn't even compile, as Contains expects a Model, not a predicate.
You can use Any() instead:
source.RemoveAll(s => result.Any(r => r.ID == s.ID));
This will remove the items from source correctly.
I might opt to tackle the problem a different way.
You said you do not have suitable implementations of equality inside the class. Maybe you can't change that. However, you can define an IEqualityComparer<Model> implementation that allows you to specify appropriate Equals and GetHashCode implementations external to the actual Model class itself.
var comparer = new ModelComparer();
var addableModels = newSourceOfModels.Except(modelsThatAlreadyExist, comparer);
// you can then add the result to the existing
Where you might define the comparer as
class ModelComparer : IEqualityComparer<Model>
{
public bool Equals(Model x, Model y)
{
// validations omitted
return x.ID == y.ID;
}
public int GetHashCode(Model m)
{
return m.ID.GetHashCode();
}
}
source.RemoveAll(source.Where(result.Select(r => r.ID).Contains(source.Select(s => s.ID))));
The goal of this statement is to make two enumerations of IDs, one for source and one for result. It then will return true to the where statement for each of the elements in both enumerations. Then it will remove any elements that return true.
Your code is removing all the models which are the same between the two lists, not those which have the same ID. Unless they're actually the same instances of the model, it won't work like you're expecting.
Sometimes I use these extension methods for that sort of thing:
public static class CollectionHelper
{
public static void RemoveWhere<T>(this IList<T> list, Func<T, bool> selector)
{
var itemsToRemove = list.Where(selector).ToList();
foreach (var item in itemsToRemove)
{
list.Remove(item);
}
}
public static void RemoveWhere<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, Func<KeyValuePair<TKey, TValue>, bool> selector)
{
var itemsToRemove = dictionary.Where(selector).ToList();
foreach (var item in itemsToRemove)
{
dictionary.Remove(item);
}
}
}