EF Core DistinctBy - c#

I am trying to find an equivalent Linq Query to run the below code in SQL Database using EF Core.
var result = Logs.Where(x => x.ProjectCode.Equals(projectCode))
.OrderByDescending(x => x.SubmittedDate).DistinctBy(y => new { y.GeographyCode, y.DataSetId })
The DistinctBy operation is creating an issue while translating query string from context. I am not sure how to write an equivalent LINQ operation that is compatible with EF Core which takes 2 distinct properties.
Is there a way to do this? I want to run this query on the SQL server end.

try GroupBy
var result = Logs.Where(x => x.ProjectCode.Equals(projectCode))
.OrderByDescending(x => x.SubmittedDate)
.GroupBy(y => new { y.GeographyCode, y.DataSetId })
.Select(g => g.First())
.ToList();

Related

EF Core Linq-to-Sql GroupBy SelectMany not working with SQL Server

I am trying the following Linq with LinqPad connecting to SQL Server with EF Core:
MyTable.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x)
I get this error:
The LINQ expression 'x => x' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explic...
However, this works:
MyTable.AsEnumerable()
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x)
I was under the impression that EF Core should be able to translate such an expression.
Am I doing anything wrong?
That exception message is an EF Core message, not EF6.
In EF 6 your expression should work, though with something like a ToList() on the end. I suspect the error you are encountering is that you may be trying to do something more prior to materializing the collection, and that is conflicting with the group by SelectMany evaluation.
For instance, something like this EF might take exception to:
var results = MyTable
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x)
.Select(x => new ViewModel { Id = x.Id, Name = x.Name} )
.ToList();
where something like this should work:
var results = MyTable
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x.Select(y => new ViewModel { Id = y.Id, Name = y.Name} ))
.ToList();
You don't want to use:
MyTable.AsEnumerable(). ...
As this is materializing your entire table into memory, which might be ok if the table is guaranteed to remain relatively small, but if the production system grows significantly it forms a cascading performance decline over time.
Edit: Did a bit of digging, credit to this post as it does look like another limitation in EF Core's parser. (No idea how something that works in EF6 cannot be successfully integrated into EF Core... Reinventing wheels I guess)
This should work:
var results = MyTable
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.Select(x => x.Key)
.SelectMany(x => _context.MyTable.Where(y => y.Key == x))
.ToList();
So for example where I had a Parent and Child table where I wanted to group by ParentId, take the top 5 parents and select all of their children:
var results = context.Children
.GroupBy(x => x.ParentId)
.OrderBy(x => x.Key) // ParentId
.Take(5)
.Select(x => x.Key) // Select the top 5 parent ID
.SelectMany(x => context.Children.Where(c => c.ParentId == x)).ToList();
EF pieces this back together by doing a SelectMany back on the DbSet against the selected group IDs.
Credit to the discussions here: How to select top N rows for each group in a Entity Framework GroupBy with EF 3.1
Edit 2: The more I look at this, the more hacky it feels. Another alternative would be to look at breaking it up into two simpler queries:
var keys = MyTable.OrderBy(x => x.SomeField)
.Select(x => x.SomeField)
.Take(5)
.ToList();
var results = MyTable.Where(x => keys.Contains(x.SomeField))
.ToList();
I think that translates your original example, but the gist is to select the applicable ID/Discriminating keys first, then query for the desired data using those keys. So in the case of my All children from the first 5 parents that have children:
var parentIds = context.Children
.Select(x => x.ParentId)
.OrderBy(x => x)
.Take(5)
.ToList();
var children = context.Children
.Where(x => parentIds.Contains(x.ParentId))
.ToList();
EF Core has limitation for such query, which is fixed in EF Core 6. This is SQL limitation and there is no direct translation to SQL for such GroupBy.
EF Core 6 is creating the following query when translating this GroupBy.
var results = var results = _context.MyTable
.Select(x => new { x.SomeField })
.Distinct()
.OrderBy(x => x.SomeField)
.Take(5)
.SelectMany(x => _context.MyTable.Where(y => y.SomeField == x.SomeField))
.ToList();
It is not most optimal query for such task, because in SQL it can be expressed by Window Function ROW_NUMBER() with PARTITION on SomeField and additional JOIN can be omitted.
Also check this function, which makes such query automatically.
_context.MyTable.TakeDistinct(5, x => x.SomeField);

How to repair LINQ query? (not work on EF Core 3.0)

I have moved my application from dotnet 2.2 to 3.0. I've solved big and small problems, but the LINQ queries I wrote in the service layer are failing.
For example, this method (and linq query) not work:
public async Task<IList<PersonnelDto>> PersonnelChart(short year)
{
var result = await uow.Repository<Personnel>().Query()
.Where(x => !x.IsOpen)
.Include(i => i.CurrentMoney)
.GroupBy(gp => new { id = gp.CurrencyType})
.Select(s => new PersonnelChartDto
{
PersonnelId = s.Key.id,
PersonnelCode = s.Key.code,
PersonnelName = s.Key.name,
PersonnelCount = s.Count(),
SavingTotal = Math.Round(s.Sum(sm => sm.CurrentMoney
.Where(x => !x.IsDeleted)
.Select(ss => ss.SavingTotal)
.DefaultIfEmpty(0)
.Sum()), 2),
})
.ToListAsync();
return result;
}
When trigger this method, EF Core throw that error:
System.InvalidOperationException: Processing of the LINQ expression 'AsQueryable((Unhandled parameter: sm).CurrentMoney)' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core.
So, how can I fix this error?

Alternative to GroupBy FirstOrDefault for EF Core 2.2

In my project I am currently having an issue with the following LINQ query:
context.WebPages.GroupBy(x => x.Url).Select(x => new { x.Key, x.FirstOrDefault()?.LogoId } ).ToList();
Basically, I am trying to get from our DB web pages distinct by their URL and the first logo ID that is assigned to them. However, I am struggling with the warning saying: The LINQ expressio could not be translated and will be evaluated locally. Since I am storing a couple of million web pages, I don't want to load unnecessary data. And I definitely don't want the expression to be evaluated locally.
I have tried several optimizations of the LINQ expressions, e.g.:
context.WebPages.GroupBY(x => x.Url, x => new {x.LogoId}).Select(x => new { x.Key, x.FirstOrDefault()?.LogoId } ).ToList();
context.WebPages.GroupBy(x => x.Url).Select(x => new { x.Key, x.First().LogoId } ).ToList();
context.WebPages.GroupBy(x => x.Url).Select(x => x.First()).ToList();
context.WebPages.GroupBy(x => x.Url).ToList();
but I always ended up with the same warning. The only query that could be translated (but is useless to me), was:
context.WebPages.GroupBy(x => x.Url).Select(x => x.Key).ToList();
Is there any alternative LINQ expression that could work (or even a set of LINQ expressions)? Or do I need to use plain SQL expression?
Side note: We are also planning to move to .NET Core 3.0, but that is a couple of months distant future... and I cannot wait until "then".
How about if we try an exclusion join? Assuming LogoId is a comparable type (e.g. int):
var ans = from a in context.WebPages
where !(from b in context.WebPages where a.Url == b.Url && a.LogoId > b.LogoId select b).Any()
select new {
a.Url,
a.LogoId
};
Update: I tested this with EF 2.2.6 and it generated (possibly inefficient) SQL fine.
Update 2: I also tested with EF 3 and it worked. My earlier Distinct/GroupJoin failed SQL translation in EF 3.

EF Core 3.0 .Include does not work as expected and Super Slow

I have linq query like this in EF Core 2.0, It work as it is, but when I upgrade to EF Core 3.0 it always timeout. I found the issue in query = query.Where(x => x.Questions);.
My Question is i would like to return the course with filter questions like only Take(10) or with .Where condition that only display certain range not all questions.
var query = _courseRepository.Table;
query = query.Where(x => x.Id == id);
query = query.Include(x => x.Questions);
query = query.Include(x => x.CourseYear);
query = query.Include(x => x.CourseSubject);
query = query.Include(x => x.Instructors).ThenInclude(y => y.User);
query = query.Include(x => x.Instructors).ThenInclude(y => y.Course);
query = query.Include(x => x.Instructors).ThenInclude(y => y.CourseClass);
query = query.Include(x => x.CourseSections);
query = query.Include(x => x.CourseSections).ThenInclude(y => y.Lessons);
query = query.Include(x => x.CourseClasses);
query = query.Include(x => x.UserCourses).ThenInclude(y => y.User);
var result = query.FirstOrDefault();
EFCore 3.0 changed the query(ies) generated by using .Include() and you are experiencing the Cartesian Explosion Problem;
Specifically there is the following Red Caution in the Docs now:
Caution
Since version 3.0.0, each Include will cause an additional JOIN to be
added to SQL queries produced by relational providers, whereas
previous versions generated additional SQL queries. This can
significantly change the performance of your queries, for better or
worse. In particular, LINQ queries with an exceedingly high number of
Include operators may need to be broken down into multiple separate
LINQ queries in order to avoid the cartesian explosion problem.
The solution is to execute multiple queries now per the docs.
Its super unfortunate loading entity graphs, common to highly normalized data, is so un-performant but this is its current state with EF.
See: Loading Related Data and scroll until you see red.
var query = _courseRepository.Table
.Include(x => x.Questions)
.Include(x => x.CourseClasses)
.Include(x => x.CourseYear)
.Include(x => x.CourseSubject);
var course = await query.FirstOrDefaultAsync(x => x.Id == id);
query.Include(x => x.Instructors).ThenInclude(y => y.User).SelectMany(a => a.Instructors).Load();
query.Include(x => x.Instructors).ThenInclude(y => y.Course).SelectMany(a => a.Instructors).Load();
query.Include(x => x.Instructors).ThenInclude(y => y.CourseClass).SelectMany(a => a.Instructors).Load();
query.Include(x => x.CourseSections).ThenInclude(y => y.Lessons).SelectMany(a => a.CourseSections).Load();
query.Include(x => x.UserCourses).ThenInclude(y => y.User).SelectMany(a => a.UserCourses).Load();

EF Core Mysql performance

I have Mysql database with ~1 500 000 entities. When I try to execute below statement using EF Core 1.1 and Mysql.Data.EntityFrameworkCore 7.0.7-m61 it takes about 40minutes to finish:
var results = db.Posts
.Include(u => u.User)
.GroupBy(g => g.User)
.Select(g => new { Nick = g.Key.Name, Count = g.Count() })
.OrderByDescending(e => e.Count)
.ToList();
On the other hand using local mysql-cli and below statement, takes around 16 seconds to complete.
SELECT user.Name, count(*) c
FROM post
JOIN user ON post.UserId = user.Id
GROUP BY user.Name
ORDER BY c DESC
Am i doing something wrong, or EF Core performance of MySql is so terrible?
Your queries are doing different things. Some issues in your LINQ-to-Entities query:
You call Include(...) which will eagerly load the User for every item in db.Posts.
You call Count() for each record in each group. This could be rewritten to count the records only once per group.
The biggest issue is that you're only using the Name property of the User object. You could select just this field and find the same result. Selecting, grouping, and returning 1.5 million strings should be a fast operation in EF.
Original:
var results =
db.Posts
.Include(u => u.User)
.GroupBy(g => g.User)
.Select(g => new { Nick = g.Key.Name, Count = g.Count() })
.OrderByDescending(e => e.Count)
.ToList();
Suggestion:
var results =
db.Posts
.Select(x => x.User.Name)
.GroupBy(x => x)
.Select(x => new { Name = x.Key, Count = x.Count() })
.OrderByDescending(x => x.Count)
.ToList();
If EF core still has restrictions on the types of grouping statements it allows, you could call ToList after the first Select(...) statement.

Categories