EF Core Mysql performance - c#

I have Mysql database with ~1 500 000 entities. When I try to execute below statement using EF Core 1.1 and Mysql.Data.EntityFrameworkCore 7.0.7-m61 it takes about 40minutes to finish:
var results = db.Posts
.Include(u => u.User)
.GroupBy(g => g.User)
.Select(g => new { Nick = g.Key.Name, Count = g.Count() })
.OrderByDescending(e => e.Count)
.ToList();
On the other hand using local mysql-cli and below statement, takes around 16 seconds to complete.
SELECT user.Name, count(*) c
FROM post
JOIN user ON post.UserId = user.Id
GROUP BY user.Name
ORDER BY c DESC
Am i doing something wrong, or EF Core performance of MySql is so terrible?

Your queries are doing different things. Some issues in your LINQ-to-Entities query:
You call Include(...) which will eagerly load the User for every item in db.Posts.
You call Count() for each record in each group. This could be rewritten to count the records only once per group.
The biggest issue is that you're only using the Name property of the User object. You could select just this field and find the same result. Selecting, grouping, and returning 1.5 million strings should be a fast operation in EF.
Original:
var results =
db.Posts
.Include(u => u.User)
.GroupBy(g => g.User)
.Select(g => new { Nick = g.Key.Name, Count = g.Count() })
.OrderByDescending(e => e.Count)
.ToList();
Suggestion:
var results =
db.Posts
.Select(x => x.User.Name)
.GroupBy(x => x)
.Select(x => new { Name = x.Key, Count = x.Count() })
.OrderByDescending(x => x.Count)
.ToList();
If EF core still has restrictions on the types of grouping statements it allows, you could call ToList after the first Select(...) statement.

Related

EF Core Linq-to-Sql GroupBy SelectMany not working with SQL Server

I am trying the following Linq with LinqPad connecting to SQL Server with EF Core:
MyTable.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x)
I get this error:
The LINQ expression 'x => x' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explic...
However, this works:
MyTable.AsEnumerable()
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x)
I was under the impression that EF Core should be able to translate such an expression.
Am I doing anything wrong?
That exception message is an EF Core message, not EF6.
In EF 6 your expression should work, though with something like a ToList() on the end. I suspect the error you are encountering is that you may be trying to do something more prior to materializing the collection, and that is conflicting with the group by SelectMany evaluation.
For instance, something like this EF might take exception to:
var results = MyTable
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x)
.Select(x => new ViewModel { Id = x.Id, Name = x.Name} )
.ToList();
where something like this should work:
var results = MyTable
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.SelectMany(x => x.Select(y => new ViewModel { Id = y.Id, Name = y.Name} ))
.ToList();
You don't want to use:
MyTable.AsEnumerable(). ...
As this is materializing your entire table into memory, which might be ok if the table is guaranteed to remain relatively small, but if the production system grows significantly it forms a cascading performance decline over time.
Edit: Did a bit of digging, credit to this post as it does look like another limitation in EF Core's parser. (No idea how something that works in EF6 cannot be successfully integrated into EF Core... Reinventing wheels I guess)
This should work:
var results = MyTable
.GroupBy(x => x.SomeField)
.OrderBy(x => x.Key)
.Take(5)
.Select(x => x.Key)
.SelectMany(x => _context.MyTable.Where(y => y.Key == x))
.ToList();
So for example where I had a Parent and Child table where I wanted to group by ParentId, take the top 5 parents and select all of their children:
var results = context.Children
.GroupBy(x => x.ParentId)
.OrderBy(x => x.Key) // ParentId
.Take(5)
.Select(x => x.Key) // Select the top 5 parent ID
.SelectMany(x => context.Children.Where(c => c.ParentId == x)).ToList();
EF pieces this back together by doing a SelectMany back on the DbSet against the selected group IDs.
Credit to the discussions here: How to select top N rows for each group in a Entity Framework GroupBy with EF 3.1
Edit 2: The more I look at this, the more hacky it feels. Another alternative would be to look at breaking it up into two simpler queries:
var keys = MyTable.OrderBy(x => x.SomeField)
.Select(x => x.SomeField)
.Take(5)
.ToList();
var results = MyTable.Where(x => keys.Contains(x.SomeField))
.ToList();
I think that translates your original example, but the gist is to select the applicable ID/Discriminating keys first, then query for the desired data using those keys. So in the case of my All children from the first 5 parents that have children:
var parentIds = context.Children
.Select(x => x.ParentId)
.OrderBy(x => x)
.Take(5)
.ToList();
var children = context.Children
.Where(x => parentIds.Contains(x.ParentId))
.ToList();
EF Core has limitation for such query, which is fixed in EF Core 6. This is SQL limitation and there is no direct translation to SQL for such GroupBy.
EF Core 6 is creating the following query when translating this GroupBy.
var results = var results = _context.MyTable
.Select(x => new { x.SomeField })
.Distinct()
.OrderBy(x => x.SomeField)
.Take(5)
.SelectMany(x => _context.MyTable.Where(y => y.SomeField == x.SomeField))
.ToList();
It is not most optimal query for such task, because in SQL it can be expressed by Window Function ROW_NUMBER() with PARTITION on SomeField and additional JOIN can be omitted.
Also check this function, which makes such query automatically.
_context.MyTable.TakeDistinct(5, x => x.SomeField);

EF Core 2.0 Group By other properties

I have 2 tables:
USERS
UserId
Name
Scores (collection of table Scores)
SCORES
UserId
CategoryId
Points
I need to show all the users and a SUM of their points, but also I need to show the name of the user. It can be filtered by CategoryId or not.
Context.Scores
.Where(p => p.CategoryId == categoryId) * OPTIONAL
.GroupBy(p => p.UserId)
.Select(p => new
{
UserId = p.Key,
Points = p.Sum(s => s.Points),
Name = p.Select(s => s.User.Name).FirstOrDefault()
}).OrderBy(p => p.Points).ToList();
The problem is that when I add the
Name = p.Select(s => s.User.Name).FirstOrDefault()
It takes so long. I don't know how to access the properties that are not inside the GroupBy or are a SUM. This example is very simple becaouse I don't have only the Name, but also other properties from User table.
How can I solve this?
It takes so long because the query is causing client evaluation. See Client evaluation performance issues and how to use Client evaluation logging to identify related issues.
If you are really on EF Core 2.0, there is nothing you can do than upgrading to v2.1 which contains improved LINQ GroupBy translation. Even with it the solution is not straight forward - the query still uses client evaluation. But it could be rewritten by separating the GroupBy part into subquery and joining it to the Users table to get the additional information needed.
Something like this:
var scores = db.Scores.AsQueryable();
// Optional
// scores = scores.Where(p => p.CategoryId == categoryId);
var points = scores
.GroupBy(s => s.UserId)
.Select(g => new
{
UserId = g.Key,
Points = g.Sum(s => s.Points),
});
var result = db.Users
.Join(points, u => u.UserId, p => p.UserId, (u, p) => new
{
u.UserId,
u.Name,
p.Points
})
.OrderBy(p => p.Points)
.ToList();
This still produces a warning
The LINQ expression 'orderby [p].Points asc' could not be translated and will be evaluated locally.
but at least the query is translated and executes as single SQL:
SELECT [t].[UserId], [t].[Points], [u].[UserId] AS [UserId0], [u].[Name]
FROM [Users] AS [u]
INNER JOIN (
SELECT [s].[UserId], SUM([s].[Points]) AS [Points]
FROM [Scores] AS [s]
GROUP BY [s].[UserId]
) AS [t] ON [u].[UserId] = [t].[UserId]

Linq - Group By Id, Order By and Then select top 5 of each grouping

Is there some way in linq group By Id, Order By descending and then select top 5 of each grouping? Right now I have some code shown below, but I used .Take(5) and it obviously selects the top 5 regardless of grouping.
Items = list.GroupBy(x => x.Id)
.Select(x => x.OrderByDescending(y => y.Value))
.Select(y => new Home.SubModels.Item {
Name= y.FirstOrDefault().Name,
Value = y.FirstOrDefault().Value,
Id = y.FirstOrDefault().Id
})
You are almost there. Use Take in the Select statement:
var items = list.GroupBy(x => x.Id)
//For each IGrouping - order nested items and take 5 of them
.Select(x => x.OrderByDescending(y => y.Value).Take(5))
This will return an IEnumerable<IEnumerable<T>>. If you want it flattened replace Select with SelectMany

SQL Azure vs. On-Premises Timeout Issue - EF

I'm working on a report right now that runs great with our on-premises DB (just refreshed from PROD). However, when I deploy the site to Azure, I get a SQL Timeout during its execution. If I point my development instance at the SQL Azure instance, I get a timeout as well.
Goal: To output a list of customers that have had an activity created during the search range, and when that customer is found, get some other information about that customer regarding policies, etc. I've removed some of the properties below for brevity (as best I can)...
UPDATE
After lots of trial and error, I can get the entire query to run fairly consistently within 1000MS so long as this block of code is not executed.
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Status.Name)
.FirstOrDefault(),
With this code in place, things begin to go haywire. I think this Where clause is a big part of it: .Where(b => b.ActivityType.IsReportable). What is the best way to grab the status name?
EXISTING CODE
Any thoughts as to why SQL Azure would timeout whereas on-premises would turn this around in less than 100MS?
return db.Customers
.Where(a => a.Activities.Where(
b => b.CreatedDateTime >= search.BeginDateCreated
&& b.CreatedDateTime <= search.EndDateCreated).Count() > 0)
.Where(a => a.CustomerGroup.Any(d => d.GroupId== search.GroupId))
.Select(a => new CustomCustomerReport
{
CustomerId = a.Id,
Manager = a.Manager.Name,
Customer = a.FirstName + " " + a.LastName,
ContactSource= a.ContactSource!= null ? a.ContactSource.Name : "Unknown",
ContactDate = a.DateCreated,
NewSale = a.Sales
.Where(p => p.Employee.IsActive)
.OrderByDescending(p => p.DateCreated)
.Select(p => new PolicyViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
ExistingSale = a.Sales
.Where(p => p.CancellationDate == null || p.CancellationDate <= myDate)
.Where(p => p.SaleDate < myDate)
.OrderByDescending(p => p.DateCreated)
.Select(p => new SalesViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Disposition.Name)
.FirstOrDefault(),
CustomerGroup = a.CustomerGroup
.Where(cd => cd.GroupId == search.GroupId)
.Select(cd => new GroupViewModel
{
//MISC PROPERTIES
}).FirstOrDefault()
}).ToList();
I cannot give you a definite answer but I would recommend approaching the problem by:
Run SQL profiler locally when this code is executed and see what SQL is generated and run. Look at the query execution plan for each query and look for table scans and other slow operations. Add indexes as needed.
Check your lambdas for things that cannot be easily translated into SQL. You might be pulling the contents of a table into memory and running lambdas on the results, which will be very slow. Change your lambdas or consider writing raw SQL.
Is the Azure database the same as your local database? If not, pull the data locally so your local system is indicative.
Remove sections (i.e. CustomerGroup then CurrentDisposition then ExistingSale then NewSale) and see if there is a significant performance improvement after removing the last section. Focus on the last removed section.
Looking at the line itself:
You use ".Count() > 0" on line 4. Use ".Any()" instead, since the former goes through every row in the database to get you an accurate count when you just want to know if at least one row satisfies the requirements.
Ensure fields referenced in where clauses have indexes, such as IsReportable.
Short answer: use memory.
Long answer:
Because of either bad maintenance plans or limited hardware, running this query in one big lump is what's causing it to fail on Azure. Even if that weren't the case, because of all the navigation properties you're using, this query would generate a staggering number of joins. The answer here is to break it down in smaller pieces that Azure can run. I'm going to try to rewrite your query into multiple smaller, easier to digest queries that use the memory of your .NET application. Please bear with me as I make (more or less) educated guesses about your business logic/db schema and rewrite the query accordingly. Sorry for using the query form of LINQ but I find things such as join and group by are more readable in that form.
var activityFilterCustomerIds = db.Activities
.Where(a =>
a.CreatedDateTime >= search.BeginDateCreated &&
a.CreatedDateTime <= search.EndDateCreated)
.Select(a => a.CustomerId)
.Distinct()
.ToList();
var groupFilterCustomerIds = db.CustomerGroup
.Where(g => g.GroupId = search.GroupId)
.Select(g => g.CustomerId)
.Distinct()
.ToList();
var customers = db.Customers
.AsNoTracking()
.Where(c =>
activityFilterCustomerIds.Contains(c.Id) &&
groupFilterCustomerIds.Contains(c.Id))
.ToList();
var customerIds = customers.Select(x => x.Id).ToList();
var newSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& s.Employee.IsActive
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new PolicyViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var existingSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& (s.CancellationDate == null || s.CancellationDate <= myDate)
&& s.SaleDate < myDate
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new SalesViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var currentStatuses =
(from a in db.Activities.AsNoTracking()
where customerIds.Contains(a.CustomerId)
&& a.ActivityType.IsReportable
group a by a.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Status = grouped
.OrderByDescending(x => x.DueDateTime)
.Select(x => x.Disposition.Name)
.FirstOrDefault()
}).ToList();
var customerGroups =
(from cg in db.CustomerGroups
where cg.GroupId == search.GroupId
group cg by cg.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Group = grouped
.Select(x =>
new GroupViewModel
{
// ...
})
.FirstOrDefault()
}).ToList();
return customers
.Select(c =>
new CustomCustomerReport
{
// ... simple props
// ...
// ...
NewSale = newSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
ExistingSale = existingSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
CurrentStatus = currentStatuses
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Status)
.FirstOrDefault(),
CustomerGroup = customerGroups
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Group)
.FirstOrDefault(),
})
.ToList();
Hard to suggest anything without seeing actual table definitions, espectially the indexes and foreign keys on Activities entity.
As far I understand Activity (CustomerId, ActivityTypeId, DueDateTime, DispositionId). If this is standard warehousing table (DateTime, ClientId, Activity), I'd suggest the following:
If number of Activities is reasonably small, then force the use of CONTAINS by
var activities = db.Activities.Where( x => x.IsReportable ).ToList();
...
.Where( b => activities.Contains(b.Activity) )
You can even help the optimiser by specifying that you want ActivityId.
Indexes on Activitiy entity should be up to date. For this particular query I suggest (CustomerId, ActivityId, DueDateTime DESC)
precache Disposition table, my crystal ball tells me that it's dictionary table.
For similar task to avoid constantly hitting Activity table I made another small table (CustomerId, LastActivity, LastVAlue) and updated it as the status changed.

How to create query of queries using LINQ?

I'm trying to convert query of queries used in ColdFusion to LINQ and C#. The data come from data files, rather than from the database.
I converted the first query, but have no clue as to
how to use it to query the second query.
how to include count(PDate) as DayCount in the second query.
Below is the code using query of queries in ColdFusion:
First query
<cfquery name="qSorted" dbtype = "query">
SELECT OA, CD,PDate,
FROM dataQuery
GROUP BY CD,OA,PDate,
</cfquery>
Second query
<cfquery name="qDayCount" dbtype = "query">
SELECT OA, CD, count(PDate) as DayCount
FROM qSorted // qSorted is from the first query.
GROUP BY
OA, CD
ORDER BY
OA, CD
</cfquery>
Here's the first converted LINQ query, and it works fine:
var Rows = allData.SelectMany(u => u._rows.Select(t => new
{
OA = t[4],
CD = t[5],
PDate = t[0]
}))
.GroupBy(x => new { x.CD, x.OA, x.PDate })
.Select(g => new
{
g.Key.OA,
g.Key.CD,
g.Key.PDate
})
.ToList();
Here's the pseudo-code for the second LINQ query, which I need your assistance:
var RowsDayCount = Rows //Is this correct? If not, how to do it?
.GroupBy(x => new { x.OA, x.PDate, x.CD, })
.Select(g => new
{
g.Key.OA,
g.Key.CD,
g.Key.PDate,//PDate should be PDate.Distinct().Count() asDayCount
// See DayCount in cfquery name="qDayCount" above.
})
.OrderBy(u => u.OA)
.ThenBy(u => u.CD)
.ToList();
Your second query origionally wasn't grouping on PDate, but your translation is. That's wrong. If you want to count the number of PDates for each OA/CD pair, you need to not group on PDate. Once you've made that change, you can modify the Select to pull out all of the PDate values from the group, and count the distinct values.
.GroupBy(x => new { x.OA, x.CD, })
.Select(g => new
{
g.Key.OA,
g.Key.CD,
DayCount = g.Select(item => item.PDate).Distinct().Count(),
})

Categories