Get Distinct Entries Based On Specific Column in Entity Framework - c#

I have a SQLite table that contains every test result we've run, and I'm looking to write an entity framework query that returns only the most recent test result per project.
Normally, I'd assume this would be "group by project id, return row with the max updated value," or, alternatively, "sort by date and return first". However, when I try the query, I keep getting Entity Framework "could not be translated" errors.
Here's what I've tried:
results = await _context.Results
.Include(x => x.Project)
.AsNoTracking()
.OrderByDescending(x => x.Updated)
.GroupBy(x => x.ProjectId, (x, y) => y.First())
.ToListAsync();
However, I keep receiving errors that the .First() command could not be translated by Entity Framework. Is there something I'm missing (or, alternatively, a better way to write the query that is more entity framework friendly)?
For reference, here's the operation I'm trying to do in normal SQL: https://thoughtbot.com/blog/ordering-within-a-sql-group-by-clause
I'd prefer to do as much as the work on the server as possible, because there are only a small number of projects, but there could be thousands of results, and I'd rather not do client-side filtering if possible.
The application is written for ASP.NET Core 3.1, using Entity Framework Core.
Minor edit: While SQLite is being used for development, the final code will run against SQL Server, hence the desire to do processing server-side.

Try with a subquery instead of a grouping. Like this:
results = await _context.Results
.Include(x => x.Project)
.AsNoTracking()
.Where( r => r.Id == _context.Results.Where( rr => rr.ProjectId == r.ProjectID).Max( rr => rr.Id) )
.ToListAsync();

Your method couldn't be translated to T-SQL, Linq to Entities couldn't recognize it. You can modify the code as below (adding AsEnumerable after AsNoTracking):
.Include(x => x.Project)
.AsNoTracking()
.AsEnumerable()
With AsEnumerable after data is loaded, any further operation is performed using Linq to Objects, on the data already in memory.

While this isn't portable, here's how this can be done using SQLite-compatable SQL and Entity Framework:
results = await _context.Results
.FromSqlRaw("SELECT Results.* FROM (SELECT Id, ProjectId, MAX(Updated) as Updated " +
"FROM Results GROUP BY ProjectId) as latest_results " +
"INNER JOIN Results ON Results.Id = latest_results.Id")
.Include(x => x.Project) //not required for question but useful
.OrderBy(x => x.Project.Name)
.AsNoTracking()
.ToListAsync();
If someone has a way to do this in pure LINQ/EF, but still perform the query server-side, I'll happily mark that as the answer, since this is dependent on the exact SQL dialect used.

Related

Entity Framework Core : LINQ advise needed on better approach using include for relational tables

I have a question about Entity Framework Core and using LINQ. I would like to get the other table details while accessing the Clients table. I can get them using below code. There are a total of around 10 tables I need to join, in this case is the below approach is good or any other, better approach? ClientId is the foreign key for all tables.
Actually I am getting a warning as below
[09:34:33 Warning] Microsoft.EntityFrameworkCore.Query
Compiling a query which loads related collections for more than one collection navigation either via 'Include' or through projection but no 'QuerySplittingBehavior' has been configured. By default Entity Framework will use 'QuerySplittingBehavior.SingleQuery' which can potentially result in slow query performance. See https://go.microsoft.com/fwlink/?linkid=2134277 for more information. To identify the query that's triggering this warning call 'ConfigureWarnings(w => w.Throw(RelationalEventId.MultipleCollectionIncludeWarning))'
Code:
var client = await _context.Clients
.Include(x => x.Address)
.Include(x => x.Properties)
.Include(x => x.ClientDetails)
-------------------
-------------------
-------------------
-------------------
.Where(x => x.Enabled == activeOnly && x.Id == Id).FirstOrDefaultAsync();
Actually when you use Eager loading (using include()) It uses left join (all needed queries in one query) to fetch data. Its default the ef behavior in ef 5.
You can set AsSplitQuery() in your query for split all includes in separated queries. like:
var client = await _context.Clients
.Include(x => x.Address)
.Include(x => x.Properties)
.Include(x => x.ClientDetails)
-------------------
-------------------
-------------------
-------------------
.Where(x =>x.Id == Id).AsSplitQuery().FirstOrDefaultAsync()
This approach needs more database connection, but it's nothing really important.
and for the final recommendation, I advise using AsNoTracking() for queries to high performance.
I have 3 different approaches depending on the version of EF Core you're using
EF Core 5 - as some have mentioned in previous answers there is new call which will simply break up the query into smaller subqueries and map all the relations in the end.
/*rest of the query here*/.AsSplitQuery();
If you are not able to just migrate your EF version you could still split the query manually
var client = await _context.Clients.FirstOrDefaultAsync(t => t.Enabled /*other conditions*/);
var Address = await _context.Addresses.FirstOrDefaultAsync(t => t.ClientId == client.Id);
/// Because they are tracked EF's entitytracker can under the hood
/// map the sub queries to their correct relations
/// in this case you should not use .AsNoTracking()
/// unless you would want to stitch relations together yourself
Another alternative is to write your query as a Select statement. This greatly improves performance but is a bit more of a hassle to construct.
var clientResult = await _context.Clients.Where(x => x.Id == id).Select(x => new
{
client = x,
x.Address,
Properties = x.Properties.Select(property => new
{
property.Name /*sub query for one to many related*/
}).ToList(),
x.ClientDetails
}).ToListAsync();
it doesn't take many includes to create cartesian explosion
you can read more up on the problem at hand in this article here
cartesian explosion in EF Core
and referral link to optimizing performance through EF Core can be found here
Maximizing Entity Framework Core Query Performance

What is causing EF Core to retrieve data much slower then the actual SQL Query and suggestions to speed up?

When running a multi join EF Core query against a context to a DB I get my return data in about 20 seconds (This is running the query thru LinqPad or actual code). When I take the actual SQL generated and run that against the same DB either in LinqPad or SSMS the query returns the results in 3 seconds. I understand that there is going to be some overhead in EF but is there anyway to speed or optimize the EF query to speed that up? The EF query loads data into context for further use.
_context.Organizations
.Where(predicate)
.Include(a => a.OrganizationType)
.Include(a => a.OrganizationLicenses)
.Include(a => a.Contacts)
.ThenInclude(b => b.Phones.Where(p => p.IsActive))
.ThenInclude(a => a.PhoneType)
.Load();
You can try using Split Queries, instead of generating one big query, eg:
_context.Organizations
.Where(predicate)
.Include(a => a.OrganizationType)
.Include(a => a.OrganizationLicenses)
.Include(a => a.Contacts)
.ThenInclude(b => b.Phones.Where(p => p.IsActive))
.ThenInclude(a => a.PhoneType)
.AsSplitQuery()
.Load();
which is intended primarily to reduce the load on the database engine by sending simpler queries, but a side-effect is that EF doesn't have to extract multiple entities from a single tabular result.

How to make multiple includes more efficient with EF Core 3.0

I have a query with many multi level includes:
var itemsToday = DatabaseContext.Items
.Where(f => f.StartTime > DateTime.Today && f.StartTime < DateTime.Today.AddDays(1))
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType1)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType2)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType3)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType4)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType5)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType6)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType7)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType8)
.Include(x => x.LocalStats).ThenInclude(x=>x.StatType9)
.Include(x => x.LocalDetails)
...
.OrderBy(f=>f.SomeOrderingCriterion);
There are more includes than this. Of course, this causes EF Core 3.0 to generate many joins in the SQL query, which means it takes forever to execute (25+ seconds to retrieve 200 records).
I have tried using the format .Include(x => x.LocalStats.StatType1) instead of Include and ThenInclude, but the results are the same.
Is there any way of making this more efficient? The docs suggest that:
LINQ queries with an exceedingly high number of Include operators may need to be broken down into multiple separate LINQ queries in order to avoid the cartesian explosion problem.
But I don't see any explanation on how to actually accomplish this.
Eventually I ended up writing the SQL manually, I couldn't find a way of making the generated SQL efficient enough. Also consider looking at optimising the DB, for example adding clustered indexes. This cut down DB time by a significant portion.
You should consider lazy loading this query.
https://learn.microsoft.com/en-us/ef/core/querying/related-data

EF Core 2.1 Group By for Views

I have a view in SQL lets call it MyCustomView.
If I was to write a simple SQL query to count and sum I could do something like: SELECT COUNT(*), SUM(ISNULL(ValueA, ValueB)) FROM MyCustomView
Is it possible to translate that query in EF Core? Diggin around I found the answers mentioning the user of GroupBy 1 (however this doesn't seem to work for views), i.e.
context
.Query<MyCustomView>()
.GroupBy(p => 1)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)}
The issue I am having is that whenever I attempt to run the query I get a complaint about having to run the group by on the client. However If I was to replace the .Query<MyCustomView>() with a DbSet property from the context then that query works fine. So I am guessing it has to do with the fact that I am trying to execute the operation on a View. Is there a way to achieve this behaviour with a View or am I out of luck again with EF Core :(
Querying Views are notoriously slow when they are not indexed. Instead you can convert your View results into a list first, then query that list. It will eliminate the querying the view on the SQL side and should speed up the overall process.
context
.Query<MyCustomView>()
.ToList()
.GroupBy(p => 1)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)}
I will say, the proper solution (if you can do it) is to index the view.
For anyone that is curious (or until someone else manages to provide an anwser) I managed to get it work by creating a linq query like this:
const a = 1;
context
.Query<MyCustomView>()
// For some reason adding the below select lets it execute
.Select(p => new { p.ValueA, p.ValueB })
.GroupBy(p => a)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)})
.First();
Also according the EF Core team this has been sorted in EF Core 3+, unfortunately I haven't got the luxury to upgrade to 3.

Best order to place the Fetch or FetchMany in nHibernate Linq statements

If you do an nHiberante Linq query and you want to eager load the related objects.
Where do you put the Fetch or the FetchMany ?
Like this:
_session.Query<Entity>()
.Where(x => x.IsSomething)
.FetchMany(x => x.Children);
Or like this:
_session.Query<Entity>()
.FetchMany(x => x.Children)
.Where(x => x.IsSomething);
I want to know the best order to place the Fetch or FetchMany (For performance). Or does the order even matter? When I use entity framework i always write the includes first, is this the same case in nHibernate?
We use the specification pattern with nHibernate. So is it smart to put the Fetch or FetchMany in the specifications?
Doesn't matter. IQueryable is just a query statement and Fetch(), FetchMany() is just a setting on the Nhibernate query which enlarge the query to return more data so that you don't perform lazy loading later on. The query is not sent to the database until you call ToList(), Single() etc. The linq query is then transformed into a sql query which will contain more joins and columns and THEN it is sent to the database server.
The following query will do a "similar" join when fetching the entity + children but here I map it into a anonymous object:
_session
.Query<Entity>()
.Where(x => x.IsSomething)
.Select(x => new { MyEntity = x, MyEntitiesChildren = x.Children });

Categories