EF Core 2.1 Group By for Views - c#

I have a view in SQL lets call it MyCustomView.
If I was to write a simple SQL query to count and sum I could do something like: SELECT COUNT(*), SUM(ISNULL(ValueA, ValueB)) FROM MyCustomView
Is it possible to translate that query in EF Core? Diggin around I found the answers mentioning the user of GroupBy 1 (however this doesn't seem to work for views), i.e.
context
.Query<MyCustomView>()
.GroupBy(p => 1)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)}
The issue I am having is that whenever I attempt to run the query I get a complaint about having to run the group by on the client. However If I was to replace the .Query<MyCustomView>() with a DbSet property from the context then that query works fine. So I am guessing it has to do with the fact that I am trying to execute the operation on a View. Is there a way to achieve this behaviour with a View or am I out of luck again with EF Core :(

Querying Views are notoriously slow when they are not indexed. Instead you can convert your View results into a list first, then query that list. It will eliminate the querying the view on the SQL side and should speed up the overall process.
context
.Query<MyCustomView>()
.ToList()
.GroupBy(p => 1)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)}
I will say, the proper solution (if you can do it) is to index the view.

For anyone that is curious (or until someone else manages to provide an anwser) I managed to get it work by creating a linq query like this:
const a = 1;
context
.Query<MyCustomView>()
// For some reason adding the below select lets it execute
.Select(p => new { p.ValueA, p.ValueB })
.GroupBy(p => a)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)})
.First();
Also according the EF Core team this has been sorted in EF Core 3+, unfortunately I haven't got the luxury to upgrade to 3.

Related

Get Distinct Entries Based On Specific Column in Entity Framework

I have a SQLite table that contains every test result we've run, and I'm looking to write an entity framework query that returns only the most recent test result per project.
Normally, I'd assume this would be "group by project id, return row with the max updated value," or, alternatively, "sort by date and return first". However, when I try the query, I keep getting Entity Framework "could not be translated" errors.
Here's what I've tried:
results = await _context.Results
.Include(x => x.Project)
.AsNoTracking()
.OrderByDescending(x => x.Updated)
.GroupBy(x => x.ProjectId, (x, y) => y.First())
.ToListAsync();
However, I keep receiving errors that the .First() command could not be translated by Entity Framework. Is there something I'm missing (or, alternatively, a better way to write the query that is more entity framework friendly)?
For reference, here's the operation I'm trying to do in normal SQL: https://thoughtbot.com/blog/ordering-within-a-sql-group-by-clause
I'd prefer to do as much as the work on the server as possible, because there are only a small number of projects, but there could be thousands of results, and I'd rather not do client-side filtering if possible.
The application is written for ASP.NET Core 3.1, using Entity Framework Core.
Minor edit: While SQLite is being used for development, the final code will run against SQL Server, hence the desire to do processing server-side.
Try with a subquery instead of a grouping. Like this:
results = await _context.Results
.Include(x => x.Project)
.AsNoTracking()
.Where( r => r.Id == _context.Results.Where( rr => rr.ProjectId == r.ProjectID).Max( rr => rr.Id) )
.ToListAsync();
Your method couldn't be translated to T-SQL, Linq to Entities couldn't recognize it. You can modify the code as below (adding AsEnumerable after AsNoTracking):
.Include(x => x.Project)
.AsNoTracking()
.AsEnumerable()
With AsEnumerable after data is loaded, any further operation is performed using Linq to Objects, on the data already in memory.
While this isn't portable, here's how this can be done using SQLite-compatable SQL and Entity Framework:
results = await _context.Results
.FromSqlRaw("SELECT Results.* FROM (SELECT Id, ProjectId, MAX(Updated) as Updated " +
"FROM Results GROUP BY ProjectId) as latest_results " +
"INNER JOIN Results ON Results.Id = latest_results.Id")
.Include(x => x.Project) //not required for question but useful
.OrderBy(x => x.Project.Name)
.AsNoTracking()
.ToListAsync();
If someone has a way to do this in pure LINQ/EF, but still perform the query server-side, I'll happily mark that as the answer, since this is dependent on the exact SQL dialect used.

EF Core: Fetching A List of Entities with Their Children

I have Post and PostImage entities. A Post entity can have a list of PostImage entities (i.e., one-to-many). I want to fetch a list of all posts and include all of its list of images. So I wrote the following piece of code:
var posts = _appDataContext.Posts
.Select(x => new
{
x.Id,
x.Title,
x.Body,
Images = x.Images.Select(y => new
{
y.Id
})
});
The code is all executed in the database which is what I want, but here's the catch. From the console log, it appears that EF is first fetching the list of posts, and then it loops over them to fetch their corresponding images (extra queries + extra fetching time). Is there any other way to fetch the data all at once (posts + their images). Both posts and images have extra columns, that's why I used the Select statement; to filter out the columns that I don't need. I tried using Include, but nothing has changed.
P.S. I'm using EntityFramework Core.
At once (single SQL query) - no. Because this is how EF Core queries work. The minimum is one SQL query for the main data + 1 SQL query for each collection. For your case the minimum is 2 SQL queries. Still much better than N + 1 queries issue you are experiencing currently.
The solution is to use EF Core 2.1+ which has Optimization of correlated subqueries. Also as mentioned in the documentation link, you have to opt-in for that optimization by "including ToList() in the right place(s)":
var posts = _appDataContext.Posts
.Select(x => new
{
x.Id,
x.Title,
x.Body,
Images = x.Images.Select(y => new
{
y.Id
}).ToList() // <--
});

Use skip and take inside a LINQ include

I have an object that has a property which is a collection of another object. I would like to load just a subset of the collection property using LINQ.
Here's how I'm trying to do it:
manager = db.Managers
.Include(m => m.Transactions.Skip((page - 1) * 10).Take(10))
.Where(m => m.Id == id)
.FirstOrDefault();
The code above throws an error that says
The Include path expression must refer to a navigation property defined on the type. Use dotted paths for reference navigation properties and the Select operator for collection navigation properties.\r\nParameter name: path
What is the right way to do this in LINQ? Thanks in advance.
You cannot do this with Include. EF simply doesn't know how to translate that to SQL. But you can do something similar with sub-query.
manager = db.Managers
.Where(m => m.Id == id)
.Select(m => new { Manager = m,
Transactions = db.Transactions
.Where(t=>t.ManagerId == m.Id)
.Skip((page-1) * 10)
.Take(10)})
.FirstOrDefault();
This will not return instance of Manager class. But it should be easy to modify it to suit your needs.
Also you have two other options:
Load all transactions and then filter in memory. Of course if there are a lot of transactions this might be quite inefficient.
Don't be afraid to make 2 queries in database. This is prime example when that is probably the best route, and will probably be the most efficient way of doing it.
Either way, if you are concerned with performance at all I would advise you to test all 3 approaches and see what is the fastest. And please let us know what were the results!
Sometimes the added complexity of putting everything in a single query is not worth it. I would split this up into two separate queries:
var manager = db.Managers.SingleOrDefault(m => m.Id == id);
var transactions = db.Transactions
.Where(t => t.ManagerId == id)
// .OrderBy(...)
.Skip((page - 1) * 10).Take(10)
.ToList();
Note that after doing this, manager.Transactions can be used as well to refer to those just-loaded transactions: Entity Framework automatically links loaded entities as long as they're loaded into the same context. Just make sure lazy loading is disabled, to prevent EF from automatically pulling in all other transactions that you specifically tried to filter out.

EF7 projection doesnt eager load collections

When selecting entities with "include" all my items gets fetched with a single SQL join statement. But when i project it to some other form with its children, the join is no longer executed, instead a separate query per row is executed to get the children. How can i prevent this? My goal is to reduce the columns fetched, and to reduce the amount of queries
This issue leads me to believe that this should work: https://github.com/aspnet/EntityFramework/issues/599
//executes ONE query as expected
context.Parents.Include(p => p.Children).ToList();
//executes MULTIPLE queries
context.Parents.Include(p => p.Children).Select(p => new {
Id = p.Id,
Name = p.Name,
Children = p.Children.Select(c => new {
Id = c.Id,
Name = c.Name
})
}).ToList();
You are seeing multiple queries sent to the database because EF Core is not yet smart enough to translate navigations in a projection to a JOIN. Here is the issue tracking this feature - https://github.com/aspnet/EntityFramework/issues/4007.
BTW as previously mentioned by others, Include only works when the entity type if part of the result (i.e. it means "if you end up creating instances of the entity type then make sure this navigation property is populated").
Your problem is here:
Children = p.Children.Select(c => new {
Id = c.Id,
Name = c.Name
})
eager loading statement Include() work only with requests without projections.
instead of this you can do:
context.Parents.Include(p => p.Children).AsEnumerable()
.Select(p => new {
Id = p.Id,
Name = p.Name,
Children = p.Children.Select(c => new {
Id = c.Id,
Name = c.Name
})
}).ToList();
AsEnumerable() says to EF that all code after it should be executed on objects and should not be transfered to sql requests.
This is partially fixed for EFCore 2.0.0-preview2 (currently not on nuget)
https://github.com/aspnet/EntityFramework/commit/0075cb7c831bb3618bee4a84c9bfa86a499ddc6c
This is partially addressed by #8584 - if the collection navigation is
not composed on, we re-use include pipeline which creates 2 queries,
rather than N+1. If the collection is filtered however, we still use
the old rewrite and N+1 queries are being issued
I don't know exactly what composed on means here and if it just means that the Child object cannot be filtered (probably acceptable for most cases), but my guess is that certainly your original query as it stands would now work.

Speed up entity framework

I've been looking to optimize a web application which uses MVC 2 and EF4. Listing queries took ~22seconds for ~10k rows with 14 columns, which is obviously too slow.
So as part of this I've upgraded to MVC 4 and EF 6.1 (highest I could go with VS2010).
For read-only queries I've added .AsNoTracking() to the queries, this dropped the time to ~3seconds. I'm wondering if there's anything more I could do to get it down to ~1seconds.
my code so far is:
category = CategoryHelper.MapToOldFormat(category);
var mainIds = Repository.Categories
.Include(o => o.LinkedCategories)
.Where(o => o.Category1.Contains(category))
.AsNoTracking()
.ToList();
var linkedCats = mainIds.SelectMany(o => o.LinkedCategories).Union(mainIds).Select(c => c.Id);
var notifications = Repository.Notifications
.Include(o => o.Country)
.Include(o => o.NonEUCountries)
.Include(o => o.Language)
.Include(o => o.RAW)
.Include(o => o.RAW.Classification)
.Include(o => o.RAW.TransactionPN)
.AsNoTracking();
if (id != null)
{
notifications = notifications.Where(o => o.Id == id);
}
if (!string.IsNullOrWhiteSpace(category))
{
notifications = notifications.Where(o => linkedCats.Contains(o.RAW.Classification.CategoryID));
}
return notifications.Logged(MethodBase.GetCurrentMethod()).ToList();
In the benchmarks category wand id were null, so the IN for category doesn't get generated. I will be replacing that with a intflag in the future as a fast way to support multiple categories.
Are there any other big performance problems with this example query?
First of all, listing 10k results is painful. You need to use paging with large datasets.
Imagine the cost of moving relational data to 10k instances of some class and inject run-time features like self-tracking or lazy-loading. A loop of 10k iterations where each iteration has complex code. It should be slow by default, shouldn't it?
Thus, it seems that you need to leverage LINQ's extension methods like .Skip(...) and .Take(...).
Another improvement would be a result of analyzing your current data schema in your DB and your object model, because 1 table is 1 class (with 14 columns / properties) could be a problem: maybe a segmented design would improve your scenario.
Anyway, paging will be your friend. This should reduce query times to a fraction of a second.
Update
#Phyx said in some comment:
I Inherited the project and don't have the budget to change ALL the
listings.
If you can't change that, I would say caching should be the solution. 1 user should receive the impact of these unoptimized (non-optimizable...) queries and the rest would be consuming an output cache that might last in small time intervals, but it might be enough to speed up your application and reduce load times.

Categories