EF7 projection doesnt eager load collections - c#

When selecting entities with "include" all my items gets fetched with a single SQL join statement. But when i project it to some other form with its children, the join is no longer executed, instead a separate query per row is executed to get the children. How can i prevent this? My goal is to reduce the columns fetched, and to reduce the amount of queries
This issue leads me to believe that this should work: https://github.com/aspnet/EntityFramework/issues/599
//executes ONE query as expected
context.Parents.Include(p => p.Children).ToList();
//executes MULTIPLE queries
context.Parents.Include(p => p.Children).Select(p => new {
Id = p.Id,
Name = p.Name,
Children = p.Children.Select(c => new {
Id = c.Id,
Name = c.Name
})
}).ToList();

You are seeing multiple queries sent to the database because EF Core is not yet smart enough to translate navigations in a projection to a JOIN. Here is the issue tracking this feature - https://github.com/aspnet/EntityFramework/issues/4007.
BTW as previously mentioned by others, Include only works when the entity type if part of the result (i.e. it means "if you end up creating instances of the entity type then make sure this navigation property is populated").

Your problem is here:
Children = p.Children.Select(c => new {
Id = c.Id,
Name = c.Name
})
eager loading statement Include() work only with requests without projections.
instead of this you can do:
context.Parents.Include(p => p.Children).AsEnumerable()
.Select(p => new {
Id = p.Id,
Name = p.Name,
Children = p.Children.Select(c => new {
Id = c.Id,
Name = c.Name
})
}).ToList();
AsEnumerable() says to EF that all code after it should be executed on objects and should not be transfered to sql requests.

This is partially fixed for EFCore 2.0.0-preview2 (currently not on nuget)
https://github.com/aspnet/EntityFramework/commit/0075cb7c831bb3618bee4a84c9bfa86a499ddc6c
This is partially addressed by #8584 - if the collection navigation is
not composed on, we re-use include pipeline which creates 2 queries,
rather than N+1. If the collection is filtered however, we still use
the old rewrite and N+1 queries are being issued
I don't know exactly what composed on means here and if it just means that the Child object cannot be filtered (probably acceptable for most cases), but my guess is that certainly your original query as it stands would now work.

Related

EF Core: Fetching A List of Entities with Their Children

I have Post and PostImage entities. A Post entity can have a list of PostImage entities (i.e., one-to-many). I want to fetch a list of all posts and include all of its list of images. So I wrote the following piece of code:
var posts = _appDataContext.Posts
.Select(x => new
{
x.Id,
x.Title,
x.Body,
Images = x.Images.Select(y => new
{
y.Id
})
});
The code is all executed in the database which is what I want, but here's the catch. From the console log, it appears that EF is first fetching the list of posts, and then it loops over them to fetch their corresponding images (extra queries + extra fetching time). Is there any other way to fetch the data all at once (posts + their images). Both posts and images have extra columns, that's why I used the Select statement; to filter out the columns that I don't need. I tried using Include, but nothing has changed.
P.S. I'm using EntityFramework Core.
At once (single SQL query) - no. Because this is how EF Core queries work. The minimum is one SQL query for the main data + 1 SQL query for each collection. For your case the minimum is 2 SQL queries. Still much better than N + 1 queries issue you are experiencing currently.
The solution is to use EF Core 2.1+ which has Optimization of correlated subqueries. Also as mentioned in the documentation link, you have to opt-in for that optimization by "including ToList() in the right place(s)":
var posts = _appDataContext.Posts
.Select(x => new
{
x.Id,
x.Title,
x.Body,
Images = x.Images.Select(y => new
{
y.Id
}).ToList() // <--
});

EF Core 2.1 Group By for Views

I have a view in SQL lets call it MyCustomView.
If I was to write a simple SQL query to count and sum I could do something like: SELECT COUNT(*), SUM(ISNULL(ValueA, ValueB)) FROM MyCustomView
Is it possible to translate that query in EF Core? Diggin around I found the answers mentioning the user of GroupBy 1 (however this doesn't seem to work for views), i.e.
context
.Query<MyCustomView>()
.GroupBy(p => 1)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)}
The issue I am having is that whenever I attempt to run the query I get a complaint about having to run the group by on the client. However If I was to replace the .Query<MyCustomView>() with a DbSet property from the context then that query works fine. So I am guessing it has to do with the fact that I am trying to execute the operation on a View. Is there a way to achieve this behaviour with a View or am I out of luck again with EF Core :(
Querying Views are notoriously slow when they are not indexed. Instead you can convert your View results into a list first, then query that list. It will eliminate the querying the view on the SQL side and should speed up the overall process.
context
.Query<MyCustomView>()
.ToList()
.GroupBy(p => 1)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)}
I will say, the proper solution (if you can do it) is to index the view.
For anyone that is curious (or until someone else manages to provide an anwser) I managed to get it work by creating a linq query like this:
const a = 1;
context
.Query<MyCustomView>()
// For some reason adding the below select lets it execute
.Select(p => new { p.ValueA, p.ValueB })
.GroupBy(p => a)
.Select(grp => new { count = grp.Count(), total = Sum(p=>p.ValueA ?? p.ValueB)})
.First();
Also according the EF Core team this has been sorted in EF Core 3+, unfortunately I haven't got the luxury to upgrade to 3.

Include Path Expression Must Refer To A Navigation Property

I've search a lot about my problem but I didn't find any clear solution. I just know that I can't use Where linq clause with Include but it doesn't make sense to me that how I make this query.
var brands = await _context.Brands
.Include(x => x.FoodCategories
.Select(y => y.Products
.Where(z => z.Sugar)
.Select(w => w.FileDetail)))
.ToListAsync();
Actually I want to apply the Where statement on Products but I want entities in hierarchy like I do here. How can I do it?
I've already try myself with different stackoverflow question answer but I'm not getting the point. Here is my trial:
var brands = _context.Brands
.Select(b => new
{
b,
FoodCategories = b.FoodCategories
.Where(x => x.BrandId == b.BrandId)
.Select(c => new
{
c,
Products = c.Products
.Where(y => y.FoodCategoryId == c.FoodCategoryId &&
y.Sugar)
.Select(p => new
{
p,
File = p.FileDetail
})
})
})
.AsEnumerable()
.Select(z => z.b)
.ToList();
But it is not returning all the product items instead of sugar only products.
Why you're only getting sugar products.
But it is not returning all the product items instead of sugar only products.
Of course it is. Because you're asking it to only give you the sugar products:
var brands = _context.Brands
.Select(b => new
{
b,
FoodCategories = b.FoodCategories
.Where(x => x.BrandId == b.BrandId)
.Select(c => new
{
c,
Products = c.Products
.Where(y => y.FoodCategoryId == c.FoodCategoryId
&& y.Sugar) //HERE!
.Select(p => new
{
p,
File = p.FileDetail
})
})
})
.AsEnumerable()
.Select(z => z.b)
.ToList();
If you want all products; then don't filter on only the ones where Sugar is set to true.
There is a lot of redundant code here.
b.FoodCategories.Where(x => x.BrandId == b.BrandId)
b.FoodCategories already expresses the food categories of this particular brand b. You don't need the Where.
The same applies to
c.Products.Where(y => y.FoodCategoryId == c.FoodCategoryId ... )
Here's an improved version of your (second) snippet:
var brands = _context.Brands
.Select(b => new
{
b,
FoodCategories = b.FoodCategories
.Select(c => new
{
c,
Products = c.Products
.Select(p => new
{
p,
File = p.FileDetail
})
})
})
.AsEnumerable()
.Select(z => z.b)
.ToList();
This should make it clearer that the custom Select logic isn't necessary. All you're doing is loading the related entities into properties of the same name. You can simply rely on the existing entities and their relations, there's no reason to define the same relationship again.
The only reason a custom Select would be desirable here was if:
You wanted to limit the retrieved columns in order to lower the data size (useful for large queries)
You want to selectively load children, not just all related children. Your code suggest that you want this, but then you say "But it is not returning all the product items" so I conclude that you don't want to filter the products on their sugar content.
Why your Include didn't work.
SImply put: you cannot use Where statements in includes.
Include statements are based on the structure of the entities, whereas a Where only filters data from a set. One has nothing to do with the other.
And even though you'd think it'd be nice to do something like "include the parent only if they have an active status", that's simply not how Include was designed to work.
Include boils down to "for every [type1], also load their related [type2]". This will be done for every [type1] object that your query will instantiate and it will load every related [type2].
Taking the next step in refactoring the above snippet:
var brands = _context.Brands
.Include(b => b.FoodCategories)
.Include(b => b.FoodCategories.Select(fc => fc.Products))
.Include(b => b.FoodCategories.Select(fc => fc.Products.Select(p => p.FileDetail)))
.ToList();
The includes give Entity Framework specific instructions:
For every loaded brand, load its related food categories.
For every loaded food category, load its related products.
For every loaded product, load its related file details.
Notice that it does not instruct WHICH brands should be loaded! This is an important distinction to make. The Include statements do not in any way filter the data, they only explain what additional data needs to be retrieved for every entry that will be loaded.
Which entries will be loaded has not been defined yet. By default, you get the whole dataset, but you can apply further filtering using Where statements before you load the data.
Think of it this way:
A restaurant wants every new customer's mother to give permission to serve dessert to the customer. Therefore, the restaurant drafts a rule: "every customer must bring their mother".
This is the equivalent of db.Customers.Include(c => c.Mother).
This does not state which customers are allowed to visit the restaurant. It only states that any customer that visits the restaurant must bring their mother (if they have no mother, they will bring null instead).
Notice how this rule applies regardless of which customers visit the restaurant:
Ladies night: db.Customers.Include(c => c.Mother).Where(c => c.IsFemale)
Parents night: db.Customers.Include(c => c.Mother).Where(c => c.Children.Any())
People whose father is named Bob night: db.Customers.Include(c => c.Mother).Where(c => c.Father.Name == "Bob")
Take note of the third example. Even though you filter on the father, you will only load the mother entity. It's perfectly possible to filter items on related entity values without actually loading the entities themselves (fathers).
You may ask yourself "why Select?". That's a good question, because it's not intuitive here.
Ideally, you'd want to do something like
context.Brand.Include(b => b.FoodCategories.Products.FileDetails)
But this is not possible because of a limitation in the language. FoodCategories is a List<FoodCategory>, which does not have a Products property.
However, FoodCategory itself does have a Products property. This is why Select is used: it allows you to access the properties of the list element's type, rather than the list itself.
Internally, EF is going to deconstruct your Select statement (which is an Expression) and it will figure out which property you want to be loaded. Don't worry too much about how EF works behind the scenes. It's not always pretty.
The Include/Select syntax is not the prettiest. Especially when you drill down multiple levels, it becomes cumbersome to write (and read).
So I suggest you invert your approach (start at the lowest child, drill up to the parent). Technically, it yields the same result, but it allows for a neater Include syntax:
var brands = context.FileDetails
.Include(fd => fd.Product)
.Include(fd => fd.Product.FoodCategory)
.Include(fd => fd.Product.FoodCategory.Brand)
.Select(fd => fd.Product.FoodCategory.Brand)
Now you don't need any nasty Select workaround in order to reference the related types.
Do note that you need to put an Include for every step! You can't just use the last Include and skip the others. EF does not infer that it needs to load multiple relations from a single Include.
Note that this trick only really works if you have a chain of one-to-many relationships. Many-to-many relationships make it harder to apply this trick. At worst, you'll have to resort to using the Select syntax from the earlier example.
While I am not a fan of the Include methods that take a string parameter (I don't like hardcoded strings that can fail on typos), I do feel it's relevant to mention here that they do not suffer from this issue. If you use the string-based includes, you can do things like:
context.Brands
.Include("FoodCategories")
.Include("FoodCategories.Products")
.Include("FoodCategories.Products.FileDetails")
The parsing logic of the string include method will automatically look for the element inside the List, thereby effectively preventing the ugly syntax.
But there are other reasons why I generally don't advise using string parameters here (doesn't update when you rename a property, no intellisense, very prone to developer error)

Include() vs Select() performance

I have a parent entity with a navigation property to a child entity. The parent entity may not be removed as long as there are associated records in the child entity. The child entity can contain hundreds of thousands of records.
I'm wondering what will be the most efficient to do in Entity Framework to do this:
var parentRecord = _context.Where(x => x.Id == request.Id)
.Include(x => x.ChildTable)
.FirstOrDefault();
// check if parentRecord exists
if (parentRecord.ChildTable.Any()) {
// cannot remove
}
or
var parentRecord = _context.Where(x => x.Id == request.Id)
.Select(x => new {
ParentRecord = x,
HasChildRecords = x.ChildTable.Any()
})
.FirstOrDefault();
// check if parentRecord exists
if (parentRecord.HasChildRecords) {
// cannot remove
}
The first query may include thousands of records while the second query will not, however, the second one is more complex.
Which is the best way to do this?
I would say it depens. It depends on which DBMS you're using. it depends on how good the optimizer works etc.
So one single statement with a JOIN could be far faster than a lot of SELECT statements.
In general I would say when you need the rows from your Child table use .Include(). Otherwise don't include them.
Or in simple words, just read the data you need.
The answer depends on your database design. Which columns are indexed? How much data is in table?
Include() offloads work to your C# layer, but means a more simple query. It's probably the better choice here but you should consider extracting the SQL that is generated by entity framework and running each through an optimisation check.
You can output the sql generated by entity framework to your visual studio console as note here.
This example might create a better sql query that suites your needs.

Check if list contains item from other list in EntityFramework

I have an entity Person which has a list of locations associated with it. I need to query the persons table and get all those that have at least one location from a list of locations (criteria). The following works but is highly inefficient:
var searchIds = new List<int>{1,2,3,4,5};
var result = persons.Where(p => p.Locations.Any(l => searchIds.Any(id => l.Id == id)));
This works fine for small lists (say 5-10 searchIds and a person with 5-10 locations. The issue is that some persons may have 100 locations and a search can also be for 100 locations at once. When I tried to execute the above EF actually produced a 2000+ SQL statement and failed because it was too deeply nested. While the nesting is already a problem in itself, even if it would work, I'd still not be very happen with a 2000+ SQL statement.
Note: the real code also includes multiple levels and parent-child relations, but I did manage to get it down to this fairly flat structure using only id's, instead of full objects
What would be the best way to accomplish this in EF?
I'll suggest:
var searchIds = new List<int>{1,2,3,4,5};
var result = persons.Where(p => p.Locations.Any(l => searchIds.Contains(l.Id)));
Contains will be translated to IN statement.
Keep in mind that the id list goes into the sql statement. If your id list is huge then you'll end up having a huge query.
Try switching to joins instead of doing a massive data include:
var searchIds = new List<int>{1,2,3,4,5};
var results = (from p in persons
join l in Location on p.PersonId equals l.PersonId
where searchIds.Contains(l.Id)
select p).Distinct().ToList();
Obviously fix this line to match your classes and/or join property.
join l in Location on p.PersonId equals l.PersonId
I would expect that to generate a more friendly execution plan.
You may try this.
List<EnquirePriceSub> e = getSomethings();
var data = appDb.EnquirePriceSubs.Where(w=> e.Select(s=>s.Id).Contains(w.Id)).ToList();

Categories