I have a simple question but didn't find an answers.
If I do so
var result = _db.Table.Include(t => t.Child).Where(t => t.Id == id).Single();
when join is calling?
After it found my entity or it includes every child during SQL looking for the row?
Lets see at the example based on simple db model:
public class Head
{
//... columns
public virtual Child {get; set;}
public Guid? ChildId {get; set;}
}
void main()
{
//The first version of code
var child = _db.Head.Include(h => h.Child)
.FirstOrDefault(//boring staff but we don't need child here)
?.Child;
if (child != null)
foo(child);
//The second one
var head = _db.Head.FirstOrDefault(//boring staff);
if (head != null && head.ChildId.HasValue)
foo(head.Child); // I know here we make a new request to our db
}
Which of two options are more productive?
I'm worry about "extra childs loading by SQL" when I need only one object based on filters query of parent table.
Thanks in advance!
It will evaluate the where condition first. Not in C# but in SQL which gets generated.
This will generate a SQL something like
SELECT top 1 .... FROM Table t
JOIN Child c ....
WHERE t.Id = id
Your database server will create a execution plan which will look for the item in the index and get corresponding child.
Without Include the loading of Child objects is deferred until you need them. Hence, if you were to iterate parent/child groups like this
foreach (var parent in _db.Table.Include(t => t.Child).Where(p => p.Name.StartsWith("Q")))
foreach (var child in parent.Child)
Console.WriteLine($"{child}, child of {parent}");
the number of round-trips would be equal to the number of parents plus one.
If you use Include, all Child objects are loaded along with the parent object, without making a separate round-trip for each parent. Hence, the number of database round-trips for the above code would be equal to 1.
In a case with Single, which could be rewritten as follows
var result = _db.Table.Include(t => t.Child).Single(t => t.Id == id);
the number of round-trips would be 1 with Include and 2 without Include.
Related
I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.
Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.
Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };
Consider two entities Person which has a one-to-many collection Vehicles
public class Person
{
public IList<Vehicle> Vehicles { get; set;}
}
public class Vehicle
{
public string Name { get; set;}
public Person Owner { get; set; }
}
I display a grid of Persons having vehicle and show the name of the first vehicle in the grid. The grid is paginated. I use the following criteria to fetch the data
I have a criteria for loading data for a grid view as
var criteria = DetachedCriteria.For<Person>()
.CreateAlias("Vehicles","vehicle", JoinType.InnerJoin)
.SetResultTransformer(new DistinctRootEntityResultTransformer())
.SetMaxResults(pageSize)
.SetFirstResult((page - 1) * pageSize)
criteria.Add(Restrictions.Eq("vehicle.Name", "super"));
where page and pageSize are calculated bits.
The problem is since max results and first results are calculated in the database and distinct root is done outside, the number of rows do not match.
Is there a way to resolve this issue ?
This kind of queries should always use subquery instead of any type of JOIN. That also means, that the colleciton item has reference to parent (as in our case).
So, here we create the inner select for Vehicle:
var vehicles = DetachedCriteria.For<Vehicle>();
// add any amount or kind of WHERE parts
vehicles.Add(Restrictions.Eq("vehicle.Name", "super"))
// and essential SELECT Person ID
vehicles.SetProjection( Projections.Property("Owner.ID"));
Now, we can adjust the above query, to work only on a root/parent level:
var criteria = DetachedCriteria.For<Person>()
// instead of this
// .CreateAlias("Vehicles","vehicle", JoinType.InnerJoin)
// we will use subquery
.Add(Subqueries.PropertyIn("ID", vehicles));
// Wrong to use this approach at all
//.SetResultTransformer(new DistinctRootEntityResultTransformer())
.SetMaxResults(pageSize)
.SetFirstResult((page - 1) * pageSize)
That will create SELECT like this:
SELECT p....
FROM Person AS p
WHERE p.ID IN (
SELECT v.OwnerId
FROM Vehcile AS v
WHERE v.Name = 'super' ...
)
See also:
Query on HasMany reference
In NHibernate, using a Disjunction gives double results
And how to fetch the collection of Vehicles (until now just used for filtering)? The best (if not only) way is to use 1+1 SELECT statements. The easy and built-in solution is batch-size setting. Just mark the collection of Vehicles with this setting (e.g. batch-size="25") and with few more SELECT statements all data will be effectively loaded. See:
19.1.5. Using batch fetching
How to Eager Load Associations without duplication in NHibernate?
I am projecting LINQ to SQL results to strongly typed classes: Parent and Child. The performance difference between these two queries is large:
Slow Query - logging from the DataContext shows that a separate call to the db is being made for each parent
var q = from p in parenttable
select new Parent()
{
id = p.id,
Children = (from c in childtable
where c.parentid = p.id
select c).ToList()
}
return q.ToList() //SLOW
Fast Query - logging from the DataContext shows a single db hit query that returns all required data
var q = from p in parenttable
select new Parent()
{
id = p.id,
Children = from c in childtable
where c.parentid = p.id
select c
}
return q.ToList() //FAST
I want to force LINQ to use the single-query style of the second example, but populate the Parent classes with their Children objects directly. otherwise, the Children property is an IQuerierable<Child> that has to be queried to expose the Child object.
The referenced questions do not appear to address my situation. using db.LoadOptions does not work. perhaps it requires the type to be a TEntity registered with the DataContext.
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Parent>(p => p.Children);
db.LoadOptions = options;
Please Note: Parent and Child are simple types, not Table<TEntity> types. and there is no contextual relationship between Parent and Child. the subqueries are ad-hoc.
The Crux of the Issue: in the 2nd LINQ example I implement IQueriable statements and do not call ToList() function and for some reason LINQ knows how to generate one single query that can retrieve all the required data. How do i populate my ad-hoc projection with the actual data as is accomplished in the first query? Also, if anyone could help me better-phrase my question, I would appreciate it.
It's important to remember that LINQ queries rely in deferred execution. In your second query you aren't actually fetching any information about the children. You've created the queries, but you haven't actually executed them to get the results of those queries. If you were to iterate the list, and then iterate the Children collection of each item you'd see it taking as much time as the first query.
Your query is also inherently very inefficient. You're using a nested query in order to represent a Join relationship. If you use a Join instead the query will be able to be optimized appropriately by both the query provider as well as the database to execute much more quickly. You may also need to adjust the indexes on your database to improve performance. Here is how the join might look:
var q = from p in parenttable
join child in childtable
on p.id equals child.parentid into children
select new Parent()
{
id = p.id,
Children = children.ToList(),
}
return q.ToList() //SLOW
The fastest way I found to accomplish this is to do a query that returns all the results then group all the results. Make sure you do a .ToList() on the first query, so that the second query doesn't do many calls.
Here r should have what you want to accomplish with only a single db query.
var q = from p in parenttable
join c in childtable on p.id equals c.parentid
select c).ToList();
var r = q.GroupBy(x => x.parentid).Select(x => new { id = x.Key, Children=x });
You must set correct options for your data load.
options.LoadWith<Document>(d => d.Metadata);
Look at this
P.S. Include for the LINQToEntity only.
The second query is fast precisely because Children is not being populated.
And the first one is slow just because Children is being populated.
Choose the one that fits your needs best, you simply can't have their features together!
EDIT:
As #Servy says:
In your second query you aren't actually fetching any information about the children. You've created the queries, but you haven't actually executed them to get the results of those queries. If you were to iterate the list, and then iterate the Children collection of each item you'd see it taking as much time as the first query.
Need to select data in entity framework but need to filter on the childrent and grandchildren
i have 4 tables. Parent -> Child -> GrandChild -> GreatGrandChild I want to return all the parents but filter on child and greatgrandchildren.
in other words (for example)
SELECT Parent.*
FROM Parent
INNER JOIN Child
INNER JOIN Grandchild
INNER JOIN GreatGrandChild
WHERE child.Column5 = 600 AND
GreatGrandChild.Column3 = 1000
it cant be anomymous type because i need to update the data and saveChanges to the db.
using vs 2010 and EF 4.0
Using linq you should need something like this.
var q = from q1 in dbContext.Parent
join q2 in dbContext.Children
on q1.key equals q2.fkey
join q3 in ........
where q4.col1 == 3000
select q1;
This query should do what you want. Yes, it is a bit of a mess because it is so deeply nested.
var result = context.Parent
.Where(parent => parent.Child
.Any(child => (child.Column5 == 600) &&
child.GrandChild
.Any(grandchild => grandchild.GreatGrandChild
.Any(greatgrandchild => greatgrandchild.Column3 == 1000))));
Your table structure - if your example is not just for illustration, leads me to think you may want to think more about your model here (i.e. are children separate entity types or should they be a defined relationship?)
What you describe is a simple joins and where clause though, and is written essentially the same way: assuming you are returning DBSet from your DBContext:
_context.Parents.Join(context.Child, p=>p.Parent.ID, c=>c.ParentID)
.Join(...Grandchild...).Where(o=>o.Column5=600)
.Join(...GreatGrandChild...).Where(o=>o.Column3=1000)
EDIT to get back the strongly typed entities you might need to do something like:
var greatgrandchildren = context.GreatGrandchildren.Where(o=>o.Column3=1000).ToList();
var grandchildren = context.Grandchildren.Where(o=>o.Column3=600 and greatgrandchildren.contains(o)).ToList();
var children = context.Children.Where(o=>grandchildren.Contains(o)).ToList();
var parents = context.Parent(o=>children.Contains(o).ToList();
My syntax might be off, and someone can add, can I avoid .ToList() to prevent roundtrips until the last call?
I have 5 tables in a L2S Classes dbml : Global >> Categories >> ItemType >> Item >> ItemData. For the below example I have only gone as far as itemtype.
//cdc is my datacontext
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Global>(p => p.Category);
options.AssociateWith<Global>(p => p.Category.OrderBy(o => o.SortOrder));
options.LoadWith<Category>(p => p.ItemTypes);
options.AssociateWith<Category>(p => p.ItemTypes.OrderBy(o => o.SortOrder));
cdc.LoadOptions = options;
TraceTextWriter traceWriter = new TraceTextWriter();
cdc.Log = traceWriter;
var query =
from g in cdc.Global
where g.active == true && g.globalid == 41
select g;
var globalList = query.ToList();
// In this case I have hardcoded an id while I figure this out
// but intend on trying to figure out a way to include something like globalid in (#,#,#)
foreach (var g in globalList)
{
// I only have one result set, but if I had multiple globals this would run however many times and execute multiple queries like it does farther down in the hierarchy
List<Category> categoryList = g.category.ToList<Category>();
// Doing some processing that sticks parent record into a hierarchical collection
var categories = (from comp in categoryList
where comp.Type == i
select comp).ToList<Category>();
foreach (var c in categories)
{
// Doing some processing that stick child records into a hierarchical collection
// Here is where multiple queries are run for each type collection in the category
// I want to somehow run this above the loop once where I can get all the Items for the categories
// And just do a filter
List<ItemType> typeList = c.ItemTypes.ToList<ItemType>();
var itemTypes = (from cat in TypeList
where cat.itemLevel == 2
select cat).ToList<ItemType>();
foreach (var t in itemTypes)
{
// Doing some processing that stick child records into a hierarchical collection
}
}
}
"List typeList = c.ItemTypes.ToList();"
This line gets executed numerous times in the foreach, and a query is executed to fetch the results, and I understand why to an extent, but I thought it would eager load on Loadwith as an option, as in fetch everything with one query.
So basically I would have expected L2S behind the scenes to fetch the "global" records in one query, take any primary key values, get the "category" children using one one query. Take those results and stick them into collections linked to the global. Then take all the category keys and excute one query to fetch the itemtype children and link those into their associated collections. Something on the order of (Select * from ItemTypes Where CategoryID in ( select categoryID from Categories where GlobalID in ( #,#,# ))
I would like to know how to properly eager load associated children with minimal queries and possibly how to accomplish my routine generically not knowing how far down I need to build the hierarchy, but given a parent entity, grab all the associated child collections and then do what I need to do.
Linq to SQL has some limitations with respect to eager loading.
So Eager Load in Linq To SQL is only
eager loading for one level at a time.
As it is for lazy loading, with Load
Options we will still issue one query
per row (or object) at the root level
and this is something we really want
to avoid to spare the database. Which
is kind of the point with eager
loading, to spare the database. The
way LINQ to SQL issues queries for the
hierarchy will decrease the
performance by log(n) where n is the
number of root objects. Calling ToList
won't change the behavior but it will
control when in time all the queries
will be issued to the database.
For details see:
http://www.cnblogs.com/cw_volcano/archive/2012/07/31/2616729.html
I am sure this could be done better, but I got my code working with minimal queries. One per level. This is obviously not really eager loading using L2S, but if someone knows the right way I would like to know for future reference.
var query =
from g in cdc.Global
where g.active == true && g.globalId == 41
select g;
var globalList = query.ToList();
List<Category> categoryList = g.category.ToList<Category>();
var categoryIds = from c in cdc.Category
where c.globalId == g.globalId
select c.categoryId;
var types = from t in cdc.ItemTypes
where categoryIds.Any(i => i == t.categoryId)
select t;
List<ItemType> TypeList = types.ToList<ItemType>();
var items = from i in cdc.Items
from d in cdc.ItemData
where i.ItemId == d.ItemId && d.labelId == 1
where types.Any(i => i == r.ItemTypes)
select new
{
i.Id,
// A Bunch of more fields shortened for berevity
d.Data
};
var ItemList = items.ToList();
// Keep on going down the hierarchy if you need more child results
// Do your processing psuedocode
for each item in list
filter child list
for each item in child list
.....
//
Wouldn't mind knowing how to do this all using generics and a recursive method given the top level table