It's more a technical (behind the scenes of EF) kind of question for a better understanding of Include for my own.
Does it make the query faster to Include another table when using a Select statement at the end?
ctx.tableOne.Include("tableTwo").Where(t1 => t1.Value1 == "SomeValueFor").Select(res => new {
res.Value1,
res.tableTwo.Value1,
res.tableTwo.Value2,
res.tableTwo.Value3,
res.tableTwo.Value4
});
Does it may depend on the number of values included from another table?
In the example above, 4 out of 5 values are from the included table. I wonder if it does have any performance-impact. Even a good or a bad one?
So, my question is: what is EF doing behind the scenes and is there any preferred way to use Include when knowing all the values I will select before?
In your case it doesn't matter if you use Include(<relation-property-name>) or not because you don't materialize the values before the Select(<mapping-expression>). If you use the SQL Server Profiler (or other profiler) you can see that EF generates two exactly the same queries.
The reason for this is because the data is not materialized in memory before the Select - you are working on IQueryable which means EF will generate an SQL query at the end (before calling First(), Single(), FirstOrDefault(), SingleOrDefault(), ToList() or use the collection in a foreach statement). If you use ToList() before the Select() it will materialize the entities from the database into your memory where Include() will come in hand not to make N+1 queries when accessing nested properties to other tables.
It is about how you want EF to load your data. If you want A 'Table' data to be pre populated than use Include. It is more handy if Include statement table is going to be used more frequently and it will be little slower as EF has to load all the relevant date before hand. Read the difference between Lazy and Eager loading. by using Include, it will be the eager loading where data will be pre populated while on the other hand EF will send a call to the secondary table when projection takes place i-e Lazy loading.
I agree with #Karamfilov for his general discussion, but in your example your query could not be the most performant. The performance can be affected by many factors, such as indexes present on the table, but you must always help EF in the generation of SQL. The Include method can produce a SQL that includes all columns of the table, you should always check what is the generated SQL and verify if you can obtain a better one using a Join.
This article explains the techniques that can be used and what impact they have on performance: https://msdn.microsoft.com/it-it/library/bb896272(v=vs.110).aspx
Related
As part of a small .NET Core 3 project, I'm trying to use the data model based in Entity Framework, but I'm having some troubles related with queries on joined tables.
When looking for data matching a condition in a single table, the model is easy to understand
List<Element> listOfElements = context.Elements.Where(predicate).ToList();
However, when this query requires joined tables, I'm not sure how to do it efficiently. After some investigation, it seems that the include (and theninclude) methods are the way to go, but I have the impression that the Where clause after the include is not executed at DB level but after all the data has been retrieved. This might work with small datasets, but I don't think it's a good idea for a production system with millions of rows.
List<Element> listOfElements = context.Elements.Include(x => x.SubElement).
Where(predicate).ToList();
I've seen some examples using EF+ library, but I'm looking for a solution using the nominal EF Core. Is there any clean/elegant way to do it?
Thank you.
There are a few scenarios when the data from DB is populating:
Deferred query execution: this is when you try to access your query results, for example, in the foreach statement.
Immediate Query Execution: this when you call ToList() (or conversions to other collections, like ToArray()).
I think the answer's to your question:
... but I have the impression that the Where clause after the include is not executed at DB level but after all the data has been retrieved.
is that your assumptions are wrong because you are calling ToList() at the end, not before the Where method.
For more information please also check here.
I've another suggestion also: to be sure about what is exactly executing at the DB level run SQL Server Profiler when executing your query.
Hope this will help ))
I have a question regarding the .AsNoTracking() extension, as this is all quite new and quite confusing.
I'm using a per-request context for a website.
A lot of my entities don't change so don't need to be tracked, but I have the following scenario where I'm unsure of what's going to the database, or even whether it makes a difference in this case.
This example is what I'm currently doing:
context.Set<User>().AsNoTracking()
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
This is the same as above but removing the .AsNoTracking() from Step 1:
context.Set<User>();
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
The Steps 1 & 2 use the same context but occur at different times. What I can't work out is whether there is any difference. As Step 2 is an update I'm guessing both will hit the database twice anyway.
Can anyone tell me what the difference is?
The difference is that in the first case the retrieved user is not tracked by the context so when you are going to save the user back to database you must attach it and set correctly state of the user so that EF knows that it should update existing user instead of inserting a new one. In the second case you don't need to do that if you load and save the user with the same context instance because the tracking mechanism handles that for you.
see this page Entity Framework and AsNoTracking
What AsNoTracking Does
Entity Framework exposes a number of performance tuning options to help you optimise the performance of your applications. One of these tuning options is .AsNoTracking(). This optimisation allows you to tell Entity Framework not to track the results of a query. This means that Entity Framework performs no additional processing or storage of the entities which are returned by the query. However, it also means that you can't update these entities without reattaching them to the tracking graph.
there are significant performance gains to be had by using AsNoTracking
No Tracking LINQ to Entities queries
Usage of AsNoTracking() is recommended when your query is meant for read operations. In these scenarios, you get back your entities but they are not tracked by your context.This ensures minimal memory usage and optimal performance
Pros
Improved performance over regular LINQ queries.
Fully materialized objects.
Simplest to write with syntax built into the programming
language.
Cons
Not suitable for CUD operations.
Certain technical restrictions, such as: Patterns using DefaultIfEmpty for
OUTER JOIN queries result in more complex queries than simple OUTER
JOIN statements in Entity SQL.
You still can’t use LIKE with general pattern matching.
More info available here:
Performance considerations for Entity Framework
Entity Framework and NoTracking
Disabling tracking will also cause your result sets to be streamed into memory. This is more efficient when you're working with large sets of data and don't need the entire set of data all at once.
References:
How to avoid memory overflow when querying large datasets with Entity Framework and LINQ
Entity framework large data set, out of memory exception
AsNoTracking() allows the "unique key per record" requirement in EF to be bypassed (not mentioned explicitly by other answers).
This is extremely helpful when reading a View that does not support a unique key because perhaps some fields are nullable or the nature of the view is not logically indexable.
For these cases the "key" can be set to any non-nullable column but then AsNoTracking() must be used with every query else records (duplicate by key) will be skipped.
If you have something else altering the DB (say another process) and need to ensure you see these changes, use AsNoTracking(), otherwise EF may give you the last copy that your context had instead, hence it being good to usually use a new context every query:
http://codethug.com/2016/02/19/Entity-Framework-Cache-Busting/
I have some distraction when i speak with my collegue and he said that Include in EF and Join in SQL is different things. Can you explain is it true? If yes, explaint in what difference is?
Thanks.
When querying EF through linq or lambda expressions, you only need join statements if the underlying schema doesn't provide FKs, and thus you don't have navigation properties on the objects.
On the other side, include (eager loading) and lazy loading can only work if there are FKs, because it uses the navigation properties.
The underlying sql in both cases will use joins (as sql has no "navigation property" concept).
As for performance, it depends on situations. Lazy loading vs Eager loading (so in FK scenario) can be a difficult choice.
I usually go with lazy loading, useful when you have a large main result, but you need "join" data only of a few items of the whole resultset.
If you know ahead that you'll need the join data of the whole resultset, eager loading could be better for performance. I'd suggest to experiment and see for yourself.
Taken from Understanding EF. Include vs Joins
When querying a large table where you need to access the navigation properties later on in code (I explicitly don't want to use lazy loading) what will perform better .Include() or .Load()? Or why use the one over the other?
In this example the included tables all only have about 10 entries and employees has about 200 entries, and it can happen that most of those will be loaded anyway with include because they match the where clause.
Context.Measurements.Include(m => m.Product)
.Include(m => m.ProductVersion)
.Include(m => m.Line)
.Include(m => m.MeasureEmployee)
.Include(m => m.MeasurementType)
.Where(m => m.MeasurementTime >= DateTime.Now.AddDays(-1))
.ToList();
or
Context.Products.Load();
Context.ProductVersions.Load();
Context.Lines.Load();
Context.Employees.Load();
Context.MeasurementType.Load();
Context.Measurements.Where(m => m.MeasurementTime >= DateTime.Now.AddDays(-1))
.ToList();
It depends, try both
When using Include(), you get the benefit of loading all of your data in a single call to the underlying data store. If this is a remote SQL Server, for example, that can be a major performance boost.
The downside is that Include() queries tend to get really complicated, especially if you have any filters (Where() calls, for example) or try to do any grouping. EF will generate very heavily nested queries using sub-SELECT and APPLY statements to get the data you want. It is also much less efficient -- you get back a single row of data with every possible child-object column in it, so data for your top level objects will be repeated a lot of times. (For example, a single parent object with 10 children will product 10 rows, each with the same data for the parent-object's columns.) I've had single EF queries get so complex they caused deadlocks when running at the same time as EF update logic.
The Load() method is much simpler. Each query is a single, easy, straightforward SELECT statement against a single table. These are much easier in every possible way, except you have to do many of them (possibly many times more). If you have nested collections of collections, you may even need to loop through your top level objects and Load their sub-objects. It can get out of hand.
Quick rule-of-thumb
Try to avoid having any more than three Include calls in a single query. I find that EF's queries get too ugly to recognize beyond that; it also matches my rule-of-thumb for SQL Server queries, that up to four JOIN statements in a single query works very well, but after that it's time to consider refactoring.
However, all of that is only a starting point.
It depends on your schema, your environment, your data, and many other factors.
In the end, you will just need to try it out each way.
Pick a reasonable "default" pattern to use, see if it's good enough, and if not, optimize to taste.
Include() will be written to SQL as JOIN: one database roundtrip.
Each Load()-instruction is "explicitly loading" the requested entities, so one database roundtrip per call.
Thus Include() will most probably be the more sensible choice in this case, but it depends on the database layout, how often this code is called and how long your DbContext lives. Why don't you try both ways and profile the queries and compare the timings?
See Loading Related Entities.
I agree with #MichaelEdenfield in his answer but I did want to comment on the nested collections scenario. You can get around having to do nested loops (and the many resulting calls to the database) by turning the query inside out.
Rather than loop down through a Customer's Orders collection and then performing another nested loop through the Order's OrderItems collection say, you can query the OrderItems directly with a filter such as the following.
context.OrderItems.Where(x => x.Order.CustomerId == customerId);
You will get the same resulting data as the Loads within nested loops but with just a single call to the database.
Also, there is a special case that should be considered with Includes. If the relationship between the parent and the child is one to one then the problem with the parent data being returned multiple times would not be an issue.
I am not sure what the effect would be if the majority case was where no child existed - lots of nulls? Sparse children in a one to one relationship might be better suited to the direct query technique that I outlined above.
Include is an example of eager loading, where as you not only load the entities you are querying for, but also all related entities.
Load is an manual override of the EnableLazyLoading. If this one is set to false. You can still lazily load the entity you asked for with .Load()
It's always hard to decide whether to go with Eager, Explicit or even Lazy Loading.
What I would recommend anyway is always to perform some profiling. That's the only way to be sure your request will be performant or not.
There're a lot of tools that will help you out. Have a look at this article from Julie Lerman where she lists several different ways to do profiling. One simple solution is to start profiling in your SQL Server Management Studio.
Do not hesitate to talk with a DBA (if you have on near you) that will help you to understand the execution plan.
You could also have a look a this presentation where I wrote a section about loading data and performance.
One more thing to add to this thread. It depends on what server you use. If you are working on sql server it's ok to use eager loading but for sqlite you will have to use .Load() to avoid crossloading exception cause sqlite can not deal with some include statements that go deeper than one dependency level
Updated answer: As of EF Core 5.0 you can use AsSplitQuery().
This is particularly useful and I personally use it all the time when I have many joins
which will result in a possible cartesian explosion, or will just take more time to complete.
As the name implies, EF will execute separate queries for each Entity instead of using joins.
So, where you would use Explicit loading, you can now use Eager loading with split queries to achieve the same result, and it's definitely more readable imo.
See https://learn.microsoft.com/en-us/ef/core/querying/single-split-queries
I have a question regarding the .AsNoTracking() extension, as this is all quite new and quite confusing.
I'm using a per-request context for a website.
A lot of my entities don't change so don't need to be tracked, but I have the following scenario where I'm unsure of what's going to the database, or even whether it makes a difference in this case.
This example is what I'm currently doing:
context.Set<User>().AsNoTracking()
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
This is the same as above but removing the .AsNoTracking() from Step 1:
context.Set<User>();
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
The Steps 1 & 2 use the same context but occur at different times. What I can't work out is whether there is any difference. As Step 2 is an update I'm guessing both will hit the database twice anyway.
Can anyone tell me what the difference is?
The difference is that in the first case the retrieved user is not tracked by the context so when you are going to save the user back to database you must attach it and set correctly state of the user so that EF knows that it should update existing user instead of inserting a new one. In the second case you don't need to do that if you load and save the user with the same context instance because the tracking mechanism handles that for you.
see this page Entity Framework and AsNoTracking
What AsNoTracking Does
Entity Framework exposes a number of performance tuning options to help you optimise the performance of your applications. One of these tuning options is .AsNoTracking(). This optimisation allows you to tell Entity Framework not to track the results of a query. This means that Entity Framework performs no additional processing or storage of the entities which are returned by the query. However, it also means that you can't update these entities without reattaching them to the tracking graph.
there are significant performance gains to be had by using AsNoTracking
No Tracking LINQ to Entities queries
Usage of AsNoTracking() is recommended when your query is meant for read operations. In these scenarios, you get back your entities but they are not tracked by your context.This ensures minimal memory usage and optimal performance
Pros
Improved performance over regular LINQ queries.
Fully materialized objects.
Simplest to write with syntax built into the programming
language.
Cons
Not suitable for CUD operations.
Certain technical restrictions, such as: Patterns using DefaultIfEmpty for
OUTER JOIN queries result in more complex queries than simple OUTER
JOIN statements in Entity SQL.
You still can’t use LIKE with general pattern matching.
More info available here:
Performance considerations for Entity Framework
Entity Framework and NoTracking
Disabling tracking will also cause your result sets to be streamed into memory. This is more efficient when you're working with large sets of data and don't need the entire set of data all at once.
References:
How to avoid memory overflow when querying large datasets with Entity Framework and LINQ
Entity framework large data set, out of memory exception
AsNoTracking() allows the "unique key per record" requirement in EF to be bypassed (not mentioned explicitly by other answers).
This is extremely helpful when reading a View that does not support a unique key because perhaps some fields are nullable or the nature of the view is not logically indexable.
For these cases the "key" can be set to any non-nullable column but then AsNoTracking() must be used with every query else records (duplicate by key) will be skipped.
If you have something else altering the DB (say another process) and need to ensure you see these changes, use AsNoTracking(), otherwise EF may give you the last copy that your context had instead, hence it being good to usually use a new context every query:
http://codethug.com/2016/02/19/Entity-Framework-Cache-Busting/