Entity framework object caching behaviour

Entity framework object caching behaviour - c#

I was sure that issuing a FirstOrDefault() on entity framework will issue a query on the database and to hit cache I can use .Find().
But helping a colleague debug a problem in EF4.2 (query data didn't reflect the db data, and query wasn't in the sql profiler) got me to the solution that works. So use ".AsNoTracking()", and I later found other posts here that states:
" The following will sometimes pull from the EF cache" :
.FirstOrDefault();
And another post with same issue: here
If you go to msdn:
By default when an entity is returned in the results of a query, just before EF materializes it, the ObjectContext will check if an entity with the same key has already been loaded into its ObjectStateManager. If an entity with the same keys is already present EF will include it in the results of the query. Although EF will still issue the query against the database, this behavior can bypass much of the cost of materializing the entity multiple times.
.........
Unlike a regular query, the Find method in DbSet (APIs included for
the first time in EF 4.1) will perform a search in memory before even
issuing the query against the database.
And other SO posts: here and here seem to support msdn?
So how does EF object caching behave?
always hits DB except for .FirstOrDefault() when it works only sometimes (black box) so it's better to use .AsNoTracking().FirstOrDefault() just to be sure?
I should expect similar issues with other methods like: First(), Single(), SingleOrDefault()

Related

In Entity Framework Core, how can I query data on joined tables with a condition at DB level (not local LINQ)?

As part of a small .NET Core 3 project, I'm trying to use the data model based in Entity Framework, but I'm having some troubles related with queries on joined tables.
When looking for data matching a condition in a single table, the model is easy to understand
List<Element> listOfElements = context.Elements.Where(predicate).ToList();
However, when this query requires joined tables, I'm not sure how to do it efficiently. After some investigation, it seems that the include (and theninclude) methods are the way to go, but I have the impression that the Where clause after the include is not executed at DB level but after all the data has been retrieved. This might work with small datasets, but I don't think it's a good idea for a production system with millions of rows.
List<Element> listOfElements = context.Elements.Include(x => x.SubElement).
Where(predicate).ToList();
I've seen some examples using EF+ library, but I'm looking for a solution using the nominal EF Core. Is there any clean/elegant way to do it?
Thank you.

There are a few scenarios when the data from DB is populating:
Deferred query execution: this is when you try to access your query results, for example, in the foreach statement.
Immediate Query Execution: this when you call ToList() (or conversions to other collections, like ToArray()).
I think the answer's to your question:
... but I have the impression that the Where clause after the include is not executed at DB level but after all the data has been retrieved.
is that your assumptions are wrong because you are calling ToList() at the end, not before the Where method.
For more information please also check here.
I've another suggestion also: to be sure about what is exactly executing at the DB level run SQL Server Profiler when executing your query.
Hope this will help ))

Entity Framework under the hood

I have some legacy code that uses Entity Framework.
When I debug the code I can see that EF DbContext contains the whole table. It was passed by OData to the frontend, and then angular processed it.
So I tried to search, is it possible to get only a single record by EF?
Everywhere I see the SingleOrDefault method, or other IQueryable, but as I understood, these are parts of the collections.
Microsoft says: Sometimes the value of default(TSource) is not the default value that you want to use if the collection contains no elements.
Does that mean EF always get all the data from the table and I can use them later?
Or is there a way to force inner query to get only one, and only one row?
We are using postgresql.

With Entity Framework, you can use LINQ to run queries and get single records or limited sets. However, in your .NET project the controller should be parsing OData query parameters and filtering the dataset before returning results to the client application. Please check your Controller code against this tutorial to see if you might be missing something.
If you are somehow bypassing the built-in OData framework, what might help is understanding which queries execute immediately vs which ones are deferred. See this list to understand exactly which operations will force a trip to the database and try to hold off on anything with immediate execution until as late as possible.

No, EF will not SELECT the entire table into memory if you use it correctly. By correctly; I mean:
context.Table.First();
Will translate into a SQL query that only returns one row, that will then map to an object to be returned to the calling code. This is because the above code uses LINQ-to-Entities. If you did something like this instead:
context.Table.ToList().First();
Then the entire table is selected to make the ToList work, and LINQ-to-Objects handles the First. So as long as you do your queries with lazy enumeration (not realizing the result ahead of time), you'll be fine.

Entity Framework performance of include

It's more a technical (behind the scenes of EF) kind of question for a better understanding of Include for my own.
Does it make the query faster to Include another table when using a Select statement at the end?
ctx.tableOne.Include("tableTwo").Where(t1 => t1.Value1 == "SomeValueFor").Select(res => new {
res.Value1,
res.tableTwo.Value1,
res.tableTwo.Value2,
res.tableTwo.Value3,
res.tableTwo.Value4
});
Does it may depend on the number of values included from another table?
In the example above, 4 out of 5 values are from the included table. I wonder if it does have any performance-impact. Even a good or a bad one?
So, my question is: what is EF doing behind the scenes and is there any preferred way to use Include when knowing all the values I will select before?

In your case it doesn't matter if you use Include(<relation-property-name>) or not because you don't materialize the values before the Select(<mapping-expression>). If you use the SQL Server Profiler (or other profiler) you can see that EF generates two exactly the same queries.
The reason for this is because the data is not materialized in memory before the Select - you are working on IQueryable which means EF will generate an SQL query at the end (before calling First(), Single(), FirstOrDefault(), SingleOrDefault(), ToList() or use the collection in a foreach statement). If you use ToList() before the Select() it will materialize the entities from the database into your memory where Include() will come in hand not to make N+1 queries when accessing nested properties to other tables.

It is about how you want EF to load your data. If you want A 'Table' data to be pre populated than use Include. It is more handy if Include statement table is going to be used more frequently and it will be little slower as EF has to load all the relevant date before hand. Read the difference between Lazy and Eager loading. by using Include, it will be the eager loading where data will be pre populated while on the other hand EF will send a call to the secondary table when projection takes place i-e Lazy loading.

I agree with #Karamfilov for his general discussion, but in your example your query could not be the most performant. The performance can be affected by many factors, such as indexes present on the table, but you must always help EF in the generation of SQL. The Include method can produce a SQL that includes all columns of the table, you should always check what is the generated SQL and verify if you can obtain a better one using a Join.
This article explains the techniques that can be used and what impact they have on performance: https://msdn.microsoft.com/it-it/library/bb896272(v=vs.110).aspx

Entity Framework query ToString won't produce SQL query

Through my research I've discovered that, since at least EF 4.1, the .ToString() method on an EF query will return the SQL to be run. Indeed, this very frequently works for me, using Entity Framework 5 and 6.
However, occasionally I call this method and get the runtime type of the query object.
Here is my specific example:
Entity input = ...;
IQueryable<Entity> query = dbContext.SetOfEntity.Where(e => e.Prop == input.Prop);
More specifically, I'm setting a breakpoint in VS2013 and hovering over the query object, and seeingSystem.Data.Entity.Infrastructure.DbQuery<Namespace.Entity> instead of the SQL to run that query. Interestingly enough, if I hover over the DBSet property (dbContext.SetOfEntity), I do see the basic select SQL for the associated table. It's only when I filter the results that I lose the SQL.
Obviously, this is a pretty simple query and I could work out the SQL for myself, but this problem has happened on more complex queries, and it would be nice to be able to debug the SQL being sent to the server without running a database trace.
Some background
A while back, I was using EF5 and the ToString() seemed to work. Shortly before switching to EF6, it seemed that none of the queries showed me SQL, but after switching to EF6, it went back to the correct behavior.
Additionally, whenever I hover over IQueryable queries and try to use the IDE's "Results View" feature, it tells me "Children could not be evaluated". This may be a separate issue, but I figure I'd include it in case it had a common cause.

If you do not need the SQL BEFORE it is executed against the DB you can do the following:
dbContext.Database.Log = s => Debug.WriteLine(s);
This would print the SQL (and some additional data) to the debug output.
See the following link for details: http://msdn.microsoft.com/de-DE/data/dn469464
Also check, as martin_costello suggested, that you are not querying the DB before you try to get the SQL via ToString(). It happened to me too, that I already got objects, because of using IEnumerable<> "to early" (instead if IQueryable<>) and so got way to many entities from the DB and did some filtering "in code" instead of "in SQL"...

What difference does .AsNoTracking() make?

I have a question regarding the .AsNoTracking() extension, as this is all quite new and quite confusing.
I'm using a per-request context for a website.
A lot of my entities don't change so don't need to be tracked, but I have the following scenario where I'm unsure of what's going to the database, or even whether it makes a difference in this case.
This example is what I'm currently doing:
context.Set<User>().AsNoTracking()
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
This is the same as above but removing the .AsNoTracking() from Step 1:
context.Set<User>();
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
The Steps 1 & 2 use the same context but occur at different times. What I can't work out is whether there is any difference. As Step 2 is an update I'm guessing both will hit the database twice anyway.
Can anyone tell me what the difference is?

The difference is that in the first case the retrieved user is not tracked by the context so when you are going to save the user back to database you must attach it and set correctly state of the user so that EF knows that it should update existing user instead of inserting a new one. In the second case you don't need to do that if you load and save the user with the same context instance because the tracking mechanism handles that for you.

see this page Entity Framework and AsNoTracking
What AsNoTracking Does
Entity Framework exposes a number of performance tuning options to help you optimise the performance of your applications. One of these tuning options is .AsNoTracking(). This optimisation allows you to tell Entity Framework not to track the results of a query. This means that Entity Framework performs no additional processing or storage of the entities which are returned by the query. However, it also means that you can't update these entities without reattaching them to the tracking graph.
there are significant performance gains to be had by using AsNoTracking

No Tracking LINQ to Entities queries
Usage of AsNoTracking() is recommended when your query is meant for read operations. In these scenarios, you get back your entities but they are not tracked by your context.This ensures minimal memory usage and optimal performance
Pros
Improved performance over regular LINQ queries.
Fully materialized objects.
Simplest to write with syntax built into the programming
language.
Cons
Not suitable for CUD operations.
Certain technical restrictions, such as: Patterns using DefaultIfEmpty for
OUTER JOIN queries result in more complex queries than simple OUTER
JOIN statements in Entity SQL.
You still can’t use LIKE with general pattern matching.
More info available here:
Performance considerations for Entity Framework
Entity Framework and NoTracking

Disabling tracking will also cause your result sets to be streamed into memory. This is more efficient when you're working with large sets of data and don't need the entire set of data all at once.
References:
How to avoid memory overflow when querying large datasets with Entity Framework and LINQ
Entity framework large data set, out of memory exception

AsNoTracking() allows the "unique key per record" requirement in EF to be bypassed (not mentioned explicitly by other answers).
This is extremely helpful when reading a View that does not support a unique key because perhaps some fields are nullable or the nature of the view is not logically indexable.
For these cases the "key" can be set to any non-nullable column but then AsNoTracking() must be used with every query else records (duplicate by key) will be skipped.

If you have something else altering the DB (say another process) and need to ensure you see these changes, use AsNoTracking(), otherwise EF may give you the last copy that your context had instead, hence it being good to usually use a new context every query:
http://codethug.com/2016/02/19/Entity-Framework-Cache-Busting/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.