What difference does .AsNoTracking() make? - c#

I have a question regarding the .AsNoTracking() extension, as this is all quite new and quite confusing.
I'm using a per-request context for a website.
A lot of my entities don't change so don't need to be tracked, but I have the following scenario where I'm unsure of what's going to the database, or even whether it makes a difference in this case.
This example is what I'm currently doing:
context.Set<User>().AsNoTracking()
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
This is the same as above but removing the .AsNoTracking() from Step 1:
context.Set<User>();
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
The Steps 1 & 2 use the same context but occur at different times. What I can't work out is whether there is any difference. As Step 2 is an update I'm guessing both will hit the database twice anyway.
Can anyone tell me what the difference is?

The difference is that in the first case the retrieved user is not tracked by the context so when you are going to save the user back to database you must attach it and set correctly state of the user so that EF knows that it should update existing user instead of inserting a new one. In the second case you don't need to do that if you load and save the user with the same context instance because the tracking mechanism handles that for you.

see this page Entity Framework and AsNoTracking
What AsNoTracking Does
Entity Framework exposes a number of performance tuning options to help you optimise the performance of your applications. One of these tuning options is .AsNoTracking(). This optimisation allows you to tell Entity Framework not to track the results of a query. This means that Entity Framework performs no additional processing or storage of the entities which are returned by the query. However, it also means that you can't update these entities without reattaching them to the tracking graph.
there are significant performance gains to be had by using AsNoTracking

No Tracking LINQ to Entities queries
Usage of AsNoTracking() is recommended when your query is meant for read operations. In these scenarios, you get back your entities but they are not tracked by your context.This ensures minimal memory usage and optimal performance
Pros
Improved performance over regular LINQ queries.
Fully materialized objects.
Simplest to write with syntax built into the programming
language.
Cons
Not suitable for CUD operations.
Certain technical restrictions, such as: Patterns using DefaultIfEmpty for
OUTER JOIN queries result in more complex queries than simple OUTER
JOIN statements in Entity SQL.
You still can’t use LIKE with general pattern matching.
More info available here:
Performance considerations for Entity Framework
Entity Framework and NoTracking

Disabling tracking will also cause your result sets to be streamed into memory. This is more efficient when you're working with large sets of data and don't need the entire set of data all at once.
References:
How to avoid memory overflow when querying large datasets with Entity Framework and LINQ
Entity framework large data set, out of memory exception

AsNoTracking() allows the "unique key per record" requirement in EF to be bypassed (not mentioned explicitly by other answers).
This is extremely helpful when reading a View that does not support a unique key because perhaps some fields are nullable or the nature of the view is not logically indexable.
For these cases the "key" can be set to any non-nullable column but then AsNoTracking() must be used with every query else records (duplicate by key) will be skipped.

If you have something else altering the DB (say another process) and need to ensure you see these changes, use AsNoTracking(), otherwise EF may give you the last copy that your context had instead, hence it being good to usually use a new context every query:
http://codethug.com/2016/02/19/Entity-Framework-Cache-Busting/

Related

Best way to improve performance in WPF application

I'm currently working on a WPF application which was build using entity framework to access data (SQL Server database) (database first).
In the past, the database was on an internal server and I did not notice any problem regarding the performance of the application even though the database is very badly implemented (only tables, no views, no indexes or stored procedure). I'm the one who created it but it was my first job and I was not very good with databases so I felt like entity framework was the best approach to focus mainly on code.
However, the database is now on another server which is waaay slower. As you guessed it, the application has now big performance issues (more than 10 seconds to load a dozen rows, same to insert new rows,...).
Should I stay with entity framework but try to improve performance by altering the database adding views and stored procedure ?
Should I get rid off entity framework and use only "basic" code (and improve the database at the same time) ?
Is there a simple ORM I could use instead of EF ?
Time is not an issue here, I can use all the time I want to improve the application but I can't seem to make a decision about the best way to make my application evolved.
The database is quite simple (around 10 tables), the only thing that could complicates thing is that I store files in there. So I'm not sure I can really use whatever I want. And I don't know if it's important but I need to display quite a few calculated fields. Any advice ?
Feel free to ask any relevant questions.
For performance profiling, the first place I recommend looking is an SQL profiler. This can capture the exact SQL statements that EF is running, and help identify possible performance culprits. I cover a few of these here. The Schema issues are probably the most relevant place to start. The title targets MVC, but most of the items relate to WPF and any application.
A good, simple profiler that I use for SQL Server is ExpressProfiler. (https://github.com/OleksiiKovalov/expressprofiler)
With the move to a new server, and it now sending the data over the wire rather than pulling from a local database, the performance issues you're noticing will most likely be falling under the category of "loading too much, too often". Now you won't only be waiting for the database to load the data, but also for it to package it up and send it over the wire. Also, does the new database represent the same data volume and serve only a single client, or now serving multiple clients? Other catches for developers is "works on my machine" where local testing databases are smaller and not dealing with concurrent queries from the server. (where locks and such can impact performance)
From here, run a copy of the application with an isolated database server (no other clients hitting it to reduce "noise") with the profiler running against it. The things to look out for:
Lazy Loading - This is cases where you have queries to load data, but then see lots (dozens to hundreds) of additional queries being spun off. Your code may say "run this query and populate this data" which you expect should be 1 SQL query, but by touching lazy-loaded properties, this can spin off a great many other queries.
The solution to lazy loading: If you need the extra data, eager load it with .Include(). If you only need some of the data, look into using .Select() to select view models / DTO of the data you need rather than relying on complete entities. This will eliminate lazy load scenarios, but may require some significant changes to your code to work with view models/dtos. Tools like Automapper can help greatly here. Read up on .ProjectTo() to see how Automapper can work with IQueryable to eliminate lazy load hits.
Reading too much - Loading entities can be expensive, especially if you don't need all of that data. Culprits for performance include excessive use of .ToList() which will materialize entire entity sets where a subset of data is needed, or a simple exists check or count would suffice. For example, I've seen code that does stuff like this:
var data = context.MyObjects.SingleOrDefault(x => x.IsActive && x.Id = someId);
return (data != null);
This should be:
var isData = context.MyObjects.Where(x => x.IsActive && x.Id = someId).Any();
return isData;
The difference between the two is that in the first example, EF will effectively do a SELECT * operation, so in the case where data is present it will return back all columns into an entity, only to later check if the entity was present. The second statement will run a faster query to simply return back whether a row exists or not.
var myDtos = context.MoyObjects.Where(x => x.IsActive && x.ParentId == parentId)
.ToList()
.Select( x => new ObjectDto
{
Id = x.Id,
Name = x.FirstName + " " + x.LastName,
Balance = calculateBalance(x.OrderItems.ToList()),
Children = x.Children.ToList()
.Select( c => new ChildDto
{
Id = c.Id,
Name = c.Name
}).ToList()
}).ToList();
Statements like this can go on and get rather complex, but the real problems is the .ToList() before the .Select(). Often these creep in because devs try to do something that EF doesn't understand, like call a method. (i.e. calculateBalance()) and it "works" by first calling .ToList(). The problem here is that you are materializing the entire entity at that point and switching to Linq2Object. This means that any "touches" on related data, such as .Children will now trigger lazy loads, and again further .ToList() calls can saturate more data to memory which might otherwise be reduced in a query. The culprit to look out for is .ToList() calls and to try removing them. Select simpler values before calling .ToList() and then feed that data into view models where the view models can calculate resulting data.
The worst culprit like this I've seen was due to a developer wanting to use a function in a Where clause:
var data = context.MyObjects.ToList().Where(x => calculateBalance(x) > 0).ToList();
That first ToList() statement will attempt to saturate the whole table to entities in memory. A big performance impact beyond just the time/memory/bandwidth needed to load all of this data is simply the # of locks the database must make to reliably read/write data. The fewer rows you "touch" and the shorter you touch them, the nicer your queries will play with concurrent operations from multiple clients. These problems magnify greatly as systems transition to being used by more users.
Provided you've eliminated extra lazy loads and unnecessary queries, the next thing to look at is query performance. For operations that seem slow, copy the SQL statement out of the profiler and run that in the database while reviewing the execution plan. This can provide hints about indexes you can add to speed up queries. Again, using .Select() can greatly increase query performance by using indexes more efficiently and reducing the amount of data the server needs to pull back.
For file storage: Are these stored as columns in a relevant table, or in a separate table that is linked to the relevant record? What I mean by this, is if you have an Invoice record, and also have a copy of an invoice file saved in the database, is it:
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFileData
or
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFile
InvoiceId
InvoiceFileData
It is a better structure to keep large, seldom used data in separate tables rather than combined with commonly used data. This keeps queries to load entities small and fast, where that expensive data can be pulled up on-demand when needed.
If you are using GUIDs for keys (as opposed to ints/longs) are you leveraging newsequentialid()? (assuming SQL Server) Keys set to use newid() or in code, Guid.New() will lead to index fragmentation and poor performance. If you populate the IDs via database defaults, switch them over to use newsequentialid() to help reduce the fragmentation. If you populate IDs via code, have a look at writing a Guid generator that mimics newsequentialid() (SQL Server) or pattern suited to your database. SQL Server vs. Oracle store/index GUID values differently so having the "static-like" part of the UUID bytes in the higher order vs. lower order bytes of the data will aid indexing performance. Also consider index maintenance and other database maintenance jobs to help keep the database server running efficiently.
When it comes to index tuning, database server reports are your friends. After you've eliminated most, or at least some serious performance offenders from your code, the next thing is to look at real-world use of your system. The best thing here to learn where to target your code/index investigations are the most used and problem queries that the database server identifies. Where these are EF queries, you can usually reverse-engineer based on the tables being hit which EF query is responsible. Grab these queries and feed them through the execution plan to see if there is an index that might help matters. Indexing is something that developers either forget, or get prematurely concerned about. Too many indexes can be just as bad as too few. I find it's best to monitor real-world usage before deciding on what indexes to add.
This should hopefully give you a start on things to look for and kick the speed of that system up a notch. :)
First you need to run a performance profiler and find put what is the bottle neck here, it can be database, entity framework configuration, entity framework queries and so on
In my experience, entity framework is a good option to this kind of applications, but you need understand how it works.
Also, What entity framework are you using? the lastest version is 6.2 and has some performance improvements that olders does not have, so if you are using a old one i suggest that update it
Based on the comments I am going to hazard a guess that it is mostly a bandwidth issue.
You had an application that was working fine when it was co-located, perhaps a single switch, gigabit ethernet and 200m of cabling.
Now that application is trying to send or retrieve data to/from a remote server, probably over the public internet through an unknown number of internal proxies in contention with who knows what other traffic, and it doesn't perform as well.
You also mention that you store files in the database, and your schema has fields like Attachment.data and Doc.file_content. This suggests that you could be trying to transmit large quantities (perhaps megabytes) of data for a simple query and that is where you are falling down.
Some general pointers:
Add indexes for anywhere you are joining tables or values you
commonly query on.
Be aware of the difference between Lazy & Eager
loading in Entity Framework. There is no right or wrong answer,
but you should be know what you approach you are using and why.
Split any file content
into its own table, with the same primary key as the main table or
play with different EF classes to make sure you only retrieve files
when you need to use them.

Is there any performance impact when using AsNoTracking() on anonymous result? [duplicate]

I have a question regarding the .AsNoTracking() extension, as this is all quite new and quite confusing.
I'm using a per-request context for a website.
A lot of my entities don't change so don't need to be tracked, but I have the following scenario where I'm unsure of what's going to the database, or even whether it makes a difference in this case.
This example is what I'm currently doing:
context.Set<User>().AsNoTracking()
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
This is the same as above but removing the .AsNoTracking() from Step 1:
context.Set<User>();
// Step 1) Get user
context.Set<User>()
// Step 2) Update user
The Steps 1 & 2 use the same context but occur at different times. What I can't work out is whether there is any difference. As Step 2 is an update I'm guessing both will hit the database twice anyway.
Can anyone tell me what the difference is?
The difference is that in the first case the retrieved user is not tracked by the context so when you are going to save the user back to database you must attach it and set correctly state of the user so that EF knows that it should update existing user instead of inserting a new one. In the second case you don't need to do that if you load and save the user with the same context instance because the tracking mechanism handles that for you.
see this page Entity Framework and AsNoTracking
What AsNoTracking Does
Entity Framework exposes a number of performance tuning options to help you optimise the performance of your applications. One of these tuning options is .AsNoTracking(). This optimisation allows you to tell Entity Framework not to track the results of a query. This means that Entity Framework performs no additional processing or storage of the entities which are returned by the query. However, it also means that you can't update these entities without reattaching them to the tracking graph.
there are significant performance gains to be had by using AsNoTracking
No Tracking LINQ to Entities queries
Usage of AsNoTracking() is recommended when your query is meant for read operations. In these scenarios, you get back your entities but they are not tracked by your context.This ensures minimal memory usage and optimal performance
Pros
Improved performance over regular LINQ queries.
Fully materialized objects.
Simplest to write with syntax built into the programming
language.
Cons
Not suitable for CUD operations.
Certain technical restrictions, such as: Patterns using DefaultIfEmpty for
OUTER JOIN queries result in more complex queries than simple OUTER
JOIN statements in Entity SQL.
You still can’t use LIKE with general pattern matching.
More info available here:
Performance considerations for Entity Framework
Entity Framework and NoTracking
Disabling tracking will also cause your result sets to be streamed into memory. This is more efficient when you're working with large sets of data and don't need the entire set of data all at once.
References:
How to avoid memory overflow when querying large datasets with Entity Framework and LINQ
Entity framework large data set, out of memory exception
AsNoTracking() allows the "unique key per record" requirement in EF to be bypassed (not mentioned explicitly by other answers).
This is extremely helpful when reading a View that does not support a unique key because perhaps some fields are nullable or the nature of the view is not logically indexable.
For these cases the "key" can be set to any non-nullable column but then AsNoTracking() must be used with every query else records (duplicate by key) will be skipped.
If you have something else altering the DB (say another process) and need to ensure you see these changes, use AsNoTracking(), otherwise EF may give you the last copy that your context had instead, hence it being good to usually use a new context every query:
http://codethug.com/2016/02/19/Entity-Framework-Cache-Busting/

Do we need external caching mechanism while EF 6 use caching out of the box?

I recently exposed to Entity Framework 6 caching mechanism.
As we might figure from this article, it does it in First-Lever manner.
Our system uses EF 6 (code first) along with MemoryCache to improve performance.
The main reason we use MemoryCache is because we need to execute an intense query on every page request. We execute this query x3 times (in the worst case) on every page request since there are client call backs.
I wonder if we still need to use the MemoryCache mechanism if the EF 6 already use one.
It is worth saying that we don't use any special caching feature or cache dependencies. Just a simple MemoryCache with timeouts.
The fact that EF caches entities in context is in no way a replacement for "real" cache, for various reasons:
You should not reuse EF context for more that one logical operation, because EF context represents unit of work, and so should be used according to this pattern. Also, even if you for some reason reuse context in multiple operations - you absolutely cannot do that in multi-threaded environment, like web server application.
It does not prevent you from making multiple queries for the same data to your database, for example:
var entity1 = ctx.Entities.Where(c => c.Id == 1).First();
var entity2 = ctx.Entities.Where(c => c.Id == 1).First();
This will still execute two queries to your database, despite the fact that query is the same and returns the same entity. So nothing is really "cached" in usual sense here. Note however, that both queries will return the same entity, even if database row has been changed between two queries. That is what is meant by EF context "caching". It will execute database query two times, but second time, while evaluating the result, it will notice that there is already entity with the same key attached to the context. So it will return this existing ("cached") entity instead, and will ignore new values (if any) returned by the second query. That behaviour is additional reason to not reuse the context between multiple operations (though you should not do it anyway).
So if you want to reduce load on your database - you have to use second-level caching using whatever suits your needs (from simple InMemoryCache to caching EF provider to distributed memcached instance).
EF only implements what is called first level cache for entities, It stores the entities which have been retrieved during the life time of a context so when you ask for that entity the second time it returns the entity from context. What you need is a second level cache but EF dosen't implants this features. NCache for example implements a wonderful caching architecture and a out of the box a second level cache provider for EF. Not in its open source version.

Entity Framework performance of include

It's more a technical (behind the scenes of EF) kind of question for a better understanding of Include for my own.
Does it make the query faster to Include another table when using a Select statement at the end?
ctx.tableOne.Include("tableTwo").Where(t1 => t1.Value1 == "SomeValueFor").Select(res => new {
res.Value1,
res.tableTwo.Value1,
res.tableTwo.Value2,
res.tableTwo.Value3,
res.tableTwo.Value4
});
Does it may depend on the number of values included from another table?
In the example above, 4 out of 5 values are from the included table. I wonder if it does have any performance-impact. Even a good or a bad one?
So, my question is: what is EF doing behind the scenes and is there any preferred way to use Include when knowing all the values I will select before?
In your case it doesn't matter if you use Include(<relation-property-name>) or not because you don't materialize the values before the Select(<mapping-expression>). If you use the SQL Server Profiler (or other profiler) you can see that EF generates two exactly the same queries.
The reason for this is because the data is not materialized in memory before the Select - you are working on IQueryable which means EF will generate an SQL query at the end (before calling First(), Single(), FirstOrDefault(), SingleOrDefault(), ToList() or use the collection in a foreach statement). If you use ToList() before the Select() it will materialize the entities from the database into your memory where Include() will come in hand not to make N+1 queries when accessing nested properties to other tables.
It is about how you want EF to load your data. If you want A 'Table' data to be pre populated than use Include. It is more handy if Include statement table is going to be used more frequently and it will be little slower as EF has to load all the relevant date before hand. Read the difference between Lazy and Eager loading. by using Include, it will be the eager loading where data will be pre populated while on the other hand EF will send a call to the secondary table when projection takes place i-e Lazy loading.
I agree with #Karamfilov for his general discussion, but in your example your query could not be the most performant. The performance can be affected by many factors, such as indexes present on the table, but you must always help EF in the generation of SQL. The Include method can produce a SQL that includes all columns of the table, you should always check what is the generated SQL and verify if you can obtain a better one using a Join.
This article explains the techniques that can be used and what impact they have on performance: https://msdn.microsoft.com/it-it/library/bb896272(v=vs.110).aspx

.Include() vs .Load() performance in EntityFramework

When querying a large table where you need to access the navigation properties later on in code (I explicitly don't want to use lazy loading) what will perform better .Include() or .Load()? Or why use the one over the other?
In this example the included tables all only have about 10 entries and employees has about 200 entries, and it can happen that most of those will be loaded anyway with include because they match the where clause.
Context.Measurements.Include(m => m.Product)
.Include(m => m.ProductVersion)
.Include(m => m.Line)
.Include(m => m.MeasureEmployee)
.Include(m => m.MeasurementType)
.Where(m => m.MeasurementTime >= DateTime.Now.AddDays(-1))
.ToList();
or
Context.Products.Load();
Context.ProductVersions.Load();
Context.Lines.Load();
Context.Employees.Load();
Context.MeasurementType.Load();
Context.Measurements.Where(m => m.MeasurementTime >= DateTime.Now.AddDays(-1))
.ToList();
It depends, try both
When using Include(), you get the benefit of loading all of your data in a single call to the underlying data store. If this is a remote SQL Server, for example, that can be a major performance boost.
The downside is that Include() queries tend to get really complicated, especially if you have any filters (Where() calls, for example) or try to do any grouping. EF will generate very heavily nested queries using sub-SELECT and APPLY statements to get the data you want. It is also much less efficient -- you get back a single row of data with every possible child-object column in it, so data for your top level objects will be repeated a lot of times. (For example, a single parent object with 10 children will product 10 rows, each with the same data for the parent-object's columns.) I've had single EF queries get so complex they caused deadlocks when running at the same time as EF update logic.
The Load() method is much simpler. Each query is a single, easy, straightforward SELECT statement against a single table. These are much easier in every possible way, except you have to do many of them (possibly many times more). If you have nested collections of collections, you may even need to loop through your top level objects and Load their sub-objects. It can get out of hand.
Quick rule-of-thumb
Try to avoid having any more than three Include calls in a single query. I find that EF's queries get too ugly to recognize beyond that; it also matches my rule-of-thumb for SQL Server queries, that up to four JOIN statements in a single query works very well, but after that it's time to consider refactoring.
However, all of that is only a starting point.
It depends on your schema, your environment, your data, and many other factors.
In the end, you will just need to try it out each way.
Pick a reasonable "default" pattern to use, see if it's good enough, and if not, optimize to taste.
Include() will be written to SQL as JOIN: one database roundtrip.
Each Load()-instruction is "explicitly loading" the requested entities, so one database roundtrip per call.
Thus Include() will most probably be the more sensible choice in this case, but it depends on the database layout, how often this code is called and how long your DbContext lives. Why don't you try both ways and profile the queries and compare the timings?
See Loading Related Entities.
I agree with #MichaelEdenfield in his answer but I did want to comment on the nested collections scenario. You can get around having to do nested loops (and the many resulting calls to the database) by turning the query inside out.
Rather than loop down through a Customer's Orders collection and then performing another nested loop through the Order's OrderItems collection say, you can query the OrderItems directly with a filter such as the following.
context.OrderItems.Where(x => x.Order.CustomerId == customerId);
You will get the same resulting data as the Loads within nested loops but with just a single call to the database.
Also, there is a special case that should be considered with Includes. If the relationship between the parent and the child is one to one then the problem with the parent data being returned multiple times would not be an issue.
I am not sure what the effect would be if the majority case was where no child existed - lots of nulls? Sparse children in a one to one relationship might be better suited to the direct query technique that I outlined above.
Include is an example of eager loading, where as you not only load the entities you are querying for, but also all related entities.
Load is an manual override of the EnableLazyLoading. If this one is set to false. You can still lazily load the entity you asked for with .Load()
It's always hard to decide whether to go with Eager, Explicit or even Lazy Loading.
What I would recommend anyway is always to perform some profiling. That's the only way to be sure your request will be performant or not.
There're a lot of tools that will help you out. Have a look at this article from Julie Lerman where she lists several different ways to do profiling. One simple solution is to start profiling in your SQL Server Management Studio.
Do not hesitate to talk with a DBA (if you have on near you) that will help you to understand the execution plan.
You could also have a look a this presentation where I wrote a section about loading data and performance.
One more thing to add to this thread. It depends on what server you use. If you are working on sql server it's ok to use eager loading but for sqlite you will have to use .Load() to avoid crossloading exception cause sqlite can not deal with some include statements that go deeper than one dependency level
Updated answer: As of EF Core 5.0 you can use AsSplitQuery().
This is particularly useful and I personally use it all the time when I have many joins
which will result in a possible cartesian explosion, or will just take more time to complete.
As the name implies, EF will execute separate queries for each Entity instead of using joins.
So, where you would use Explicit loading, you can now use Eager loading with split queries to achieve the same result, and it's definitely more readable imo.
See https://learn.microsoft.com/en-us/ef/core/querying/single-split-queries

Categories