DbContext Query performance poor vs ObjectContext [duplicate]

DbContext Query performance poor vs ObjectContext [duplicate] - c#

This question already has an answer here:
Is it always better to use 'DbContext' instead of 'ObjectContext'?
(1 answer)
Closed 9 years ago.
I recently moved my entity model from an ObjectContext using 4.1 to a DbContext using 5.0. I'm starting to regret doing that because I'm noticing some very poor performance on query's using the DbContext vs ObjectContext. Here's the test scenario :
Both contexts use the same database with about 600 tables. LazyLoading and ProxyCreation is turned off for both (not shown in code example). Both have pre-generated views.
The test first makes 1 call to load up the metadata workspace. Then in a for loop that gets executed 100 times, I new up a context and make one call that takes the first 10. (I'm creating the context inside the for loop because this simulates being used in a WCF service, which would create the context every time)
for (int i = 0; i < 100; i++)
{
using (MyEntities db = new MyEntities())
{
var a = db.MyObject.Take(10).ToList();
}
}
When I run this with the ObjectContext it takes about 4.5 seconds. When I run it using the DbContext it takes about 17 seconds. I profiled this using RedGate's performance profiler. For the DbContext it seems the major culprit is a method called UpdateEntitySetMappings. This is called on every query and appears to retrieve the metadataworkspace and cycle through every item in the OSpace. AsNoTracking did not help.
EDIT : To give some better detail, the problem has to do with the creation\initialization of a DbSet vs an ObjectSet, not the actual query. When I make a call with the ObjectContext, it takes on average 42ms to create the ObjectSet. When I make a call with the DbContext, it takes about 140ms to create the internal dbset. Both ObjectSet and DbSet do some entityset mapping lookups from the metadataworkspace. What I've noticed is that the DbSet does it for ALL the types in the workspace while the ObjectSet does not. I'm guessing (haven't tried it) that a model with fewer tables that the performance difference is less.

I've been also concerned by the underperformance of the code first approach and I've performed some benchmarks in a scenario similar to yours
http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html
The results were no suprise, since the DbContext is a wrapper over ObjectContext, it has to sacrifice performance for the simplicity. However, my tests show that:
the more records you retrieve the less is the difference
the more records you retrieve the more important it is to turn off tracking if you want to be faster
For example, retrieving just 10 records
Note that code first is significantly slower than model first and there is no noticeable difference between tracking and no tracking - both observations are exactly like yours.
However when retrieving 10000 rows you have
Note that there is almost no difference between code first and model first in the notracking version. Also, both perform surprisingly well, almost as fast as the raw ado.net datareader.
Please follow my blog entry for more details.
That simple benchmark helped me to accept the nature of the code first. I still prefer it for smaller projects because of two features: poco entities and migrations. On the other hand, I would never pick any of the two for a project where performance is a critical requirement. This effectively means that I will probably never use the model first approach again.
(A side note: my benchmark also reveals that there is something wrong with nHibernate. I still haven't found anyone to help me to explain this even though I've consulted two independent developers who use NH daily)

DbContext is a wrapper for ObjectContext. Here is good answer about your question. It is possible that to make it easier to use they sacrificed performance.

I use Simple.Data to query millions of records and it works quite well and fast.

Related

Why Entity Framework performs faster than Dapper in direct select statement [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm new to using ORM in dealing with database, Currently I'm making a new project and I have to decide if i'll use Entity Framework or Dapper. I read many articles which says that Dapper is faster than Entity Framework.
So I made 2 simple prototype projects one using Dapper and the other uses Entity Framework with one function to get all the rows from one table.
The table schema as the following picture
and the code for both projects as the following
for Dapper project
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
IEnumerable<Emp> emplist = cn.Query<Emp>(#"Select * From Employees");
sw.Stop();
MessageBox.Show(sw.ElapsedMilliseconds.ToString());
for Entity Framework Project
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
IEnumerable<Employee> emplist = hrctx.Employees.ToList();
sw.Stop();
MessageBox.Show(sw.ElapsedMilliseconds.ToString());
after trying the above code many times only the first time I run the project the dapper code will be faster and after this first time always I get better results from entity framework project
I tried also the following statement on the entity framework project to stop the lazy loading
hrctx.Configuration.LazyLoadingEnabled = false;
but still the same EF performes faster except for the first time.
Can any one give me explanation or guidance on what makes EF faster in this sample although all the articles on the web says the opposite
Update
I've changed the line of code in the entity sample to be
IEnumerable<Employee> emplist = hrctx.Employees.AsNoTracking().ToList();
using the AsNoTracking as mentioned in some articles stops the entity framework caching and after stopping the caching the dapper sample is performing better, (but not a very big difference)

ORM (Object Relational Mapper) is a tool that creates layer between your application and data source and returns you the relational objects instead of
(in terms of c# that you are using) ADO.NET objects. This is basic thing that every ORM does.
To do this, ORMs generally execute the query and map the returned DataReader to the POCO class. Dapper is limited up to here.
To extend this further, some ORMs (also called "full ORM") do much more things like generating query for you to make your application database independent, cache your data for future calls, manage unit of work for you and lot more. All these are good tools and adds value to ORM; but it comes with cost. Entity Framework falls in this class.
To generate the query, EF have to execute additional code. Cache improves the performance but managing the cache needs to execute additional code. Same is true for unit of work and any other add-on feature provided by EF. All this saves you writing additional code and EF pays the cost.
And the cost is performance. As Dapper does very basic job, it is faster; but you have to write more code. As EF does much more than that, it is (bit) slower; but you have to write less code.
So why your tests show opposite results?
Because the tests you are executing are not comparable.
Full ORMs have many good features as explained above; one of them is UnitOfWork. Tracking is one of the responsibilities of UoW. When the object is requested (SQL query) for first time, it causes round trip to database. This object is then saved in memory cache. Full ORM keeps track of changes done to this already loaded object(s). If same object is requested again (other SQL query in same UoW scope that include loaded object), they do not do database round trip. Instead, they return the object from memory cache instead. This way, considerable time is saved.
Dapper do not support this feature that causes it to perform slower in your tests.
But, this benefit is only applicable if same object(s) loaded multiple times. Also, if number of objects loaded in memory is too high, this will slow down the full ORM instead as then the time required to check the objects in memory will be higher. So again, this benefit depends on use-case.

I read many articles which says that Dapper is faster than Entity Framework
The problem with the most of the benchmarks on internet is that they compare EF Linq to Dapper. And that's what you did too. Which is unfair. An auto generated query(EF) is often not equal to the one written by a good developer.
This,
IEnumerable<Employee> emplist = hrctx.Employees.ToList();
should be replaced by this.
IEnumerable<Employee> emplist = hrctx.Employees.FromSql(#"Select * From Employees").AsNoTracking();
Edit:
As pointed out by #mjwills, below is the results table for insert, update and select statements.
Dapper is outperforming EF Core 2. However, it can be seen that for EF plain queries, the difference is very minimum. I have posted complete details here.

There is no problem to mix them together. In my current project I'm using Dapper for selecting data and EF for creating and updating and database migrations.
Dapper becomes exteremely helpful when it comes to complex queries where more than two tables are involved or where there are some complex operations (joining by more than one column, joining with >= and <= clauses, recursive selections, cte's etc) where to use pure SQL is much easier than LINQ. As I know, Entity Framework (unlike Dapper) cannot use .FromSql() method on custom DTO's. It can map only one table that should be in your database context.

The article Entity Framework Core 2.0 vs. Dapper performance benchmark, querying SQL Azure tables confirms that Dapper is a bit quicker, but not enough to ignore "full ORM" benefits.

Do we need external caching mechanism while EF 6 use caching out of the box?

I recently exposed to Entity Framework 6 caching mechanism.
As we might figure from this article, it does it in First-Lever manner.
Our system uses EF 6 (code first) along with MemoryCache to improve performance.
The main reason we use MemoryCache is because we need to execute an intense query on every page request. We execute this query x3 times (in the worst case) on every page request since there are client call backs.
I wonder if we still need to use the MemoryCache mechanism if the EF 6 already use one.
It is worth saying that we don't use any special caching feature or cache dependencies. Just a simple MemoryCache with timeouts.

The fact that EF caches entities in context is in no way a replacement for "real" cache, for various reasons:
You should not reuse EF context for more that one logical operation, because EF context represents unit of work, and so should be used according to this pattern. Also, even if you for some reason reuse context in multiple operations - you absolutely cannot do that in multi-threaded environment, like web server application.
It does not prevent you from making multiple queries for the same data to your database, for example:
var entity1 = ctx.Entities.Where(c => c.Id == 1).First();
var entity2 = ctx.Entities.Where(c => c.Id == 1).First();
This will still execute two queries to your database, despite the fact that query is the same and returns the same entity. So nothing is really "cached" in usual sense here. Note however, that both queries will return the same entity, even if database row has been changed between two queries. That is what is meant by EF context "caching". It will execute database query two times, but second time, while evaluating the result, it will notice that there is already entity with the same key attached to the context. So it will return this existing ("cached") entity instead, and will ignore new values (if any) returned by the second query. That behaviour is additional reason to not reuse the context between multiple operations (though you should not do it anyway).
So if you want to reduce load on your database - you have to use second-level caching using whatever suits your needs (from simple InMemoryCache to caching EF provider to distributed memcached instance).

EF only implements what is called first level cache for entities, It stores the entities which have been retrieved during the life time of a context so when you ask for that entity the second time it returns the entity from context. What you need is a second level cache but EF dosen't implants this features. NCache for example implements a wonderful caching architecture and a out of the box a second level cache provider for EF. Not in its open source version.

Is Entity Framework programmed so it is slow when you have many entity types?

I had a model having approximately 500 entity types. Now I have added approximately 2500 entity types for future usage. Therefore now I have approximately 3000 entity types.
At this moment my program does the same as in the situation, where I had only 500 entities. My program is just building a graph of entities, i.e. instantiating a lot of entities and connecting them via references.
Unfortunately my program takes approximately 20 time longer to run as before I added the new entity types, even despite I don't deal with instances of the new entity types.
Is it correct, that there is substantial overhead in Entity Framework and it grows very significantly in the number of entities in the model, even despite the majority of the model will not be used during the lifetime of a DbContext?

EF does a lot of reflection work at startup (which takes considerable time) over all entities defined, regardless if they are actually used or not. So if you see the startup (much) longer then it's somehow normal.
If you encounter this delay over the next queries and operations then you probably have another issue and you would need to provide more information for a solution.

You can try to do the generate the 'EF Views' at compile-time, instead of run-time.
(not exactly sure what it is, but it is something that EF does at start-up time)
See here
Generate Views - Generates pre-compiled views used by the EF runtime to improve start-up performance.

Pre-loading the context before it's being used

This is hard to explain so bear with me.
I have an Entity Framework Context being used by a View Model. Essentially, it is a search box which has a service which uses the context to run queries based on the search criteria.
The problem is, when the first search is performed, the DbContext then kicks into action and looks at the database to generate the entities and relationships. (At least this is what I think is happening)
This is demonstrated below:
The first search takes a few seconds, as Entity Framework is doing it's thing. After the first search is performed, all other searches that are performed happen pretty much instantaneously. It's just the first search which takes a long time.
Now, onto my question.
Is it possible to force the DbContext to load the relationships and generally do it's thing (asynchronously) before any action is performed on the context? i.e a query.
Ideally, the first search should be as quick as the other searches.

Yes, simply query the entities, but do nothing with them. The dbContext then caches the results.

What is taking a lot of time on first use is dependant on the size of your db schema (building EF's virtual tables) and done once at runtime on first instantiation.
Just initialise a context on another thread at startup and do any query on it and it will take that performance hit asynchronously.
Don't try to keep a reference to that context either, creating contexts is cheap and they are meant to be short lived, what is expensive is only the first time you create one in your process.
If the slowdown is an issue even asynchronously you can have EF do this work at compile time but it is somewhat involved

Poor performance for POCO Generator

I am working on a Legacy application, and we have a poor performance with Entity Framework (4.0.0) and massive insert.
When I tried the POCO Generator (T4), the issue was worse, the SaveChanges was three times longer. This is huge, if you have any idea why I have this issue, I am interested.

I don't have any performance metrics for different generators. But the bottleneck should not be in your context anyway. You should know EF will generate one SQL statement per insert, update and delete, and if you didn't explicitly open the connection first, it will log on before and log off from sql server once per SQL statement.
Also the context must maintain the states and relationships so the performance degrades as your context gets larger and larger. SaveChanges must figure out what's happening in the context first and should be the reason why POCO Generator vs Entity Object ends up with different execution times. As far as it being 3 times longer, more details will be needed to figure it out.
PS, if you are stuck with the legacy code, you should look into using bulk copy with EF.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.