Entity Framework takes about 30 seconds on first query - c#

I'm using Entity Framework 6 on a SQL Server database to query an existing database (database first, so there's an EDMX in my project).
I've noticed that the first time I request an entity, it can take up to thirty seconds for the query to be executed. Subsequent queries to the same object then get completed in a matter of milliseconds. The actual SQL being executed is very fast so it's not a slow query.
I've found that Entity Framework generates views on the background and that this is the most likely culprit. What I haven't found, however, is a good solution for this. There's a NuGet package that can handle the View Generation (EFInteractiveViews), but it hasn't been updated since 2014 and I hardly seem to find any information on how to use it.
What options do I have nowadays? I've tried initializing Entity Framework on Application_Start by doing a few queries, but this doesn't seem to help much at all, and also it's quite difficult to perform the real queries on Application_Start, because most queries use data from the current user (who is not yet logged on at Application_Start) so it's difficult to run these in advance.
I've thought about creating an ashx file that constantly polls the application by calling the API and keep it alive. I've also set the Application Pool to "AlwaysRunning" so that EF doesn't restart when the app pool is recycled.
Does anyone have any tips or ideas on how I can resolve this or things I can try?
Thanks a lot in advance. I've spent the better part of two days already searching for a viable solution.

There are many practices to speed up Entity Framework, I will mention some of them
Turn off the LazyLoading (EDMX => open the file right click anywhere => properties => Lazy Loading Enabled set it to false )
Use AsNoTracking().ToList() and when you want to update, use Attach and update object state to EntityState.Modified
Use Indexes on your table
Use Paging, do not load all the data at once
Split your Edmx into many smaller, only include the ones you need in your page, ( this will effect the performance in good way)
If you want to load related objects "be eager and not lazy", use Include, you might include using System.Data.Entity to use the lambda include features
Example for splitting your Edmx
If you have the following objects for a rent a car app : Country, City , Person, Car, Rent, Gender, Engine, Manufacturers,..etc.
Now
If you are working on a screen to Manage (CRUD) person, this means you don't need Car,Rent,Manufacturer, so create ManagePerson.edmx contains ( Country, City, Person, Gender)
If you are working on managing (CRUD) Car then you don't need (Person,City, Gender,Rent), so you can create ManageCar.edmx containing ( Car, Manufacturer,Country, Engine)

Entity Framework must first compile and translate your LINQ queries into SQL, but after this it then caches them. The first hit to a query is always going to take a long time, but as you mention after that the query will run very quickly.
When I first used EF it was constantly an issue brought up by testers, but when the system went live and was used frequently (and queries were cached) it wasn't an issue.
See Hadi Hassans answer for general speed up tips.

Related

Best way to improve performance in WPF application

I'm currently working on a WPF application which was build using entity framework to access data (SQL Server database) (database first).
In the past, the database was on an internal server and I did not notice any problem regarding the performance of the application even though the database is very badly implemented (only tables, no views, no indexes or stored procedure). I'm the one who created it but it was my first job and I was not very good with databases so I felt like entity framework was the best approach to focus mainly on code.
However, the database is now on another server which is waaay slower. As you guessed it, the application has now big performance issues (more than 10 seconds to load a dozen rows, same to insert new rows,...).
Should I stay with entity framework but try to improve performance by altering the database adding views and stored procedure ?
Should I get rid off entity framework and use only "basic" code (and improve the database at the same time) ?
Is there a simple ORM I could use instead of EF ?
Time is not an issue here, I can use all the time I want to improve the application but I can't seem to make a decision about the best way to make my application evolved.
The database is quite simple (around 10 tables), the only thing that could complicates thing is that I store files in there. So I'm not sure I can really use whatever I want. And I don't know if it's important but I need to display quite a few calculated fields. Any advice ?
Feel free to ask any relevant questions.
For performance profiling, the first place I recommend looking is an SQL profiler. This can capture the exact SQL statements that EF is running, and help identify possible performance culprits. I cover a few of these here. The Schema issues are probably the most relevant place to start. The title targets MVC, but most of the items relate to WPF and any application.
A good, simple profiler that I use for SQL Server is ExpressProfiler. (https://github.com/OleksiiKovalov/expressprofiler)
With the move to a new server, and it now sending the data over the wire rather than pulling from a local database, the performance issues you're noticing will most likely be falling under the category of "loading too much, too often". Now you won't only be waiting for the database to load the data, but also for it to package it up and send it over the wire. Also, does the new database represent the same data volume and serve only a single client, or now serving multiple clients? Other catches for developers is "works on my machine" where local testing databases are smaller and not dealing with concurrent queries from the server. (where locks and such can impact performance)
From here, run a copy of the application with an isolated database server (no other clients hitting it to reduce "noise") with the profiler running against it. The things to look out for:
Lazy Loading - This is cases where you have queries to load data, but then see lots (dozens to hundreds) of additional queries being spun off. Your code may say "run this query and populate this data" which you expect should be 1 SQL query, but by touching lazy-loaded properties, this can spin off a great many other queries.
The solution to lazy loading: If you need the extra data, eager load it with .Include(). If you only need some of the data, look into using .Select() to select view models / DTO of the data you need rather than relying on complete entities. This will eliminate lazy load scenarios, but may require some significant changes to your code to work with view models/dtos. Tools like Automapper can help greatly here. Read up on .ProjectTo() to see how Automapper can work with IQueryable to eliminate lazy load hits.
Reading too much - Loading entities can be expensive, especially if you don't need all of that data. Culprits for performance include excessive use of .ToList() which will materialize entire entity sets where a subset of data is needed, or a simple exists check or count would suffice. For example, I've seen code that does stuff like this:
var data = context.MyObjects.SingleOrDefault(x => x.IsActive && x.Id = someId);
return (data != null);
This should be:
var isData = context.MyObjects.Where(x => x.IsActive && x.Id = someId).Any();
return isData;
The difference between the two is that in the first example, EF will effectively do a SELECT * operation, so in the case where data is present it will return back all columns into an entity, only to later check if the entity was present. The second statement will run a faster query to simply return back whether a row exists or not.
var myDtos = context.MoyObjects.Where(x => x.IsActive && x.ParentId == parentId)
.ToList()
.Select( x => new ObjectDto
{
Id = x.Id,
Name = x.FirstName + " " + x.LastName,
Balance = calculateBalance(x.OrderItems.ToList()),
Children = x.Children.ToList()
.Select( c => new ChildDto
{
Id = c.Id,
Name = c.Name
}).ToList()
}).ToList();
Statements like this can go on and get rather complex, but the real problems is the .ToList() before the .Select(). Often these creep in because devs try to do something that EF doesn't understand, like call a method. (i.e. calculateBalance()) and it "works" by first calling .ToList(). The problem here is that you are materializing the entire entity at that point and switching to Linq2Object. This means that any "touches" on related data, such as .Children will now trigger lazy loads, and again further .ToList() calls can saturate more data to memory which might otherwise be reduced in a query. The culprit to look out for is .ToList() calls and to try removing them. Select simpler values before calling .ToList() and then feed that data into view models where the view models can calculate resulting data.
The worst culprit like this I've seen was due to a developer wanting to use a function in a Where clause:
var data = context.MyObjects.ToList().Where(x => calculateBalance(x) > 0).ToList();
That first ToList() statement will attempt to saturate the whole table to entities in memory. A big performance impact beyond just the time/memory/bandwidth needed to load all of this data is simply the # of locks the database must make to reliably read/write data. The fewer rows you "touch" and the shorter you touch them, the nicer your queries will play with concurrent operations from multiple clients. These problems magnify greatly as systems transition to being used by more users.
Provided you've eliminated extra lazy loads and unnecessary queries, the next thing to look at is query performance. For operations that seem slow, copy the SQL statement out of the profiler and run that in the database while reviewing the execution plan. This can provide hints about indexes you can add to speed up queries. Again, using .Select() can greatly increase query performance by using indexes more efficiently and reducing the amount of data the server needs to pull back.
For file storage: Are these stored as columns in a relevant table, or in a separate table that is linked to the relevant record? What I mean by this, is if you have an Invoice record, and also have a copy of an invoice file saved in the database, is it:
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFileData
or
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFile
InvoiceId
InvoiceFileData
It is a better structure to keep large, seldom used data in separate tables rather than combined with commonly used data. This keeps queries to load entities small and fast, where that expensive data can be pulled up on-demand when needed.
If you are using GUIDs for keys (as opposed to ints/longs) are you leveraging newsequentialid()? (assuming SQL Server) Keys set to use newid() or in code, Guid.New() will lead to index fragmentation and poor performance. If you populate the IDs via database defaults, switch them over to use newsequentialid() to help reduce the fragmentation. If you populate IDs via code, have a look at writing a Guid generator that mimics newsequentialid() (SQL Server) or pattern suited to your database. SQL Server vs. Oracle store/index GUID values differently so having the "static-like" part of the UUID bytes in the higher order vs. lower order bytes of the data will aid indexing performance. Also consider index maintenance and other database maintenance jobs to help keep the database server running efficiently.
When it comes to index tuning, database server reports are your friends. After you've eliminated most, or at least some serious performance offenders from your code, the next thing is to look at real-world use of your system. The best thing here to learn where to target your code/index investigations are the most used and problem queries that the database server identifies. Where these are EF queries, you can usually reverse-engineer based on the tables being hit which EF query is responsible. Grab these queries and feed them through the execution plan to see if there is an index that might help matters. Indexing is something that developers either forget, or get prematurely concerned about. Too many indexes can be just as bad as too few. I find it's best to monitor real-world usage before deciding on what indexes to add.
This should hopefully give you a start on things to look for and kick the speed of that system up a notch. :)
First you need to run a performance profiler and find put what is the bottle neck here, it can be database, entity framework configuration, entity framework queries and so on
In my experience, entity framework is a good option to this kind of applications, but you need understand how it works.
Also, What entity framework are you using? the lastest version is 6.2 and has some performance improvements that olders does not have, so if you are using a old one i suggest that update it
Based on the comments I am going to hazard a guess that it is mostly a bandwidth issue.
You had an application that was working fine when it was co-located, perhaps a single switch, gigabit ethernet and 200m of cabling.
Now that application is trying to send or retrieve data to/from a remote server, probably over the public internet through an unknown number of internal proxies in contention with who knows what other traffic, and it doesn't perform as well.
You also mention that you store files in the database, and your schema has fields like Attachment.data and Doc.file_content. This suggests that you could be trying to transmit large quantities (perhaps megabytes) of data for a simple query and that is where you are falling down.
Some general pointers:
Add indexes for anywhere you are joining tables or values you
commonly query on.
Be aware of the difference between Lazy & Eager
loading in Entity Framework. There is no right or wrong answer,
but you should be know what you approach you are using and why.
Split any file content
into its own table, with the same primary key as the main table or
play with different EF classes to make sure you only retrieve files
when you need to use them.

extremely Inconsistent AWS remote database query times,

I am querying for values from a database in AWS sydney, (I am in new zealand), using stopwatch i measured the query time, it is wildly inconsistent, sometimes in the 10s of milliseconds and sometimes in the hundreds of milliseconds, for the exact same query. I have no idea why.
Var device = db.things.AsQueryable().FirstOrDefault(p=>p.ThingName == model.thingName);
things table only has 5 entries, I have tried it without the asqueryable and it seems to make no difference. I am using visual studio 2013, entity framework version 6.1.1
EDIT:
Because this is for a business, I cannot put a lot of code up, another time example is that it went from 34 ms to 400 ms
thanks
This can be related to cold-warm query execution.
The very first time any query is made against a given model, the Entity Framework does a lot of work behind the scenes to load and validate the model. We frequently refer to this first query as a "cold" query. Further queries against an already loaded model are known as "warm" queries, and are much faster.
You can find more information about this in the following article:
https://msdn.microsoft.com/en-us/library/hh949853(v=vs.113).aspx
One way to make sure this is the problem is to write a Stored Procedure and get data by it(using Entity Framework) to see if the problem is in the connection or in the query(Entity Framework) itself.

Pre-loading the context before it's being used

This is hard to explain so bear with me.
I have an Entity Framework Context being used by a View Model. Essentially, it is a search box which has a service which uses the context to run queries based on the search criteria.
The problem is, when the first search is performed, the DbContext then kicks into action and looks at the database to generate the entities and relationships. (At least this is what I think is happening)
This is demonstrated below:
The first search takes a few seconds, as Entity Framework is doing it's thing. After the first search is performed, all other searches that are performed happen pretty much instantaneously. It's just the first search which takes a long time.
Now, onto my question.
Is it possible to force the DbContext to load the relationships and generally do it's thing (asynchronously) before any action is performed on the context? i.e a query.
Ideally, the first search should be as quick as the other searches.
Yes, simply query the entities, but do nothing with them. The dbContext then caches the results.
What is taking a lot of time on first use is dependant on the size of your db schema (building EF's virtual tables) and done once at runtime on first instantiation.
Just initialise a context on another thread at startup and do any query on it and it will take that performance hit asynchronously.
Don't try to keep a reference to that context either, creating contexts is cheap and they are meant to be short lived, what is expensive is only the first time you create one in your process.
If the slowdown is an issue even asynchronously you can have EF do this work at compile time but it is somewhat involved

I want to improve performance of my Edmx by using AutoDetectChanges

I have a reasonably large edmx generated from a database and I have been working on performance recently to improve my application I have read a number of articles in a variety of places some here some not
this one on disabling auto detect of changes http://msdn.microsoft.com/en-us/data/jj556205.aspx
this one on improving performance on delete DbContext is very slow when adding and deleting
this one (which I think is pretty good) http://www.codeproject.com/Articles/38922/Performance-and-the-Entity-Framework
I am already using myentities.tablename.MergeOption = MergeOption.NoTracking, i am using compiledqueries, I am pregenerated my View using EdmGen, I have reduced the data I am fetching etc.. and, of course, I have gained performance in leaps and bounds so that a page that was loading in 54 seconds is now taking 16.1 seconds - however I have to get it to 3 seconds So I am still looking for the next improvement
so the research is all well and great and as a result I have upgraded to the latest EntityFramework, I have regenerated my .edmx from db etc... and tried a variety of things but I simply cannot find a myEntities.Configuration.AutoDetectChangesEnabled in order to set it to false. Now I must be missing a simple easy trick - how do I get my edmx to have this option.
I am in this environment.Net 4.0.3, visual studio 2010, latest version of EntityFramework, MVC 4.0... All I need is somebody to say "aha" you need to go and do this....
Currently if I delete 1000 records from one of my larger tables (134million rows) it takes nearly 10 minutes to savechanges. So from what I have read AutoDetectChangesEnabled is what I need to alter but it doesnt exist in my classes? where is it what must I do to get it?
Any help appreciated I am trying to solve this one quickly
Regards Julian
Right, I eventually found this item on stackoverflow Get DbContext for Entities that describes what is needed in order to change your database first edmx into a version that has the .Configuration.AutoDetectChangesEnabled this was great and I was able to progress. However, this did not get me the solution I was looking for as deletes being saved still took an inordinate amount of time.
So the moral is, yes apply all of the performance tricks
pre generate your views,
use AutoDetectChangesenabled=true,
use compiled queries,
smart connection strings
create fake objects instead of fetching the data first,
etc...
you can probably in most cases get the performance that is acceptable but if you really need to do things quickly you will need to go to TSQL and do it by hand
Regards Julian
AutoDetectChanges sits in DbContext.Configuration.AutoDetectChangesEnabled. For deletion what you can also try is to get list of IDs you want to delete, create fake objects that have only these IDs set, attach these objects and than delete them.
However we have also recently had a similar problem and we are currently deleting with ADO.NET. (or there is a method on DbContext where you can push SQL). In general EF works great for our app, but in 2-3 places we need performance, as number of records is huge. Unfortunatelly we had to use ADO.NET in these places, it's just many time faster when you work with mass data.

C# Entity Framework - Handling destructive autogen DB scripts with model first design

I recently started a new personal project to learn Entity Framework. My end goal is to make a desktop game that uses SQL compact for data management and uses Entity Framework for the game objects. Not actually knowing there were multiple ways to start EF (model first, code first, db first) I went with the most obvious choice of model first.
I've been working with it successfully now, however one thing concerns me, especially post-development. My goal with the game is that users can update to the latest version without losing any of their existing data. The current issue is that all the generation scripts are destructive by nature (dropping everything then recreating it) - that means I can't run those against the user SQLCE DBs out in "production", so I need to come up with an alternative plan of action.
That said, does anyone have recommended solutions on best practices? In previous desktop apps, I've traditionally used XML/binary to store data, which allows me to easily update the "schema" without affecting existing data (versioning in the app tailors the Load() according to the version, while the Save() always saves in the latest version).
What are some recommendations on handling this problem using SQLCE?
What you need, if understood right, is to utilize migrations which come with EF. Since the question is general this link should best guide you to what you need I think...
http://blogs.msdn.com/b/adonet/archive/2012/02/09/ef-4-3-code-based-migrations-walkthrough.aspx
With migrations which you can tailor manually if needed (and come in the shape of code which is applied at each point of change, incrementally) and you can also supply your 'seeding' if required.
i.e. you should be able to do most of what you require, delete, remove old incompatible data - and seed the new one that you have - and all related to a particular migration step you have.
How would that work with your app deployment specifically, that's a bit more complex I guess, but this should get you started, and then with each db version-breaking change your new code update would contain all the migrations since the previous update (or just one usually is enough, i.e. make it be one with each update) and the code to tear-down or create new things.
hope this helps,

Categories