Entity framework - Reducing round trips to the database - c#

I'm writing an app using WPF, Entity framework and SQLServer, all very run of the mill stuff. I was having a look at what calls get made to the database using sql profiler and found quite a few unnecessary calls. The first one was solved pretty easy but I have included it for anyone reading this thread in the future. Assuming I have a table structure with 3 tables like this Invoice->InvoiceDetail->Product
1) When I load up an Invoice object, it will then execute a seperate statement to retrieve each InvoiceDetail item. This is solved pretty easy by using the Include statement, eg
context.Invoices.Include("InvoiceDetails").Where(i => i.Something == somethingelse);
2) When I delete an Invoice the database has a cascade delete which automatically deletes all of the InvoiceDetails. However EF still insists on calling a delete for each of the InvoiceDetail objects that it has in memory. If an invoice has 100 items on it then it will execute 101 statements instead of 1. This is bad.
3) In addition to the extra statements executed in point 2, assuming each InvoiceDetail object points to a product and I have caused the products to loaded into memory (this would happen if I showed the invoice before I deleted it) then EF executes a useless update statement on every product!!!! In fact this update statement is more than useless because if someone else has changed something about the product in the mean time then this code will change the data back!! If I'm logging changes then we get useless log entries. I suspect it is doing this because Product would have had an InvoiceDetails collection which has had some items removed, but the Product itself has not changed so why the update?
Thanks for reading
Cheers,
Michael

The initial behavior was something known as lazy loading. You have replaced it with eager loading which is exact solution for this problem.
For entity framework this is the only correct behavior because EF doesn't support any batch modifications. Every record must be deleted with its own statement and round trip to the database. Once you load entities to memory you simply have to delete them one by one otherwise you will get exception before any database call will be done (= database cascade delete will not help you). The only workaround is custom stored procedure for deletion and disposing the current context after running the stored procedure because its internal state will not be consistent with the database.
This is interesting. It would require little bit more investigation but it can be simply design flaw / bug in EF and you will most probably not avoid it (unless you use stored procedure as described in 2.). If you want to avoid overwriting changes in Product you must involve optimistic concurrency. In such case your changes will not be overwritten but your delete will fail with OptimisticConcurrencyException. I will check this behavior later and let you know if I'm able to reproduce it and find any workaround.

I've been using this as a solution to let SQL Server handle the cascading deletes without the EF hit.
Public Sub DeleteCheckedOutByUser(ByVal username As String)
Dim cmd As String = String.Format("delete Maintenance.CheckoutManager where CheckOutTo = '{0}'", username)
_context.ExecuteStoreCommand(cmd)
End Sub
Sorry it's in VB, that's what my current client is using. If you have any trouble translating what I'm saying just let me know.

To remove the cascading deletes (and presumably rely on SQL Server to do the deletes), see the approach here: http://geekswithblogs.net/danemorgridge/archive/2010/12/17/ef4-cpt5-code-first-remove-cascading-deletes.aspx

Related

Entity Framework takes about 30 seconds on first query

I'm using Entity Framework 6 on a SQL Server database to query an existing database (database first, so there's an EDMX in my project).
I've noticed that the first time I request an entity, it can take up to thirty seconds for the query to be executed. Subsequent queries to the same object then get completed in a matter of milliseconds. The actual SQL being executed is very fast so it's not a slow query.
I've found that Entity Framework generates views on the background and that this is the most likely culprit. What I haven't found, however, is a good solution for this. There's a NuGet package that can handle the View Generation (EFInteractiveViews), but it hasn't been updated since 2014 and I hardly seem to find any information on how to use it.
What options do I have nowadays? I've tried initializing Entity Framework on Application_Start by doing a few queries, but this doesn't seem to help much at all, and also it's quite difficult to perform the real queries on Application_Start, because most queries use data from the current user (who is not yet logged on at Application_Start) so it's difficult to run these in advance.
I've thought about creating an ashx file that constantly polls the application by calling the API and keep it alive. I've also set the Application Pool to "AlwaysRunning" so that EF doesn't restart when the app pool is recycled.
Does anyone have any tips or ideas on how I can resolve this or things I can try?
Thanks a lot in advance. I've spent the better part of two days already searching for a viable solution.
There are many practices to speed up Entity Framework, I will mention some of them
Turn off the LazyLoading (EDMX => open the file right click anywhere => properties => Lazy Loading Enabled set it to false )
Use AsNoTracking().ToList() and when you want to update, use Attach and update object state to EntityState.Modified
Use Indexes on your table
Use Paging, do not load all the data at once
Split your Edmx into many smaller, only include the ones you need in your page, ( this will effect the performance in good way)
If you want to load related objects "be eager and not lazy", use Include, you might include using System.Data.Entity to use the lambda include features
Example for splitting your Edmx
If you have the following objects for a rent a car app : Country, City , Person, Car, Rent, Gender, Engine, Manufacturers,..etc.
Now
If you are working on a screen to Manage (CRUD) person, this means you don't need Car,Rent,Manufacturer, so create ManagePerson.edmx contains ( Country, City, Person, Gender)
If you are working on managing (CRUD) Car then you don't need (Person,City, Gender,Rent), so you can create ManageCar.edmx containing ( Car, Manufacturer,Country, Engine)
Entity Framework must first compile and translate your LINQ queries into SQL, but after this it then caches them. The first hit to a query is always going to take a long time, but as you mention after that the query will run very quickly.
When I first used EF it was constantly an issue brought up by testers, but when the system went live and was used frequently (and queries were cached) it wasn't an issue.
See Hadi Hassans answer for general speed up tips.

Reading entities intended sql query?

I have a system where the customer wants to rework the current model so that everytime a user makes a change an administrator must accept the change before its written to the database..
I was thinking of doing a quick fix for this by overriden SaveChanges and taking each object in the ObjectStateManager and adding its intended sql code to a limbo table that would keep the inteded sql query saved until an admin has accepted it (and then run it).
I know that you can use ToTraceString() on database querys, but can you somehow pull the intended sql query on the object taken from ObjectStateManager?
Was thinking something like this:
var modified = DB.ObjectStateManager.GetObjectStateEntries(System.Data.EntityState.Modified);
foreach (var mod in modified)
{
//Insert the query to the limbo table
tblPendingChanges change = new tblPendingChanges();
//Code omitted
change.sql = mod.Query;
//Code omitted
DB.tblPendingChanges.AddObject(change);
mod.Delete();
}
DB.SaveChanges();
Your solution is terrible. If you have the requirement that each change must be approved it leads to approval workflow where you save changes to some temporary store and move them to the main tables once approved. It is really not done on SQL level. If you need something working on SQL level don't use high level tools like Entity framework because they are really not designed to support this. For example EF will not provide you SQL commands generated for data modifications.
I solved this issue by using a entity wrapper found here
This allowed me to read each sql statement before it was sent to the server. Redirecting it.
I had to edit the wrapper to allow parameters to be inserted correctly into the statement so that the sql statement could be run.

Entity Framework POCO long-term change tracking

I'm using .NET entity framework 4.1 with code-first approach to effectively solve the following problem, here simplified.
There's a database table with tens of thousands of entries.
Several users of my program need to be able to
View the (entire) table in a GridRow, which implied that the entire Table has to be downloaded.
Modify values of any random row, changes are frequent but need not be persisted immediately. It's expected that different users will modify different rows, but this is not always true. Some loss of changes is permitted, as users will most likely update same rows to same values.
On occasion add new rows.
Sounds simple enough. My initial approach was to use a long-running DbContext instance. This one DbContext was supposed to track changes to the entities, so that when SaveChanges() is called, most of the legwork is done automatically. However many have pointed out that this is not an optimal solution in the long run, notably here. I'm still not sure if I understand the reasons, and I don't see what a unit-of-work is in my scenario either. The user chooses herself when to persist changes, and let's say that client always wins for simplicity. It's also important to note that objects that have not been touched don't overwrite any data in the database.
Another approach would be to track changes manually or use objects that track changes for me, however I'm not too familiar with such techniques, and I would welcome a nudge in the right direction.
What's the correct way to solve this problem?
I understand that this question is a bit wishy-washy, but think of it as more fundamental. I lack fundamental understanding about how to solve this class of problems. It seems to me that long living DbContext is the right way, but knowledgeable people tell me otherwise, which leads me to confusion and imprecise questions.
EDIT1
Another point of confusion is the existance of Local property on the DbSet<> object. It invites me to use a long running context, as another user has posted here.
Problem with long running context is that it doesn't refresh data - I more discussed problems here. So if your user opens the list and modify data half an hour she doesn't know about changes. But in case of WPF if your business action is:
Open the list
Do as many actions as you want
Trigger saving changes
Then this whole is unit of work and you can use single context instance for that. If you have scenario where last edit wins you should not have problems with this until somebody else deletes record which current user edits. Additionally after saving or cancelling changes you should dispose current context and load data again - this will ensure that you really have fresh data for next unit of work.
Context offers some features to refresh data but it only refreshes data previously loaded (without relations) so for example new unsaved records will be still included.
Perhaps you can also read about MS Sync framework and local data cache.
Sounds to me like your users could have a copy (cached) of the data for an indefinate period of time. The longer the users are using cached data the greater the odds that they could become disconnected from the database connection in DbContext. My guess is EF doesn't handle this well and you probably want to deal with that. (e.g. occaisionally connected architecture). I would expect implementing that may solve many of your issues.

Checking if an item exists before saving

I've a SQL DB with various tables that save info about a product (it's for an online shop) and I'm coding in C#. There are options associated with a given product and as mentioned the info recorded about these options is spread across a few tables when saved.
Now when I come to edit this product in the CMS I see a list of the existing product options and I can add to that list or delete from it, as you'd expect.
When I save the product I need to check if the record already exists and if so update it, if not then save a new record. I'm trying to find an efficient way of doing this. It's very important that I maintain the ID's associated with the product options so clearing them all out each time and re-saving them isn't viable unfortunately.
To describe again, possibly more clearly: Imagine I have a collection of options when I load the product, this is loaded into memory and added to / deleted from depending on what the user chooses. When they click 'Save' I need to check what options are updates and what ones are new to the list.
Any suggestions of an efficient way of doing this?
Thanks.
If the efficiency you are looking to achieve is in relation to the number of round trips to the database then you could write a stored procedure to do the update or insert for you.
In most cases however it's not really necessary to avoid the SELECT first, provided you have appropriate primary keys or unique indices on your tables this should be very quick.
If the efficiency is in terms of elegant or reduced code on the server side then I would look at using some sort of ORM, for example Entity Framework 4.0. With a proper ORM architecture you can almost stop thinking in terms of the database records and INSERT/UPDATE and just work with a collection of objects in memory.
I usually do this by performing the following:
For each item, execute an update query that will update the item if it exists.
After each update, check how many rows were updated (using ##ROWCOUNT in SQL Server). If zero rows were updated, execute an insert to create the row.
Alternatively, you can do the opposite, if you create a unique constraint that prevents duplicate rows:
For each item, try to insert it.
If the insert fails because of theconstraint (check the error code), perform the update instead.
Run a select query checking for the ID. If it exists then you need to update. If it does not exist then you need to insert.
Without more details I'm not really sure what else to tell you. This is fairly standard.

Is there a benefit to override the default Insert/Update/Delete queries for EF

As the question asks really.
The EF modeller tool allows us to map the Insert/Update/Delete functions to a sproc, is there any benefit to overriding them?
If it requires some custom validation, then obviously yes, but if I'm happy with how it is now, is it worth creating sproc's for them all?
I can't remember how to view the SQL it's executing for them to find out the exact query, but I should imagine it'd be pretty similar to a standard Insert/Update/Delete query.
I can think of a few cases where it could be useful:
You're working with a legacy database which doesn't quite map to your EF model precisely.
You need extra queries to be executed on insert/update/delete, but you don't have rights to triggers on your database.
Soft deletes in your database which you want to abstract away from. So a regular delete will actually perform a soft delete.
Not quite sure how viable these options are, as I personally am more of a NHibernate guy. These are theoretical options.
As for viewing the executed queries, there's a few possibilities to do that. You could attach a profiler to your SQL Server instance and look at the raw queries that are executed. There's also Entity Framework Profiler (By Ayende/Oren Eini) which isn't free, but it does make reading and debugging the queries a lot easier.
Yes. There is a benefit to overriding them.
Not everybody actually updates or deletes a row of data when an update or delete happens.
In some cases, deleting a record really simply means setting an EffictiveUntil date to an existing record and keeping the record in the database for historical purposes.
The same can go for an Update. Instead of updating an existing row, the current row gets the EffectiveUntil date set and a brand new row gets inserted with the new data with a null EffectiveUntil date (or similar mechanism).
By providing Insert/Update/Delete logic to Entity Framework, you are allowed to specify exactly what those operations mean in terms of your database rather than what they mean in the scope of an RDBMS.
As for the second question (that I apparently originally missed), if you're happy with what is currently being generated, then no it's not worth creating them. You'd just add the extra headache of having to remember to update your Stored Procedures whenever you change the table structure.

Categories