Entity Framework Too slow /Memory leak - c#

I'm doing a Lot of Work With EntityFramework, like millions Inserts and Updates.
However, by time it Get Slower and Slower...
I tried usign some ways to improve performance. Like:
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
tried too:
db.Table.AsNoTracking();
When i change all this things it really gets Faster. However Memory used start to increases and until it give me exception.
Has anyone had this situation?
Thanks

The DbContext stores all the entities you have fetched or added to a DbSet. As others have suggested, you need to dispose of the context after each group of operations (a set of closely-related operations - e.g. a web request) and create a new one.
In the case of inserting millions of entities, that might mean creating a new context every 1,000 entities for example. This answer gives you all you need to know about inserting thousands of entities.

If you are doing only insertion and updates - try to use db.Database.SqlQuery(queryString, object).
Entity framework keeps in memory all attached objects. So having millions of them may cause a memory leak.

https://github.com/loresoft/EntityFramework.Extended offers a clean interface for doing faster bulk updates, and deletes. I think it only works with SQL Server, but it may give you a quick solution to your performance issue.
Updates can be done like this:
context.Users.Where(u => u.FirstName == "Firstname").Delete();
Deletes can be done in a similar fashion:
context.Tasks.Where(t => t.StatusId == 1).Update(t => new Task { StatusId = 2 });

For millions insert and Update, Everything give out of memory, i've tried all..
Only worked for me when i stop use the context and use ADO or Another Micro ORM like Dapper.

Related

Multiple Execute Calls

I'm trying to update a field on a phone call entity, then close it. Current to do so as far as I can tell, takes two calls. But this is painfully slow as it's taken 30 minutes to process 60 phone calls and I have around 200,000 to do. Is there a way to combine both into one call?
Here's my current code -
foreach (phonecall phonepointer in _businessEntityCollection.BusinessEntities.Cast<phonecall>()
.Where(phonepointer => phonepointer.statecode.Value == PhoneCallState.Open))
{
//Update fiserv_contactstatus value
phonepointer.fiserv_contactstatus = Picklist;
crmService.Update(phonepointer);
//Cancel activity
setStatePhoneCallRequest.PhoneCallState = PhoneCallState.Canceled;
setStatePhoneCallRequest.PhoneCallStatus = 200011;
setStatePhoneCallRequest.EntityId = phonepointer.activityid.Value;
crmService.Execute(setStatePhoneCallRequest);
}
Unfortunately, there's little you can do.
You COULD try and use the new SDK and the XRM context (strongly typed classes) to batch-update the phone call entities (this should be faster), but you'll still need to use the old-fashioned CrmService to actually change the state of each entity, one by one.
EDIT:
You could also directly change the state of the entities in the database, but this should be your last resort, as manual changes to the CRM DB are unsupported and dangerous.
Seriously, last resort! No, I'm NOT joking!

new objects added during long loop

We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}
It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.
I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.
The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.
I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.

Why is inserting entities in EF 4.1 so slow compared to ObjectContext?

Basically, I insert 35000 objects within one transaction:
using(var uow = new MyContext()){
for(int i = 1; i < 35000; i++) {
var o = new MyObject()...;
uow.MySet.Add(o);
}
uow.SaveChanges();
}
This takes forever!
If I use the underlying ObjectContext (by using IObjectAdapter), it's still slow but takes around 20s. It looks like DbSet<> is doing some linear searches, which takes square amount of time...
Anyone else seeing this problem?
As already indicated by Ladislav in the comment, you need to disable automatic change detection to improve performance:
context.Configuration.AutoDetectChangesEnabled = false;
This change detection is enabled by default in the DbContext API.
The reason why DbContext behaves so different from the ObjectContext API is that many more functions of the DbContext API will call DetectChanges internally than functions of the ObjectContext API when automatic change detection is enabled.
Here you can find a list of those functions which call DetectChanges by default. They are:
The Add, Attach, Find, Local, or Remove members on DbSet
The GetValidationErrors, Entry, or SaveChanges members on DbContext
The Entries method on DbChangeTracker
Especially Add calls DetectChanges which is responsible for the poor performance you experienced.
I contrast to this the ObjectContext API calls DetectChanges only automatically in SaveChanges but not in AddObject and the other corresponding methods mentioned above. That's the reason why the default performance of ObjectContext is faster.
Why did they introduce this default automatic change detection in DbContext in so many functions? I am not sure, but it seems that disabling it and calling DetectChanges manually at the proper points is considered as advanced and can easily introduce subtle bugs into your application so use [it] with care.
Little empiric test with EF 4.3 CodeFirst:
Removed 1000 objects with AutoDetectChanges = true : 23 sec
Removed 1000 objects with AutoDetectChanges = false: 11 sec
Inserted 1000 objects with AutoDetectChanges = true : 21 sec
Inserted 1000 objects with AutoDetectChanges = false : 13 sec
In .netcore 2.0 this was moved to:
context.ChangeTracker.AutoDetectChangesEnabled = false;
Besides the answers you have found here. It is important to know that at the database level is is more work to insert than it is to add. The database has to extend/allocate new space. Then it has to update at least the primary key index. Although indexes may also be updated when updating, it is a lot less common. If there are any foreign keys it has to read those indexes as well to make sure referential integrity is maintained. Triggers can also play a role although those can affect updates the same way.
All that database work makes sense in daily insert activity originated by user entries. But if you are just uploading an existing database, or have a process that generates a lot of inserts. You may want to look at ways of speeding that up, by postponing it to the end. Normally disabling indexes while inserting is a common way. There is very complex optimizations that can be done depending on the case, they can be a bit overwhelming.
Just know that in general insert will take longer than updates.

How do I speed up DbSet.Add()?

I have to import about 30k rows from a CSV file to my SQL database, this sadly takes 20 minutes.
Troubleshooting with a profiler shows me that DbSet.Add is taking the most time, but why?
I have these Entity Framework Code-First classes:
public class Article
{
// About 20 properties, each property doesn't store excessive amounts of data
}
public class Database : DbContext
{
public DbSet<Article> Articles { get; set; }
}
For each item in my for loop I do:
db.Articles.Add(article);
Outside the for loop I do:
db.SaveChanges();
It's connected with my local SQLExpress server, but I guess there isn't anything written till SaveChanges is being called so I guess the server won't be the problem....
As per Kevin Ramen's comment (Mar 29)
I can confirm that setting db.Configuration.AutoDetectChangesEnabled = false makes a huge difference in speed
Running Add() on 2324 items by default ran 3min 15sec on my machine, disabling the auto-detection resulted in the operation completing in 0.5sec.
http://blog.larud.net/archive/2011/07/12/bulk-load-items-to-a-ef-4-1-code-first-aspx
I'm going to add to Kervin Ramen's comment by saying that if you are only doing inserts (no updates or deletes) then you can, in general, safely set the following properties before doing any inserts on the context:
DbContext.Configuration.AutoDetectChangesEnabled = false;
DbContext.Configuration.ValidateOnSaveEnabled = false;
I was having a problem with a once-off bulk import at my work. Without setting the above properties, adding about 7500 complicated objects to the context was taking over 30 minutes. Setting the above properties (so disabling EF checks and change tracking) reduced the import down to seconds.
But, again, I stress only use this if you are doing inserts. If you need to mix inserts with updates/deletes you can split your code into two paths and disable the EF checks for the insert part and then re-enable the checks for the update/delete path. I have used this approach succesfully to get around the slow DbSet.Add() behaviour.
Each item in a unit-of-work has overhead, as it must check (and update) the identity manager, add to various collections, etc.
The first thing I would try is batching into, say, groups of 500 (change that number to suit), starting with a fresh (new) object-context each time - as otherwise you can reasonably expect telescoping performance. Breaking it into batches also prevents a megalithic transaction bringing everything to a stop.
Beyond that; SqlBulkCopy. It is designed for large imports with minimal overhead. It isn't EF though.
There is an extremely easy to use and very fast extension here:
https://efbulkinsert.codeplex.com/
It's called "Entity Framework Bulk Insert".
Extension itself is in namespace EntityFramework.BulkInsert.Extensions. So to reveal the extension method add using
using EntityFramework.BulkInsert.Extensions;
And then you can do this
context.BulkInsert(entities);
BTW - If you do not wish to use this extension for some reason, you could also try instead of running db.Articles.Add(article) for each article, to create each time a list of several articles and then use AddRange (new in EF version 6, along with RemoveRange) to add them together to the dbcontext.
I haven't really tried this, but my logic would be to hold on to ODBC driver to load file into datatable and then to use sql stored procedure to pass table to procedure.
For the first part, try:
http://www.c-sharpcorner.com/UploadFile/mahesh/AccessTextDb12052005071306AM/AccessTextDb.aspx
For the second part try this for SQL procedure:
http://www.builderau.com.au/program/sqlserver/soa/Passing-table-valued-parameters-in-SQL-Server-2008/0,339028455,339282577,00.htm
And create SqlCommnand object in c# and add to its Parameters collection SqlParameter that is SqlDbType.Structured
Well, I hope it helps.

How to clear the DataContext cache on Linq to Sql

I'm using Linq to Sql to query some database, i only use Linq to read data from the DB, and i make changes to it by other means. (This cannot be changed, this is a restriction from the App that we are extending, all updates must go trough its sdk).
This is fine, but I'm hitting some cache problems, basically, i query a row using Linq, then i delete it trough external means, and then i create a new row externally if i query that row again using linq i got the old (cached) data.
I cannot turn off Object Tracking because that seems to prevent the data context from auto loading associated propertys (Foreign Keys).
Is there any way to clear the DataContex cache?
I found a method sufring the net but it doesn't seem safe: http://blog.robustsoftware.co.uk/2008/11/clearing-cache-of-linq-to-sql.html
What do you think? what are my options?.
If you want to refresh a specific object, then the Refresh() method may be your best bet.
Like this:
Context.Refresh(RefreshMode.OverwriteCurrentValues, objectToRefresh);
You can also pass an array of objects or an IEnumerable as the 2nd argument if you need to refresh more than one object at a time.
Update
I see what you're talking about in comments, in reflector you see this happening inside .Refresh():
object objectByKey = context.Services.GetObjectByKey(trackedObject.Type, keyValues);
if (objectByKey == null)
{
throw Error.RefreshOfDeletedObject();
}
The method you linked seems to be your best option, the DataContext class doesn't provide any other way to clear a deleted row. The disposal checks and such are inside the ClearCache() method...it's really just checking for disposal and calling ResetServices() on the CommonDataServices underneath..the only ill-effect would be clearing any pending inserts, updates or deletes that you have queued.
There is one more option, can you fire up another DataContext for whatever operation you're doing? It wouldn't have any cache to it...but that does involve some computational cost, so if the pending insert, update and deletes aren't an issue, I'd stick with the ClearCache() approach.
I made this code to really CLEAR the "cached" entities, detaching it.
var entidades = Ctx.ObjectStateManager.GetObjectStateEntries(EntityState.Added | EntityState.Deleted | EntityState.Modified | EntityState.Unchanged);
foreach (var objectStateEntry in entidades)
Ctx.Detach(objectStateEntry.Entity);
Where Ctx are my Context.
You should be able to just requery the result sets that are using this objects. This would not pull a cached set, but would actually return the final results. I know that this may not be as easy or feasible depending on how you setup your app...
HTH.

Categories