Any decent compiler should eliminate dead code, at least to a certain extent. However, I am curious how a compiler (specifically MSBuild) handles a situation like the following:
// let's assume LazyLoadingEnabled = false;
var users = db.Users.ToList();
// more code that never touches 'users'
Since LazyLoadingEnabled = false, will the compiled code:
Eagerly load the results from the database call
Make the call to the database without storing the results
or
Never make the call to begin with?
I was cleaning up some old code at work and I found several cases of this occurring, so I'm curious as to whether we've been wasting resources or not.
It feels like the right answer is number 3, but I haven't found any solid evidence to back up my claims. Thank you for your help!
The answer is #1.
Not only will this execute the database query to select all the records from the Users table, but it will fetch all those records and construct entities for each of those records in the Users table. Very expensive if you have many records. Of course, the GC will eventually collect the wasted resources.
If you want to prove the above for yourself, just add the following line after you create your DbContext to log the SQL being executed:
db.Database.Log = s => Console.WriteLine(s);
BTW, the LazyLoadingEnabled setting has no effect on the observed behavior. The LazyLoadingEnabled setting determines if navigational properties are eagerly loaded or not. In this case, db.Users is not a navigational property, so it has no effect.
Related
I'm running into some speed issues in my project and it seems like the primary cause it calls to the database using entity framework. Every time I call the database, it is always done as
database.Include(...).Where(...)
and I'm wondering if that is different than
database.Where(...).Include(...)?
My thinking is that the first way includes everything for all the elements in the target table, then filters out the ones I want, while the second one filters out the ones I want, then only includes everything for those. I don't fully understand entity framework, so is my thinking correct?
Entity Framework delays its querying as long as it can, up until the point where your code start working on the data. Just to prove the example:
var query = db.People
.Include(p => p.Cars)
.Where(p => p.Employer.Name == "Globodyne")
.Select(p => p.Employer.Founder.Cars);
With all these chained calls, EF has not yet called the database. Instead, it has kept track of what you're trying to fetch, and it knows what query to run if you start working with the data. If you never do anything else with query after this point, then you will never hit the database.
However, if you do any of the following:
var result = query.ToList();
var firstCar = query.FirstOrDefault();
var founderHasCars = query.Any();
Now, EF is forced to look at the database because it cannot answer your question unless it actually fetches the data from the database. At this point, not before, does EF actually hit the database.
For reference, this trigger to fetch the data is often referred to as "enumerating the collection", i.e. turning a query into an actual result set.
By deferring the execution of that query for as long as possible, EF is able to wait and see if you're going to filter/order/paginate/transform/... the result set, which could lead to EF needing to return less data than when it executes every command immediately.
This also means that when you call Include, you're not actually hitting the database, so you're not going to be loading data from items that will later be filtered by your Where clause, if you didn't enumerate the collection.
Take these two examples:
var list1 = db.People
.Include(p => p.Cars)
.ToList() // <= enumeration
.Where(p => p.Name == "Bob");
var list2 = db.People
.Include(p => p.Cars)
.Where(p => p.Name == "Bob")
.ToList(); // <= enumeration
These lists will eventually yield the same result. However, the first list will fetch data before you filter it because you called ToList before Where. This means you're going to be loading all people and their cars in memory, only to then filter that list in memory.
The second list, however, will only enumerate the collection when it already knows about the Where clause, and therefore EF will only load people named Bob and their cars into memory. The filtering will happen on the database before it gets sent back to your runtime.
You did not show enough code for me to verify whether you are prematurely enumerating the collection. I hope this answer helps you in determining whether this is the cause of your performance issues.
database.Include(...).Where(...) and I'm wondering if that is different than database.Where(...).Include(...)?
Assuming this code is verbatim (except the missing db set) and there is nothing happening inbetween the Include and Where, the order does not change the execution and therefore it is not the source of your performance issue.
I generally advise you to put your Include statements before anything else (i.e. right after db.MyTable), as a matter of readability. The other operations depends on the specific query you're trying to construct.
Most of the times order of clauses will not make any difference
Include statement tells to SQL Join one table with another
While Where will results in.. yes, SQL Where
When you do something like database.Include(...).Where(...) you are building IQueryable object that will be transleted to direct SQL after you try to access it like with .ToList() or .FirstOrDefault() and those queries are already optimized
So if you still have performance issues - you should use profiler to look for bottlenecks and maybe consider using stored procedures (those could be integrated with EF)
I'm doing a Lot of Work With EntityFramework, like millions Inserts and Updates.
However, by time it Get Slower and Slower...
I tried usign some ways to improve performance. Like:
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
tried too:
db.Table.AsNoTracking();
When i change all this things it really gets Faster. However Memory used start to increases and until it give me exception.
Has anyone had this situation?
Thanks
The DbContext stores all the entities you have fetched or added to a DbSet. As others have suggested, you need to dispose of the context after each group of operations (a set of closely-related operations - e.g. a web request) and create a new one.
In the case of inserting millions of entities, that might mean creating a new context every 1,000 entities for example. This answer gives you all you need to know about inserting thousands of entities.
If you are doing only insertion and updates - try to use db.Database.SqlQuery(queryString, object).
Entity framework keeps in memory all attached objects. So having millions of them may cause a memory leak.
https://github.com/loresoft/EntityFramework.Extended offers a clean interface for doing faster bulk updates, and deletes. I think it only works with SQL Server, but it may give you a quick solution to your performance issue.
Updates can be done like this:
context.Users.Where(u => u.FirstName == "Firstname").Delete();
Deletes can be done in a similar fashion:
context.Tasks.Where(t => t.StatusId == 1).Update(t => new Task { StatusId = 2 });
For millions insert and Update, Everything give out of memory, i've tried all..
Only worked for me when i stop use the context and use ADO or Another Micro ORM like Dapper.
We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}
It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.
I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.
The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.
I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.
Basically, I insert 35000 objects within one transaction:
using(var uow = new MyContext()){
for(int i = 1; i < 35000; i++) {
var o = new MyObject()...;
uow.MySet.Add(o);
}
uow.SaveChanges();
}
This takes forever!
If I use the underlying ObjectContext (by using IObjectAdapter), it's still slow but takes around 20s. It looks like DbSet<> is doing some linear searches, which takes square amount of time...
Anyone else seeing this problem?
As already indicated by Ladislav in the comment, you need to disable automatic change detection to improve performance:
context.Configuration.AutoDetectChangesEnabled = false;
This change detection is enabled by default in the DbContext API.
The reason why DbContext behaves so different from the ObjectContext API is that many more functions of the DbContext API will call DetectChanges internally than functions of the ObjectContext API when automatic change detection is enabled.
Here you can find a list of those functions which call DetectChanges by default. They are:
The Add, Attach, Find, Local, or Remove members on DbSet
The GetValidationErrors, Entry, or SaveChanges members on DbContext
The Entries method on DbChangeTracker
Especially Add calls DetectChanges which is responsible for the poor performance you experienced.
I contrast to this the ObjectContext API calls DetectChanges only automatically in SaveChanges but not in AddObject and the other corresponding methods mentioned above. That's the reason why the default performance of ObjectContext is faster.
Why did they introduce this default automatic change detection in DbContext in so many functions? I am not sure, but it seems that disabling it and calling DetectChanges manually at the proper points is considered as advanced and can easily introduce subtle bugs into your application so use [it] with care.
Little empiric test with EF 4.3 CodeFirst:
Removed 1000 objects with AutoDetectChanges = true : 23 sec
Removed 1000 objects with AutoDetectChanges = false: 11 sec
Inserted 1000 objects with AutoDetectChanges = true : 21 sec
Inserted 1000 objects with AutoDetectChanges = false : 13 sec
In .netcore 2.0 this was moved to:
context.ChangeTracker.AutoDetectChangesEnabled = false;
Besides the answers you have found here. It is important to know that at the database level is is more work to insert than it is to add. The database has to extend/allocate new space. Then it has to update at least the primary key index. Although indexes may also be updated when updating, it is a lot less common. If there are any foreign keys it has to read those indexes as well to make sure referential integrity is maintained. Triggers can also play a role although those can affect updates the same way.
All that database work makes sense in daily insert activity originated by user entries. But if you are just uploading an existing database, or have a process that generates a lot of inserts. You may want to look at ways of speeding that up, by postponing it to the end. Normally disabling indexes while inserting is a common way. There is very complex optimizations that can be done depending on the case, they can be a bit overwhelming.
Just know that in general insert will take longer than updates.
I'm using Linq to Sql to query some database, i only use Linq to read data from the DB, and i make changes to it by other means. (This cannot be changed, this is a restriction from the App that we are extending, all updates must go trough its sdk).
This is fine, but I'm hitting some cache problems, basically, i query a row using Linq, then i delete it trough external means, and then i create a new row externally if i query that row again using linq i got the old (cached) data.
I cannot turn off Object Tracking because that seems to prevent the data context from auto loading associated propertys (Foreign Keys).
Is there any way to clear the DataContex cache?
I found a method sufring the net but it doesn't seem safe: http://blog.robustsoftware.co.uk/2008/11/clearing-cache-of-linq-to-sql.html
What do you think? what are my options?.
If you want to refresh a specific object, then the Refresh() method may be your best bet.
Like this:
Context.Refresh(RefreshMode.OverwriteCurrentValues, objectToRefresh);
You can also pass an array of objects or an IEnumerable as the 2nd argument if you need to refresh more than one object at a time.
Update
I see what you're talking about in comments, in reflector you see this happening inside .Refresh():
object objectByKey = context.Services.GetObjectByKey(trackedObject.Type, keyValues);
if (objectByKey == null)
{
throw Error.RefreshOfDeletedObject();
}
The method you linked seems to be your best option, the DataContext class doesn't provide any other way to clear a deleted row. The disposal checks and such are inside the ClearCache() method...it's really just checking for disposal and calling ResetServices() on the CommonDataServices underneath..the only ill-effect would be clearing any pending inserts, updates or deletes that you have queued.
There is one more option, can you fire up another DataContext for whatever operation you're doing? It wouldn't have any cache to it...but that does involve some computational cost, so if the pending insert, update and deletes aren't an issue, I'd stick with the ClearCache() approach.
I made this code to really CLEAR the "cached" entities, detaching it.
var entidades = Ctx.ObjectStateManager.GetObjectStateEntries(EntityState.Added | EntityState.Deleted | EntityState.Modified | EntityState.Unchanged);
foreach (var objectStateEntry in entidades)
Ctx.Detach(objectStateEntry.Entity);
Where Ctx are my Context.
You should be able to just requery the result sets that are using this objects. This would not pull a cached set, but would actually return the final results. I know that this may not be as easy or feasible depending on how you setup your app...
HTH.