Big Batch of Entity Framework Updates Much Slower Than Batching Myself

Big Batch of Entity Framework Updates Much Slower Than Batching Myself - c#

Updating a bunch of records is much slower using what I think are standard entity framework techniques than batching the same queries it would generate myself. For 250 records I see entity framework about 10 times as slow. For 1000 records it goes up to about 20 times slower.
When I log the database activity for entity framework, I see it is generating the same basic queries I would generate myself, but it seems to be running them one at a time instead of all at once, even though I only call SaveChanges once. Is there any way to ask it to run the queries all at once?
I can't do a simple mass SQL update because in my real use case each row needs to be processed separately to determine what to set the fields to.
Sample timing code is below:
var stopwatchEntity = new System.Diagnostics.Stopwatch();
var stopwatchUpdate = new System.Diagnostics.Stopwatch();
using (var dbo = new ProjDb.dbo("Server=server;Database=database;Trusted_Connection=True;"))
{
var resourceIds = dbo.Resources.Select(r => r.ResourceId).Take(250).ToList();
//dbo.Database.Log += (s) => System.Diagnostics.Debug.WriteLine(s);
stopwatchEntity.Start();
foreach (var resourceId in resourceIds)
{
var resource = new ProjDb.Models.dbo.Resource { ResourceId = resourceId };
dbo.Resources.Attach(resource);
resource.IsBlank = false;
}
dbo.SaveChanges();
stopwatchEntity.Stop();
stopwatchUpdate.Start();
var updateStr = "";
foreach (var resourceId in resourceIds)
updateStr += "UPDATE Resources SET IsBlank = 0 WHERE ResourceId = " + resourceId + ";";
dbo.Database.ExecuteSqlCommand(updateStr);
stopwatchUpdate.Stop();
MessageBox.Show(stopwatchEntity.Elapsed.TotalSeconds.ToString("f") + ", " + stopwatchUpdate.Elapsed.TotalSeconds.ToString("f"));
}

As #EricEJ and #Kirchner reported, EF6 doesn't support batch update. However, some third-party libraries do.
Disclaimer: I'm the owner of the project Entity Framework Plus
EF+ Batch Update allows updating multiples rows with the same value/formula.
For example:
context.Resources
.Where(x => resourceIds.Contains(x => x.ResourceId)
.Update(x => new Resource() { IsBlank = false });
Since entities are not loaded in the context, you should get the best performance available.
Read more: http://entityframework-plus.net/batch-update
Disclaimer: I'm the owner of the project Entity Framework Extensions
If the value must differ from a row to another, this library allows BulkUpdate features. This library is a paid library but pretty much supports everything you need for performance:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
For example:
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
context.BulkMerge(customers);

Entity Framework 6 does not support batching, EF Core does

Related

EF 6 performance while updating multiple records with different values in same table

I have a table , whose values are updated conditional basis, and when I am calling
db.SaveChanges()
there is a huge performance drop.
I am also setting properties
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
still results are not as expected.
Edit 1:
using(var db= new MyEntities())
{
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
foreach(var acc in myacclist)
{
//will update my account objects here
}
db.SaveChanges();
}

Unfortunately, there is no way you will be able to have good performance with Entity Framework and SaveChanges.
SaveChange makes a database round-trip for every record update. So if you currently have 10,000 accounts, 10k database round-trip is performed.
Setting AutoDetectChangesEnabled and ValidateOnSaveEnabled is usually a very bad idea and will not really improve the performance since it the number of database round-trip the real issue.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to dramatically improve performance by performing:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
Example:
using(var db= new MyEntities())
{
foreach(var acc in myacclist)
{
//will update my account objects here
}
db.BulkSaveChanges();
}

Building a Update query in string builder and saving it for every 1k records improved my performance
using (var db = new MyEntities())
{
StringBuilder finalquery = new StringBuilder();
int i = 0;
foreach (var acc in myacclist)
{
i++;
//will update my account objects here
finalquery.Append(stmnt);
if (1 % 1000 = 0) { db.Database.ExecuteSqlCommand(finalquery.ToString()); }
}
db.SaveChanges();
}

Inserting many rows with Entity Framework is extremely slow

I'm using Entity Framework to build a database. There's two models; Workers and Skills. Each Worker has zero or more Skills. I initially read this data into memory from a CSV file somewhere, and store it in a dictionary called allWorkers. Next, I write the data to the database as such:
// Populate database
using (var db = new SolverDbContext())
{
// Add all distinct skills to database
db.Skills.AddRange(allSkills
.Distinct(StringComparer.InvariantCultureIgnoreCase)
.Select(s => new Skill
{
Reference = s
}));
db.SaveChanges(); // Very quick
var dbSkills = db.Skills.ToDictionary(k => k.Reference, v => v);
// Add all workers to database
var workforce = allWorkers.Values
.Select(i => new Worker
{
Reference = i.EMPLOYEE_REF,
Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
DefaultRegion = "wa",
DefaultEfficiency = i.TECH_EFFICIENCY
});
db.Workers.AddRange(workforce);
db.SaveChanges(); // This call takes 00:05:00.0482197
}
The last db.SaveChanges(); takes over five minutes to execute, which I feel is far too long. I ran SQL Server Profiler as the call is executing, and basically what I found was thousands of calls to:
INSERT [dbo].[SkillWorkers]([Skill_SkillId], [Worker_WorkerId])
VALUES (#0, #1)
There are 16,027 rows being added to SkillWorkers, which is a fair amount of data but not huge by any means. Is there any way to optimize this code so it doesn't take 5min to run?
Update: I've looked at other possible duplicates, such as this one, but I don't think they apply. First, I'm not bulk adding anything in a loop. I'm doing a single call to db.SaveChanges(); after every row has been added to db.Workers. This should be the fastest way to bulk insert. Second, I've set db.Configuration.AutoDetectChangesEnabled to false. The SaveChanges() call now takes 00:05:11.2273888 (In other words, about the same). I don't think this really matters since every row is new, thus there are no changes to detect.
I think what I'm looking for is a way to issue a single UPDATE statement containing all 16,000 skills.

One easy method is by using the EntityFramework.BulkInsert extension.
You can then do:
// Add all workers to database
var workforce = allWorkers.Values
.Select(i => new Worker
{
Reference = i.EMPLOYEE_REF,
Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
DefaultRegion = "wa",
DefaultEfficiency = i.TECH_EFFICIENCY
});
db.BulkInsert(workforce);

What does Future() means in NHibernate?

I'm new to NHibernate
the description for IEnumerable Future(); says the following
// Summary:
// Get a enumerable that when enumerated will execute a batch of queries in
// a single database roundtrip
Just wondering what does it means, the description has nothing to do with the word 'future'

Future allows to execute two or more sql in a single roundtrip, as long as the database supports it.
It's also almost transparent, so you'll want to use Futures whenever possible. If NHibernate can't execute the queries in a single roundtrip, it will execute the queries in two or more, as expected.
From http://ayende.com/blog/3979/nhibernate-futures
Let us take a look at the following piece of code:
using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
var blogs = s.CreateCriteria<Blog>()
.SetMaxResults(30)
.List<Blog>();
var countOfBlogs = s.CreateCriteria<Blog>()
.SetProjection(Projections.Count(Projections.Id()))
.UniqueResult<int>();
Console.WriteLine("Number of blogs: {0}", countOfBlogs);
foreach (var blog in blogs)
{
Console.WriteLine(blog.Title);
}
tx.Commit();
}
This code would generate two queries to the database
Two queries to the database is a expensive, we can see that it took us
114ms to get the data from the database. We can do better than that, let us
tell NHibernate that it is free to do the optimization in any way that it likes
using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
var blogs = s.CreateCriteria<Blog>()
.SetMaxResults(30)
.Future<Blog>();
var countOfBlogs = s.CreateCriteria<Blog>()
.SetProjection(Projections.Count(Projections.Id()))
.FutureValue<int>();
Console.WriteLine("Number of blogs: {0}", countOfBlogs.Value);
foreach (var blog in blogs)
{
Console.WriteLine(blog.Title);
}
tx.Commit();
}
Instead of going to the database twice, we only go once, with both
queries at once. The speed difference is quite dramatic, 80 ms instead
of 114 ms, so we saved about 30% of the total data access time and a
total of 34 ms.

Optimize query on unknown database

A third party application creates one database per project. All the databases have the same tables and structure. New projects may be added at anytime so I can't use any EF schema.
What I do now is:
private IEnumerable<Respondent> getListRespondentWithStatuts(string db)
{
return query("select * from " + db + ".dbo.respondent");
}
private List<Respondent> query(string sqlQuery)
{
using (var sqlConx = new SqlConnection(Settings.Default.ConnectionString))
{
sqlConx.Open();
var cmd = new SqlCommand(sqlQuery, sqlConx);
return transformReaderIntoRespondentList(cmd.ExecuteReader());
}
}
private List<Respondent> transformReaderIntoRespondentList(SqlDataReader sqlDataReader)
{
var listeDesRépondants = new List<Respondent>();
while (sqlDataReader.Read())
{
var respondent = new Respondent
{
CodeRépondant = (string)sqlDataReader["ResRespondent"],
IsActive = (bool?)sqlDataReader["ResActive"],
CodeRésultat = (string)sqlDataReader["ResCodeResult"],
Téléphone = (string)sqlDataReader["Resphone"],
IsUnContactFinal = (bool?)sqlDataReader["ResCompleted"]
};
listeDesRépondants.Add(respondent);
}
return listeDesRépondants;
}
This works fine, but it is deadly slow (20 000 records per minutes). Do you have any hints on what strategy should be faster? For info, what is really slow is transformReaderIntoRespondentList method
Thanks!!

Generally speaking anything SELECT * FROM is bad practice, but it could also be resulting in you having to pull back more data than is actually required. The transform is operating on only a few columns are more columns than required being returned? Consider replacing with:
private IEnumerable<Respondent> getListRespondentWithStatuts(string db)
{
return query("select ResRespondent, ResActive, ResCodeResult, Resphone, ResCompleted from " + db + ".dbo.respondent");
}
Also, gaurd against SQL-Injection attacks; concating strings for SQL queries is very dangerous.
When pulling data from a DataReader, I find that using the non-named lookups work best:
var respondent = new Respondent
{
CodeRépondant = sqlDataReader.GetString(0),
IsActive = sqlDataReader.IsDBNull(1) ? (Boolean?)null : sqlDataReader.GetBoolean(1),
CodeRésultat = sqlDataReader.GetString(2),
Téléphone = sqlDataReader.GetString(3),
IsUnContactFinal = sqlDataReader.IsDBNull(4) ? (Boolean?)null : sqlDataReader.GetBoolean(4)
};
I have not explcicitly tested the performance difference in a long while; but that used to make a notable difference. The ordinal checks did not have to do a named lookup and also avoided boxing/unboxing values.
Other than that, without more info it is hard to say... do you need all 20,000 records?
UPDATE
Ran a simple local test case with 300,000 records and reduced the time to load all data by almost 50%. I imagine these results will vary depending on the type of data being retrieved; but it still does make a difference on overall execution time. That being said, in my environment we are talking a drop from 650ms to just over 300ms.
NOTE
If respondent is a view, what is likely "really slow" is the database building up the result set; although the data reader will start processing information as soon as records are available, the ultimate bottleneck will be the database itself and/or network latency. Other than the above optimizations, there is not going to be much that you can do with your code unless you can index the view/table to optimize the query and or reduce the information required.

Update a record without first querying?

Lets say I query the database and load a list of items. Then I open one of the items in a detail view form, and instead of re-querying the item out of the database, I create an instance of the item from the datasource in the list.
Is there a way I can update the database record without fetching the record of the individual item?
Here is a sample how I am doing it now:
dataItem itemToUpdate = (from t in dataEntity.items
where t.id == id
select t).FirstOrDefault();
Then after pulling the record I update some values in the item and push the record back:
itemToUpdate.itemstatus = newStatus;
dataEntity.SaveChanges();
I would think there would be a better way to do this, any ideas?

You should use the Attach() method.
Attaching and Detaching Objects

You can also use direct SQL against the database using the context of the datastore. Example:
dataEntity.ExecuteStoreCommand
("UPDATE items SET itemstatus = 'some status' WHERE id = 123 ");
For performance reasons, you may want to pass in variables instead of a single hard coded SQL string. This will allow SQL Server to cache the query and reuse with parameters. Example:
dataEntity.ExecuteStoreCommand
("UPDATE items SET itemstatus = 'some status' WHERE id = {0}", new object[] { 123 });
UPDATE - for EF 6.0
dataEntity.Database.ExecuteSqlCommand
("UPDATE items SET itemstatus = 'some status' WHERE id = {0}", new object[] { 123 });

The code:
ExampleEntity exampleEntity = dbcontext.ExampleEntities.Attach(new ExampleEntity { Id = 1 });
exampleEntity.ExampleProperty = "abc";
dbcontext.Entry<ExampleEntity>(exampleEntity).Property(ee => ee.ExampleProperty).IsModified = true;
dbcontext.Configuration.ValidateOnSaveEnabled = false;
dbcontext.SaveChanges();
The result TSQL:
exec sp_executesql N'UPDATE [dbo].[ExampleEntities]
SET [ExampleProperty ] = #0
WHERE ([Id] = #1)
',N'#0 nvarchar(32),#1 bigint',#0='abc',#1=1
Note:
The "IsModified = true" line, is needed because when you create the new ExampleEntity object (only with the Id property populated) all the other properties has their default values (0, null, etc). If you want to update the DB with a "default value", the change will not be detected by entity framework, and then DB will not be updated.
In example:
exampleEntity.ExampleProperty = null;
will not work without the line "IsModified = true", because the property ExampleProperty, is already null when you created the empty ExampleEntity object, you needs to say to EF that this column must be updated, and this is the purpose of this line.

If the DataItem has fields EF will pre-validate (like non-nullable fields), we'll have to disable that validation for this context:
DataItem itemToUpdate = new DataItem { Id = id, Itemstatus = newStatus };
dataEntity.Entry(itemToUpdate).Property(x => x.Itemstatus).IsModified = true;
dataEntity.Configuration.ValidateOnSaveEnabled = false;
dataEntity.SaveChanges();
//dataEntity.Configuration.ValidateOnSaveEnabled = true;
Otherwise we can try satisfy the pre-validation and still only update the single column:
DataItem itemToUpdate = new DataItem
{
Id = id,
Itemstatus = newStatus,
NonNullableColumn = "this value is disregarded - the db original will remain"
};
dataEntity.Entry(itemToUpdate).Property(x => x.Itemstatus).IsModified = true;
dataEntity.SaveChanges();
Assuming dataEntity is a System.Data.Entity.DbContext
You can verify the query generated by adding this to the DbContext:
/*dataEntity.*/Database.Log = m => System.Diagnostics.Debug.Write(m);

Now native support for this in EF Core 7 — ExecuteUpdate:
Finally! After a long wait, EF Core 7.0 now has a natively supported way to run UPDATE (and also DELETE) statements while also allowing you to use arbitrary LINQ queries (.Where(u => ...)), without having to first retrieve the relevant entities from the database: The new built-in method called ExecuteUpdate — see "What's new in EF Core 7.0?".
ExecuteUpdate is precisely meant for these kinds of scenarios, it can operate on any IQueryable instance, and lets you update specific columns on any number of rows, while always issuing a single UPDATE statement behind the scenes, making it as efficient as possible.
Usage:
Imagine you wanted to update the Email column of a specific user:
dbContext.Users
.Where(u => u.Id == someId)
.ExecuteUpdate(b =>
b.SetProperty(u => u.Email, "NewEmail#gmail.com")
);
As you can see, calling ExecuteUpdate requires you to make calls to the SetProperty method, to specify which property to update, and also what new value to assign to it.
EF Core will translate this into the following UPDATE statement:
UPDATE [u]
SET [u].[Email] = "NewEmail#gmail.com"
FROM [Users] AS [u]
WHERE [u].[Id] = someId
Also, ExecuteDelete for deleting rows:
There's also a counterpart to ExecuteUpdate called ExecuteDelete, which, as the name implies, can be used to delete a single or multiple rows at once without having to first fetch them.
Usage:
// Delete all users that haven't been active in 2022:
dbContext.Users
.Where(u => u.LastActiveAt.Year < 2022)
.ExecuteDelete();
Similar to ExecuteUpdate, ExecuteDelete will generate DELETE SQL statements behind the scenes — in this case, the following one:
DELETE FROM [u]
FROM [Users] AS [u]
WHERE DATEPART(year, [u].[LastActiveAt]) < 2022
Other notes:
Keep in mind that both ExecuteUpdate and ExecuteDelete are "terminating", meaning that the update/delete operation will take place as soon as you call the method. You're not supposed to call dbContext.SaveChanges() afterwards.
If you're curious about the SetProperty method, and you're confused as to why ExectueUpdate doesn't instead receive a member initialization expression (e.g. .ExecuteUpdate(new User { Email = "..." }), then refer to this comment (and the surrounding ones) on the GitHub issue for this feature.
Furthermore, if you're curious about the rationale behind the naming, and why the prefix Execute was picked (there were also other candidates), refer to this comment, and the preceding (rather long) conversation.
Both methods also have async equivalents, named ExecuteUpdateAsync, and ExecuteDeleteAsync respectively.

I recommend using Entity Framework Plus
Updating using Entity Framework Core can be very slow if you need to update hundreds or thousands of entities with the same expression. Entities are first loaded in the context before being updated which is very bad for the performance and then, they are updated one by one which makes the update operation even worse.
EF+ Batch Update updates multiple rows using an expression in a single database roundtrip and without loading entities in the context.
// using Z.EntityFramework.Plus; // Don't forget to include this.
// UPDATE all users inactive for 2 years
var date = DateTime.Now.AddYears(-2);
ctx.Users.Where(x => x.LastLoginDate < date)
.Update(x => new User() { IsSoftDeleted = 1 });

Simple and elegant extension method:
I've written an extension method for DbContext that does exactly what the OP asked for.
In addition to that, it only requires you to provide a member initialization expression (e.g. new User { ... }), and it then figures out on its own what properties you've changed, so you won't have to specify them by hand:
public static void UpdateEntity<TEntity>(
this DbContext context,
int id,
Expression<Func<TEntity>> updateExpression
) where TEntity : BaseEntity, new()
{
if (updateExpression.Body is not MemberInitExpression memberInitExpr)
throw new ArgumentException("The update expression should be a member initialization.");
TEntity entityToUpdate = updateExpression.Compile().Invoke();
entityToUpdate.Id = id;
context.Attach(entityToUpdate);
var updatedPropNames = memberInitExpr.Bindings.Select(b => b.Member.Name);
foreach (string propName in updatedPropNames)
context.Entry(entityToUpdate).Property(propName).IsModified = true;
}
You also need a BaseEntity class or interface that has your primary key in it, like:
public abstract class BaseEntity
{
public int Id { get; set; }
}
Usage:
Here's how you'd use the method:
dbContext.UpdateEntity(1234 /* <- this is the ID */, () => new User
{
Name = "New Name",
Email = "TheNewEmail#gmail.con",
});
dbContext.SaveChanges();
Nice and simple! :D
And here's the resulting SQL that gets generated by Entity Framework:
UPDATE [Users]
SET [Name] = #p0, [Email] = #p1
WHERE [Id] = #p2;
Limitation:
This method only allows you to update a single row using its primary key.
So, it doesn't work with .Where(...), IQueryable<...>, and so on. If you don't have the PK, or you want to bulk-update, then this wouldn't be your best option. In general, if you have more complex update operations, then I'd recommend you use Entity Framework Plus, or similar libraries.

It works somewhat different in EF Core:
There may be a faster way to do this in EF Core, but the following ensures an UPDATE without having to do a SELECT (tested with EF Core 2 and JET on the .NET Framework 4.6.2):
Ensure your model does not have IsRequired properties
Then use the following template (in VB.NET):
Using dbContext = new MyContext()
Dim bewegung = dbContext.MyTable.Attach(New MyTable())
bewegung.Entity.myKey = someKey
bewegung.Entity.myOtherField = "1"
dbContext.Entry(bewegung.Entity).State = EntityState.Modified
dbContext.Update(bewegung.Entity)
Dim BewegungenDescription = (From tp In dbContext.Model.GetEntityTypes() Where tp.ClrType.Name = "MyTable" Select tp).First()
For Each p In (From prop In BewegungenDescription.GetProperties() Select prop)
Dim pp = dbContext.Entry(bewegung.Entity).Property(p.Name)
pp.IsModified = False
Next
dbContext.Entry(bewegung.Entity).Property(Function(row) row.myOtherField).IsModified = True
dbContext.SaveChanges()
End Using

ef core 7 :
public async Task<int> Update(UpdateLevelVm vm)
{
return await _db.Levels.Where(l => l.Id == vm.LevelId)
.ExecuteUpdateAsync(u => u
.SetProperty(l => l.GradeId, vm.GradeId)
.SetProperty(l => l.Title, vm.Title)
);
}

this has worked for me in EF core 3.1
await _unitOfWork.Context.Database.ExecuteSqlRawAsync("UPDATE Student SET Age = 22 Where StudentId = 123");

Generally speaking, if you used Entity Framework to query all the items, and you saved the entity object, you can update the individual items in the entity object and call SaveChanges() when you are finished. For example:
var items = dataEntity.Include("items").items;
// For each one you want to change:
items.First(item => item.id == theIdYouWant).itemstatus = newStatus;
// After all changes:
dataEntity.SaveChanges();
The retrieval of the one item you want should not generate a new query.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Big Batch of Entity Framework Updates Much Slower Than Batching Myself - c#

Entity Framework 6 does not support batching, EF Core does

Related

EF 6 performance while updating multiple records with different values in same table

Inserting many rows with Entity Framework is extremely slow

What does Future() means in NHibernate?

Optimize query on unknown database

Update a record without first querying?

Categories

Resources