Bulk deleting rows with RemoveRange()

Bulk deleting rows with RemoveRange() - c#

I am trying to delete multiple rows from a table.
In regular SQL Server, this would be simple as this:
DELETE FROM Table
WHERE
Table.Column = 'SomeRandomValue'
AND Table.Column2 = 'AnotherRandomValue'
In Entity Framework 6, they have introduced RemoveRange() method.
However, when I use it, rather than deleting rows using the where clauses that I provided, Entity Framework queries the database to get all rows that match the where clauses and delete them one by one using their primary keys.
Is this the current limitation of EntityFramework?
Or am I using RemoveRange() wrong?
Following is how I am using RemoveRange():
db.Tables.RemoveRange(
db.Tables
.Where(_ => _.Column == 'SomeRandomValue'
&& _.Column2 == 'AnotherRandomValue')
);

I think we reached here a limitation of EF.
Sometimes you just have to use ExecuteSqlCommand to stay performant.

What you are looking for is a Batch Delete Library which deletes multiple records in a database from a LINQ Query without loading entities.
Multiple libraries supporting this feature exist.
You can find the list here: Entity Framework Batch Delete Library
Disclaimer: I'm the owner of the project Entity Framework Plus
// using Z.EntityFramework.Plus; // Don't forget to include this.
// DELETE directly in SQL (without loading entities)
db.Tables.Where(_ => _.Column == 'SomeRandomValue'
&& _.Column2 == 'AnotherRandomValue')
.Delete();
// DELETE using a BatchSize
db.Tables.Where(_ => _.Column == 'SomeRandomValue'
&& _.Column2 == 'AnotherRandomValue')
.Delete(x => x.BatchSize = 1000);
Wiki: EF+ Batch Delete

It's a bit broken, try
db.Tables.RemoveRange(
db.Tables
.Where(_ => _.Column == 'SomeRandomValue'
&& _.Column2 == 'AnotherRandomeValue').AsEnumerable().ToList()
);
db.SaveChanges();

var db1 = db.Tables
.Where(_ => _.Column == 'SomeRandomValue'
&& _.Column2 == 'AnotherRandomeValue').AsEnumerable().ToList();
db.Tables.RemoveRange(db1);
db.SaveChanges();
Use an Variable to Store Removable list and pass it to RemoveRange().
I Usually do like this, and That's Work.
Hope This also work in your case.

Step back and think. Do you really want to download the records from the database in order to delete them? Just because you can doesn't make it a good idea.
Perhaps you could consider deleting the items from the database with a stored procedure? EF also allows to do that...

I'm dealing with this myself, and agree with Adi - just use sql.
I'm cleaning up old rows in a log table, and EF RemoveRange took 3 minutes to do the same thing this did in 3 seconds:
DELETE FROM LogEntries WHERE DATEDIFF(day, GETDATE(), Date) < -6
Date is the name of the column containing the date. To make it correct, use a parameter, of course, like this:
context.Database.ExecuteSqlCommand
("DELETE FROM LogEntries WHERE DATEDIFF(day, GETDATE(), Date) < #DaysOld", new System.Data.SqlClient.SqlParameter(
"DaysOld", - Settings.DaysToKeepDBLogEntries));
Note that there are a lot of rows involved in my case. When I started the project and didn't have a lot of data, RemoveRange worked fine.

I have met this problem as well, this is my workaround with a library called entityframework.extended by LoreSoft (which you can get it from nuget package manager):
1.You first query them your list.
2.Then use .Delete(), which is a function from .extended library.
var removeList = db.table.Where(_ => _.Column == 'SomeRandomValue'&& _.Column2 == 'AnotherRandomValue')
removeList .Delete();
Note: According to Entity Framework EF extended, as of the time writing, this entityframework.extended is deprecated. Therefore one might need to consider entityframework plus library. While i have not tested on the plus extension, but i can confirmed working it works using .extended library.

Related

Entity Framework equivalent of SQL relative update

If I want to increase the value in a table column with SQL I can do it as follows:
UPDATE mytable m SET m.mycolumn = m.mycolumn + 1;
This is great because it doesn't rely on being executed in any order and uses an absolute minimum of locking.
How can this be done using C# and Entity Framework, with the same (or as close as possible) minimal overhead?

In short, you can't. You can execute a sql statement through the ado.net properties available on the DbContext.Database. You could also created a stored procedure if this is a specific action that occurs where the stored proc does this. That is the closest you will get to this.
This answer assumes you already know how to retrieve all the records, iterate over them to change the value of the property, and then persist the changes using SaveChanges on the DbContext and that this is not what you are looking for as it generates 1 update statement per record.

As already said, it's not possible in Pure EF.
As always you're free to create your pure SQL statements and run them through your context.
The closest thing to your question is to use Entity Framework Extensions. The have an UpdateFromQuery method: https://entityframework-extensions.net/update-from-query
This would look like this:
context.mytable.UpdateFromQuery(x => new mytable() {mycolumn = x.mycolumn + 1});

You can use EF Core extension linq2db.EntityFrameworkCore (disclaimer: I'm one of the creators)
context.mytable
.Set(x => x.mycolumn, x => x.mycolumn + 1)
.Update();
Library has it's own LINQ Translator which eliminates a lot of EF Core limitations.

dbContext.Entity.ForEach(x -> x.property = value);
or
dbContext.Entity.Select(x -> {x.property =100; return c;}).ToList();
See Using LINQ to Update A Property in a List of Entities

Include() vs Select() performance

I have a parent entity with a navigation property to a child entity. The parent entity may not be removed as long as there are associated records in the child entity. The child entity can contain hundreds of thousands of records.
I'm wondering what will be the most efficient to do in Entity Framework to do this:
var parentRecord = _context.Where(x => x.Id == request.Id)
.Include(x => x.ChildTable)
.FirstOrDefault();
// check if parentRecord exists
if (parentRecord.ChildTable.Any()) {
// cannot remove
}
or
var parentRecord = _context.Where(x => x.Id == request.Id)
.Select(x => new {
ParentRecord = x,
HasChildRecords = x.ChildTable.Any()
})
.FirstOrDefault();
// check if parentRecord exists
if (parentRecord.HasChildRecords) {
// cannot remove
}
The first query may include thousands of records while the second query will not, however, the second one is more complex.
Which is the best way to do this?

I would say it depens. It depends on which DBMS you're using. it depends on how good the optimizer works etc.
So one single statement with a JOIN could be far faster than a lot of SELECT statements.
In general I would say when you need the rows from your Child table use .Include(). Otherwise don't include them.
Or in simple words, just read the data you need.

The answer depends on your database design. Which columns are indexed? How much data is in table?
Include() offloads work to your C# layer, but means a more simple query. It's probably the better choice here but you should consider extracting the SQL that is generated by entity framework and running each through an optimisation check.
You can output the sql generated by entity framework to your visual studio console as note here.
This example might create a better sql query that suites your needs.

How do I update multiple Entity models in one SQL statement?

I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?

Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.

Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.

How to boost Entity Framework Unit Of Work Performance

Please see the following situation:
I do have a CSV files of which I import a couple of fields (not all in SQL server using Entity Framework with the Unit Of Work and Repository Design Pattern).
var newGenericArticle = new GenericArticle
{
GlnCode = data[2],
Description = data[5],
VendorId = data[4],
ItemNumber = data[1],
ItemUOM = data[3],
VendorName = data[12]
};
var unitOfWork = new UnitOfWork(new AppServerContext());
unitOfWork.GenericArticlesRepository.Insert(newGenericArticle);
unitOfWork.Commit();
Now, the only way to uniquely identify a record, is checking on 4 fields: GlnCode, Description, VendorID and Item Number.
So, before I can insert a record, I need to check whether or not is exists:
var unitOfWork = new UnitOfWork(new AppServerContext());
// If the article is already existing, update the vendor name.
if (unitOfWork.GenericArticlesRepository.GetAllByFilter(
x => x.GlnCode.Equals(newGenericArticle.GlnCode) &&
x.Description.Equals(newGenericArticle.Description) &&
x.VendorId.Equals(newGenericArticle.VendorId) &&
x.ItemNumber.Equals(newGenericArticle.ItemNumber)).Any())
{
var foundArticle = unitOfWork.GenericArticlesRepository.GetByFilter(
x => x.GlnCode.Equals(newGenericArticle.GlnCode) &&
x.Description.Equals(newGenericArticle.Description) &&
x.VendorId.Equals(newGenericArticle.VendorId) &&
x.ItemNumber.Equals(newGenericArticle.ItemNumber));
foundArticle.VendorName = newGenericArticle.VendorName;
unitOfWork.GenericArticlesRepository.Update(foundArticle);
}
If it's existing, I need to update it, which you see in the code above.
Now, you need to know that I'm importing around 1.500.000 records, so quite a lot.
And it's the filter which causes the CPU to reach almost 100%.
The `GetAllByFilter' method is quite simple and does the following:
return !Entities.Any() ? null : !Entities.Where(predicate).Any() ? null : Entities.Where(predicate).AsQueryable();
Where predicate equals Expression<Func<TEntity, bool>>
Is there anything that I can do to make sure that the server's CPU doesn't reach 100%?
Note: I'm using SQL Server 2012
Kind regards

Wrong tool for the task. You should never process a million+ records one at at time. Insert the records to a staging table using bulk insert and clean (if need be) and then use a stored proc to do the processing in a set-based way or use the tool designed for this, SSIS.

I've found another solution which wasn't proposed here, so I'll be answering my own question.
I will have a temp table in which I will import all the data, and after the import, I'll execute a stored procedure which will execute a Merge command to populate the destinatio table. I do believe that this is the most performant.

Have you indexed on those four fields in your database? That is the first thing that I would do.
Ok, I would recommend trying the following:
Improving bulk insert performance in Entity framework
To summarize,
Do not call SaveChanges() after every insert or update. Instead, call every 1-2k records so that the inserts/updates are made in batches to the database.
Also, optionally change the following parameters on your context:
yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;

Entity Framework - How many trips to the database on DeleteObject

Is this entity framework call actually making two trips to the database?
var result = from p in entities.people where p.id == 6 select p;
entities.DeleteObject(result);
It strikes me that maybe DeleteObject would force the first trip to get results and THEN, having the object to work with, would execute the delete function.
If that is the case, how do I avoid the second trip? How does one do a remove-by-query in entity framework with a single database trip?
Thanks!
EDIT
My original example was misleading, because it was a query by primary key. I guess the real question is whether there is a way to have a single-trip function that can delete items from an IQueryable. For example:
var result = from p in entities.people where p.cityid == 6 select p;
foreach (var r in result)
{
entities.DeleteObject(r);
}
(Notice that the query is of a foreign key, so there may be multiple results).

You can do it like this:
entities.ExecuteStoreCommand("DELETE FROM people WHERE people.cityid={0}", 6);
this is one trip to the database for sure, and effective as it can be.
EDIT:
Also, take a look here, they suggest the same solution. And to answer the question, this is the only way to delete entities, not referenced by primary key, from a database using entity framework, without fetching these entities (and without writing some helper extension methods like suggested in this answer).

Direct delete:
var people = new people() { id = 6 };
entities.people.Attach(people);
entities.people.Remove(people);
entities.SaveChanges();
If you want to see it for yourself, fire up a profiler.
EDIT:
This will allow you to use Linq but it won't be one trip.
var peopleToDelete = entities.people.Where(p => p.id == 6);
foreach (var people in peopleToDelete )
entities.people.DeleteObject(people );
entities.SaveChanges();
There's no easy way to do that out of the box in EF, a big annoyance indeed (as long as one does not want to resort to using direct SQL, which personally I don't). One of the other posters links to an answer that in turn links to this article, which describes a way to make your own function for this, using ToTraceString.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Bulk deleting rows with RemoveRange() - c#

I think we reached here a limitation of EF. Sometimes you just have to use ExecuteSqlCommand to stay performant.

It's a bit broken, try db.Tables.RemoveRange( db.Tables .Where(_ => _.Column == 'SomeRandomValue' && _.Column2 == 'AnotherRandomeValue').AsEnumerable().ToList() ); db.SaveChanges();

Step back and think. Do you really want to download the records from the database in order to delete them? Just because you can doesn't make it a good idea. Perhaps you could consider deleting the items from the database with a stored procedure? EF also allows to do that...

Related

Entity Framework equivalent of SQL relative update

Include() vs Select() performance

How do I update multiple Entity models in one SQL statement?

How to boost Entity Framework Unit Of Work Performance

Entity Framework - How many trips to the database on DeleteObject

Categories

Resources