Performance - Delete multiple MSSQL database entries with LINQ - c#

I have got a MSSQL database table with a few million entries.
Every new entry got an ID +1 from the last entry. That means that lower ID numbers are older entries. Now I want to delete old entries in the database with the help of it's ID.
I delete every "Entry" that is lower than the "maxID".
while (true)
{
List<Entry> entries = entity.Entry.Where(z => z.id < maxID).Take(1000).ToList();
foreach (var entry in entries)
{
entity.Entry.DeleteObject(entry);
}
if (entries < 1000)
{
break;
}
}
I can't take all entries with one query because this would raise a System.OutOfMemoryException. So I only took 1000 entries and repeat the delete function until every entry is deleted.
My question is: What would be the best number of entries to ".Take()" in performance?

It's faster to drop and recreate the tables in the database,
You can directly execute commands against the database by calling your stored procedure using
ExecuteStoreQuery method.
Any commands automatically generated by the Entity Framework may be more complex than similar commands written explicitly by a database developer. If you need explicit control over the commands executed against your data source, consider defining a mapping to a table-valued function or stored procedure. -MSDN
As i can see your code (Please correct me if i am wrong or improve the answer), Your code is actually loading entities in memory which is an overhead cost because you need to perform delete operation ,and your query will create delete query for each entity marked by
DeleteObject. So in terms of performance it will be better to call a stored procedure and execute your query directly against the database.
ExecuteStoreCommand Method
Directly Execute commands

Try this...
entity.Entry.Where(z => z.id < maxID).ToList().ForEach(entity.Entry.DeleteObject);
entity.SaveChanges();

Related

Best way to compare two large list, C#

This is for one of my ETL project to sync two database, some table is 4G, so ETL job just load updated data to insert or update, that works fine, but the source table will delete some records, and I want to delete from my table too. What I did is:
List<long> SourceIDList; // load all ID from source table
List<long> MyIDList; // load all ID from my table
var NeedRemoveIDList = MyIDList.Except( SourceIDList );
foreach(var ID in NeedRemoveIDList)
// remove from my table
The code logic work, but load ID from 4G table to List will through "out of memory" exception, is there better way?
Thanks for all the comments, I end up doing this in database, I insert two list into temp table, and use SQL to compare them, take some time to insert data, but since this is ETL job, few extra minutes is OK.

How to know how many persistent objects were deleted using Session.Delete(query);

We are refactoring a project from plain MySQL queries to the usage of NHibernate.
In the MySQL connector there is the ExecuteNonQuery function that returns the rows affected. So
int RowsDeleted = ExecuteNonQuery("DELETE FROM `table` WHERE ...");
would show me how many rows where effectively deleted.
How can I achieve the same with NHibernate? So far I can see it is not possible with Session.Delete(query);.
My current workaround is first loading all of the objects that are about to be deleted and delete them one-by-one, incrementing a counter on each delete. But that will cost performance I may assume.
If you don't mind that nHibernate will create delete statements for each row and maybe additional statements for orphans and/or other relationships, you can use session.Delete.
For better performance I would recommend to do batch deletes (see example below).
session.Delete
If you delete many objects with session.Delete, nHibernate makes sure that the integrity is preserved, it will load everything into the session if needed anyways. So there is no real reason to count your objects or have a method to retrieve the number of objects which have been deleted, because you would simply do a query before running the delete to determine the number of objects which will be affected...
The following statement will delete all entities of type post by id.
The select statement will query the database only for the Ids so it is actually very performant...
var idList = session.Query<Post>().Select(p => p.Id).ToList<int>();
session.Delete(string.Format("from Post where Id in ({0})", string.Join(",", idList.ToArray())));
The number of objects deleted will be equal to the number of Ids in the list...
This is actually the same (in terms of queries nHibernate will fire against your database) as if you would query<T> and loop over the result and delete all of them one by one...
Batch delete
You can use session.CreateSqlQuery to run native SQL commands. It also allows you to have input and output parameters.
The following statement would simply delete everything from the table as you would expect
session.CreateSQLQuery(#"Delete from MyTableName");
To retrieve the number of rows delete, we'll use the normal TSQL ##ROWCOUNT variable and output it via select. To retrieve the selected row count, we have to add an output parameter to the created query via AddScalar and UniqueResult simple returns the integer:
var rowsAffected = session.CreateSQLQuery(#"
Delete from MyTableName;
Select ##ROWCOUNT as NumberOfRows")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.UniqueResult();
To pass input variables you can do this with .SetParameter(<name>,<value>)
var rowsAffected = session.CreateSQLQuery(#"
DELETE from MyTableName where ColumnName = :val;
select ##ROWCOUNT NumberOfRows;")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.SetParameter("val", 1)
.UniqueResult();
I'm not so confortable with MySQL, the example I wrote is for MSSQL, I think in MySQL the ##ROWCOUNT equivalent would be SELECT ROW_COUNT();?

asp.net stored procedure insert multiple pairs of data into the DB

I have a method that accepts int[] (userIDs) and an int (groupID) as parameters. Then it runs the stored procedure and insert the data into the DB
For example:
if userIDs=[1,2,3] and groupID=4,
then I want the following data to be inserted into the DB
userID groupID
1 4
2 4
3 4
I have 2 solutions to this problem. The first one is to write a stored procedure that insert a single record into the DB. In the method(), I will loop through the int[] and call the stored procedures n times
method()
for (int i =0; i< userID.length; i++){
// call stored procedure to insert a single record
}
The second solution is to pass int[] and int as parameters to the stored procedures and do the looping in the stored procedure.
Which way is a better solution? ( if its the 2nd solution is better, can someone provide guidance on handling int[] in stored procedure )
Is there a good reason why you don't want to use an O/R mapper?
Your example looks like server / code-behind and you can use Entity Framework (or other) to insert your new values.
If you can't use them then I would use the for ( the 2nd in your posting) approach. But it's dangerous, because you are not within any transaction.
You can start your Entity Framework investigation here : http://www.asp.net/entity-framework.
If you, for any reasons, are not able to use EF consider using a transcation scope for your sql commands ( see http://msdn.microsoft.com/en-us/library/777e5ebh.aspx for starting reading )
Geeeeeeeeeeeenerally speaking, code is faster for this stuff. You're bettter off iterating through your array, C# side, then calling a stored procedure to handle your specific record.
Now of course specifics will change, and this certainly sounds best handled in one big shot. Convert your int array to a datatable and then you can have some real fun...
If all you're doing is adding rows to a database, I don't see the need for a Stored Procedure (unless that's a requirement coming from a DBA or policy).
Just loop through your items and add the entries to the database using ADO.NET or Entity Framework.

Batching Stored Procedure Commands in EF 4.2

I've got a call to a stored procedure, that is basically an INSERT stored procedure. It inserts into Table A, then into Table B with the identity from Table A.
Now, i need to call this stored procedure N amount of times from my application code.
Is there any way i can batch this? At the moment it's doing N round trips to the DB, i would like it to be one.
The only approach i can think of is to pass a the entire list of items across the wire, via an User Defined Table Type.
But the problem with this approach is that i will need a CURSOR in the sproc to loop through each item in order to do the insert (because of the identity field).
Basically, can we batch DbCommand.ExecuteNonQuery() with EF 4.2?
Or can we do it with something like Dapper?
You can keep it like that and in the stored procedure just do a MERGE between your target table and the table parameter. Because you are always coming with new records, the MERGE will enter only on the INSERT branch.
In this case, using MERGE like this is an easy way of doing batch inserts without a cursor.
Also, another way which also avoids the use of a cursor is to use a INSERT from SELECT statement in the SP.

DataSet TableAdapter Fill Method

I have a DataSet with two TableAdapters (1 to many relationship) that was created using visual studio 2010's Configuration Wizard.
I make a call to an external source and populate a Dictionary with the results. These results should be all of the entries in the database. To synchronize the DB I don't want to just clear all of the tables and then repopulate them like dropping the tables and creating them with new data in sql.
Is there a clean way possibly using the TableAdapter.Fill() method or do I have to loop through the two tables row by row and decide if it stay or gets deleted and then add the new entries? What is the best approach to make the data that is in the dictionary be the only data in my two tables with the DataSet?
First Question: if it's the same DB why do you have 2 tables with the same information?
To the question at hand: that largley depend on the sizes. If the tables are not big then use a transaction, clear the table (DELETE * FROM TABLE or whatever) and write your data in there again.
If the tables are big on the other hand the question is: can you load all this into your dictionary?
Of course you have to ask yourself what happens to inconsistent data (another user/app changed the data while you had it in your dictionary).
If this takes to long you could remember what you did to the data - that means: flag the changed data and remember the deleted keys and new inserted rows and make your updates based on that.
Both can be achieved by remembering the Filled DataTable and use this as backing field or by implementing your own mechanisms.
In any way I would recommend think on the problem: do you really need the dictionary? Why not make queries against the database to get the data? Or only cache a part of the data for quick access?
PS: the update method on you DataAdapter will do all the work (changing the changed, removing the deleted and inserting the new datarows but it will update the DataTable/Set so this will only work once)
It could be that it is quicker to repopulate the entire table than to itterate through and decide what record go / stay. Could you not do the process of deciding if a records is deleteed via an sql statement ? (Delete from table where active = false) if you want them to stay in the database but not in the dataset (select * from table where active = true)
You could have a date field and select all records that have been added since the date you late 'pooled' the database (select * from table where active = true and date-added > #12:30#)

Categories