Entity Framework Batch Update by ID - c#

I'm using an Entity Framework project in which I need to batch update records by ID. The ID (i.e. the primary key of a particular table) is available at runtime, and I would like to update all records like in the following query:
UPDATE EntityTable
SET Column = #p0
WHERE EntityID IN '1,2,3,[...]'
The problem I'm running into is that I have around 60k IDs (at worst) that I will need to handle, and our database software (SQL Server 2008) can't handle this:
The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information.
Through Google searches I've found people using old-school DataTable and SqlDataAdapter calls to accomplish this, but I'd like to stay within the spirit of Entity Framework if possible, or raw sql if necessary. Is there any way to do this in a reasonably efficient manner?

EF doesn't directly support batch updates so you have to use direct SQL. Your best choice is stored procedure with table valued parameter containing your IDs. EF doesn't support table valued parameters and because of that you have to use ADO.NET directly.
Your current solution can be improved only by dividing yours IDs into smaller groups and execute your update for each subset separately.

Related

C# Entity Framework - Update column value in 500,000+ records

We need to process 500,000 records in a database by adding a certain value for a specific column in each record.
Currently, we are running multiple Tasks in parallel using TPL, each taking the records in batches (of size 1000) update the values, and writing them back to the database using a DBContext. This takes around 10 minutes to process.
Are there more efficient ways to process large databases?
EDIT - the value that we update with is generate dynamically, depending on the record information
Are there more efficient ways to process large databases?
Run a SQL statement to change all of the data at once. Don't feel like you have to use entities for every DB update - there's still nothing wrong with running SQL scripts on the back-end database directly. There are methods within EF to run custom SQL, or you could have a separate "support" app that does not use EF but manages the data directly.
If you are unable to use T-SQL directly then change your approach to produce the T-SQL needed to run it directly. If the values must be calculated beforehand and are different for each record this will be a much faster approach than trying to use Entity Framework on such a large dataset.
Architecture and design of your codebase plays a key role here. These types of problems are why we cleanly separate the domain logic from data access logic, so the processing and computation of business rules and values does not interfere with how you need to persist them and visa-versa.
For example, if you had 500,000 business entity classes that you retrieve from a repository class and compute all the values for them, you could simply then enumerate all of them and produce the desired SQL or pass the new values and identities to your data access layer to perform an optimized bulk update.
The reason I did not provide code in this answer is because there are many ways to develop a solution to this problem using my suggested approach.
It is important to still understand that Entity Framework was still designed around the unit of work concept (at least EF6) and is not optimized for bulk workloads (except select query scenarios). A well-designed codebase will definitely have a mix of EF and T-SQL in the data access layer (or database via functions and stored procedures) to handle performance critical operations.

linq2sql C#: How to query from a table with changing schema name

I have a webservice which tries to connect to a database of a desktop accounting application.
It have tables with same name but with different schema names such as:
[DatabaseName].[202001].[CustomerCredit]
[DatabaseName].[202002].[CustomerCredit]
.
.
.
[DatabaseName].[202014].[CustomerCredit]
[DatabaseName].[202015].[CustomerCredit]
[DatabaseName].[202016].[CustomerCredit]
...
..
[DatabaseName].[2020xx].[CustomerCredit]
Schema name is in format [Year+IncrementalNumber] such as [202014], [202015],[202016] and etc.
Whenever I want to query customer credit information in database, I should fetch information from schema with biggest number such as [DatabaseName].[202016].[CustomerCredit] if 202016 is latest schema in my db.
Note:
Creation of new schema in accounting application database have no rules and is completely decided by user of accounting application and every instance of application installed on different place may have different number of schemas.
So when I'm developing my webservice I have no idea to connect to which schema prior to development. In run-time I can find correct schema to query from its tables but I don't know how to manage to fetch table information with correct schema name in query.
I ususally creat a linq-to-sql dbml class and use its definitions to read information from db but I don't know how to manage schema change in this way?
DBML designer manage Scehma names like this:
[global::System.Data.Linq.Mapping.TableAttribute(Name="[202001].CustomerCredit")]
However since my app can retrieve schema name in run time, I don't know how to fix table declaration in my special case.
It is so easy to handle in ADO.NET but I don't know its equivalent in Linq2SQL:
select count(*) from [" + Variables.FinancialYearSchemaName + "].CustomerCredit where SFC_Status = 100;
Ultimately, no: most ORMs do not expect the schema change to vary at runtime, so most - including EF and LINQ-to-SQL do not support this scenario. One possible option would be to have different connection strings, each with different user accounts, that each has a different default schema configured at the database - and intialize your DB-context with a connection-string or connection that matches the required account. Then if EF asks the RDBMS for [CustomerCredit], it will look first in that account's schema ([202014].[CustomerCredit]). You should probably avoid having a [202014].[CustomerCredit] in that scenario, to prevent confusion. This is, however, a pretty hacky and ugly solution. But... it should work.
Alternatively, you would have to take more control over the data access, essentially writing your own SQL (presumably with a token replacement for the schema, which has problems of its own).
That schema is essentially a manual partitioning of the CustomerCredit table. The best solution would one that makes partitioning transparent to all users. The code shouldn't know how the data is partitioned.
Database Solutions
The benefit of database solutions is that they are transparent or almost transparent to users and require minimal maintenance
Table Partitioning
The clean solution would be to use table partitioning, making the different partitions transparent to all users. Table partitioning used to be an Enterprise-only feature but it became available in all editions since SQL Server 2016 SP1, even Express. This means it's free in all versions still in mainstream support.
The table is partitioned based on a function (eg a date based function) and stored in different files. Whenever possible, the query optimizer can check the partition boundaries and the query conditions and use only the file that contains the relevant data. Eg in a date-partitioned table, queries that contain a date filter can search only the relevant partitions.
Partitioned views
Another option, available since 2000 at least, is to use partitionend views, essentially a UNION ALL view that combines all table partitions, eg :
SELECT <select_list1>
FROM [202001].[CustomerCredit]
UNION ALL
SELECT <select_list2>
FROM [202002].[CustomerCredit]
UNION ALL
...
SELECT <select_listn>
FROM Tn;
EF can map entities to views instead of tables. If the criteria for updatable views are met, the partitioned view itself will be updatable and any modifications will be made to the correct table.
The query optimizer can take advantage of CHECK constraints on the tables to search only one table at a time, similar to how partitioned tables work.
Code solutions
This requires raw SQL queries, and a way to identify the correct table/schema each time a change is made. It requires modifications to the application each time the table partitioning changes, whether those are code modifications, or changes in a configuration file.
In all cases, one query can only read from one table at a time
Keep ADO.NET
One possibility is to keep using ADO.NET, replacing the table/schema name in a query template. The code will have to map to objects if needed, the same way it already did.
EF Raw SQL
Another, is to use EF's raw SQL features, eg EF Core's FromSqlRaw to query from a specific table , the same way ADO.NET would. The benefit is that EF will map the query results to objects. In EF Core, the raw query can be combined with LINQ operators :
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit]"
var credits = context.CustomerCredits
.FromSqlRaw(query)
.Where(...)
.ToList();
Dapper
Another option is to use Dapper or another micro-ORM with an ad-hoc query, similar to ADO.NET, and map the results to objects:
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit] where customerID=#ID";
var credits=connection.Query<CustomerCredit>(query,new {ID=someID});

Can Entity Framework generate an UPDATE ... WHERE statement in SQL?

Building a data access layer for an internal enterprise MVC application with Entity Framework and the repository pattern. In some cases, I need to update, say, 100k rows in a table with 100 million or so rows in a SQL Server 2008 R2 database.
I am using EF6, targeting an existing legacy database we are refactoring. I am new to Entity Framework, but have much experience with SQL Server.
No matter how I seem to structure my update statement, when I run a profiler, I see 60k individual updates by ID. This can take up to 5 minutes to run. However, I have say a "batch number" that is indexed for a bunch of these records. Is there any way to UPDATE these records with a single where clause generated from EF? My solution has been to just write a simple sp that I then have EF call.
Entity Framework in its current version does not support bulk operations in the sense that you are wanting it to. Each update or delete will be one individual transaction per record. To avoid this, you can run SQL directly from Entity Framework if you do not want to go through the hassle of creating a stored procedure (though, they are pretty easy to get up and going)
using (var context = new MyDbContext())
{
context.ExecuteStoreCommand("Update MyTable Set Value = {0} Where SomeColumn = {1}",
updateValue, queryValue);
context.ExecuteStoreCommand("Delete From MyTable Where value = {0}",
myValueToQueryBy);
}
Note the use of the parameter by using syntax that looks very similar to a string.Format(). Please see the MSDN for more information.

Using paging (Skip, Take) over multiple DbSets

I have an Entity Framework DbContext with two different DbSets.
In my view I am combining these two sets into the same view model and listing them in the same table.
I want to support table paging to be able to query only a page of records at a time sorted by a particular column. I can't see how to do this without reading all of the records from the database and then paging from memory.
For example, I want to be able to sort by date ascending since both tables have a date column. I could simply take the page size from both tables and then sort in memory but the problem comes into play when I am skipping records. I do not know how many to skip in each table since it depends on how many records are found in the other table.
Is there a way to manipulate Entity Framework to do this?
It is possible.
JOin them in the database (can be done in EF).
Project that (select new {}) into the final object
Order by, skip, take on that projection.
It will be crap performance wise but there is no way around that given you have a broken database model. It basically has to get a tempoary view of all rows for the SQL to find the first ones - that will be slow.
Your best bet is going to be to combine them with a stored procedure or view, and then map that sp/view into Entity Framework. Combing them on the client is going to kill performance - let the server do it for you; it is clearly a server side task.

Forced usage of stored procedures with ADO.NET Entity Framework

I would like to have your advice.
I'm now developing a small WPF client application using C#, bindings, ADO.Net Entity Framework, ODP.net and an Oracle database.
The application is a small one, two XAML screens, about 15 tables. I was developing using entities by filling my entities through the application and using the SaveChanges method.
However our DBA said me that I don't have the right to make direct access to the but only using stored procedures. I asked him why and he said me that it is a security reason because using stored procedures forces to provide the row identifier when deleting a record in one table.
According him the risk is that the application will maybe delete all the rows in one table instead of only one row if the id is provided througe the stored procedure.
I find that is a lot of overkill for only 15 table.
What do you think about that?
Have you suggested to your DBA that you use Linq to SQL? That way you can extract objects, representing individual rows and it would make it far less likely you would accidentally delete multiple rows.
Personally I think EDM might be overkill for the size of DB.
I should say I'm a big proponent of LINQ to SQL and not a big fan of SPs however....
LINQ2SQL on top of ODP.NET is a great stack. And I agree with Andrew, because you would have to write code to load the records, delete all of them, and commit the changes, it's not exactly something that can happen "easily".
Forgetting a where clause in a LINQ statement is no easier or harder then forgetting a where clause in a stored procedure.

Categories