Building a data access layer for an internal enterprise MVC application with Entity Framework and the repository pattern. In some cases, I need to update, say, 100k rows in a table with 100 million or so rows in a SQL Server 2008 R2 database.
I am using EF6, targeting an existing legacy database we are refactoring. I am new to Entity Framework, but have much experience with SQL Server.
No matter how I seem to structure my update statement, when I run a profiler, I see 60k individual updates by ID. This can take up to 5 minutes to run. However, I have say a "batch number" that is indexed for a bunch of these records. Is there any way to UPDATE these records with a single where clause generated from EF? My solution has been to just write a simple sp that I then have EF call.
Entity Framework in its current version does not support bulk operations in the sense that you are wanting it to. Each update or delete will be one individual transaction per record. To avoid this, you can run SQL directly from Entity Framework if you do not want to go through the hassle of creating a stored procedure (though, they are pretty easy to get up and going)
using (var context = new MyDbContext())
{
context.ExecuteStoreCommand("Update MyTable Set Value = {0} Where SomeColumn = {1}",
updateValue, queryValue);
context.ExecuteStoreCommand("Delete From MyTable Where value = {0}",
myValueToQueryBy);
}
Note the use of the parameter by using syntax that looks very similar to a string.Format(). Please see the MSDN for more information.
Related
I have a webservice which tries to connect to a database of a desktop accounting application.
It have tables with same name but with different schema names such as:
[DatabaseName].[202001].[CustomerCredit]
[DatabaseName].[202002].[CustomerCredit]
.
.
.
[DatabaseName].[202014].[CustomerCredit]
[DatabaseName].[202015].[CustomerCredit]
[DatabaseName].[202016].[CustomerCredit]
...
..
[DatabaseName].[2020xx].[CustomerCredit]
Schema name is in format [Year+IncrementalNumber] such as [202014], [202015],[202016] and etc.
Whenever I want to query customer credit information in database, I should fetch information from schema with biggest number such as [DatabaseName].[202016].[CustomerCredit] if 202016 is latest schema in my db.
Note:
Creation of new schema in accounting application database have no rules and is completely decided by user of accounting application and every instance of application installed on different place may have different number of schemas.
So when I'm developing my webservice I have no idea to connect to which schema prior to development. In run-time I can find correct schema to query from its tables but I don't know how to manage to fetch table information with correct schema name in query.
I ususally creat a linq-to-sql dbml class and use its definitions to read information from db but I don't know how to manage schema change in this way?
DBML designer manage Scehma names like this:
[global::System.Data.Linq.Mapping.TableAttribute(Name="[202001].CustomerCredit")]
However since my app can retrieve schema name in run time, I don't know how to fix table declaration in my special case.
It is so easy to handle in ADO.NET but I don't know its equivalent in Linq2SQL:
select count(*) from [" + Variables.FinancialYearSchemaName + "].CustomerCredit where SFC_Status = 100;
Ultimately, no: most ORMs do not expect the schema change to vary at runtime, so most - including EF and LINQ-to-SQL do not support this scenario. One possible option would be to have different connection strings, each with different user accounts, that each has a different default schema configured at the database - and intialize your DB-context with a connection-string or connection that matches the required account. Then if EF asks the RDBMS for [CustomerCredit], it will look first in that account's schema ([202014].[CustomerCredit]). You should probably avoid having a [202014].[CustomerCredit] in that scenario, to prevent confusion. This is, however, a pretty hacky and ugly solution. But... it should work.
Alternatively, you would have to take more control over the data access, essentially writing your own SQL (presumably with a token replacement for the schema, which has problems of its own).
That schema is essentially a manual partitioning of the CustomerCredit table. The best solution would one that makes partitioning transparent to all users. The code shouldn't know how the data is partitioned.
Database Solutions
The benefit of database solutions is that they are transparent or almost transparent to users and require minimal maintenance
Table Partitioning
The clean solution would be to use table partitioning, making the different partitions transparent to all users. Table partitioning used to be an Enterprise-only feature but it became available in all editions since SQL Server 2016 SP1, even Express. This means it's free in all versions still in mainstream support.
The table is partitioned based on a function (eg a date based function) and stored in different files. Whenever possible, the query optimizer can check the partition boundaries and the query conditions and use only the file that contains the relevant data. Eg in a date-partitioned table, queries that contain a date filter can search only the relevant partitions.
Partitioned views
Another option, available since 2000 at least, is to use partitionend views, essentially a UNION ALL view that combines all table partitions, eg :
SELECT <select_list1>
FROM [202001].[CustomerCredit]
UNION ALL
SELECT <select_list2>
FROM [202002].[CustomerCredit]
UNION ALL
...
SELECT <select_listn>
FROM Tn;
EF can map entities to views instead of tables. If the criteria for updatable views are met, the partitioned view itself will be updatable and any modifications will be made to the correct table.
The query optimizer can take advantage of CHECK constraints on the tables to search only one table at a time, similar to how partitioned tables work.
Code solutions
This requires raw SQL queries, and a way to identify the correct table/schema each time a change is made. It requires modifications to the application each time the table partitioning changes, whether those are code modifications, or changes in a configuration file.
In all cases, one query can only read from one table at a time
Keep ADO.NET
One possibility is to keep using ADO.NET, replacing the table/schema name in a query template. The code will have to map to objects if needed, the same way it already did.
EF Raw SQL
Another, is to use EF's raw SQL features, eg EF Core's FromSqlRaw to query from a specific table , the same way ADO.NET would. The benefit is that EF will map the query results to objects. In EF Core, the raw query can be combined with LINQ operators :
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit]"
var credits = context.CustomerCredits
.FromSqlRaw(query)
.Where(...)
.ToList();
Dapper
Another option is to use Dapper or another micro-ORM with an ad-hoc query, similar to ADO.NET, and map the results to objects:
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit] where customerID=#ID";
var credits=connection.Query<CustomerCredit>(query,new {ID=someID});
I have a big database which is created by entity framework core. This database stores round about 5 million datasets. To improve the query speed i'd like to aggregate the data of the days before.
In this case i would like to execute a SQL command once every day at 00:00 o'clock and aggregate the data of yesterday.
In the past i created stored-procs which are executed by a database-job in mssql. But these databases were created manually and now i'd like to get a similar functionallity by using the entity framework.
I read that there shouldn't be any logic in the database. So how could i do this instead? (The article where i get the base information is: Can you create sql views / stored procedure using Entity Framework 4.1 Code first approach)
So i'm searching a good solution to execute every day a "aggregation" function and store the aggregation data in the database.
You use the method you used before! It's ideally solved by SQL Agent and a proc, almost anything else will have more issues and worse performance.
If you really wanted to do it differently then you need two parts
a scheduler, this will most likely be the OS one, but has no where near
as many features as SQL Agent.
the actual program, a .NET app using EF will do this but EF is
not required, simple ADO will work, as will any other library.
The only reason you'd choose this route, is if you had further requirements that SQL would be inappropriate for, so you needed a more general language.
I have my first project with Entity Framework and SQL Server Compact.
The database has about 15 tables which all have foreign keys to other tables.
I have to read thousands of XML files and import their data to the database. The database structure mirrors the XML file structure. There is a table hierarchy with up to 5 levels. So for each record in the "top" table I have to insert one or more in the underlying tables.
I am using Entity Framework for inserting and it works fine, but the performance is very very poor :(.
I think the main problem is that for most records the ID has to be read back to be used for records in underlying tables.
The other thing is that - if I know right - that Entity Framework inserts each record with a separate command.
Is there a way to increase the performance dramatically?
Thank you
Use SQL Compact Bulk Insert Library library to insert the data in bulks.
If you need to update any records, then use this technique:
Create a staging table in your database
Use the library to bulk insert into the staging table
Then execute a stored procedure to do an update by reading from the staging table and updating the target table's records.
First, make sure to use AddRange or alternative solutions to ensure not getting poor performance result due to DetectChanges method.
See: http://entityframework.net/improve-ef-add-performance
Disclaimer: I'm the owner of Entity Framework Extensions
This library support all major provider including SQL Server Compact
By default, BulkInsert automatically get the IDs of the inserted rows.
This library allows you to perform all bulk operations you need for your scenarios:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});
I would like to have your advice.
I'm now developing a small WPF client application using C#, bindings, ADO.Net Entity Framework, ODP.net and an Oracle database.
The application is a small one, two XAML screens, about 15 tables. I was developing using entities by filling my entities through the application and using the SaveChanges method.
However our DBA said me that I don't have the right to make direct access to the but only using stored procedures. I asked him why and he said me that it is a security reason because using stored procedures forces to provide the row identifier when deleting a record in one table.
According him the risk is that the application will maybe delete all the rows in one table instead of only one row if the id is provided througe the stored procedure.
I find that is a lot of overkill for only 15 table.
What do you think about that?
Have you suggested to your DBA that you use Linq to SQL? That way you can extract objects, representing individual rows and it would make it far less likely you would accidentally delete multiple rows.
Personally I think EDM might be overkill for the size of DB.
I should say I'm a big proponent of LINQ to SQL and not a big fan of SPs however....
LINQ2SQL on top of ODP.NET is a great stack. And I agree with Andrew, because you would have to write code to load the records, delete all of them, and commit the changes, it's not exactly something that can happen "easily".
Forgetting a where clause in a LINQ statement is no easier or harder then forgetting a where clause in a stored procedure.
I'm using an Entity Framework project in which I need to batch update records by ID. The ID (i.e. the primary key of a particular table) is available at runtime, and I would like to update all records like in the following query:
UPDATE EntityTable
SET Column = #p0
WHERE EntityID IN '1,2,3,[...]'
The problem I'm running into is that I have around 60k IDs (at worst) that I will need to handle, and our database software (SQL Server 2008) can't handle this:
The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information.
Through Google searches I've found people using old-school DataTable and SqlDataAdapter calls to accomplish this, but I'd like to stay within the spirit of Entity Framework if possible, or raw sql if necessary. Is there any way to do this in a reasonably efficient manner?
EF doesn't directly support batch updates so you have to use direct SQL. Your best choice is stored procedure with table valued parameter containing your IDs. EF doesn't support table valued parameters and because of that you have to use ADO.NET directly.
Your current solution can be improved only by dividing yours IDs into smaller groups and execute your update for each subset separately.