How to increase database insert performance with Entity Framework - c#

I have my first project with Entity Framework and SQL Server Compact.
The database has about 15 tables which all have foreign keys to other tables.
I have to read thousands of XML files and import their data to the database. The database structure mirrors the XML file structure. There is a table hierarchy with up to 5 levels. So for each record in the "top" table I have to insert one or more in the underlying tables.
I am using Entity Framework for inserting and it works fine, but the performance is very very poor :(.
I think the main problem is that for most records the ID has to be read back to be used for records in underlying tables.
The other thing is that - if I know right - that Entity Framework inserts each record with a separate command.
Is there a way to increase the performance dramatically?
Thank you

Use SQL Compact Bulk Insert Library library to insert the data in bulks.
If you need to update any records, then use this technique:
Create a staging table in your database
Use the library to bulk insert into the staging table
Then execute a stored procedure to do an update by reading from the staging table and updating the target table's records.

First, make sure to use AddRange or alternative solutions to ensure not getting poor performance result due to DetectChanges method.
See: http://entityframework.net/improve-ef-add-performance
Disclaimer: I'm the owner of Entity Framework Extensions
This library support all major provider including SQL Server Compact
By default, BulkInsert automatically get the IDs of the inserted rows.
This library allows you to perform all bulk operations you need for your scenarios:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});

Related

Can Entity Framework generate an UPDATE ... WHERE statement in SQL?

Building a data access layer for an internal enterprise MVC application with Entity Framework and the repository pattern. In some cases, I need to update, say, 100k rows in a table with 100 million or so rows in a SQL Server 2008 R2 database.
I am using EF6, targeting an existing legacy database we are refactoring. I am new to Entity Framework, but have much experience with SQL Server.
No matter how I seem to structure my update statement, when I run a profiler, I see 60k individual updates by ID. This can take up to 5 minutes to run. However, I have say a "batch number" that is indexed for a bunch of these records. Is there any way to UPDATE these records with a single where clause generated from EF? My solution has been to just write a simple sp that I then have EF call.
Entity Framework in its current version does not support bulk operations in the sense that you are wanting it to. Each update or delete will be one individual transaction per record. To avoid this, you can run SQL directly from Entity Framework if you do not want to go through the hassle of creating a stored procedure (though, they are pretty easy to get up and going)
using (var context = new MyDbContext())
{
context.ExecuteStoreCommand("Update MyTable Set Value = {0} Where SomeColumn = {1}",
updateValue, queryValue);
context.ExecuteStoreCommand("Delete From MyTable Where value = {0}",
myValueToQueryBy);
}
Note the use of the parameter by using syntax that looks very similar to a string.Format(). Please see the MSDN for more information.

SQLBUlkCopy to call StoredProcedure to insert or update data in SQLDb

I am having a list of records which I need to insert or update in a SQL DB based on whether the record is present or not present in the database.
The current flow is I process each record 1 by 1 and then call a Stored Procedure from my C# code which does the task of inserting or updating the database.
The above process is very inefficient, Can i use SQL Bulk Copy to insert these in once into the SQLDb .
Will above increase the performance .
Regards
Ankur
SqlBulkCopy can only insert. If you need to upsert, you might want to SqlBulkCopy into a staging table (a separate table off to one side that isn't part of the main model), and then do the merge in TSQL. You might also want to think about concurrency (how many people can be using the staging table at once, etc).

Seeding large lookup table data with Entity Framework Code First migrations

I'm about to start a new project and I'd like to use Entity Framework Code First migrations; i.e., write the database in code and have it all auto-generated for me and the schema updated etc.
However, my stumbling block is that I have one lookup table which I need to import and has over 2 million records (it's a post code lookup table).
My question is, how do you deal with such large pre-populated lookup tables within Entity Framework Code First migrations?
Your migration doesn't actually have to drop/recreate the whole table (and won't unless you specify that it should). Normally, the migrations just do the Up/Down methods to alter the table with additional columns, etc.
Do you actually need to drop the table? If so, do you really need to seed it from EF? The EF cost for doing 2 million inserts will be astounding, so if you could do it as a manual step with something more efficient (that will use bulk inserts), it would be very much preferred.
If I had to do that many inserts, I would probably break it out into SQL files and do something like what's mentioned here: EF 5 Code First Migration Bulk SQL Data Seeding

can I use Entity Framework when the DBMS-type is not known at build time?

I'm developing an application (CRUD WS) that connects to single database that can be Oracle or SQL server and the type is read from configuration file. Can I use EF to implement DAL? Some requirements:
the tables do not have primary keys and i can't add them.
C# / .NET 4.0
all operations are done and committed always directly to database
all operations are simple CRUD and nothing is cached for reads
transaction support (one request can deal with multiple tables) -> commit / rollback
the model has to be built for only 10 tables while db has hundreds of tables
if db is oracle the tables are divided to table spaces and not all of these 10 tables are in same table space
If I can do this a link or an example would be nice.
I asked this question in other way but I got instantly 2 close-votes so here's my 2nd try.

Entity Framework Batch Update by ID

I'm using an Entity Framework project in which I need to batch update records by ID. The ID (i.e. the primary key of a particular table) is available at runtime, and I would like to update all records like in the following query:
UPDATE EntityTable
SET Column = #p0
WHERE EntityID IN '1,2,3,[...]'
The problem I'm running into is that I have around 60k IDs (at worst) that I will need to handle, and our database software (SQL Server 2008) can't handle this:
The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information.
Through Google searches I've found people using old-school DataTable and SqlDataAdapter calls to accomplish this, but I'd like to stay within the spirit of Entity Framework if possible, or raw sql if necessary. Is there any way to do this in a reasonably efficient manner?
EF doesn't directly support batch updates so you have to use direct SQL. Your best choice is stored procedure with table valued parameter containing your IDs. EF doesn't support table valued parameters and because of that you have to use ADO.NET directly.
Your current solution can be improved only by dividing yours IDs into smaller groups and execute your update for each subset separately.

Categories