Seeding large lookup table data with Entity Framework Code First migrations

Seeding large lookup table data with Entity Framework Code First migrations - c#

I'm about to start a new project and I'd like to use Entity Framework Code First migrations; i.e., write the database in code and have it all auto-generated for me and the schema updated etc.
However, my stumbling block is that I have one lookup table which I need to import and has over 2 million records (it's a post code lookup table).
My question is, how do you deal with such large pre-populated lookup tables within Entity Framework Code First migrations?

Your migration doesn't actually have to drop/recreate the whole table (and won't unless you specify that it should). Normally, the migrations just do the Up/Down methods to alter the table with additional columns, etc.
Do you actually need to drop the table? If so, do you really need to seed it from EF? The EF cost for doing 2 million inserts will be astounding, so if you could do it as a manual step with something more efficient (that will use bulk inserts), it would be very much preferred.
If I had to do that many inserts, I would probably break it out into SQL files and do something like what's mentioned here: EF 5 Code First Migration Bulk SQL Data Seeding

Related

Migrate data from one table to another in EF core code first approach

I am using a code-first approach in EF core, and I am in a situation where want to move a column from one table to another.
My approach is to insert data using migration builder sql
migrationBuilder.Sql("Insert query to new table"); and drop the column in the first table
migrationBuilder.DropColumn(name: "FirstName", table: "Customer"); .
Is there any better approach to migrate data from one table to another and drop the column from the first table?

I've used the same approach as what you're suggesting before and it has worked well for us. One of its benefits is that queries executed with migrationBuilder.Sql will be wrapped in the same transaction as the migration, so if the query fails or anything goes wrong then all migration changes are rolled back and you don't end up with a corrupted database.

Incremental ETL on code first many-to-many association table

I'm setting up a data warehouse (in SQL Server) together with our engineers we got almost everything up and running. Our main application also uses SQL Server as backend, and aims to be code first while using the entity framework. In most tables we added a column like updatedAt to allow for incremental loading to our data warehouse, but there is a many-to-many association table created by the entity framework which we cannot modify. The table consists of two GUID columns with a composite key, so they are not iterable like an incrementing integer or dates. We are now basically figuring out the options on how to enable incremental load on this table, but there is little information to be found.
After searching for a while I mostly came across posts which explained how it's not possible to manually add columns (such as updatedAt) to the association table, such as here Create code first, many to many, with additional fields in association table. Suggestions are to split out the table into two one-to-many tables. We would like to prevent this if possible.
Another potential option would be to turn on change data capture on the server, but that would potentially defeat the purpose of code first in the application.
Another thought was to add a column in the database itself, not in code, with a default value of the current datetime. But that might also be impossible / non compatible with the entity framework, as well as defeating the code first principle.
Are we missing anything? Are there other solutions for this? The ideal solution would be a code first solution, or a solution in the ETL process without affecting the base application, without changing too much. Any suggestions are appreciated.

How to increase database insert performance with Entity Framework

I have my first project with Entity Framework and SQL Server Compact.
The database has about 15 tables which all have foreign keys to other tables.
I have to read thousands of XML files and import their data to the database. The database structure mirrors the XML file structure. There is a table hierarchy with up to 5 levels. So for each record in the "top" table I have to insert one or more in the underlying tables.
I am using Entity Framework for inserting and it works fine, but the performance is very very poor :(.
I think the main problem is that for most records the ID has to be read back to be used for records in underlying tables.
The other thing is that - if I know right - that Entity Framework inserts each record with a separate command.
Is there a way to increase the performance dramatically?
Thank you

Use SQL Compact Bulk Insert Library library to insert the data in bulks.
If you need to update any records, then use this technique:
Create a staging table in your database
Use the library to bulk insert into the staging table
Then execute a stored procedure to do an update by reading from the staging table and updating the target table's records.

First, make sure to use AddRange or alternative solutions to ensure not getting poor performance result due to DetectChanges method.
See: http://entityframework.net/improve-ef-add-performance
Disclaimer: I'm the owner of Entity Framework Extensions
This library support all major provider including SQL Server Compact
By default, BulkInsert automatically get the IDs of the inserted rows.
This library allows you to perform all bulk operations you need for your scenarios:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});

EF Core - Clear __EFMigrationsHistories table

How to clear this table: __EFMigrationsHistories
I don't want to delete my migration or something, I explicitly want to clear this table by code.
Edit:
I would try to explain a little bit why i want to do this! I want to call on every startup the same (and the only one) migration.
And this migration loops trough all my models and call's the onUpdateMethod, so every model can handle his update by itself.

If you want to Clear Data within SQL here is the Query:
DELETE FROM [TableName]
If you want to clear Data Within your application run this query using Entity Framework like below:
context.Database.ExecuteSqlCommand("TRUNCATE TABLE [TableName]");
The TRUNCATE TABLE statement is a fast, efficient method of deleting all rows in a table. TRUNCATE TABLE is similar to the DELETE statement without a WHERE clause. However, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.

entity framework with existing, non related database

I have an odd situation. I am working on a project with a very large existing database that is completely unrelated, but does contain corresponding table id's. It's as if someone copied the database but never related the tables.
In Entity Framework, is there a way to go EF code first and create the relationships in code, but Not apply those relationships in the database? I would like to go through and relate the database but the client doesn't want to pay to fix it.
Thanks!

In this instance, it seems you would be best to add relationships directly to your database (or to a duplicated database for testing/staging) and then just update your entities using your test connection and regression test your app.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.