Any way to speed up CreateIfNotExists in Entity Framework?

Any way to speed up CreateIfNotExists in Entity Framework? - c#

I'm using EF 4.3 Code First on SQL Server 2008. I run several test suites that delete and recreate the database with CreateIfNotExists. This works fine but is dog slow. It can take up to 15 seconds to create the database on the first call, and typically 3-6 seconds after that. I have several places where this is called. I've already optimized to call this as few times as I can. Is there something I can do to speed up database creation programmatically? I'm willing to go around EF to do this if that helps, but I would like to keep my database build in code and not go back to a SQL script. Thanks!

This works fine but is dog slow.
Yes. The point is to use the real database only for integration tests which don't have to be executed so often and the whole set of integration tests is usually executed only on build server.
It can take up to 15 seconds to create the database on the first call
This is because of slow initialization of EF when unit testing (you can try to switch to x86). The time is also consumed by view generation. Views can be pre-generated which is usually done to reduce startup and initialization of the real system but in case of speeding up unit tests using view pre-generation will not help too much because you will just move the time from test to build.
I'm willing to go around EF to do this if that helps, but I would like to keep my database build in code and not go back to a SQL
Going around would just mean using plain old SQL script. The additional time needed for this operation is may be spent in generating that SQL. I think the SQL is not cached because normal application execution normally doesn't need it more than once but you can ask EF to give you at lest the most important part of that SQL, cache it somewhere and execute it yourselves every time you need it. EF is able to give you SQL for tables and constraints:
var dbSql = ((IObjectContextAdapter) context).ObjectContext.CreateDatabaseScript();
You just need to have your own small SQL to create database and use them together. Even something like following script should be enough:
CREATE DATABASE YourDatabaseName
USE YourDatabaseName
You must also turn off database generation in code first to make this work and to take control over the process:
Database.SetInitializer<YourContextType>(null);
When executing database creation SQL you will need separate connection string pointing to Master database.

Related

Unit testing a sql string

Short Question
Is there something you can run sql commands that have JOINS and WHEREs that is not a DB
Long Question
I am putting in unit tests for a brown field win forms app.
I have complete freedom of choice on what kinda unit test framework I have
The problem I have is there is masses of SQL string statement in the code.
Think something like this
SELECT *
FROM Sale
INNER JOIN SaleItem ON Sale.ID = SaleItem.SaleID
WHERE ID = 5
It is parameterized, and has IF statements to build up the where, so it might be where CustomerId = 5 or DispatchDate was in last year.
The query is a lot bigger that this, and I kinda want to check that all the joins work and all the possible wheres work. Do think this might be me looking at the detail to much
I dont want to have to manage a database of data, which if the data changes it will break tests, and I'm scared that will root and people will just kill of the tests.
I want to run this sql against some object or a thing, that is NOT a DB and get a item.
It has to be smart enough to actually filter So it the Sale object was like the following table, it would only return the one with the ID 5.
ID DateDispatched CustomerID
1 1/1/1 5
2 2/2/2 6
5 3/3/3 7
I have thought of running sql command on datasets and XML, and relised that wont work.
I guess LINQ has spoiled me over that last few years cus I cant work out how to do this. And im afraid there is so much logic building up massive SQL statement, I have to put some tests on them.
Would be more than willing to hear about other options like moving the SQL to stored procedure in the DB, if you can recommend a good unit testing framework.
Now I don't like SQL being built in the app and would love to change it to entity framework, but its a 10 year old application and that's just not a option.
Okay some quick edits
The database is on SQL Server 2012, so stored procedures are a option, as in some places they use stored procedures.

Let me try to understand your problem.
You have got an winform application and you are writing unit tests for
this. But if you run the the test, you afraid it will going to hit the
DB and spoil the data. So you want some mechanism which allows to run
your unit tests but will not hit the actual database. Correct?
If I got your problem right, I suggest to separate out your db interaction logic and make it interface driven. Then you can create mock objects, wherein you define the expectation of your db interaction interfaces. So for example, if some GetSales() method is called, what should be returned from this method? and so on. Sharing some links on details about unit testing and mocking.
https://msdn.microsoft.com/en-us/library/ff650441.aspx
https://github.com/Moq/moq4
http://www.developerhandbook.com/unit-testing/writing-unit-tests-with-nunit-and-moq/
Testing a MVC Controller fails with NULL reference exception

A SQLite database is designed for exactly this kind of requirement.
The database is a simple file. The database driver reads from and writes to this file. You can run all the SQL queries and so on that you are used to against the database. (However, you might have problems if your SQL uses language or syntax specific to SQL Server.)
It is true that you have to manage a 'database' full of data. But you should be able to:
Write a script which quickly sets up all the tables. You might already have a piece of code which create a database with the same schema, or you might be able to dump one automatically.
Keep some test data around in CSV's, SQL files, or similar. This isn't easy, but it is very very useful. You should add the minimum and only build it up as the testing demands.
Check the whole SQLite database file into source control if you like.
Thanks for the question - I have been thinking a lot about testable database code recently, and I hadn't figured out a solution to this type of problem until your question made me realize I already knew it.
In my opinion, there are three approaches to make this kind of testing as painless as possible in the long-run:
Use an ORM or another wrapper layer, such as Entity Framework as you mentioned. This means that when testing you don't need a 'real database' at all - just a test double of your wrapper.
Only use standard portable SQL such as JOIN, SELECT, etc, with nothing which can't be run on a SQLite database. This can be very restrictive, as types vary so much between DBMS.
Use SPs exclusively as an interface to your database. This means that your test double only has to recognize which SP is being called, and respond correctly to that. I personally don't like this approach as I think lots of untested, unversioned business logic ends up in the SPs.

Here is an option that we use for our unit testing with MS test
Use the TransactionScope from the System.Transactions namespace
in your TestInitialize method Create an instance of the transaction and in the TestCleanup method dispose of it. You can do the insert into the db in the test initialize method or in the individual test methods
[TestCleanup]
public void testClean()
{
_Trans.Dispose();
}
[TestInitialize]
public void testInit()
{
_Trans = new TransactionScope();
}
[TestMethod]
public void TestQuery()
{
// arrange
//' insert data
// act
Obj Target = Obj.New();
// Assert
Assert.AreEqual("someValue",Obj.SomeProperty);
}

Okay I took the old adage of if you cant change your employer, change your employer.
I think tsqlt http://tsqlt.org/ would of been the best fit for this exact problem.
As Entity framework would not be allowed and they had so many dependency tables. Which would let me mock tables move all logic to stored procedures and mock the tables. Kinda mix of Yogi and JWG answer.

Bulk insert using code first migrations for azure

We have a lot of data that needs to be loaded into a number of tables. As far as I can see we have two options:
Include the data as part of the Configuration class seed method
Problems
1.a. this would be slow and involve writing a lot of C# code)
Use bulk insert with code first migrations - a lot quicker and probably a better solution.
Problems
2.a. It may prove tricky working with other data that gets loaded into the same tables as part of the seed.
2.b. It requires SQL Identity Insert to be switched on.
What solution is best and if it is 2 how do I go about bulk insert with code first migrations and how can I address the problems?

Bypassing EF and using ADO.NET/SQL is definitely a good approach for bulk data upload. The best approach depends on whether you want the data to be inserted as part of migration or just logic that runs on app start.
If you want it to be inserted as part of a migration (which may be nice since then you don't have to worry about defensive checks if the data exists etc.) then you can use the Sql(string) method to execute sql that uses whatever format and sql features you want (including switching IDENTITY_INSERT on/off). In EF6.1 there is also an overload that allows you to easily run a .sql file rather than having everything in code as a string.
If you want to do it on app start, then just create an instance of your context and then access Database.Connection to get the raw SqlConnection and use ADO.NET directly to insert the data.

Is it OK to update a production database with EF migrations?

According to this blog post most companies using EF Migrations are supposedly not updating the database schema of production databases with EF migrations. Instead the blog post's author recommends to use Schema update scripts as part of the deployment process.
I've used Schema update scripts for a few years now and while they work, I was planning to use EF migrations instead in the future for the following reasons:
Faster deployment, less downtime
A simpler deployment procedure
Much easier migration of existing data than it would be possible with T-SQL
A more comprehensible syntax of the changes waiting to be applied (DbMigration class with clean C# syntax vs. clunky T-SQL Migration script in a traditional environment).
There is an easy and fast downgrade path to the old db schema if the deployment of the new software version should fail
One reason I can think of that would prohibit the use of EF to migrate a production DB would be if the DB schema was only altered by the DBAs as opposed to the Developers. However, I am both DBA and Developer, so this does not matter in my case.
So, what are the risks of updating a production database using EF?
Edit: I would like to add that, as solomon8718 already suggested, I am always pulling a fresh copy of the production database to my staging server and test the EF Migrations to be applied on the staging server before applying them to a production server. IMO this is essential for any schema update to a production system, whether I'm using EF migrations or not.

Well, I'll try and answer anyhow. I would say No, there's no reason not to use Code First Migrations in production. After all, what's the point of this easy to use system if you can't take it all the way?
The biggest problems I see with it are all problems that you can have with any system, which you've noted already. As long as the whole team (DBA included if applicable) is on board with it, I think allowing EF to manage the schema through migrations is less complex, and hence less error-prone than traditional script-based management. I would still take a backup before performing a migration on a production system, but then you'd do that anyhow.
There's nothing that says a DBA can't perform a migration from Visual Studio, either. The access could still be locked down with privileges at the database level, and he/she could review the migration (in a helpful SQL export format using -Script, if desired) before performing the actual operation. Then they're still in control, but you can use code-first migrations. Hell, they might even end up liking it!
Update: since SPROCs and TVFs were brought up, we handle those in migrations as well, although they are actually done with straight-up SQL statements using a DbMigration.Sql() call in the Up(), and the reverse of them in the Down() (You can also use CreateStoredProcedure and DropStoredProcedure for simple SPROCs, but I think you still have to define the body itself in SQL). I guess you could say that's a caveat; there isn't yet a way for an entire, comprehensive database to be written purely in C#. However, you can use migrations which include SQL scripts to manage the entire schema. One benefit we've found from this process is you can use the C# config file for schema object names (different server names for production vs dev for example) with a simple String.Format, combined with XML Transformation for the config files themselves.

Yes there are good reasons not to use an automated system such as Code First Migrations to make production database changes. But as always there are exceptions to the rules.
One reason which has been mentioned would be access permissions, which would be directly related to your organization's change management rules and security policies.
Another reason would be your level of trust in the Migrations tool itself. Are we sure the tool doesn't have a bug in it? What happens if the tool fails midway through? Are you certain you have up-to-date backups and a process to roll-back if need be?
The change scripts may execute unexpected or inefficient scripts. I've experienced cases where the sql generated copied the data into a temp table, dropped the original table, then recreated the original table for things like adding a new column if you accidentally (or purposefully) change the order in which the column appears, or when you rename the table. If millions of records are involved this could cause serious performance issues.
My recomendation:
Assuming you have a Staging database that mirrors your production schema, use the Migrations tool to generate its change scripts against that system. We usually restore our stage database from a fresh production copy before running. We then examine the change scripts manually to check for issues. After that we run the scripts against our stage database to make sure it executes properly and that all the changes expected took place. Now we are sure that the scripts are both safe to run in production and perform the expected changes. This process would address all three issues I listed above.

One other caveat I found: If you have several websites using the same data context, you need to make sure that all of them are updated at the same time. Otherwise there might be a constant database update / downgrade fight between the websites. Other than that, it worked fine for me.
EDIT: My own perspective one year after starting to use EF Migrations in production:
EF Migrations is actually pretty cool, even for production use, provided that you
Test the migrations on a staging system. I test all migrations by migrating all the way down and up again on my CI server before running integration tests.
Do not trigger migrations automatically, but with a batch file that is launched by an admin. This is essentially the same as running the sql for a migration manually in SSMS.

I use it in production for a couple of projects. Once you get the hang of it I think it's fine.
During development you can keep auto migrations on but at the end you can connect to the live db right from package manager console and generate a migration. It will give you one migration for all the changes.
But always always always use the -script option with update-database and fire the SQL yourself.
I would also advice not using the update db option from web deploy. That way there is no way to tell how much of the migration has already been fired on error. I've ran into trouble with that a few times. So best to get the SQL and fire it manually.

Clear NHibernate database fast

I am using NHibernate for ORM, and everything works fine.
Now I started to write some unit-tests (using the DB, I do not want to put tooo much effort in abstracting this away, I know its not perfect, but it works..).
I need to be sure that the DB is completly empty for some tests. I can, of course, create the whole DB. But that seems to be overkill and I think it takes longer...
Is there a DELETE_ALL command which clears all tables, I can use in NHibernate?
Chris
EDIT: A short update, I decided to go the SQLite way, no problem to change this with NHibernate. There are some pitfalls, I am using this config, and it works. Otherwise you might get "table not found" errors, due to nHibernate closing the connection while in session, resulting in a "lost" database...
For your convinience: Copy and paste...
.Database(SQLiteConfiguration.Standard.ConnectionString("Data Source=:memory:;Version=3;New=True;Pooling=True;Max Pool Size=1;")
.Raw("connection.release_mode", "on_close"))
.Mappings(obj => obj.AutoMappings.Add(_config.APModel));

Drop and recreate the database. You can use schemaexport:
var export = new SchemaExport(config);
export.Drop(false, true);
export.Create(true, true);
Sql lite in memory runs faster for tests than a "normal" database, but the disadvantage is that the sql-lite dialect can be different than the sql dialect in production.

I would recommend you check out ndbunit. It's not a NHibernate-specific, but I've used it for testing NHibernate projects in the past and it works well. Basically, it provides functions to clear the database, prefill it with test data, or restore it to known states after each test. You just have to provide an XSD of the database schema, and optionally some XML data for pre-filling.
I believe I first saw this in the Summer of NHibernate screen-cast series, so check those out to see it in use.

You're not writing unit tests (i.e. tests that test one unit), you're writing integration tests where units interact (i.e. with your database).
Part of your test infrastructure could run a sql script that does one of the following:
Drop db and recreate.
Truncate all tables.
Ideally, you do want to put a bit of work in abstracting the db away, especially since you have NH which makes it much easier than some other frameworks.

Use an in memory database like SQLite, and setup the necessary data in it before each test. The initial setup takes a bit of time, but each test runs very fast afterwards and you can make sure that you start off with a clean slate. Ayende has a few blog posts about how to set it up.

Just in case the Drop/Create DB does not suit your needs (like if the db contains object that NHibernate is not aware of, like SPs, functions etc) you could always make a backup point with the DB empty and after you're done testing just restore to that point

Where to put the SQL logic

I have an existing SQL Server database whose structure I can't really change, although I can add stored procedures or new tables if I want. I have to write a stand-alone program to access the DB, process the data and produce some reports. I've chosen C# and Visual Studio as we're pretty much an MS shop.
I've made a start at exploring using VS 2008 to create said program. I'm trying to decide where to put some of the SQL logic. My primary aims are to keep the development as simple as possible and to perform quickly.
Should I put the SQL logic into a stored procedure and simply call the stored procedure and have SQL Server do the grunt work and hand me the results? Or am I better off keeping the SQL query in my code, creating the corresponding command and executing it against the SQL Server?
I have a feeling the former might perform better, but I've then got to manage the stored procedure separately to the rest of my code base, don't I?
UPDATE: It's been pointed out the performance should be the same if it's the same SQL code in a C# program or a stored procedure. If this is the case, which is the easiest to maintain?
2009-10-02: I had to really think about which answer to select. At the time of writing, there were 8 answers, basically split 5-3 in favour of putting the SQL logic in the application. On the other hand, there were 11 up-votes, split 9-2 in favour of putting the SQL logic in stored procedures (along with a couple of warnings about going this way). So I'm torn. In the end I'm going with the up-votes. However, if I run into trouble I'm going to come back and change my selected answer :)

If it is heavy data manipulation, keep it on the db in stored procedures. If the queries might change some, the better place would be in the db too, otherwise a redeploy might be required for each change.

Keeping the mainstay of the work in stored procedures has the advantage of flexibility - I find it easier to modify a procedure than implement a program change. Unfortunately flexibility is a double-edged sword; it's much easier to make an ill-advised change as well.

I suggest taking a look at LINQ to Entities, which provides an Object Relational Mapping wrapper around any SQL statements (CRUD), abstracting away the logic needed to write to the database, and allowing you to write OO code instead of using SQLConnections and SQLCommands.
OO code (the save method does not exist but you get the gist of it):
// this adds a new car to the Car table in SQL, without using ANY SQL code
Car car = new Car();
Car.BrandName = "Audi";
Car.Save(); //save is called something else and is on the
// datacontext the car is in, but for brevity sake..
SQL code as string in SqlCommand:
// open sql connection in your app and
// create Command that inserts car
SqlConnection conn = new SqlConnection(connstring);
SQlCommand comm = new SqlCommand("INSERT INTO CAR...");
// execute

Versioning and maintaining stored procedures is a nightmare. If you don't hit serious performance issues (that you think will be resolved using stored procedures), I think it will be better to implement logic in your c# code (linq, subsonic or anything like that).

With regard to your point concerning performance variation between embedding your code in .NET source or within SQL Server stored procedures, you should actually see no difference between the two methods!
This is because the same execution plan will be generated by SQL server, provided the data access T-SQL within the two different sources is the same.
You can see this in action by running a SQL Server Profiler trace and comparing the execution plans that are generated by the two different T-SQL query sources.
In light of this and back to the main point of your question then, your choice of implementation should be determined by ease of development and your future extensibility requirements. As you appear to be the sole individual who shall be working on the project then go with what you prefer, which I suspect being to keep the code centralised i.e. within a visual studio Data Access Layer (DAL).
Stored Procedures can come into their own however when you have separate development functions within your organisation/team. For example, you may have database developers on your team who can create your data access code for you and do so independently of the application, freeing you to work on other code modules.

Update deployment: If you need to update the procedure, you can update a stored procedure without your users eve knowing, without taking the server offline. updating the C# means pushing out a new EXE to all your users!

Have a look at Entity Spaces. It's a code generation tool - but it'll do more.
There's a small amount of leg work to do in learning the tool, but once you're up and running you'll never look back. Saves hours of work. (I don't work for them BTW!)

Should I put the SQL logic into a stored procedure
Well that depends on what the “SQL logic” is, doesn't it? If it's purely database-related, a stored procedure might be most appropriate. If it's ‘business logic’, the rules that decide how your application operates, it definitely belongs in your application.
which is the easiest to maintain?
Personally I find application-side code easier as modern languages like C# have much more expressive power than SQL. Doing any significant processing in T-SQL quickly becomes tedious and difficult to read.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.