Background
I need to write some integration tests in C# (about 120 of them) for a C#/SQL Server application. Now, initially before any test, the database will already be there, the reason it will be there is because lot of scripts will be run to set it up (about 20 minutes running time). Now when I run my test, few tables will be updated (CRUD operations). For e.g. in 10-11 tables, few rows will be added and say in 15-16 tables, few rows will be updated and in 4-5 tables few rows will be deleted.
Problem
After every test is run, the database needs to be reset to it's original state. How can I achieve that?
Bad Solution
After every run of a test, re-run the database creation scripts (20 minutes of running time). Since there will be around 120 tests, this comes to 40 hours which is not an option. Secondly there is a process that has several connections open against this database so the database cannot be dropped/re-created.
Good Solution?
I would like to know if there is any other way of solving this problem? Another problem I have is that, for each of those tests, I don't even know what tables will be updated and I will have to manually go and check to see what tables were updated anyway if I were to delete, revert the database to it's original state manually by writing queries.
You should take a look at the possibilities MSSQl gives you with taking a snapshot of your database. It is potentially a lot faster than reverting to a backup or recreating the database.
Managing a test database
In a testing environment, it can be useful when repeatedly running a test protocol for the database to contain identical data at the start of each round of testing. Before running the first round, an application developer or tester can create a database snapshot on the test database. After each test run, the database can be quickly returned to its prior state by reverting the database snapshot.
Related
We started developing an application in C# recently and decided to try Entity Framework 6.1.3 with a code-first approach to handle persistence. The data is being stored in a SQL Server Express 2012 instance which is running on a local server. The application is still very small. Currently we are only persisting two POCOs and have two tables in the database: one with 5 fields (about 10 rows of data), and one with 4 fields (2 rows).
We are experiencing a 3 second delay when Entity Framework processes the first query with subsequent queries being extremely quick. After some searching we found that this is normal for Entity Framework so, although it seemed excessive for such a small database, we've been coping with it.
After doing some work today, none of which was specifically related to persistence, suddenly I found that the first query was only taking a quarter of a second to run. I couldn't see anything obvious in the changes I'd made so I uploaded them to source control and asked my colleague to download everything. When he builds and runs it on his computer, it still takes 3 seconds for the first query. I've compiled the application, tried it on two test computers and they experience the initial 3 second delay.
There seems to be no correlation between the problem and the computers/operating systems. My computer is running Windows 7 SP1 x64. My colleague's development computer is running Windows 10 x64. The other two test computers are running Windows 7 SP1 x86. They are all a similar specification (Core i5, 4GB/8GB RAM).
In my research of the delay I found that there are several things you can do to improve performance (pre-generating views etc) and I have done none of this. I haven't installed anything or made any changes to my system, although I suppose it's possible an update was installed in the background. Nothing has changed on the database server or in the database itself or in the POCOs. We are all connecting to the same database on the same server.
This raises a few obviously related questions. If it's possible for it to start up in a quarter of a second, why has it been taking 3 seconds up until now? What happened on my computer to suddenly improve performance and how can I replicate this on the other computers that are still slow?
Can anyone offer any advice please?
EDIT
I turned on logging in Entity Framework to see what queries were being run and how long they were taking. Before the first and, for testing purposes, only query is run, EF runs 3 queries to do with migration. The query generated to retrieve my data follows that and is as follows:
SELECT
[Extent1].[AccountStatusId] AS [AccountStatusId],
[Extent1].[Name] AS [Name],
[Extent1].[Description] AS [Description],
[Extent1].[SortOrder] AS [SortOrder]
FROM [dbo].[AccountStatus] AS [Extent1]
-- Executing at 28/01/2016 20:55:16 +00:00
-- Completed in 0 ms with result: SqlDataReader
As you can see it runs really quickly which is hardly surprising considering there are only 2 records in that table. The 3 migration queries and my query take no longer than 5ms to run in total on both my computer and my colleague's computer.
Copying that query in to SSMS and running it from various other machines produces the same result. It's that fast it doesn't register a time taken. It certainly doesn't look like the query causes the delay.
EDIT 2: Screenshots of diagnostic tools
In order to give a good comparison I've altered the code so that the query runs at application start. I've added a red arrow to indicate the point at which the form appears. I hadn't noticed before but when my colleague runs the application the first time after starting Visual Studio, it's about a second quicker. All subsequent times are slower.
1) Colleague's computer - first run after loading Visual Studio
2) Colleague's computer - all subsequent runs
3) My computer - all runs
So every time my colleague runs the application (apart from the first time) there is a second's pause in addition to the usual delay. The first run immediately after starting Visual Studio seems to eliminate this second's pause but it's still no where near the speed on my computer.
Incidentally, there is a normal delay of around a quarter of a second caused by the application starting. If I change the application to require a button click for the first query, the second's pause and usual 2 second delay happen only after the button is clicked.
Another thing of note is the amount of memory the application uses. Most of the time on my computer it will use around 40MB but on the other computer it never seems to use more than 35MB. Could there be some kind of memory optimisation going on that is slowing things down for the other computer? Maybe my computer is loading some additional/cached information in to memory that the others are having to generate. If this is possible, any thoughts on where I might look for this?
EDIT 3
I've been holding off making changes to the model and database because I was worried the delay would come back and I'd not have anything to test against. Just wanted to add that after exhausting all other possibilities, I've tried modifying a POCO and the database and it's still quick on my computer but slow on others.
I've altered the title of this question to more accurately reflect the problem I'm trying to solve.
Query plans in SQL Server can change over time. It may be that your machine has cached a good query plan, while your co workers machine has not. In other words, it may have nothing to do with EF. You could potentially confirm or deny this theory by running the same query by hand in management studio.
In order to tackle performance problems related to EF generated queries, I advice you to use Profiler or an alternative (as Express Editions do not have it) to see how much of your time is actually consumed running the query.
If most of your time is used for running the query, as already suggested by jackmott, you can run it in SSMS by toggling Actual Execution Plan to see the generated plan in each situation.
If time is spent on something else (some C# warming up or something similar), Visual Studio 2015 has builtin performance analysis that can be used to see where it is spending that time.
I created .Net C# software using SQLite3.0 database.
In the software, data is inserted almost per minute, every day.
But very rare times (about once in a month), it loses its data like it never was.
My all tables have identity columns.
To check the loss, I writes inserted time of new row in every table.
When the loss occurred, I observed rows of an hour was lost. But the identity values were not skipped and continued finely.
I checked my transactions but they were fine.
There is one connection created and opened when the program started.
And that connection is used whole runtime without close and reopen. At runtime, there are many db actions such as insert, update, delete, select.
Can it be a reason for the loss?
Should I open and close a connection for every db actions?
Since your unique keys aren't being skipped, it is more likely that you are not writing to the database rather than losing data.
I've never had a problem leaving a connection open for long periods though(locally of course). I would scrutinize what's writing to the database.
I have roughly 30M rows to Insert Update in SQL Server per day what are my options?
If I use SqlBulkCopy, does it handle not inserting data that already exists?
In my scenario I need to be able to run this over and over with the same data without duplicating data.
At the moment I have a stored procedure with an update statement and an insert statement which read data from a DataTable.
What should I be looking for to get better performance?
The usual way to do something like this is to maintain a permanent work table (or tables) that have no constraints on them. Often these might live in a separate work database on the same server.
To load the data, you empty the work tables, blast the data in via BCP/bulk copy. Once the data is loaded, you do whatever cleanup and/or transforms are necessary to prep the newly loaded data. Once that's done, as a final step, you migrate the data to the real tables by performing the update/delete/insert operations necessary to implement the delta between the old data and the new, or by simply truncating the real tables and reloading them.
Another option, if you've got something resembling a steady stream of data flowing in, might be to set up a daemon to monitor for the arrival of data and then do the inserts. For instance, if your data is flat files get dropped into a directory via FTP or the like, the daemon can monitor the directory for changes and do the necessary work (as above) when stuff arrives.
One thing to consider, if this is a production system, is that doing massive insert/delete/update statements is likely to cause blocking while the transaction is in-flight. Also, a gigantic transaction failing and rolling back has its own disadvantages:
The rollback can take quite a while to process.
Locks are held for the duration of the rollback, so more opportunity for blocking and other contention in the database.
Worst, after all that happens, you've achieved no forward motion, so to speak: a lot of time and effort and you're right back where you started.
So, depending on your circumstances, you might be better off doing your insert/update/deletes in smaller batches so as to guarantee that you achieve forward progress. 30 million rows over 24 hours works out to be c. 350 per second.
Bulk insert into a holding table then perform either a single Merge statement or an Update and an Insert statement. Either way you want to compare your source table to your holding table to see which action to perform
I have various large data modification operations in a project built on c# and Fluent NHibernate.
The DB is sqlite (on disk rather than in memory as I'm interested in performance)
I wanted to check performance of these so I created some tests to feed in large amounts of data and let the processes do their thing. The results from 2 of these processes have got me pretty confused.
The first is a fairly simple case of taking data supplied in an XML file doing some light processing and importing it. The XML contains around 172,000 rows and the process takes a total of around 60 seconds to run with the actual inserts taking around 40 seconds.
In the next process, I do some processing on the same set of data. So I have a DB with approx 172,000 rows in one table. The process then works through this data, doing some heavier processing and generating a whole bunch of DB updates (inserts and updates to the same table).
In total, this results in around 50,000 rows inserted and 80,000 updated.
In this case, the processing takes around 30 seconds, which is fine, but saving the changes to the DB takes over 30 mins! and it crashes before it finishes with an sqlite 'disk or i/o error'
So the question is: why are the inserts/updates in the second process so much slower? They are working on the same table of the same database with the same connection. In both cases, IStatelessSession is used and ado.batch_size is set to 1000.
In both cases, the code looks that does the update like this:
BulkDataInsert((IStatelessSession session) =>
{
foreach (Transaction t in transToInsert) { session.Insert(t); }
foreach (Transaction t in transToUpdate) { session.Update(t); }
});
(although the first process has no 'transToUpdate' line as it's only inserts - Removing the update line and just doing the inserts still takes almost 10 minutes.)
The transTo* variables are List with the objects to be updated/inserted.
BulkDataInsert creates the session and handles the DB transaction.
I didn't understand your second process. However, here are some things to consider:
Are there any clustered or non-clustered indexes on the table?
How many disk drives do you have?
How many threads are writing to the DB in the second test?
It seems that you are experiencing IO bottlenecks that can be resolved by having more disks, more threads, indexes, etc.
So, assuming a lot of things, here is what I "think" is happening:
In the first test your table probably has no indexes, and since you are just inserting data, it is a sequential insert in a single thread which can be pretty fast - especially if you are writing to one disk.
Now, in the second test, you are reading data and then updating data. Your SQL instance has to find the record that it needs to update. If you do not have any indexes this "find" action is basically a table scan, which will happen for each one of those 80,000 row updates. This will make your application really really slow.
The simplest thing you could probably do is add a clustered index on the table for a unique key, and the best option is to use the columns that you are using in the where clause to "update" those rows.
Hope this helps.
DISCLAIMER: I made quite a few assumptions
The problem was due to my test setup.
As is pretty common with nhibernate based projects, I had been using in-memory sqlite databases for unit testing. These work great but one downside is that if you close the session, it destroys the database.
Consequently, my unit of work implementation contains a 'PreserveSession' property to keep the session alive and just create new transactions when needed.
My new performance tests are using on-disk databases but they still use the common code for setting up test databases and so have PreserveSession set to true.
It seems that having several sessions all left open (even though they're not doing anything) starts to cause problems after a while including the performance drop off and the disk IO error.
I re-ran the second test with PreserveSession set to false and immediately I'm down from over 30 minutes to under 2 minutes. Which is more where I'd expect it to be.
I have an C# application that performance ETL process. For self referencing table, the application will run "ALTER TABLE [tableName] NOCHECK CONSTRAINT [constraintName]" which turns off any FK constraint(s ) check of this table. Once all the data is loaded, the constraint(s) are enable again.
The database time out is set to 3 minutes, however, the above SQL command would fail because of database would timeout in 30 sec.
What could be the cause of this timeout?
Are there database system tables I should check for abnormality?
Other information:
I checked the app, it only has one active thread doing the ETL, so I don't think the application locks any database resource. In addition, the database runs on the same machine as the application.
Event the application closes all its database connections, it would timeout again if it runs ETL process the next time. If I run the sql manually using SQL Manager Studio, it has no problem at all.
Thanks
UPDATE - The application is turning off a number of constraints. It turns out the time out only happens to 1 particular constraint. This constraint is referencing to the Date Dimension table.
UPDATE - It looks like there is some weird abnormality for the testing database that I was working. I tried the same ETL process with other data warehouse and it has no problem so far. Other developers in the team also haven't encounter this issue. This application runs on every midnight. I will keep it running overnight and hopefully I can reproduce the same issue on other databases. So far no luck on figuring out what is going on.
Altering a table requires an exclusive lock for the table. If there is another process reading/writing to the table in question, the schema change can't take place until that process releases its lock.
When you experience a long run time for the table, run sp_who2 in a different connection and see if any connections are blocking your ETL connection. You can then look at the command buffer for that connection to determine what its doing.