I created .Net C# software using SQLite3.0 database.
In the software, data is inserted almost per minute, every day.
But very rare times (about once in a month), it loses its data like it never was.
My all tables have identity columns.
To check the loss, I writes inserted time of new row in every table.
When the loss occurred, I observed rows of an hour was lost. But the identity values were not skipped and continued finely.
I checked my transactions but they were fine.
There is one connection created and opened when the program started.
And that connection is used whole runtime without close and reopen. At runtime, there are many db actions such as insert, update, delete, select.
Can it be a reason for the loss?
Should I open and close a connection for every db actions?
Since your unique keys aren't being skipped, it is more likely that you are not writing to the database rather than losing data.
I've never had a problem leaving a connection open for long periods though(locally of course). I would scrutinize what's writing to the database.
Related
Scenario
I have a database where real-time data is inserted. Ex. Price of Symbols.
The data is inserted every 5 sec.(around 10000 rows each time) though some 3rd party tool
Things I have already tried:
SQL Dependency which detects changes in the table and alerts me the changes are made.
On change I fire fetching query
Problem arises is that the connection pool gets full and throws a exception
Thread sleep will be a problem since if rows insertion takes less time , still I will be waiting for fixed interval
What I want:
My Web application to fetch the data once insertion gets stable i.e no more rows are inserted in a
time-span of 1-2 sec.
Environment
C#-MCV, SQL Server
Background
I need to write some integration tests in C# (about 120 of them) for a C#/SQL Server application. Now, initially before any test, the database will already be there, the reason it will be there is because lot of scripts will be run to set it up (about 20 minutes running time). Now when I run my test, few tables will be updated (CRUD operations). For e.g. in 10-11 tables, few rows will be added and say in 15-16 tables, few rows will be updated and in 4-5 tables few rows will be deleted.
Problem
After every test is run, the database needs to be reset to it's original state. How can I achieve that?
Bad Solution
After every run of a test, re-run the database creation scripts (20 minutes of running time). Since there will be around 120 tests, this comes to 40 hours which is not an option. Secondly there is a process that has several connections open against this database so the database cannot be dropped/re-created.
Good Solution?
I would like to know if there is any other way of solving this problem? Another problem I have is that, for each of those tests, I don't even know what tables will be updated and I will have to manually go and check to see what tables were updated anyway if I were to delete, revert the database to it's original state manually by writing queries.
You should take a look at the possibilities MSSQl gives you with taking a snapshot of your database. It is potentially a lot faster than reverting to a backup or recreating the database.
Managing a test database
In a testing environment, it can be useful when repeatedly running a test protocol for the database to contain identical data at the start of each round of testing. Before running the first round, an application developer or tester can create a database snapshot on the test database. After each test run, the database can be quickly returned to its prior state by reverting the database snapshot.
I have roughly 30M rows to Insert Update in SQL Server per day what are my options?
If I use SqlBulkCopy, does it handle not inserting data that already exists?
In my scenario I need to be able to run this over and over with the same data without duplicating data.
At the moment I have a stored procedure with an update statement and an insert statement which read data from a DataTable.
What should I be looking for to get better performance?
The usual way to do something like this is to maintain a permanent work table (or tables) that have no constraints on them. Often these might live in a separate work database on the same server.
To load the data, you empty the work tables, blast the data in via BCP/bulk copy. Once the data is loaded, you do whatever cleanup and/or transforms are necessary to prep the newly loaded data. Once that's done, as a final step, you migrate the data to the real tables by performing the update/delete/insert operations necessary to implement the delta between the old data and the new, or by simply truncating the real tables and reloading them.
Another option, if you've got something resembling a steady stream of data flowing in, might be to set up a daemon to monitor for the arrival of data and then do the inserts. For instance, if your data is flat files get dropped into a directory via FTP or the like, the daemon can monitor the directory for changes and do the necessary work (as above) when stuff arrives.
One thing to consider, if this is a production system, is that doing massive insert/delete/update statements is likely to cause blocking while the transaction is in-flight. Also, a gigantic transaction failing and rolling back has its own disadvantages:
The rollback can take quite a while to process.
Locks are held for the duration of the rollback, so more opportunity for blocking and other contention in the database.
Worst, after all that happens, you've achieved no forward motion, so to speak: a lot of time and effort and you're right back where you started.
So, depending on your circumstances, you might be better off doing your insert/update/deletes in smaller batches so as to guarantee that you achieve forward progress. 30 million rows over 24 hours works out to be c. 350 per second.
Bulk insert into a holding table then perform either a single Merge statement or an Update and an Insert statement. Either way you want to compare your source table to your holding table to see which action to perform
I have various large data modification operations in a project built on c# and Fluent NHibernate.
The DB is sqlite (on disk rather than in memory as I'm interested in performance)
I wanted to check performance of these so I created some tests to feed in large amounts of data and let the processes do their thing. The results from 2 of these processes have got me pretty confused.
The first is a fairly simple case of taking data supplied in an XML file doing some light processing and importing it. The XML contains around 172,000 rows and the process takes a total of around 60 seconds to run with the actual inserts taking around 40 seconds.
In the next process, I do some processing on the same set of data. So I have a DB with approx 172,000 rows in one table. The process then works through this data, doing some heavier processing and generating a whole bunch of DB updates (inserts and updates to the same table).
In total, this results in around 50,000 rows inserted and 80,000 updated.
In this case, the processing takes around 30 seconds, which is fine, but saving the changes to the DB takes over 30 mins! and it crashes before it finishes with an sqlite 'disk or i/o error'
So the question is: why are the inserts/updates in the second process so much slower? They are working on the same table of the same database with the same connection. In both cases, IStatelessSession is used and ado.batch_size is set to 1000.
In both cases, the code looks that does the update like this:
BulkDataInsert((IStatelessSession session) =>
{
foreach (Transaction t in transToInsert) { session.Insert(t); }
foreach (Transaction t in transToUpdate) { session.Update(t); }
});
(although the first process has no 'transToUpdate' line as it's only inserts - Removing the update line and just doing the inserts still takes almost 10 minutes.)
The transTo* variables are List with the objects to be updated/inserted.
BulkDataInsert creates the session and handles the DB transaction.
I didn't understand your second process. However, here are some things to consider:
Are there any clustered or non-clustered indexes on the table?
How many disk drives do you have?
How many threads are writing to the DB in the second test?
It seems that you are experiencing IO bottlenecks that can be resolved by having more disks, more threads, indexes, etc.
So, assuming a lot of things, here is what I "think" is happening:
In the first test your table probably has no indexes, and since you are just inserting data, it is a sequential insert in a single thread which can be pretty fast - especially if you are writing to one disk.
Now, in the second test, you are reading data and then updating data. Your SQL instance has to find the record that it needs to update. If you do not have any indexes this "find" action is basically a table scan, which will happen for each one of those 80,000 row updates. This will make your application really really slow.
The simplest thing you could probably do is add a clustered index on the table for a unique key, and the best option is to use the columns that you are using in the where clause to "update" those rows.
Hope this helps.
DISCLAIMER: I made quite a few assumptions
The problem was due to my test setup.
As is pretty common with nhibernate based projects, I had been using in-memory sqlite databases for unit testing. These work great but one downside is that if you close the session, it destroys the database.
Consequently, my unit of work implementation contains a 'PreserveSession' property to keep the session alive and just create new transactions when needed.
My new performance tests are using on-disk databases but they still use the common code for setting up test databases and so have PreserveSession set to true.
It seems that having several sessions all left open (even though they're not doing anything) starts to cause problems after a while including the performance drop off and the disk IO error.
I re-ran the second test with PreserveSession set to false and immediately I'm down from over 30 minutes to under 2 minutes. Which is more where I'd expect it to be.
I have an C# application that performance ETL process. For self referencing table, the application will run "ALTER TABLE [tableName] NOCHECK CONSTRAINT [constraintName]" which turns off any FK constraint(s ) check of this table. Once all the data is loaded, the constraint(s) are enable again.
The database time out is set to 3 minutes, however, the above SQL command would fail because of database would timeout in 30 sec.
What could be the cause of this timeout?
Are there database system tables I should check for abnormality?
Other information:
I checked the app, it only has one active thread doing the ETL, so I don't think the application locks any database resource. In addition, the database runs on the same machine as the application.
Event the application closes all its database connections, it would timeout again if it runs ETL process the next time. If I run the sql manually using SQL Manager Studio, it has no problem at all.
Thanks
UPDATE - The application is turning off a number of constraints. It turns out the time out only happens to 1 particular constraint. This constraint is referencing to the Date Dimension table.
UPDATE - It looks like there is some weird abnormality for the testing database that I was working. I tried the same ETL process with other data warehouse and it has no problem so far. Other developers in the team also haven't encounter this issue. This application runs on every midnight. I will keep it running overnight and hopefully I can reproduce the same issue on other databases. So far no luck on figuring out what is going on.
Altering a table requires an exclusive lock for the table. If there is another process reading/writing to the table in question, the schema change can't take place until that process releases its lock.
When you experience a long run time for the table, run sp_who2 in a different connection and see if any connections are blocking your ETL connection. You can then look at the command buffer for that connection to determine what its doing.