Good Morning.
I have an a service I am writing in C# that performs and insert of a single row every 15 seconds or so. This interval is user-definable but at the moment 15 seconds is the targeted interval.
I have read this article: SqlBulkCopy on a single record? which does explain clearly the throughput benefits of using a regular insert over performing a single row bulk insert. But it also mentions (or implies) that the regular insert is a bit a heavier on CPU than the bulk insert.
As the service I Am writing is a support service, and as the insert is artificially restricted to only 1 in a given interval I am wondering if whether the implication above is justification for using SqlBulkCopy over a regular insert. Obviously I want my service to be as lightweight as possible.
The answer in your article says the SqlBulkCopy is 4.4x slower and take double the CPU of a simple SQL Command. You misread the answer (or I did), it states SqlBulkCopy is heavier on CPU than a regular insert which makes more sense.
The answer is simple: Stick with SQL Command if you are 100% sure only one record need to be inserted.
Even my library C# Bulk Operations do not use SqlBulkCopy until it reaches a specific amount of row which is normally ten because SqlBulkCopy is too heavy.
Related
I am working to increase the performance of bulk loads; 100's of millions of records + daily.
I moved this over to use the IDatareader interface in lieu of the data tables and did get a noticeable performance boost (500,000 more records a minute). The current setup is:
A custom cached reader to parse the delimited files.
Wrapping the stream reader in a buffered stream.
A custom object reader class that enumerates over the objects and implements the IDatareader interface.
Then SqlBulkCopy writes to server
The bulk of the performance bottle neck is directly in SqlBulkCopy.WriteToServer. If I unit test the process up to but excluding just the WriteToServer the process returns in roughly 1 minute. WriteToServer is taking on an additional 15 minutes +. For the unit test it is on my local machine so the same drive the database lives on so it's not having to copy the data across the network.
I am using a heap table (no indexes; clustered or unclustered; I have played around various batch sizes without major differences in performance).
There is a need to decrease the load times so I am hoping someone might now a way to squeeze a little more blood out of this turn-up.
Why not use SSIS directly?
Anyway, if you did a treaming from parsing to IDataReader you're already on the right path. To optimize SqlBulkCopy itself you need to turn your focus to SQL Server. The key is minimally logged operations. You must read these MSDN articles:
Prerequisites for Minimal Logging in Bulk Import.
Optimizing Bulk Import Performance.
If your target is a B-Tree (ie a clustered indexed table) unfortunately one of the most important tenets of performant bulk insert, namely the sorted-input rowset, cannot be declared. Sis simple as this, ADO.Net SqlClient does not have the equivalent of SSPROP_FASTLOADOPTIONS -> ORDER(Column) (OleDb). Since the engine does not know that the data is already sorted it will add a Sort operator in the plan which is not that bad except when it spills. To avoid spills, use a small batch size (~10k). See my original point: all these are just options and clicks to set in SSIS rather than digging through OleDB MSDN spec...
If your data stream is unsorted to start with or the destination is a heap then my point above is mute.
However, achieving minimally logging is still a must for decent performance.
I've to transfer the data from datatable to database. The total record is around 15k per time.
I want to redue the time for data inserting. Should I use SqlBulk? or anything else ?
If you use C# take a look on SqlBulkCopy class. There is a good article on CodeProject about how to use it with DataTable
If you just aim for speed, then bulk copy might suite you well. Depending on the requirements you can also reduce logging level.
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/edf280b9-2b27-48bb-a2ff-cab4b563dcb8/
How to copy a huge table data into another table in SQL Server
SQL bulk copy is definitively fastest with this kind of sample size as it streams the data rather than actually using regular sql commands. This gives it a slightly longer startup time but with more than ~120 rows at a time (according to my tests)
Check out my post here on how to get here if you want some stats on how sql bulk copy stacks up against a few other methods.
http://blog.staticvoid.co.nz/2012/8/17/mssql_and_large_insert_statements
15k records is not that much really. I have a laptop with i5 CPU and 4GB ram and it inserts 15k records in several seconds. I wouldn't really bother spending too much time on finding an ideal solution.
I am trying to use sqlite in my application as a sort of cache. I say sort of because items never expire from my cache and I am not storing anything. I simply need to use the cache to store all ids I processed before. I don't want to process anything twice.
I am entering items into the cache at 10,000 messages/sec for a total of 150 million messages. My table is pretty simple. It only has one text column which stores the id's. I was doing this all in memory using a dictionary, however, I am processing millions of messages and, although it is fast that way, I ran out of memory after some time.
I have researched sqlite and performance and I understand that configuration is key, however, I am still getting horrible performance on inserts (I haven't tried selects yet). I am not able to keep up with even 5000 inserts/sec. Maybe this is as good as it gets.
My connection string is as below:
Data Source=filename;Version=3;Count Changes=off;Journal Mode=off;
Pooling=true;Cache Size=10000;Page Size=4096;Synchronous=off
Thanks for any help you can provide!
If you are doing lots of inserts or updates at once, put them in a transaction.
Also, if you are executing essentially the same SQL each time, use a parameterized statement.
Have you looked at the SQLite Optimization FAQ (bit old).
SQLite performance tuning and optimization on embedded systems
If you have many threads writing to the same database, then you're going to run into concurrency problems with that many transactions per second. SQLite always locks the whole database for writes so only one write transaction can be processed at a time.
An alternative is Oracle Berkley DB with SQLite. This latest version of Berkley DB includes a SQLite front end that has a page-level locking mechanism instead of database level. This provides much higher numbers of transactions per second when there is a high concurrency requirement.
http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
It includes the same SQLite.NET provider and is supposed to be a drop-in replacement.
Since you're requirements are so specific you may be better off with something more dedicated, like memcached. This will provide a very high throughput caching implementation that will be a lot more memory efficient than a simple hashtable.
Is there a port of memcache to .Net?
I have about 6500 files for a sum of about 17 GB of data, and this is the first time that I've had to move what I would call a large amount of data. The data is on a network drive, but the individual files are relatively small (max 7 MB).
I'm writing a program in C#, and I was wondering if I would notice a significant difference in performance if I used BULK INSERT instead of SQLBulkCopy. The table on the server also has an extra column, so if I use BULK INSERT I'll have to use a format file and then run an UPDATE for each row.
I'm new to forums, so if there was a better way to ask this question feel free to mention that as well.
By test, BULK INSERT is much faster. After an hour using SQLBulkCopy, I was maybe a quarter of the way through my data, and I had finished writing the alternative method (and having lunch). By the time I finished writing this post (~3 minutes), BULK INSERT was about a third of the way through.
For anyone who is looking at this as a reference, it is also worth mentioning that the upload is faster without a primary key.
It should be noted that one of the major causes for this could be that the server was a significantly more powerful computer, and that this is not an analysis of the efficiency of the algorithm, however I would still recommend using BULK INSERT, as the average server is probably significantly faster than the average desktop computer.
I'm dealing with chunks of data that are 50k rows each.
I'm inserting them into an SQL database using LINQ:
for(int i=0;i<50000;i++)
{
DB.TableName.InsertOnSubmit
(
new TableName
{
Value1 = Array[i,0],
Value2 = Array[i,1]
}
);
}
DB.SubmitChanges();
This takes about 6 minutes, and I want it to take much less if possible. Any suggestions?
if you are reading in a file you'd be better off using BULK INSERT (Transact-SQL) and if you are writing that much (50K rows) at one time from memory, you might be better off writing to a flat file first and then using Bulk Insert on that file.
As you are doing a simple insert and not gaining much from the use of LinqToSql, have a look at SqlBulkCopy, it will remove most of the round trips and reduce the overhead on the Sql Server side as well. You will have to make very few coding changes to use it.
Also look at pre-sorting your data by the column that the table is indexed on, as this will lead to better cache hits when SQL-Server is update the table.
Also consider if you should upload the data to a temp staging table that is not indexed, then a stored proc to insert into the main table with a single sql statement. This may let SqlServer spread the indexing work over all your CPUs.
There are a lot of things you need to check/do.
How much disk space is allocated to the database? Is there enough free to do all of the inserts without it auto increasing in size? If not, increase the database file size as it has to stop every so many inserts to auto resize the db itself.
do NOT do individual inserts. They take way too long. Instead either use table-value parameters (sql 2008), sql bulk copy, or a single insert statement (in that order of preference).
drop any indexes on that table before and recreate them after the load. With that many inserts they are probably going to be fragged to hell anyway.
If you have any triggers, consider dropping them until the load is complete.
Do you have enough RAM available in the database server? You need to check on the server itself to see if it's consuming ALL the available RAM? If so, you might consider doing a reboot prior to the load... sql server has a tendency to just consume and hold on to everything it can get it's hands on.
Along the RAM lines, we like to keep enough RAM in the server to hold the entire database in memory. I'm not sure if this is feasible for you or not.
How is it's disk speed? Is the queue depth pretty long? Other than hardware replacement there's not much to be done here.