I've to transfer the data from datatable to database. The total record is around 15k per time.
I want to redue the time for data inserting. Should I use SqlBulk? or anything else ?
If you use C# take a look on SqlBulkCopy class. There is a good article on CodeProject about how to use it with DataTable
If you just aim for speed, then bulk copy might suite you well. Depending on the requirements you can also reduce logging level.
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/edf280b9-2b27-48bb-a2ff-cab4b563dcb8/
How to copy a huge table data into another table in SQL Server
SQL bulk copy is definitively fastest with this kind of sample size as it streams the data rather than actually using regular sql commands. This gives it a slightly longer startup time but with more than ~120 rows at a time (according to my tests)
Check out my post here on how to get here if you want some stats on how sql bulk copy stacks up against a few other methods.
http://blog.staticvoid.co.nz/2012/8/17/mssql_and_large_insert_statements
15k records is not that much really. I have a laptop with i5 CPU and 4GB ram and it inserts 15k records in several seconds. I wouldn't really bother spending too much time on finding an ideal solution.
Related
Good Morning.
I have an a service I am writing in C# that performs and insert of a single row every 15 seconds or so. This interval is user-definable but at the moment 15 seconds is the targeted interval.
I have read this article: SqlBulkCopy on a single record? which does explain clearly the throughput benefits of using a regular insert over performing a single row bulk insert. But it also mentions (or implies) that the regular insert is a bit a heavier on CPU than the bulk insert.
As the service I Am writing is a support service, and as the insert is artificially restricted to only 1 in a given interval I am wondering if whether the implication above is justification for using SqlBulkCopy over a regular insert. Obviously I want my service to be as lightweight as possible.
The answer in your article says the SqlBulkCopy is 4.4x slower and take double the CPU of a simple SQL Command. You misread the answer (or I did), it states SqlBulkCopy is heavier on CPU than a regular insert which makes more sense.
The answer is simple: Stick with SQL Command if you are 100% sure only one record need to be inserted.
Even my library C# Bulk Operations do not use SqlBulkCopy until it reaches a specific amount of row which is normally ten because SqlBulkCopy is too heavy.
I need to load multiple sql statements from SQL Server into DataTables. Most of the statements return some 10.000 to 100.000 records and each take up to a few seconds to load.
My guess is that this is simply due to the amount of data that needs to be shoved around. The statements themselves don't take much time to process.
So I tried to use Parallel.For() to load the data in parallel, hoping that the overall processing time would decrease. I do get a 10% performance increase, but that is not enough. A reason might be that my machine is only a dual core, thus limiting the benefit here. The server on which the program will be deployed has 16 cores though.
My question is, how I could improve the performance more? Would the use of Asynchronous Data Service Queries be a better solution (BeginExecute, etc.) than PLINQ? Or maybe some other approach?
The SQl Server is running on the same machine. This is also the case on the deployment server.
EDIT:
I've run some tests with using a DataReader instead of a DataTable. This already decreased the load times by about 50%. Great! Still I am wondering whether parallel processing with BeginExecute would improve the overall load time if a multiprocessor machine is used. Does anybody have experience with this? Thanks for any help on this!
UPDATE:
I found that about half of the loading time was consumed by processing the sql statement. In SQL Server Management Studio the statements took only a fraction of the time, but somehow they take much longer through ADO.NET. So by using DataReaders instead of loading DataTables and adapting the sql statements I've come down to about 25% of the initial loading time. Loading the DataReaders in parallel threads with Parallel.For() does not make an improvement here. So for now I am happy with the result and leave it at that. Maybe when we update to .NET 4.5 I'll give the asnchronous DataReader loading a try.
My guess is that this is simply due to the amount of data that needs to be shoved around.
No, it is due to using a SLOW framework. I am pulling nearly a million rows into a dictionary in less than 5 seconds in one of my apps. DataTables are SLOW.
You have to change the nature of the problem. Let's be honest, who needs to view 10.000 to 100.000 records per request? I think no one.
You need to consider to handle paging and in your case, paging should be done on sql server. To make this clear, lets say you have stored procedure named "GetRecords". Modify this stored procedure to accept page parameter and return only data relevant for specific page (let's say 100 records only) and total page count. Inside app just show this 100 records (they will fly) and handle selected page index.
Hope this helps, best regards!
Do you often have to load these requests? If so, why not use a distributed cache?
I have a database that contains more than 100 million records. I am running a query that contains more than 10 million records. This process takes too much time so i need to shorten this time. I want to save my obtained record list as a csv file. How can I do it as quickly and optimum as possible? Looking forward your suggestions. Thanks.
I'm assuming that your query is already constrained to the rows/columns you need, and makes good use of indexing.
At that scale, the only critical thing is that you don't try to load it all into memory at once; so forget about things like DataTable, and most full-fat ORMs (which typically try to associate rows with an identity-manager and/or change-manager). You would have to use either the raw IDataReader (from DbCommand.ExecuteReader), or any API that builds a non-buffered iterator on top of that (there are several; I'm biased towards dapper). For the purposes of writing CSV, the raw data-reader is probably fine.
Beyond that: you can't make it go much faster, since you are bandwidth constrained. The only way you can get it faster is to create the CSV file at the database server, so that there is no network overhead.
Chances are pretty slim you need to do this in C#. This is the domain of bulk data loading/exporting (commonly used in Data Warehousing scenarios).
Many (free) tools (I imagine even Toad by Quest Software) will do this more robustly and more efficiently than you can write it in any platform.
I have a hunch that you don't actually need this for an end-user (the simple observation is that the department secretary doesn't actually need to mail out copies of that; it is too large to be useful in that way).
I suggest using the right tool for the job. And whatever you do,
donot roll your own datatype conversions
use CSV with quoted literals and think of escaping the double quotes inside these
think of regional options (IOW: always use InvariantCulture for export/import!)
"This process takes too much time so i need to shorten this time. "
This process consists of three sub-processes:
Retrieving > 10m records
Writing records to file
Transferring records across the network (my presumption is you are working with a local client against a remote database)
Any or all of those issues could be a bottleneck. So, if you want to reduce the total elapsed time you need to figure out where the time is spent. You will probably need to instrument your C# code to get the metrics.
If it turns out the query is the problem then you will need to tune it. Indexes won't help here as you're retrieving a large chunk of the table (> 10%), so increasing the performance of a full table scan will help. For instance increasing the memory to avoid disk sorts. Parallel query could be useful (if you have Enterprise Edition and you have sufficient CPUs). Also check that the problem isn't a hardware issue (spindle contention, dodgy interconnects, etc).
Can writing to a file be the problem? Perhaps your disk is slow for some reason (e.g. fragmentation) or perhaps you're contending with other processes writing to the same directory.
Transferring large amounts of data across a network is obviously a potential bottleneck. Are you certain you're only sending relevenat data to the client?
An alternative architecture: use PL/SQL to write the records to a file on the dataserver, using bulk collect to retrieve manageable batches of records, and then transfer the file to where you need it at the end, via FTP, perhaps compressing it first.
The real question is why you need to read so many rows from the database (and such a large proportion of the underlying dataset). There are lots of approaches which should make this scenario avoidable, obvious ones being synchronous processing, message queueing and pre-consolidation.
Leaving that aside for now...if you're consolidating the data or sifting it, then implementing the bulk of the logic in PL/SQL saves having to haul the data across the network (even if it's just to localhost, there's still a big overhead). Again if you just want to dump it out into a flat file, implementing this in C# isn't doing you any favours.
I have about 6500 files for a sum of about 17 GB of data, and this is the first time that I've had to move what I would call a large amount of data. The data is on a network drive, but the individual files are relatively small (max 7 MB).
I'm writing a program in C#, and I was wondering if I would notice a significant difference in performance if I used BULK INSERT instead of SQLBulkCopy. The table on the server also has an extra column, so if I use BULK INSERT I'll have to use a format file and then run an UPDATE for each row.
I'm new to forums, so if there was a better way to ask this question feel free to mention that as well.
By test, BULK INSERT is much faster. After an hour using SQLBulkCopy, I was maybe a quarter of the way through my data, and I had finished writing the alternative method (and having lunch). By the time I finished writing this post (~3 minutes), BULK INSERT was about a third of the way through.
For anyone who is looking at this as a reference, it is also worth mentioning that the upload is faster without a primary key.
It should be noted that one of the major causes for this could be that the server was a significantly more powerful computer, and that this is not an analysis of the efficiency of the algorithm, however I would still recommend using BULK INSERT, as the average server is probably significantly faster than the average desktop computer.
I'm dealing with chunks of data that are 50k rows each.
I'm inserting them into an SQL database using LINQ:
for(int i=0;i<50000;i++)
{
DB.TableName.InsertOnSubmit
(
new TableName
{
Value1 = Array[i,0],
Value2 = Array[i,1]
}
);
}
DB.SubmitChanges();
This takes about 6 minutes, and I want it to take much less if possible. Any suggestions?
if you are reading in a file you'd be better off using BULK INSERT (Transact-SQL) and if you are writing that much (50K rows) at one time from memory, you might be better off writing to a flat file first and then using Bulk Insert on that file.
As you are doing a simple insert and not gaining much from the use of LinqToSql, have a look at SqlBulkCopy, it will remove most of the round trips and reduce the overhead on the Sql Server side as well. You will have to make very few coding changes to use it.
Also look at pre-sorting your data by the column that the table is indexed on, as this will lead to better cache hits when SQL-Server is update the table.
Also consider if you should upload the data to a temp staging table that is not indexed, then a stored proc to insert into the main table with a single sql statement. This may let SqlServer spread the indexing work over all your CPUs.
There are a lot of things you need to check/do.
How much disk space is allocated to the database? Is there enough free to do all of the inserts without it auto increasing in size? If not, increase the database file size as it has to stop every so many inserts to auto resize the db itself.
do NOT do individual inserts. They take way too long. Instead either use table-value parameters (sql 2008), sql bulk copy, or a single insert statement (in that order of preference).
drop any indexes on that table before and recreate them after the load. With that many inserts they are probably going to be fragged to hell anyway.
If you have any triggers, consider dropping them until the load is complete.
Do you have enough RAM available in the database server? You need to check on the server itself to see if it's consuming ALL the available RAM? If so, you might consider doing a reboot prior to the load... sql server has a tendency to just consume and hold on to everything it can get it's hands on.
Along the RAM lines, we like to keep enough RAM in the server to hold the entire database in memory. I'm not sure if this is feasible for you or not.
How is it's disk speed? Is the queue depth pretty long? Other than hardware replacement there's not much to be done here.