What is the fastest way to get a DataTable into SQL Server? - c#

I have a DataTable in memory that I need to dump straight into a SQL Server temp table.
After the data has been inserted, I transform it a little bit, and then insert a subset of those records into a permanent table.
The most time consuming part of this operation is getting the data into the temp table.
Now, I have to use temp tables, because more than one copy of this app is running at once, and I need a layer of isolation until the actual insert into the permanent table happens.
What is the fastest way to do a bulk insert from a C# DataTable into a SQL Temp Table?
I can't use any 3rd party tools for this, since I am transforming the data in memory.
My current method is to create a parameterized SqlCommand:
INSERT INTO #table (col1, col2, ... col200) VALUES (#col1, #col2, ... #col200)
and then for each row, clear and set the parameters and execute.
There has to be a more efficient way. I'm able to read and write the records on disk in a matter of seconds...

SqlBulkCopy will get the data in very fast.
I blogged not that long ago how to maximise performance. Some stats and examples in there. I compared 2 techniques, 1 using an SqlDataAdapter and 1 using SqlBulkCopy - bottom line was for bulk inserting 100K records, the data adapter approach took ~25 seconds compared to only ~0.8s for SqlBulkCopy.

You should use the SqlBulkCopy class.

Related

What is the fastest way to insert record in SQL Server from C#

I have 5 tables, for each table i want to delete data and insert 100 records.
all operation should happen in single transaction.
In real terms, 100 records is not much; if performance is your aim, you could try using SqlBulkCopy - this uses raw TDS, and supports transactions via the optional constructor argument. In terms of how to feed data in: SqlBulkCopy accepts either DataTable or a data reader; "FastMember" allows you to treat a List<T> or similar as a data-reader, if that helps (see example at the bottom of this page). However, you should be careful to actually time that against doing the same thing with regular parameterized TSQL operations, as for a number like 100 it isn't guaranteed that SqlBulkCopy would be faster.
An alternative to SqlBulkCopy would be "table valued parameters"; this requires more config, as the user type needs to be defined on the server etc, but; it can work.

How to insert millions of data of different RDBMS in to SQL Server database with insert statement?

I have two databases in my SQL Server with each database containing 1 single table as of now.
I have 2 database like below :
1) Db1 (MySQL)
2) Db2 (Oracle)
Now what I want to do is fill my database table of SQL Server db1 with data from Db1 from MySQL like below :
Insert into Table1 select * from Table1
Select * from Table1(Mysql Db1) - Data coming from Mysql database
Insert into Table1(Sql server Db1) - Insert data coming from Mysql
database considering same schema
I don't want to use sqlbulk copy as I don't want to insert chunk by chunk data. I want to insert all data in 1 go considering millions of data as my operation is just not limited to insert records in database. So user have to sit wait for a long like first millions of data inserting chunk by chunk in database and then again for my further operation which is also long running operation.
So if I have this process speed up then I can have my second operation also speed up considering all records are in my 1 local sql server instance.
Is this possible to achieve in a C# application?
Update: I researched about Linked server as #GorDon Linoff suggested me that linked server can be use to achieve this scenario but based on my research it seems like i cannot create linked server through code.
I want to do this with the help of ado.net.
This is what I am trying to do exactly:
Consider I have 2 different client RDBMS with 2 database and some tables in client premises.
So database is like this :
Sql Server :
Db1
Order
Id Amount
1 100
2 200
3 300
4 400
Mysql or Oracle :
Db1:
Order
Id Amount
1 1000
2 2000
3 3000
4 400
Now I want to compare Amount column from source (SQL Server) to destination database (MySQL or Oracle).
I will be use to join this 2 different RDBMS databases tables to compare Amount columns.
In C# what I can do is like fetch chunk by chunk records in my datatable (in memory) then compare this records with the help of code but this will take so much time considering millions of records.
So I want to do something better than this.
Hence I was thinking that i bring out this 2 RDBMS records in my local SQL server instance in 2 databases and then create join query joining this 2 tables based on Id and then take advantage of DBMS processing capability which can compare this millions of records efficiently.
Query like this compares millions of records efficiently :
select SqlServer.Id,Mysql.Id,SqlServer.Amount,Mysql.Amount from SqlServerDb.dbo.Order as SqlServer
Left join MysqlDb.dbo.Order as Mysql on SqlServer.Id=Mysql.Id
where SqlServer.Amount != Mysql.Amount
Above query works when I have this 2 different RDBMS data in my local server instance with database : SqlServerDb and MysqlDb and this will fetch below records whose amount is not matching :
So I am trying to get those records from source(Sql server Db) to MySQL whose Amount column value is not matching.
Expected Output :
Id Amount
1 1000
2 2000
3 3000
So there is any way to achieve this scenario?
On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ...
On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is).
Doing it all at once may be easier to code than chunking, and may (or may not) be faster running.
SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input.
Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy.
This can copy almost any number of rows without limit. I have copied hundreds of millions of rows using the data reader approach.
What about adding a changed date in the remote database.
Then you could get all rows that have changed since the last sync and just compare those?
First of all do not use linked server. It is tempting but it will more trouble than it is bringing on the table. Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back.
As far as I understand you are trying to copy changed data to target system for some stuff.
I recommend using a timestamp column on source table. When anything changes on source table timestamp column is updated by sql server.
On target, get max ID and max timestamp. two queries at max.
On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync.
Now you do not have to compare two worlds, and you just got all updates and inserts.
You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery.
Take a look at openquery:
https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx
Yes, SQL Server is very efficient when it's working with sets so let's keep that in play.
In a nutshell, what I'm pitching is
Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table.
Have a stored proc on the target database to reconcile the data between your target table and the staging table. Your C# code calls the stored proc.
Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed.

Importing large text file into sql database

I read txt files and saving rows from this file to local database.The Problem is that program reads 700 000 rows and it takes long time to read whole file. I use linq to sql, firs I read the row, then i split it in to the Table object and then I submit into DB.
For example The row has format
2014-03-01 00:08:02.380 00000000000001100111
this row is splited into DateTime and 20 columns (each column represents 1 channel (CH1 - CH20))
Is there a better (faster) way?
You can use the FileHelpers http://filehelpers.sourceforge.net/ to feed directly into SqlBulkCopy. http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
That is by far the easiest and fastest approach.
You can still use Linq-2-sql for read/non-batch writes but for bulkinsert is is simply too slow.
That would be slow with linq to sql submitting that many items.
Bulk insert or Bulk update would be preferable for this task which you can't with linq to sql. See also this post. bulk insert with linq-to-sql
I suggest something else than linq to sql for this task.

Efficient insert statement

I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.

fastest Import of a csv to a database table

I have implemented an import functionality which takes data from a csv file in an Asp.Net appication. The file of the size can vary from a few kb's to a max of 10 MB.
However when an import occurs and if the file size is > 50000 it takes around 20 MINS .
Which is way too much of a time. I need to perform an import for around 300000 records within a timespan of 2-3 Mins .
I know that the import to a database also depends on the physical memory of the db server .I create insert scripts in bulk and execute . I also know using SqlBulkCopy would also be another option but in my case its just not the inserting of product's that take place but also update and delete that is a field called "FUNCTION CODE" which decides whether to Insert,Update Or Delete.
Any suggestions regarding as to how to go about this would be greatly appreciated.
One approach towards this would be to implement multiple threads which carry out processes simultaneosly ,but i have never implemented threading till date and hence am not aware of the complication i would incur by implementing the same.
Thanks & Regards,
Francis P.
SqlBulkCopy is definitely going to be fastest. I would approach this by inserting the data into a temporary table on the database. Once the data is in the temp table, you could use SQL to merge/insert/delete accordingly.
I guess you are using SQL Server...
If you are using 2005/2008 consider using SSIS to process the file. Technet
Importing huge amount of data within the asp.net process is not the best thing you can do. You might upload the file and start a process that is doing the magic for you.
If this is a repeated process and the file is uploaded via asp.net plus you are doing some decision making on the data to decide insert/update or delete, then try out http://www.codeproject.com/KB/database/CsvReader.aspx it is this fast csv reader. Its quite quick and economical with memory
You are doing all your database queries with 1 connection sequentially. So for every insert/update/delete you are sending the command through the wire, wait for the db to do it's thing, and then wake up again when something is sent back.
Databases are optimized for heavy parallel access. So there are 2 easy routes for a significant speedup:
Open X connections to the database (where you have to tweak X but just start with 5) and either: spin up 5 threads who each do a chunk of the same work you were doing.
or: use asynchronous calls and when a callback arrives shoot in the next query.
I suggest using the XML functionality in SQL Server 2005/2008, which will allow you to bulk insert and bulk update. I'd take the following approach:
Process the entire file into an in-memory data structure.
Create a single XML document from this structure to pass to a stored proc.
Create a stored proc to load data from the XML document into a temporary table, then perform the inserts and updates. See below for guidance on creating the stored proc.
There are numerous advantages to this approach:
The entire operation is completed in one database call, although if your dataset is really large you may want to batch it.
You can wrap all the database writes into a single transaction easily, and roll back if anything fails.
You are not using any dynamic SQL, which could have posed a security risk.
You can return the IDs of the inserted, updated and/or deleted records using the OUTPUT clause.
In terms of the stored proc you will need, something like the following should work:
CREATE PROCEDURE MyBulkUpdater
(
#p_XmlData VARCHAR(MAX)
)
AS
DECLARE #hDoc INT
EXEC sp_xml_preparedocument #hDoc OUTPUT, #p_XmlData
-- Temporary table, should contain the same schema as the table you want to update
CREATE TABLE #MyTempTable
(
-- ...
)
INSERT INTO #MyTempTable
(
[Field1],
[Field2]
)
SELECT
XMLData.Field1,
XMLData.Field2
FROM OPENXML (#hdoc, 'ROOT/MyRealTable', 1)
WITH
(
[Field1] int,
[Field2] varchar(50),
[__ORDERBY] int
) AS XMLData
EXEC sp_xml_removedocument #hDoc
Now you can simply insert, update and delete your real table from your temporary table as required eg
INSERT INTO MyRealTable (Field1, Field2)
SELECT Field1, Field2
FROM #MyTempTable
WHERE ...
UPDATE MyRealTable
SET rt.Field2 = tt.Field2
FROM MyRealTable rt
JOIN MyTempTable tt ON tt.Field1 = MyRealTable.Field1
WHERE ...
For an example of the XML you need to pass in, you can do:
SELECT TOP 1 *, 0 AS __ORDERBY FROM MyRealTable AS MyRealTable FOR XML AUTO, ROOT('ROOT')
For more info, see OPENXML, sp_xml_preparedocument and sp_xml_removedocument.

Categories