reducing insert statement time when dealing with large amount of data - c#

I read about SqlBulkCopy and the way it can reduce the amount of time used when inserting large amount of rows my scenario is : I have a an excel file wish I convert it into a dataTable then I send this dataTable to a stored procedure ( wish I can't change its code ) that insert all the rows in the dataTable to an sql table in the database
the problem is that I have like 10 000 to 50 000 rows to insert is there any work around to reduce the time took by the stored procedure ?

The best way to do this would be to use SqlBulkCopy to add the data to a temporary table and then feed that data into the stored proc. You will need to write some SQL code to do the processing but the performance benefits of doing it this way should be worth the effort.
If you create a new stored proc then you have the added benefit of running all of this code inside the database engine so you will not be switching back and forth between your application and the DB engine.
Some Code:
var importData = new DataSet();
xmlData.Position = 0;
importData.ReadXml(xmlData);
using (var connection = new SqlConnection(myConnectionString))
{
connection.Open();
using (var trans = connection.BeginTransaction())
{
using (var sbc = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default, trans) { DestinationTableName = myTableName })
{
foreach (DataColumn col in importData.Tables[0].Columns)
{
sbc.ColumnMappings.Add(col.ColumnName, col.ColumnName);
}
sbc.WriteToServer(importData.Tables[0]); //table 0 is the main table in this dataset
// Now lets call the stored proc.
var cmd = new SqlCommand("ProcessDataImport", connection)
{
CommandType = CommandType.StoredProcedure
};
cmd.CommandTimeout = 1200;
cmd.ExecuteNonQuery();
trans.Commit();
}
connection.Close();
return null;
}
}
Where XmlData is a stream with the Xml data matching your bulk import and myTableName contains the table you want to import into. Rememeber, when doing a bulk copy, the column names must match 100%. Case is important too.
The proc would look something like this:
CREATE PROCEDURE [ProcessDataImport]
AS
BEGIN
DECLARE #IMPORTCOL INT
WHILE EXISTS (SELECT X FROM TEMPTABLE)
BEGIN
SELECT #IMPORTCOL = (SELECT TOP 1 COLUMN1 FROM TEMPTABLE)
EXEC DOTHEIMPORT #IMPORTCOL
DELETE FROM TEMPTABLE WHERE COLUMN1 = #IMPORTCOL
END
END

Related

Why does my SQL update for 20.000 records take over 5 minutes?

I have a piece of C# code, which updates two specific columns for ~1000x20 records in a database on the localhost. As I know (though I am really far from being a database expert), it should not take long, but it takes more than 5 minutes.
I tried SQL Transactions, with no luck. SqlBulkCopy seems a bit overkill, since it's a large table with dozens of columns, and I only have to update 1/2 column for a set of records, so I would like to keep it simple. Is there a better approach to improve efficiency?
The code itself:
public static bool UpdatePlayers(List<Match> matches)
{
using (var connection = new SqlConnection(Database.myConnectionString))
{
connection.Open();
SqlCommand cmd = connection.CreateCommand();
foreach (Match m in matches)
{
cmd.CommandText = "";
foreach (Player p in m.Players)
{
// Some player specific calculation, which takes almost no time.
p.Morale = SomeSpecificCalculationWhichMilisecond();
p.Condition = SomeSpecificCalculationWhichMilisecond();
cmd.CommandText += "UPDATE [Players] SET [Morale] = #morale, [Condition] = #condition WHERE [ID] = #id;";
cmd.Parameters.AddWithValue("#morale", p.Morale);
cmd.Parameters.AddWithValue("#condition", p.Condition);
cmd.Parameters.AddWithValue("#id", p.ID);
}
cmd.ExecuteNonQuery();
}
}
return true;
}
Updating 20,000 records one at a time is a slow process, so taking over 5 minutes is to be expected.
From your query, I would suggest putting the data into a temp table, then joining the temp table to the update. This way it only has to scan the table to update once, and update all values.
Note: it could still take a while to do the update if you have indexes on the fields you are updating and/or there is a large amount of data in the table.
Example update query:
UPDATE P
SET [Morale] = TT.[Morale], [Condition] = TT.[Condition]
FROM [Players] AS P
INNER JOIN #TempTable AS TT ON TT.[ID] = P.[ID];
Populating the temp table
How to get the data into the temp table is up to you. I suspect you could use SqlBulkCopy but you might have to put it into an actual table, then delete the table once you are done.
If possible, I recommend putting a Primary Key on the ID column in the temp table. This may speed up the update process by making it faster to find the related ID in the temp table.
Minor improvements;
use a string builder for the command text
ensure your parameter names are actually unique
clear your parameters for the next use
depending on how many players in each match, batch N commands together rather than 1 match.
Bigger improvement;
use a table value as a parameter and a merge sql statement. Which should look something like this (untested);
CREATE TYPE [MoraleUpdate] AS TABLE (
[Id] ...,
[Condition] ...,
[Morale] ...
)
GO
MERGE [dbo].[Players] AS [Target]
USING #Updates AS [Source]
ON [Target].[Id] = [Source].[Id]
WHEN MATCHED THEN
UPDATE SET SET [Morale] = [Source].[Morale],
[Condition] = [Source].[Condition]
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(...));
dt.Columns.Add("Morale", typeof(...));
dt.Columns.Add("Condition", typeof(...));
foreach(...){
dt.Rows.Add(p.Id, p.Morale, p.Condition);
}
SqlParameter sqlParam = cmd.Parameters.AddWithValue("#Updates", dt);
sqlParam.SqlDbType = SqlDbType.Structured;
sqlParam.TypeName = "dbo.[MoraleUpdate]";
cmd.ExecuteNonQuery();
You could also implement a DbDatareader to stream the values to the server while you are calculating them.

Select after bulk insert doesn't work in MS access database

There is a windows form application. I am using the MS Access database for some data manipulation. I want to copy data from one database to another. The table name, schema and the data types are same in both the tables.
I am using the below query to bulk insert data in destination database by selecting data from the source database.
INSERT INTO [Table1] IN 'C:\Data\Users.mdf' SELECT * FROM [Table1]
After data is inserted, I am querying to the target table to fetch the inserted data. I am using OleDbConnection for performing the database operations.
The issue I am facing here is that, after the above mentioned INSERT query is executed when I am executing the SELECT statement to fetch the data, I am not getting the data. However, when I am checking in debugging mode then I am getting the data.
I noticed that if I am waiting for some time after the INSERT statement is executed then the data is coming correctly. So I assume that it needs some time(delay?) to complete the bulk insert operation.
I tried providing Task.Delay(20000) after the INSERT query execution but no luck. Could someone help me here, how I can resolve this issue? Any help is highly appreciated.
I didn't find a good way to handle this but did a work around for the same. After data is inserted into the table, I am firing another query to check whether there is any data in the inserted table or not. This happens in a do..while loop like follows. The table is dropped every time the operation is completed.
var insertQuery = "INSERT INTO [Table1] IN 'C:\Data\Users.mdf' SELECT * FROM [Table1]";
ExecuteQuery(insertQuery, connProd);
var count = 10;
do
{
var selectQuery = "SELECT TOP 1 * FROM " + tableProdCopy;
var dtTopRowData = GetQueryData(selectQuery, connOther);
if (dtTopRowData != null && dtTopRowData.Rows.Count > 0)
{
count = 0;
break;
}
System.Threading.Thread.Sleep(2000);
count = count - 1;
} while (count > 0);
private DataTable GetQueryData(string query, OleDbConnection conn)
{
using (OleDbCommand cmdOutput = new OleDbCommand(query, conn))
{
using (OleDbDataAdapter adapterOutput = new OleDbDataAdapter(cmdOutput))
{
var dtOutput = new DataTable();
adapterOutput.Fill(dtOutput);
return dtOutput;
}
}
}
private void ExecuteQuery(string query, OleDbConnection conn)
{
using (OleDbCommand cmdInput = new OleDbCommand(query, conn))
{
cmdInput.ExecuteNonQuery();
}
}

Parameterize insert of multiple rows

Is there any way to parameterize an SQL INSERT statement (in C#), which inserts multiple rows? Currently I can think of only one way, to generate a statement for inserting mulitple rows, but that is quite open to SQL injection:
string sql = " INSERT INTO my_table"
+ " (a, b, c)"
+ " VALUES";
// Add each row of values to the statement
foreach (var item in collection) {
sql = sql
+ String.Format(" ({0}, {1}, {2}),",
aVal, bVal, cVal);
}
// Remove the excessive comma
sql = sql.Remove(sql.Length - 1);
What is the smarter/safer way to do this?
You could add paramaters inside the loop, like:
using (var comm = new SqlCommand()) {
var counter = 0;
foreach (var item in collection) {
sql = sql + String.Format(" (#a{0}, #b{0}, #c{0})," counter);
comm.Parameters.AddWithValue("#a" + counter, aVal);
comm.Parameters.AddWithValue("#b" + counter, bVal);
comm.Parameters.AddWithValue("#c" + counter, cVal);
counter++;
}
}
But I really wouldn't do a multi-row insert like this. IIRC the maximum amount of parameters in a query is about 2100, and this could get very big very fast. As you're looping through a collection anyway, you could just send it to the database in your loop, something like:
using (var con = new SqlConnection("connectionString here"))
{
con.Open();
var sql = "INSERT INTO my_table (a, b, c) VALUES (#a,#b,#c);"
using (var comm = new SqlCommand(sql, con))
{
comm.Parameters.Add("#a", SqlDbType.Int);
comm.Parameters.Add("#b", SqlDbType.NVarChar);
comm.Parameters.Add("#c", SqlDbType.Int);
foreach (var item in collection) {
{
comm.Parameters["#a"].Value = aVal;
comm.Parameters["#b"].Value = bVal;
comm.Parameters["#b"].Size = bVal.Length;
comm.Parameters["#c"].Value = cVal;
comm.ExecuteNonQuery();
}
}
}
The statement is prepared only once (and faster than a huge statement with 100's of parameters), and it doesn't fail all records when one record fails (add some exception handling for that). If you want to fail all when one record fails, you could wrap the thing up in a transaction.
Edit:
Ofcourse, when you regularly have to input 1000's of rows, this approach isn't the most efficient either, and your DBA might start to complain.
There are other approaches to this problem to remove the strain from the database: for example, create a stored procedure in your database that will insert the data from an xml document, or use Table Valued Parameters.
NYCdotNet wrote 2 nice blogs about these options, which I won't recreate here, but they're worth exploring (I'll paste some code below from the blog, as per guidelines, but credit where it's due: NYCdotNet)
XML document approach
Table Valued Parameters
The "meat" from the blog about TVP (in VB.NET but that shouldn't matter):
So I created this "generic" table-valued type:
CREATE TYPE dbo.UniqueIntegerList AS TABLE
(
TheInteger INT NOT NULL
PRIMARY KEY (TheInteger)
);
Creating the Save Stored Procedure
Next, I created a new stored procedure which would accept my new
Table-Valued Type as a parameter.
CREATE PROC DoTableValuedParameterInsert(#ProductIDs
dbo.UniqueIntegerList READONLY)
AS BEGIN
INSERT INTO ProductsAccess(ProductID)
SELECT TheInteger AS [ProductID]
FROM #ProductIDs;
END
In this procedure, I am passing in a parameter called #ProductIDs.
This is of type "dbo.UniqueIntegerList" which I just created in the
previous step. SQL Server looks at this and says "oh I know what this
is - this type is actually a table". Since it knows that the
UniqueIntegerList type is a table, I can select from it just like I
could select from any other table-valued variable. You have to mark
the parameter as READONLY because SQL 2008 doesn't support updating
and returning a passed table-valued parameter.
Creating the Save Routine
Then I had to create a new save routine on my business object that
would call the new stored procedure. The way you prepare the
Table-Valued parameter is to create a DataTable object with the same
column signature as the Table-Valued type, populate it, and then pass
it inside a SqlParameter object as SqlDbType.Structured.
Public Sub SaveViaTableValuedParameter()
'Prepare the Table-valued Parameter'
Dim objUniqueIntegerList As New DataTable
Dim objColumn As DataColumn =
objUniqueIntegerList.Columns.Add("TheInteger", _
System.Type.GetType("System.Int32"))
objColumn.Unique = True
'Populate the Table-valued Parameter with the data to save'
For Each Item As Product In Me.Values
objUniqueIntegerList.Rows.Add(Item.ProductID)
Next
'Connect to the DB and save it.'
Using objConn As New SqlConnection(DBConnectionString())
objConn.Open()
Using objCmd As New SqlCommand("dbo.DoTableValuedParameterInsert")
objCmd.CommandType = CommandType.StoredProcedure
objCmd.Connection = objConn
objCmd.Parameters.Add("ProductIDs", SqlDbType.Structured)
objCmd.Parameters(0).Value = objUniqueIntegerList
objCmd.ExecuteNonQuery()
End Using
objConn.Close()
End Using
End Sub

SQL Server - Update table and return the Updated rows

I have a SQL Server database which has a lot of information inside.
I want to select top 50 rows in a single query (which I did, with no problem) but then I want to update a column from false to true, so next time I select I wont select the same, my code looks like this:
string Command = "UPDATE HubCommands SET [Alreadytaken] = 'true' FROM (SELECT TOP 50 [CommandId],[DeviceId],[Commandtext], [HashCommand],[UserId] FROM HubCommands) I WHERE [HubId] = '18353fe9-82fd-4ac2-a078-51c199d9072b'";
using (SqlConnection myConnection = new SqlConnection(SqlConnection))
{
using (SqlDataAdapter myDataAdapter = new SqlDataAdapter(Command, myConnection))
{
DataTable dtResult = new DataTable();
myDataAdapter.Fill(dtResult);
foreach (DataRow row in dtResult.Rows)
{
Guid CommandId, DeviceId, UserId;
Guid.TryParse(row["CommandId"].ToString(), out CommandId);
Guid.TryParse(row["DeviceId"].ToString(), out DeviceId);
Guid.TryParse(row["UserId"].ToString(), out UserId);
Console.WriteLine("CommandId" + CommandId);
}
}
}
This code does work, and it updates what I ask it to update, but I don't get nothing in the data table, its like it is always updating but not selecting.
If I do a normal select it does work and give information.
Does anyone have any idea how to update and get some data back, in a single query?
So your question is:
How can I update a table in SQL Server using C# and return the truly updated
rows as a DataTable ?
First You have multiple issues in your query.
You should use 1 and 0, not true or false. SQL-Server has a bit datatype and not a Boolean.
Second, this is how you should've constructed your query:
DECLARE #IDs TABLE
(
[CommandId] uniqueidentifier
);
INSERT INTO #IDs
SELECT [CommandId] FROM HubCommands
WHERE [HubId] = '18353fe9-82fd-4ac2-a078-51c199d9072b' AND [Alreadytaken] = 0;
UPDATE HubCommands
SET [Alreadytaken] = 1
WHERE CommandId IN
(
SELECT [CommandId] FROM #IDs
);
SELECT * FROM HubCommands
WHERE CommandId IN
(
SELECT [CommandId] FROM #IDs
);
Wrap all the above in a single string and use SqlDataReader. No need for an Adapter in you case (Since we're mixing commands unlike what the adapter usually does):
var sqlCommand = new SqlCommand(Command, myConnection);
SqlDataReader dataReader = sqlCommand.ExecuteReader();
DataTable dtResult = new DataTable();
dtResult.Load(dataReader);
I highly advise you to create a stored procedure accepting HubId as a parameter that does all the above work. It is neater and better for maintenance.

Dynamically print number of rows updated

The company that I work for has large databases, millions of records in a single table. I have written a C# program that migrates tables between remote servers.
I first create all the tables using SMO without copying data and then the data insertion is done after all the tables have been created.
During the record insertion since there are so many records the console window remains blank until all the rows have been inserted. Due to the sheer volumes of data this takes a long time.
What I want now is a way to print n rows updated like in MSSQL import export data wizard.
The insert part is just a simple insert into select * query.
It sounds like you might be using SqlCommands, if so here is a sample
using (SqlConnection connection = new SqlConnection(Connection.ConnectionString) )
{
using(SqlCommand command = new SqlCommand("insert into OldCustomers select * from customers",connection))
{
connection.Open();
var numRows = command.ExecuteNonQuery();
Console.WriteLine("Affected Rows: {0}",numRows);
}
}
You definitely need to look on OUTPUT clause. There are useful examples on MSDN.
using (SqlConnection conn = new SqlConnection(connectionStr) )
{
var sqlCmd = "
CREATE TABLE #tmp (
InsertedId BIGINT
);
INSERT INTO TestTable
OUTPUT Inserted.Id INTO #tmp
VALUES ....
SELECT COUNT(*) FROM #tmp";
using(SqlCommand cmd = new SqlCommand(sqlCmd,conn))
{
conn .Open();
var numRows = command.ExecuteNonQuery();
Console.WriteLine("Affected Rows: {0}",numRows);
}
}
Also I suggest to use stored procedure for such purposes.

Categories