I need to update a very large table periodically and SQLBulkCopy is perfect for that, only that I have a 2-columns index that prevents duplicates. Is there a way to use SQLBulkCopy as "insert or update if exists"?
If not, what is the most efficient way of doing so? Again, I am talking about a table with millions of records.
Thank you
I published a nuget package (SqlBulkTools) to solve this problem.
Here's a code example that would achieve a bulk upsert.
var bulk = new BulkOperations();
var books = GetBooks();
using (TransactionScope trans = new TransactionScope())
{
using (SqlConnection conn = new SqlConnection(ConfigurationManager
.ConnectionStrings["SqlBulkToolsTest"].ConnectionString))
{
bulk.Setup<Book>()
.ForCollection(books)
.WithTable("Books")
.AddAllColumns()
.BulkInsertOrUpdate()
.MatchTargetOn(x => x.ISBN)
.Commit(conn);
}
trans.Complete();
}
For very large tables, there are options to add table locks and temporarily disable non-clustered indexes. See SqlBulkTools Documentation for more examples.
I would bulk load data into a temporary staging table, then do an upsert into the final table. See here for an example of doing an upsert.
Not in one step, but in SQL Server 2008, you could:
bulk load into staging table
apply a MERGE statement to update/insert into your real table
Read more about the MERGE statement
Instead of create a new temporary table, which BTW consume more space and memory.
I created a Trigger with INSTEAD OF INSERT and use inside MERGE statement.
But don't forget add the parameter SqlBulkCopyOptions.FireTriggers in the SqlBulkCopy.
This is my two cents.
Another alternative would be to not use a temporary table but use a stored procedure with a table valued parameter. Pass a datatable to the sp and do the merge there.
Got a hint from #Ivan. For those who might need, here's what I did.
create trigger yourschma.Tr_your_triger_name
on yourschma.yourtable
instead of INSERT
as
merge into yourschma.yourtable as target
using inserted as source
on (target.yourtableID = source.yourtableID)
when matched then
update
set target.ID = source.ID,
target.some_column = source.some_column,
target.Amount = source.Amount
when not matched by target then
insert (some_column, Amount)
values (source.some_column, source.Amount);
go
Related
I'm copying some data from one SQL Server database to another SQL Server database.
That works fine, what I need is to check if some data already exists, then not copy it. How can I do that? Any suggestions?
string Source = ConfigurationManager.ConnectionStrings["Db1"].ConnectionString;
string Destination = ConfigurationManager.ConnectionStrings["Db2"].ConnectionString;
using (SqlConnection sourceCon = new SqlConnection(Source))
{
SqlCommand cmd = new SqlCommand("SELECT [Id],[Client] FROM [Db1].[dbo].[Client]", sourceCon);
sourceCon.Open();
using (SqlDataReader rdr = cmd.ExecuteReader())
{
using (SqlConnection destCon = new SqlConnection(Destination))
{
using (SqlBulkCopy bc = new SqlBulkCopy(destCon))
{
bc.DestinationTableName = "Clients";
bc.ColumnMappings.Add("Id", "ClientId");
bc.ColumnMappings.Add("Client", "Client");
destCon.Open();
bc.WriteToServer(rdr);
}
}
}
}
One way to do what you're after would be to bulk-copy into a staging table (a separate table with similar layout), and then perform a conditional insert from the staging table into the real table.
You could also do something similar using a table-valued-parameter instead of SqlBulkCopy, and treat the table-valued-parameter as the staging table.
Copy all tables from your source database to your destination database as temp tables, then run SQL to add the missing record from the temp table to the destination table. The final step to delete the temp tables.
hope that will work for you.
You could make a database link from source db to destination, and run a query to work out which rows would need to transit, but be careful not to drag too much data over the link as it could make the process slow- realistically you only need the. Columns you will use to determine whether a row in the source equals a row in the dest
Typically though it's easier to bulk copy all the data into a temporary table at the destination then use a merge or insert-leftjoin to only insert some data from temporary table to real table
Here's an example of how to insert only some rows that don't already exist:
INSERT INTO real(column1,column2...)
SELECT temp.column1,temp.column2... FROM
temp
LEFT JOIN real ON real.ID = temp.ID
WHERE
real.ID IS NULL
In c# terms it would look like:
new SqlCommand(#"INSERT INTO real(column1,column2...)
SELECT temp.column1,temp.column2... FROM
temp
LEFT JOIN real ON real.ID = temp.ID
WHERE
real.ID IS NULL", conn).ExecuteNonQuery();
You need to run this using a conn connected to your destination database
I am facing an issue I hope to get it solved by here. I have 3 different tables in a DataSet and I want to insert it in the database table.
I know I can do this using SqlBulkCopy but there is a catch and that is I want to check if the data already exists in the database then I want it to get updated instead of insert.
And if the data doesn't exist in the database table, I want to insert it then. Any help on this would be appreciated.
I know I can iterate it through each record and then fire a procedure which will check for its existence if it exists den update or else insert. But the data size is huge and iterating through each record would be a time taking process, I don't want to use this approach.
Regards
Disclaimer: I'm the owner of the project Bulk Operations
This project allows to BulkInsert, BulkUpdate, BulkDelete, and BulkMerge (Upsert).
Under the hood, it does almost what #marc_s have suggested (Use SqlBulkCopy into a temporary table and perform a merge statement to insert or update depending on the primary key).
var bulk = new BulkOperation(connection);
bulk.BulkMerge(dt);
I'm wanting to do a bulk copy of data from one database to another. It needs to be dynamic enough so that when the users of the source database create new fields, there are minimal changes at the destination end(my end!).
I've done this using the sqlbulkcopy function, using column mappings set up in a seperate table, so that if anything new is created all I need to do is create the new field and set up the mapping (no code or stored procedure changes):
foreach (var mapping in columnMapping)
{
var split = mapping.Split(new[] { ',' });
sbc.ColumnMappings.Add(split.First(), split.Last());
}
try
{
sbc.WriteToServer(sourcedatatable);
}
However, the requirements have now changed.
I need to keep more data, sourced from elsewhere, in other columns in this table which means I can't truncate the whole table and write everything with the sqlbulkcopy. Now, I need to be able to Insert new records or Update the relevant fields for current records, but still be dynamic enough that I won't need code changes if the users create new fields.
Does anyone have any ideas?
Comment on original question from mdisibio - it looks like the SQL MERGE statement would have been the answer.
I have used BulkCopy command to transfer rows from one table to another table with bulk data about 3 to 5 million rows. I want to update these rows.
Is there any BulkUpdate command similar to the BulkCopy command? I'm using ASP.NET with C#.
No, there isn't.
Q: What's an "lac"?
This might help:
http://itknowledgeexchange.techtarget.com/itanswers/bulk-update-in-sql-server-2005/
Assuming that you have a column with distict values to show you which
rows are which between the two tables this can be done with a simple
update statement.
UPDATE TableA
SET TableA.A1 = TableB.B1,
TableA.A2 = TableB.B2
FROM TableB
WHERE TableA.A3 = TableB.B3
If you are worried about creating one massive transaction you can
batch the operation into smaller chunks. This is done via the TOP
keyword.
UPDATE TOP (1000) TableA
SET TableA.A1 = TableB.B1,
TableA.A2 = TableB.B2
FROM TableB
WHERE TableA.A3 = TableB.B3
AND TableA.A1 <> TableB.B1
AND TableA.A2 <> TableB.B2
You can put that into a loop...
Here's another link (with basically the same solution):
http://www.sqlusa.com/bestpractices2005/hugeupdate/
A common approach here is:
bulk-load (SqlBulkCopy) into an empty *staging table - meaning: a table with the right columns/types as the actual data, but not part of the main transactional system
now do an update joining the real data to the staging data, to update the values in the real data
Disclaimer: I'm the owner of the project Bulk Operations
The Bulk Operations Library allow to Insert, Delete, Update and Merge millions of rows in few seconds.
It's very easy to learn and use if you already know the SqlBulkCopy class.
var bulk = new BulkOperation(connection);
// ... Mappings ....
bulk.BulkUpdate(dt);
This is my first post.. I have 2 SQL Server databases located on different servers..
Let's say SDT for source data table from source database SDB to DDT (Destination data table) for Database DDB
I'm using C# for bulk copying from SDT to DDT..
My code is something like this:
sqlcommand = "Delete * from DDT where locID = #LocIDParam" // #LocIDParam is the parameter for a specific location //
then bulk copy "Select * from SDT where locID = #LocIDParam" // the steps are well known..
I just don't want to go for useless details..
However, my SDT has a huge data so that it causes high traffic for bulk copying the whole table
Is there anyway for bulk copying the only updated records from SDT to DDT as well as inserting the new ones???
Do you think using an SQL trigger for updated and newly inserted data is the best idea for this kind of scenarios? (trigger to insert the primary key value into a single column table for the new and update then deleting and inserting from/to DDT based on this )
PS. I don't want to use SQL replication for that since it has a lot of problems..
Thank you in advance
From the date I suppose you already fond your solution. In case not, here is how we deal with a somehow similar situation.
On the source table we have a column that shows if the data has to be send to the destination. We use a boolean but you can also have a datetime field that shows last update date.
Then our pull process does following :
Pull all the flagged data in a temporary table on the destination server
Update records that exists in both table
Insert all records from temporary table that don't exist in destination table
Drop the temporary table
If you use SQL 2008, there is a merge option that I don't know. Here a link that explains it :
SQL 208 MERGE command
Hope this will help you if you still need.