MySQL insert/update records very slow - c#

I found out mysql update records very slow.
I have a logic whereby, if the records exist then, it will update, else it will insert.
but the total record for source database having 14k rows, and it only successfully inserted each row in few sec, therefore, it is very slow, how can i make it 1 minutes to insert 14k rows?
by the way, i'm using c#
foreach (row in rows)
checkrecords(row); // if exist update, else create.
Regards,
MH

If exists may be the root of the problem.
Make sure you are using a unique index on your check whether the record exists or not.

Try setting values:
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT (for non-windows machine)
innodb_buffer_pool_size=25GB (currently it is close to 21GB)
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=0...1000 (try different values, beginning with 200)
References:
MySQL docs for description of different variables.
MySQL Server Setting Tuning
MySQL Performance Optimization basics
Hope it helps...
Please refer Why is MySQL InnoDB insert so slow?
Source

Restructure the query and be sure the indexes are set up correctly.
Load the rows to be updated into a temporary table. Then do the insert/update as:
insert into t(. . .)
select . . .
from temptable tt
on duplicate key update col1 = values(col1), . . .;
Make sure that the conditions that you are checking for the existence of the record are combined into a unique index. So, if you are looking for a combination of three columns, then make sure you have a unique index on t(col1, col2, col3).

Related

getting "DateTime" of the last change in rows in a table in SQL Server?

Is there any way to find "DateTime" of the last change in rows in a table in SQL Server?
The changes (Insert / Update) are submitted by another windows application
And all I have in this table is insert_Date and there is no update_Date (and I can't add any columns or use triggers)
I've tried some queries, but all I got was the number of "User Updates" in a table, not the IDs of modified rows!
I want to get rows which are modified or inserted after a specific DateTime
If the information isn't stored in the table (or in another one by using a trigger for example) then it's impossible to track which rows were inserted after a determined datetime. You might find the time the last operation was executed at a table/index level (by querying sys.dm_db_index_usage_stats) but not at a record level.
You can't find data that doesn't exist!

Insert/Update whole DataTable into database table C#

I am facing an issue I hope to get it solved by here. I have 3 different tables in a DataSet and I want to insert it in the database table.
I know I can do this using SqlBulkCopy but there is a catch and that is I want to check if the data already exists in the database then I want it to get updated instead of insert.
And if the data doesn't exist in the database table, I want to insert it then. Any help on this would be appreciated.
I know I can iterate it through each record and then fire a procedure which will check for its existence if it exists den update or else insert. But the data size is huge and iterating through each record would be a time taking process, I don't want to use this approach.
Regards
Disclaimer: I'm the owner of the project Bulk Operations
This project allows to BulkInsert, BulkUpdate, BulkDelete, and BulkMerge (Upsert).
Under the hood, it does almost what #marc_s have suggested (Use SqlBulkCopy into a temporary table and perform a merge statement to insert or update depending on the primary key).
var bulk = new BulkOperation(connection);
bulk.BulkMerge(dt);

How to we skip error with SQLBulkCopy [duplicate]

I am trying to insert huge amount of data into SQL server. My destination table has an unique index called "Hash".
I would like to replace my SqlDataAdapter implementation with SqlBulkCopy. In SqlDataAapter there is a property called "ContinueUpdateOnError", when set to true adapter.Update(table) will insert all the rows possible and tag the error rows with RowError property.
The question is how can I use SqlBulkCopy to insert data as quickly as possible while keeping track of which rows got inserted and which rows did not (due to the unique index)?
Here is the additional information:
The process is iterative, often set on a schedule to repeat.
The source and destination tables can be huge, sometimes millions of rows.
Even though it is possible to check for the hash values first, it requires two transactions per row (first for selecting the hash from destination table, then perform the insertion). I think in the adapter.update(table)'s case, it is faster to check for the RowError than checking for hash hits per row.
SqlBulkCopy, has very limited error handling facilities, by default it doesn't even check constraints.
However, its fast, really really fast.
If you want to work around the duplicate key issue, and identify which rows are duplicates in a batch. One option is:
start tran
Grab a tablockx on the table select all current "Hash" values and chuck them in a HashSet.
Filter out the duplicates and report.
Insert the data
commit tran
This process will work effectively if you are inserting huge sets and the size of the initial data in the table is not too huge.
Can you please expand your question to include the rest of the context of the problem.
EDIT
Now that I have some more context here is another way you can go about it:
Do the bulk insert into a temp table.
start serializable tran
Select all temp rows that are already in the destination table ... report on them
Insert the data in the temp table into the real table, performing a left join on hash and including all the new rows.
commit the tran
That process is very light on round trips, and considering your specs should end up being really fast;
Slightly different approach than already suggested; Perform the SqlBulkCopy and catch the SqlException thrown:
Violation of PRIMARY KEY constraint 'PK_MyPK'. Cannot insert duplicate
key in object 'dbo.MyTable'. **The duplicate key value is (17)**.
You can then remove all items from your source from ID 17, the first record that was duplicated. I'm making assumptions here that apply to my circumstances and possibly not yours; i.e. that the duplication is caused by the exact same data from a previously failed SqlBulkCopy due to SQL/Network errors during the upload.
Note: This is a recap of Sam's answer with slightly more details
Thanks to Sam for the answer. I have put it in an answer due to comment's space constraints.
Deriving from your answer I see two possible approaches:
Solution 1:
start tran
grab all possible hit "hash" values by doing "select hash in destinationtable where hash in (val1, val2, ...)
filter out duplicates and report
insert data
commit tran
solution 2:
Create temp table to mirror the
schema of destination table
bulk insert into the temp table
start serializable transaction
Get duplicate rows: "select hash from
tempTable where
tempTable.hash=destinationTable.hash"
report on duplicate rows
Insert the data in the temp table
into the destination table: "select * into destinationTable from temptable left join temptable.hash=destinationTable.hash where destinationTable.hash is null"
commit the tran
Since we have two approaches, it comes down to which approach is the most optimized? Both approaches have to retrieve the duplicate rows and report while the second approach requires extra:
temp table creation and delete
one more sql command to move data from temp to destination table
depends on the percentage of hash collision, it also transfers a lot of unnecessary data across the wire
If these are the only solutions, it seems to me that the first approach wins. What do you guys think? Thanks!

Efficient insert statement

I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.

SqlBulkCopy Error handling / continue on error

I am trying to insert huge amount of data into SQL server. My destination table has an unique index called "Hash".
I would like to replace my SqlDataAdapter implementation with SqlBulkCopy. In SqlDataAapter there is a property called "ContinueUpdateOnError", when set to true adapter.Update(table) will insert all the rows possible and tag the error rows with RowError property.
The question is how can I use SqlBulkCopy to insert data as quickly as possible while keeping track of which rows got inserted and which rows did not (due to the unique index)?
Here is the additional information:
The process is iterative, often set on a schedule to repeat.
The source and destination tables can be huge, sometimes millions of rows.
Even though it is possible to check for the hash values first, it requires two transactions per row (first for selecting the hash from destination table, then perform the insertion). I think in the adapter.update(table)'s case, it is faster to check for the RowError than checking for hash hits per row.
SqlBulkCopy, has very limited error handling facilities, by default it doesn't even check constraints.
However, its fast, really really fast.
If you want to work around the duplicate key issue, and identify which rows are duplicates in a batch. One option is:
start tran
Grab a tablockx on the table select all current "Hash" values and chuck them in a HashSet.
Filter out the duplicates and report.
Insert the data
commit tran
This process will work effectively if you are inserting huge sets and the size of the initial data in the table is not too huge.
Can you please expand your question to include the rest of the context of the problem.
EDIT
Now that I have some more context here is another way you can go about it:
Do the bulk insert into a temp table.
start serializable tran
Select all temp rows that are already in the destination table ... report on them
Insert the data in the temp table into the real table, performing a left join on hash and including all the new rows.
commit the tran
That process is very light on round trips, and considering your specs should end up being really fast;
Slightly different approach than already suggested; Perform the SqlBulkCopy and catch the SqlException thrown:
Violation of PRIMARY KEY constraint 'PK_MyPK'. Cannot insert duplicate
key in object 'dbo.MyTable'. **The duplicate key value is (17)**.
You can then remove all items from your source from ID 17, the first record that was duplicated. I'm making assumptions here that apply to my circumstances and possibly not yours; i.e. that the duplication is caused by the exact same data from a previously failed SqlBulkCopy due to SQL/Network errors during the upload.
Note: This is a recap of Sam's answer with slightly more details
Thanks to Sam for the answer. I have put it in an answer due to comment's space constraints.
Deriving from your answer I see two possible approaches:
Solution 1:
start tran
grab all possible hit "hash" values by doing "select hash in destinationtable where hash in (val1, val2, ...)
filter out duplicates and report
insert data
commit tran
solution 2:
Create temp table to mirror the
schema of destination table
bulk insert into the temp table
start serializable transaction
Get duplicate rows: "select hash from
tempTable where
tempTable.hash=destinationTable.hash"
report on duplicate rows
Insert the data in the temp table
into the destination table: "select * into destinationTable from temptable left join temptable.hash=destinationTable.hash where destinationTable.hash is null"
commit the tran
Since we have two approaches, it comes down to which approach is the most optimized? Both approaches have to retrieve the duplicate rows and report while the second approach requires extra:
temp table creation and delete
one more sql command to move data from temp to destination table
depends on the percentage of hash collision, it also transfers a lot of unnecessary data across the wire
If these are the only solutions, it seems to me that the first approach wins. What do you guys think? Thanks!

Categories