DataTable: is deleting old DataRows before inserting new safe? - c#

i have a many-to-many relationship table in a typed DataSet.
For convenience on an update i'm deleting old relations before i'm adding the new(maybe the same as before).
Now i wonder if this way is failsafe or if i should ensure only to delete which are really deleted(for example with LINQ) and only add that one which are really new.
In SQL-Server is a unique constraint defined for the relation table, the two foreign keys are a composite primary key.
Is the order the DataAdapter updates the DataRows which RowState are <> Unchanged predictable or not?
In other words: is it possible that DataAdapter.Update(DataTable) will result in an exception when the key already exists?
This is the datamodel:
This is part of the code(LbSymptomCodes is an ASP.Net ListBox):
Dim daTrelRmaSymptomCode As New ERPModel.dsRMATableAdapters.trelRMA_SymptomCodeTableAdapter
For Each oldTrelRmaSymptomCodeRow As ERPModel.dsRMA.trelRMA_SymptomCodeRow In thisRMA.GettrelRMA_SymptomCodeRows
oldTrelRmaSymptomCodeRow.Delete()
Next
For Each item As ListItem In LbSymptomCodes.Items
If item.Selected Then
Dim newTrelRmaSymptomCodeRow As ERPModel.dsRMA.trelRMA_SymptomCodeRow = Services.dsRMA.trelRMA_SymptomCode.NewtrelRMA_SymptomCodeRow
newTrelRmaSymptomCodeRow.fiRMA = Services.IdRma
newTrelRmaSymptomCodeRow.fiSymptomCode = CInt(item.Value)
Services.dsRMA.trelRMA_SymptomCode.AddtrelRMA_SymptomCodeRow(newTrelRmaSymptomCodeRow)
End If
Next
daTrelRmaSymptomCode.Update(Services.dsRMA.trelRMA_SymptomCode)
Thank you in advance.

I think that the DataAdapter in ADO.NET is clever enough to perform the delete/inserts in the correct order.
However, if you really want to ensure that updates are done in the correct order you should do it manually by using the Select method to return an array of data rows for each particular row state. You could then call the Update method on the array of data rows
DataTable tbl = ds.Tables["YourTable"];
// Process any Deleted rows first
adapter.Update(tbl.Select(null, null, DataViewRowState.Deleted));
// Process any Updated/Modified rows
adapter.Update(tbl.Select(null, null, DataViewRowState.ModifiedCurrent));
// Process the Inserts last
adapter.Update(tbl.Select(null, null, DataViewRowState.Added));

Not sure about the DA but in theory DB transactions should be performed in the following order Deletes, Inserts, Updates.
looking at msdn the exact wording for the update method is
Blockquote
Attempts to save all changes in the DataTable to the database. (This includes removing any rows deleted from the table, adding rows inserted to the table, and updating any rows in the table that have changed.)
Blockquote
In regards to your solution of deleting items and possibly re-inserting the same items, typically speaking this should be avoided because it creates a load on the DB. In high volume applications you want to do everything you can to minimize calls to the DB as they are very expensive; computation time, from determining which row updates are spurious, is cheap.

Related

SQLiteAsyncConnection inserting multiple row in order?

I used SQLite before, and adding multiple rows using Insert in a for loop was slow. The solution was using a transaction.
Now that I am using SQLiteAsyncConnection in SQLite.Net (for ORM), I also tried to use a transaction. It works but with only one problem. The insert order is not the order of the data.
Database.RunInTransactionAsync(
(SQLiteConnection conn) => {
foreach (var row in rows)
{
conn.InsertOrReplace(row);
}
conn.Commit();
}
);
If rows contained [1,2,3,4,5,6], the rows in the database was something like [3,1,2,6,4,5]. How can I keep the original order?
Note that I only mean newly inserted rows. Even thought the code is replacing existing rows, when testing there were no existing rows in the database to be replaced.
PS: The row has ID field which is the [PrimaryKey], but in the rows the rows are not sorted by ID. It seems that in the database the rows are sorted by ID. I do not want it to be sorted by ID but the original order to be maintained.
PS 2: I need to know the ID of the last-inserted row. When viewing the database using a GUI tool like DB Browser for SQLite or getting the last item by LIMIT 1, it seems the SQLite had automatically sorted the rows by ID. I did some Google search and it said by the rules of SQL, when there is no ORDER BY, the order of the returned rows are not guaranteed to be the physical order, anyway. Should I create another field and set it as the primary, auto-increasing field?
Currently, ID is guaranteed to be unique per row, but 'ID' is part of the data itself, not a field specially added for the use with the database.
SQL tables are logically unordered, so if you want a certain order, you always have to use ORDER BY in your queries.
If your data does not contain any values (e.g., timestamp) that corresponds to the insertion order, then you have to use the rowid, i.e., add a column declared as INTEGER PRIMARY KEY.

How to we skip error with SQLBulkCopy [duplicate]

I am trying to insert huge amount of data into SQL server. My destination table has an unique index called "Hash".
I would like to replace my SqlDataAdapter implementation with SqlBulkCopy. In SqlDataAapter there is a property called "ContinueUpdateOnError", when set to true adapter.Update(table) will insert all the rows possible and tag the error rows with RowError property.
The question is how can I use SqlBulkCopy to insert data as quickly as possible while keeping track of which rows got inserted and which rows did not (due to the unique index)?
Here is the additional information:
The process is iterative, often set on a schedule to repeat.
The source and destination tables can be huge, sometimes millions of rows.
Even though it is possible to check for the hash values first, it requires two transactions per row (first for selecting the hash from destination table, then perform the insertion). I think in the adapter.update(table)'s case, it is faster to check for the RowError than checking for hash hits per row.
SqlBulkCopy, has very limited error handling facilities, by default it doesn't even check constraints.
However, its fast, really really fast.
If you want to work around the duplicate key issue, and identify which rows are duplicates in a batch. One option is:
start tran
Grab a tablockx on the table select all current "Hash" values and chuck them in a HashSet.
Filter out the duplicates and report.
Insert the data
commit tran
This process will work effectively if you are inserting huge sets and the size of the initial data in the table is not too huge.
Can you please expand your question to include the rest of the context of the problem.
EDIT
Now that I have some more context here is another way you can go about it:
Do the bulk insert into a temp table.
start serializable tran
Select all temp rows that are already in the destination table ... report on them
Insert the data in the temp table into the real table, performing a left join on hash and including all the new rows.
commit the tran
That process is very light on round trips, and considering your specs should end up being really fast;
Slightly different approach than already suggested; Perform the SqlBulkCopy and catch the SqlException thrown:
Violation of PRIMARY KEY constraint 'PK_MyPK'. Cannot insert duplicate
key in object 'dbo.MyTable'. **The duplicate key value is (17)**.
You can then remove all items from your source from ID 17, the first record that was duplicated. I'm making assumptions here that apply to my circumstances and possibly not yours; i.e. that the duplication is caused by the exact same data from a previously failed SqlBulkCopy due to SQL/Network errors during the upload.
Note: This is a recap of Sam's answer with slightly more details
Thanks to Sam for the answer. I have put it in an answer due to comment's space constraints.
Deriving from your answer I see two possible approaches:
Solution 1:
start tran
grab all possible hit "hash" values by doing "select hash in destinationtable where hash in (val1, val2, ...)
filter out duplicates and report
insert data
commit tran
solution 2:
Create temp table to mirror the
schema of destination table
bulk insert into the temp table
start serializable transaction
Get duplicate rows: "select hash from
tempTable where
tempTable.hash=destinationTable.hash"
report on duplicate rows
Insert the data in the temp table
into the destination table: "select * into destinationTable from temptable left join temptable.hash=destinationTable.hash where destinationTable.hash is null"
commit the tran
Since we have two approaches, it comes down to which approach is the most optimized? Both approaches have to retrieve the duplicate rows and report while the second approach requires extra:
temp table creation and delete
one more sql command to move data from temp to destination table
depends on the percentage of hash collision, it also transfers a lot of unnecessary data across the wire
If these are the only solutions, it seems to me that the first approach wins. What do you guys think? Thanks!

DataSet TableAdapter Fill Method

I have a DataSet with two TableAdapters (1 to many relationship) that was created using visual studio 2010's Configuration Wizard.
I make a call to an external source and populate a Dictionary with the results. These results should be all of the entries in the database. To synchronize the DB I don't want to just clear all of the tables and then repopulate them like dropping the tables and creating them with new data in sql.
Is there a clean way possibly using the TableAdapter.Fill() method or do I have to loop through the two tables row by row and decide if it stay or gets deleted and then add the new entries? What is the best approach to make the data that is in the dictionary be the only data in my two tables with the DataSet?
First Question: if it's the same DB why do you have 2 tables with the same information?
To the question at hand: that largley depend on the sizes. If the tables are not big then use a transaction, clear the table (DELETE * FROM TABLE or whatever) and write your data in there again.
If the tables are big on the other hand the question is: can you load all this into your dictionary?
Of course you have to ask yourself what happens to inconsistent data (another user/app changed the data while you had it in your dictionary).
If this takes to long you could remember what you did to the data - that means: flag the changed data and remember the deleted keys and new inserted rows and make your updates based on that.
Both can be achieved by remembering the Filled DataTable and use this as backing field or by implementing your own mechanisms.
In any way I would recommend think on the problem: do you really need the dictionary? Why not make queries against the database to get the data? Or only cache a part of the data for quick access?
PS: the update method on you DataAdapter will do all the work (changing the changed, removing the deleted and inserting the new datarows but it will update the DataTable/Set so this will only work once)
It could be that it is quicker to repopulate the entire table than to itterate through and decide what record go / stay. Could you not do the process of deciding if a records is deleteed via an sql statement ? (Delete from table where active = false) if you want them to stay in the database but not in the dataset (select * from table where active = true)
You could have a date field and select all records that have been added since the date you late 'pooled' the database (select * from table where active = true and date-added > #12:30#)

How to refresh datagridview binding in vb.net or C#?

I created two tables(FIRSTtable and SECONDtable) in the mysql database and two tables that are related.
The FIRST table, has a columns (product_id (pK), product_name).
The SECOND table has an columns (machine_id, production_date, product_id (fK),
product_quantity, operator_id).
Relations between the two tables using the product_id column with UpdateCascade and DeleteCascade. Both relationships are functioning normally when I try with the sql script. Suppose I delete all product_id in the FIRST table, all existing data in the SECOND table will be deleted.
Both of these tables displayed in datagridview. When I delete all the data in the FIRST table, the all rows in datagridview FIRST table will be deleted, also the data in mysql the FIRST table will be deleted.
I try to open the mysql database, the data are in SECOND Table also deleted, the problem why the view that in the second datagridview, can not be deleted, still keep the previous data? How to refresh datagridview binding in vb.net or C#? Thanks.
With Me.SECOND_DataGridView
.Datasource = Nothing ' tried this, but failed.
.DataSource = MyDataset.Tables("SECOND_table")
End With
I believe what you are running into is the fact the the MySQL Engine is actually performing the cascading deletes for you.
When you query the MySQL Data into a localized C# "DataTable" (Table within a DataSet), that data is now in memory and not directly linked to that on the disk. When you go to delete the rows in the "memory" version of the first data table, its causing the deletions to occur at the SERVER for the second level table and NOT directly updating you in-memory version of data table two.
That being said, you will probably have to do one of two things... Requery the entire dataset (tables one and two) to get a full refresh of what is STILL in the actual database... OR... As you are calling the delete from table one of the dataset, you'll have to perform the delete handling in the local datatable TWO as well to keep it in synch.

SqlBulkCopy Error handling / continue on error

I am trying to insert huge amount of data into SQL server. My destination table has an unique index called "Hash".
I would like to replace my SqlDataAdapter implementation with SqlBulkCopy. In SqlDataAapter there is a property called "ContinueUpdateOnError", when set to true adapter.Update(table) will insert all the rows possible and tag the error rows with RowError property.
The question is how can I use SqlBulkCopy to insert data as quickly as possible while keeping track of which rows got inserted and which rows did not (due to the unique index)?
Here is the additional information:
The process is iterative, often set on a schedule to repeat.
The source and destination tables can be huge, sometimes millions of rows.
Even though it is possible to check for the hash values first, it requires two transactions per row (first for selecting the hash from destination table, then perform the insertion). I think in the adapter.update(table)'s case, it is faster to check for the RowError than checking for hash hits per row.
SqlBulkCopy, has very limited error handling facilities, by default it doesn't even check constraints.
However, its fast, really really fast.
If you want to work around the duplicate key issue, and identify which rows are duplicates in a batch. One option is:
start tran
Grab a tablockx on the table select all current "Hash" values and chuck them in a HashSet.
Filter out the duplicates and report.
Insert the data
commit tran
This process will work effectively if you are inserting huge sets and the size of the initial data in the table is not too huge.
Can you please expand your question to include the rest of the context of the problem.
EDIT
Now that I have some more context here is another way you can go about it:
Do the bulk insert into a temp table.
start serializable tran
Select all temp rows that are already in the destination table ... report on them
Insert the data in the temp table into the real table, performing a left join on hash and including all the new rows.
commit the tran
That process is very light on round trips, and considering your specs should end up being really fast;
Slightly different approach than already suggested; Perform the SqlBulkCopy and catch the SqlException thrown:
Violation of PRIMARY KEY constraint 'PK_MyPK'. Cannot insert duplicate
key in object 'dbo.MyTable'. **The duplicate key value is (17)**.
You can then remove all items from your source from ID 17, the first record that was duplicated. I'm making assumptions here that apply to my circumstances and possibly not yours; i.e. that the duplication is caused by the exact same data from a previously failed SqlBulkCopy due to SQL/Network errors during the upload.
Note: This is a recap of Sam's answer with slightly more details
Thanks to Sam for the answer. I have put it in an answer due to comment's space constraints.
Deriving from your answer I see two possible approaches:
Solution 1:
start tran
grab all possible hit "hash" values by doing "select hash in destinationtable where hash in (val1, val2, ...)
filter out duplicates and report
insert data
commit tran
solution 2:
Create temp table to mirror the
schema of destination table
bulk insert into the temp table
start serializable transaction
Get duplicate rows: "select hash from
tempTable where
tempTable.hash=destinationTable.hash"
report on duplicate rows
Insert the data in the temp table
into the destination table: "select * into destinationTable from temptable left join temptable.hash=destinationTable.hash where destinationTable.hash is null"
commit the tran
Since we have two approaches, it comes down to which approach is the most optimized? Both approaches have to retrieve the duplicate rows and report while the second approach requires extra:
temp table creation and delete
one more sql command to move data from temp to destination table
depends on the percentage of hash collision, it also transfers a lot of unnecessary data across the wire
If these are the only solutions, it seems to me that the first approach wins. What do you guys think? Thanks!

Categories