How to Avoid Duplicate Key Exception - c#

I am using TableAdapter to insert records in table within a loop.
foreach(....)
{
....
....
teamsTableAdapter.Insert(_teamid, _teamname);
....
}
Where TeamID is the primary key in the table and _teamID inserts it. Actually i am extracting data from XML file which contains unique teamId
After first run of this loop, Insert throws Duplicate Primary Key found Exception. To handle this, i have done this
foreach(....)
{
....
....
try
{
_teamsTableAdapter.Insert(_teamid, _teamname);
}
catch (System.Data.SqlClient.SqlException e)
{
if (e.Number != 2627)
MessageBox.Show(e.Message);
}
....
....
}
But using try catch statement is costly, how to avoid this exception. I am working in VS2010 and INSERT ... ON DUPLICATE KEY UPDATE does not work.
I want to avoid try catch statements and handle it without the use of try catch statements.

Based on your comments to other answers, I would suggest that TeamID be changed from the primary key (if possible) and a new Idx column set up as the primary key. You can then set a trigger on your DB that, when a new record is inserted with a duplicate TeamID will update the original record and delete the new one.
If that is not possible, I would modify the stored procedure which is inserting the record so that, instead of just inserting, it first checks for a duplicate TeamID. If there isn't a duplicate id, the record can insert, Else it will just 'Select 0.'
pseudo-code example:
Declare #Count int
Set #Count = (Select Count(TeamId) From [Table] where TeamId = #TeamId)
If(#Count > 0)
Begin
Select 0
End
Else
--Insert Logic Here
Then, your Insert Method in code can, instead of being ExecuteNonQuery(), be ExecuteScalar(). Your code would handle that this way
If(_teams.TableAdapter.Insert(_teamId, _teamName) == 0)
{
_teams.TableAdapter.Update(_teamId, _teamName)
}
Alternatively, if you just wanted to handle it all in SQL (so your C# code doesn't have to change) you could do something like this:
Declare #Count int
Set #Count = Select Count(TeamId) from [Table] Where TeamId = #TeamId
If(#Count > 0)
Begin
//Update Logic
End
Else
Begin
//Insert Logic
End
But, again, I'd just modify the table if that's an option.

Does the table you're using have a primary key? If not, you should create one as it will prevent duplicate records, and might make it easier to access keys for other parts of your program.
Usually this is done with an Identity Column, or something similar. (Which it looks like you might already have in terms of TeamID, in which case you only need to change it to primary key in either SQL-MS or VS2010).
Edit: To designate a primary key as an identity column (teamID in your example) using Visual Studio:
Go to the server explorer. Navigate to the relevant table. Right-click "Open Table Definition". Click on the primary key column. Scroll the properties window until you reach "identity specification". Change this to "yes" (you can set the increment / seed to whatever you wish. Usually 1,1 is fine). Now, all you have to do is insert a Team Name into the table, and the TeamID is automatically generated.

There are clearly duplicates in your data. Either you need to eliminate them first or use some type of merge statment to do an isert if new or an update if not new.
To see what data is casueing the problem, run profiler while you run the loop from your application and see waht statments are actually being sent. That shoudl point you towards which record(s) are duplicated.
If this is a large file, bulk insert (after cleaning the dups) will be faster than row-by-row processing.

Related

Map identity value to object after merge statement

I have a table called People with the following schema:
Id INT NOT NULL IDENTITY(1, 1)
FirstName NVARCHAR(64) NOT NULL
LastName NVARCHAR964) NOT NULL
I am using a query like this one to perform inserts and updates in one query:
MERGE INTO People AS TARGET
USING ( VALUES
(#id0, #firstName0, #lastname0),
(#id1, #firstName1, #lastname1)
...
) AS SOURCE ([Id],[FirstName],[LastName])
ON TARGET.[Id] = SOURCE.[Id]
WHEN MATCHED BY TARGET THEN
UPDATE SET
[FirstName] = SOURCE.[FirstName],
[LastName] = SOURCE.[LastName]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([FirstName],[LastName])
VALUES ([FirstName],[LastName])
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT $action, INSERTED.*;
My application is structured such that the client calls back to the server to load the existing state of the app. The client then creates/modifies/deletes entities locally and pushes those changes to the server in one bunch.
Here's an example of what my "SaveEntities" code currently looks like:
public void SavePeople(IEnumerable<Person> people)
{
// Returns the query I mentioned above
var query = GetMergeStatement(people);
using(var command = new SqlCommand(query))
{
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
// how do I tie these records back to
// the objects in the people collection?
}
}
}
}
I can use the value in the $action column to filter down to just INSERTED records. INSERTED.* returns all of the columns in TARGET for the inserted record. The problem is I have no way of distinctly linking those results back to the collection of objects passed into this method.
The only solution I could think of was to add a writable GUID column to the table and allow the MERGE statement to specify that value so I could link back to these objects in code using that and assign the ID value from there, but that seems like it defeats the purpose of having an automatic identity column and feels convoluted.
I'm really curious how this can work because I know Entity Framework does something to mitigate this problem (to be clear, I believe I'd have to same problem were I to be using a pure INSERT statement instead of a MERGE). In EF I can add objects to the model and call Entity.SaveChanges() and have the entity's ID property auto-update using magic. I guess it's that kind of magic I'm looking to understand more.
Also, I know I could structure my saves to insert one record at a time and cascade the changes appropriately (by returning SCOPE_IDENTITY for every insert) but this would be terribly inefficient.
One of the things I love about the MERGE statement is that the source data is in scope in the OUTPUT clause.
OUTPUT $action, SOURCE.Id, INSERTED.Id;
On insert, this will give you three columns: 'INSERT' in the first, the values of #id0 and #id1 in the second, and the matching, newly inserted Id values in the third.
In your C# code, just read the rows as you normally would.
while (reader.Read())
{
string action = reader.GetString(0);
if (action == "INSERT")
{
int oldId = reader.GetInt32(1);
int newId = reader.GetInt32(2);
// Now do what you want with them.
}
}
You can check for "DELETE" and "UPDATE" too, but keep in mind that ordinal 2 will be NULL on "DELETE" so you need to make sure you check for that before calling reader.GetInt32 in that case.
I've used this, in combination with table variables (OUTPUT SOURCE.Id, INSERTED.Id INTO #PersonMap ([OldId], [NewId])), to copy hierarchies 4 and 5 tables deep, all with identity columns.

Bulk Insert With Auto Increment - No Identity column

I am trying to implement bulk insert of data from Datatable. In my MS-SQL Table(Destination table) i have a column with primary key not Identity column, so i have to increment manually. But its not possible in Code because there will be multi Thread on the same table.Please give me suggestion if any.
public void BulkInsert(DataTable dtTable)
{
DataTable dtProductSold = dtTable;
//creating object of SqlBulkCopy
SqlBulkCopy objbulk = new SqlBulkCopy(ConStr.ToString());
//assigning Destination table name
objbulk.DestinationTableName = "BatchData_InvReportMapping";
//Mapping Table column
objbulk.ColumnMappings.Add("InvPK", "InvPK");
objbulk.ColumnMappings.Add("DateValue", "DateDalue");
objbulk.ColumnMappings.Add("TextValue", "TextValue");
objbulk.ColumnMappings.Add("NumericValue", "NumericValue");
objbulk.ColumnMappings.Add("ErrorValue", "ErrorValue");
//inserting bulk Records into DataBase
objbulk.WriteToServer(dtProductSold);
}
Thanks in advance,
This is too long for a comment.
If you have a primary key column, then you need to take responsibility for its being unique and non-NULL when you insert rows. SQL Server offers a very handy mechanism to help with this, which is the identity column.
If you do not have an identity, then I you basically have two options:
Load data that has a valid primary key column.
Create a trigger that assigns the value when rows are loaded in.
Oh, wait. The default option for bulk insert is not to fire triggers, so the second choice really isn't a good option.
Instead, modify the table to have an identity primary key column. Then define a view on the table without the primary key and do the bulk insert into the view. The primary key will then be assigned automatically.
EDIT:
There is a third option, which might be feasible. Load the data into a staging table. Then insert from the staging table into the final table, calculating the primary key value. Something like this:
insert into finaltable (pk, . . .)
select m.maxpk + seqnum, . . . .
from (select row_number() over (order by (select null)) as seqnum,
. . .
from stagingtable
) s cross join
(select max(pk) as maxpk
from finaltable
) m;
i had one idea
generally we use tables to store the records, even if you insert the data using front end finally it will be stored in table.So i am suggesting to use sequences with insert trigger on the table. which means when you insert the data into the table first the trigger will be called, sequence will be incremented the the increased value will be stored along with other values in the table. just try this. because in oracle 11g we don't have identity() hence we will use sequences and insert trigger for identity column
Create a Table called id's. VARCHAR(50) TableName, INT Id.
When you want to generate your ids read the relevant row and increment it by the number of rows you want to insert within the same transaction.
you can now bulk insert these rows whenever you want without worrying about other threads inserting them.
Similar to how Nhibernates HiLow generator works.
http://weblogs.asp.net/ricardoperes/making-better-use-of-the-nhibernate-hilo-generator

How do I structure this transaction?

We have an ASP.NET/MSSQL based web app which generates orders with sequential order numbers.
When a user saves a form, a new order is created as follows:
SELECT MAX(order_number) FROM order_table, call this max_order_number
set new_order_number = max_order_number + 1
INSERT a new order record, with this new_order_number (it's just a field in the order record, not a database key)
If I enclose the above 3 steps in single transaction, will it avoid duplicate order numbers from being created, if two customers save a new order at the same time? (And let's say the system is eventually on a web farm with multiple IIS servers and one MSSQL server).
I want to avoid two customers selecting the same MAX(order_number) due to concurrency somewhere in the system.
What isolation level should be used? Thank you.
Why not just use an Identity as the order number?
Edit:
As far as I know, you can make the current order_number column an Identity (you may have to reset the seed, it's been a while since I've done this). You might want to do some tests.
Here's a good read about what actually goes on when you change a column to an Identity in SSMS. The author mentions how this may take a while if the table already has millions of rows.
Using an identity is by far the best idea. I create all my tables like this:
CREATE TABLE mytable (
mytable_id int identity(1, 1) not null primary key,
name varchar(50)
)
The "identity" flag means, "Let SQL Server assign this number for me". The (1, 1) means that identity numbers should start at 1 and be incremented by 1 each time someone inserts a record into the table. Not Null means that nobody should be allowed to insert a null into this column, and "primary key" means that we should create a clustered index on this column. With this kind of a table, you can then insert your record like this:
-- We don't need to insert into mytable_id column; SQL Server does it for us!
INSERT INTO mytable (name) VALUES ('Bob Roberts')
But to answer your literal question, I can give a lesson about how transactions work. It's certainly possible, although not optimal, to do this:
-- Begin a transaction - this means everything within this region will be
-- executed atomically, meaning that nothing else can interfere.
BEGIN TRANSACTION
DECLARE #id bigint
-- Retrieves the maximum order number from the table
SELECT #id = MAX(order_number) FROM order_table
-- While you are in this transaction, no other queries can change the order table,
-- so this insert statement is guaranteed to succeed
INSERT INTO order_table (order_number) VALUES (#id + 1)
-- Committing the transaction releases your lock and allows other programs
-- to work on the order table
COMMIT TRANSACTION
Just keep in mind that declaring your table with an identity primary key column does this all for you automatically.
The risk is two processes selecting the MAX(order_number) before one of them inserts the new order. A safer way is to do it in one step:
INSERT INTO order_table
(order_number, /* other fields */)
VALUES
( (SELECT MAX(order_number)+1 FROM order_table ) order_number,
/* other values */
)
I agree with G_M; use an Identity field. When you add your record, just
INSERT INTO order_table (/* other fields */)
VALUES (/* other fields */) ; SELECT SCOPE_IDENTITY()
The return value from Scope Identity will be your order number.

How can I Insert/Update into two related tables in one command?

A database exists with two tables
Data_t : DataID Primary Key that is
Identity 1,1. Also has another field
'LEFT' TINYINT
Data_Link_t : DataID PK and FK where
DataID MUST exist in Data_t. Also has another field 'RIGHT' SMALLINT
Coming from a microsoft access environment into C# and sql server I'm looking for a good method of importing a record into this relationship.
The record contains information that belongs on both sides of this join (Possibly inserting/updating upwards 5000 records at once). Bonus to process the entire batch in some kind of LINQ list type command but even if this is done record by record the key goal is that BOTH sides of this record should be processed in the same step.
There are countless approaches and I'm looking at too many to determine which way I should go so I thought faster to ask the general public. Is LINQ an option for inserting/updating a big list like this with LINQ to SQL? Should I go record by record? What approach should I use to add a record to normalized tables that when joined create the full record?
Sounds like a case where I'd write a small stored proc and call that from C# - e.g. as a function on my Linq-to-SQL data context object.
Something like:
CREATE PROCEDURE dbo.InsertData(#Left TINYINT, #Right SMALLINT)
AS BEGIN
DECLARE #DataID INT
INSERT INTO dbo.Data_t(Left) VALUES(#Left)
SELECT #DataID = SCOPE_IDENTITY();
INSERT INTO dbo.Data_Link_T(DataID, Right) VALUES(#DataID, #Right)
END
If you import that into your data context, you could call this something like:
using(YourDataContext ctx = new YourDataContext)
{
foreach(YourObjectType obj in YourListOfObjects)
{
ctx.InsertData(obj.Left, obj.Right)
}
}
and let the stored proc handle all the rest (all the details, like determining and using the IDENTITY from the first table in the second one) for you.
I have never tried it myself, but you might be able to do exactly what you are asking for by creating an updateable view and then inserting records into the view.
UPDATE
I just tried it, and it doesn't look like it will work.
Msg 4405, Level 16, State 1, Line 1
View or function 'Data_t_and_Data_Link_t' is not updatable because the modification affects multiple base tables.
I guess this is just one more thing for all the Relational Database Theory purists to hate about SQL Server.
ANOTHER UPDATE
Further research has found a way to do it. It can be done with a view and an "instead of" trigger.
create table Data_t
(
DataID int not null identity primary key,
[LEFT] tinyint,
)
GO
create table Data_Link_t
(
DataID int not null primary key foreign key references Data_T (DataID),
[RIGHT] smallint,
)
GO
create view Data_t_and_Data_Link_t
as
select
d.DataID,
d.[LEFT],
dl.[RIGHT]
from
Data_t d
inner join Data_Link_t dl on dl.DataID = d.DataID
GO
create trigger trgInsData_t_and_Data_Link_t on Data_t_and_Data_Link_T
instead of insert
as
insert into Data_t ([LEFT]) select [LEFT] from inserted
insert into Data_Link_t (DataID, [RIGHT]) select ##IDENTITY, [RIGHT] from inserted
go
insert into Data_t_and_Data_Link_t ([LEFT],[RIGHT]) values (1, 2)

TSQL: UPDATE with INSERT INTO SELECT FROM

so I have an old database that I'm migrating to a new one. The new one has a slightly different but mostly-compatible schema. Additionally, I want to renumber all tables from zero.
Currently I have been using a tool I wrote that manually retrieves the old record, inserts it into the new database, and updates a v2 ID field in the old database to show its corresponding ID location in the new database.
for example, I'm selecting from MV5.Posts and inserting into MV6.Posts. Upon the insert, I retrieve the ID of the new row in MV6.Posts and update it in the old MV5.Posts.MV6ID field.
Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually? I'm using SQL Server 2005, dev edition.
The key with migration is to do several things:
First, do not do anything without a current backup.
Second, if the keys will be changing, you need to store both the old and new in the new structure at least temporarily (Permanently if the key field is exposed to the users because they may be searching by it to get old records).
Next you need to have a thorough understanding of the relationships to child tables. If you change the key field all related tables must change as well. This is where having both old and new key stored comes in handy. If you forget to change any of them, the data will no longer be correct and will be useless. So this is a critical step.
Pick out some test cases of particularly complex data making sure to include one or more test cases for each related table. Store the existing values in work tables.
To start the migration you insert into the new table using a select from the old table. Depending on the amount of records, you may want to loop through batches (not one record at a time) to improve performance. If the new key is an identity, you simply put the value of the old key in its field and let the database create the new keys.
Then do the same with the related tables. Then use the old key value in the table to update the foreign key fields with something like:
Update t2
set fkfield = newkey
from table2 t2
join table1 t1 on t1.oldkey = t2.fkfield
Test your migration by running the test cases and comparing the data with what you stored from before the migration. It is utterly critical to thoroughly test migration data or you can't be sure the data is consistent with the old structure. Migration is a very complex action; it pays to take your time and do it very methodically and thoroughly.
Probably the simplest way would be to add a column on MV6.Posts for oldId, then insert all the records from the old table into the new table. Last, update the old table matching on oldId in the new table with something like:
UPDATE mv5.posts
SET newid = n.id
FROM mv5.posts o, mv6.posts n
WHERE o.id = n.oldid
You could clean up and drop the oldId column afterwards if you wanted to.
The best you can do that I know is with the output clause. Assuming you have SQL 2005 or 2008.
USE AdventureWorks;
GO
DECLARE #MyTableVar table( ScrapReasonID smallint,
Name varchar(50),
ModifiedDate datetime);
INSERT Production.ScrapReason
OUTPUT INSERTED.ScrapReasonID, INSERTED.Name, INSERTED.ModifiedDate
INTO #MyTableVar
VALUES (N'Operator error', GETDATE());
It still would require a second pass to update the original table; however, it might help make your logic simpler. Do you need to update the source table? You could just store the new id's in a third cross reference table.
Heh. I remember doing this in a migration.
Putting the old_id in the new table makes both the update easier -- you can just do an insert into newtable select ... from oldtable, -- and the subsequent "stitching" of records easier. In the "stitch" you'll either update child tables' foreign keys in the insert, by doing a subselect on the new parent (insert into newchild select ... (select id from new_parent where old_id = oldchild.fk) as fk, ... from oldchild) or you'll insert children and do a separate update to fix the foreign keys.
Doing it in one insert is faster; doing it in a separate step meas that your inserts aren't order dependent, and can be re-done if necessary.
After the migration, you can either drop the old_id columns, or, if you have a case where the legacy system exposed the ids and so users used the keys as data, you can keep them to allow use lookup based on the old_id.
Indeed, if you have the foreign keys correctly defined, you can use systables/information-schema to generate your insert statements.
Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually?
Since you wouldn't want to do it manually, but automatically, create a trigger on MV6.Posts so that UPDATE occurs on MV5.Posts automatically when you insert into MV6.Posts.
And your trigger might look something like,
create trigger trg_MV6Posts
on MV6.Posts
after insert
as
begin
set identity_insert MV5.Posts on
update MV5.Posts
set ID = I.ID
from inserted I
set identity_insert MV5.Posts off
end
AFAIK, you cannot update two different tables with a single sql statement
You can however use triggers to achieve what you want to do.
Make a column in MV6.Post.OldMV5Id
make a
insert into MV6.Post
select .. from MV5.Post
then make an update of MV5.Post.MV6ID

Categories