Map identity value to object after merge statement

Map identity value to object after merge statement - c#

I have a table called People with the following schema:
Id INT NOT NULL IDENTITY(1, 1)
FirstName NVARCHAR(64) NOT NULL
LastName NVARCHAR964) NOT NULL
I am using a query like this one to perform inserts and updates in one query:
MERGE INTO People AS TARGET
USING ( VALUES
(#id0, #firstName0, #lastname0),
(#id1, #firstName1, #lastname1)
...
) AS SOURCE ([Id],[FirstName],[LastName])
ON TARGET.[Id] = SOURCE.[Id]
WHEN MATCHED BY TARGET THEN
UPDATE SET
[FirstName] = SOURCE.[FirstName],
[LastName] = SOURCE.[LastName]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([FirstName],[LastName])
VALUES ([FirstName],[LastName])
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT $action, INSERTED.*;
My application is structured such that the client calls back to the server to load the existing state of the app. The client then creates/modifies/deletes entities locally and pushes those changes to the server in one bunch.
Here's an example of what my "SaveEntities" code currently looks like:
public void SavePeople(IEnumerable<Person> people)
{
// Returns the query I mentioned above
var query = GetMergeStatement(people);
using(var command = new SqlCommand(query))
{
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
// how do I tie these records back to
// the objects in the people collection?
}
}
}
}
I can use the value in the $action column to filter down to just INSERTED records. INSERTED.* returns all of the columns in TARGET for the inserted record. The problem is I have no way of distinctly linking those results back to the collection of objects passed into this method.
The only solution I could think of was to add a writable GUID column to the table and allow the MERGE statement to specify that value so I could link back to these objects in code using that and assign the ID value from there, but that seems like it defeats the purpose of having an automatic identity column and feels convoluted.
I'm really curious how this can work because I know Entity Framework does something to mitigate this problem (to be clear, I believe I'd have to same problem were I to be using a pure INSERT statement instead of a MERGE). In EF I can add objects to the model and call Entity.SaveChanges() and have the entity's ID property auto-update using magic. I guess it's that kind of magic I'm looking to understand more.
Also, I know I could structure my saves to insert one record at a time and cascade the changes appropriately (by returning SCOPE_IDENTITY for every insert) but this would be terribly inefficient.

One of the things I love about the MERGE statement is that the source data is in scope in the OUTPUT clause.
OUTPUT $action, SOURCE.Id, INSERTED.Id;
On insert, this will give you three columns: 'INSERT' in the first, the values of #id0 and #id1 in the second, and the matching, newly inserted Id values in the third.
In your C# code, just read the rows as you normally would.
while (reader.Read())
{
string action = reader.GetString(0);
if (action == "INSERT")
{
int oldId = reader.GetInt32(1);
int newId = reader.GetInt32(2);
// Now do what you want with them.
}
}
You can check for "DELETE" and "UPDATE" too, but keep in mind that ordinal 2 will be NULL on "DELETE" so you need to make sure you check for that before calling reader.GetInt32 in that case.
I've used this, in combination with table variables (OUTPUT SOURCE.Id, INSERTED.Id INTO #PersonMap ([OldId], [NewId])), to copy hierarchies 4 and 5 tables deep, all with identity columns.

Related

Primary Key value in datagrid with ObservableCollection (Entity Framework)

Don't throw stones at me - I am quite new in programming, especially in Entity Framework.
I created a small program, that records some variables to SQL table. For representation of table with record variables I use datagrid, that is bound with ObservableCollection. The Datagrid looks like this:
recorded variable
Id is the primary key with seed 1. The problem is how to show actual Id number without querying a SQL table?
I tried this:
VariableRecordOCtoShow.Add(new VariableRecord() //VariableRecord is class from entity model
{
Value = newVariableRecord.Value,
Time = newVariableRecord.Time,
IdVariableAssignment = newVariableRecord.IdVariableAssignment,
IdUser = newVariableRecord.IdUser
});
But in Id column in datagrid were only zeros.
I tried this: firstly create a database, what is las Id number
public int getIdValue()
{
var query = (from p in context.VariableRecords
orderby p.Id descending
select p).Take(1).Single();
return query.Id;
}
Secondly write it do variable:
idVariable = getIdValue();
Then write it to Id of ObservableCollection in (not class of entity model) and increase by one this variable after each record (below is part of writing method):
VariableRecordOCtoShow.Add(new VariableRecord() // v tomto velkem radku jenom zapisuju do Observable Collection hodnoty ze vstupu + Id
{
Id = idVariable, //fake Id (calculated from query)
Value = newVariableRecord.Value,
Time = newVariableRecord.Time,
IdVariableAssignment = newVariableRecord.IdVariableAssignment,
IdUser = newVariableRecord.IdUser
});
idVariable++;
It works, but only in the beginning - when I tested my program overnight, there was some error with performance (some variables didn't write in time) and at final Id from OC and Id from SQL doesn't match.
Can you help me? How can I display real primary key value in datagrid without querying while record is running?
Thank you in advance.

The problem is how to show actual Id number without querying a SQL table?
I am not sure about your question
Firstly, why don't you want to query the SQL table?
Secondly, we generally don't show ID field (Identity field) in the view. It's not a good practice.
And still you want to show then you can always query in EF :
ObservableCollection<VariableRecords> v =
new ObservableCollection<VariableRecords>(from p in context.VariableRecords
select c)
Also your second approach might create problems in future as identity field ID is not always Max(ID) +1. See this
Reuse of values – For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a particular insert statement fails or if the insert statement is rolled back then the consumed identity values are lost and will not be generated again. This can result in gaps when the subsequent identity values are generated.

Entity Framework 6 and SQL Server Sequences

I am using EF6 with a database first project. We have a requirement to use sequences which was a feature introduced in SQL server 2012 (I believe).
On the table the identity column has a default value set using:
(NEXT VALUE FOR [ExhibitIdentity])
This is used as we have two tables which store exhibit information for separate departments but we need the identity to be unique across both of the tables as it is then used as a reference in lots of other shared common tables.
My problem is using this within the Entity Framework, I have googled but couldn't find much information in relation to whether EF6 supports them. I have tried setting StoreGeneratedPatttern in the EFdesigner to Identity but when saving this complains that zero rows were affected as it is using scope_identity to see if the insert succeeded but as we are using sequences this comes back as null.
Setting it to computed throws an error saying I should set it to identity and setting it to none causes it to insert 0 as the id value and fail.
Do I need to call a function/procedure in order to get the next sequence and then assign it to the id value before saving the record?
Any help is much appreciated.

It's clear that you can't escape from this catch-22 by playing with DatabaseGeneratedOptions.
The best option, as you suggested, is to set DatabaseGeneratedOption.None and get the next value from the sequence (e.g. as in this question) right before you save a new record. Then assign it to the Id value, and save. This is concurrency-safe, because you will be the only one drawing that specific value from the sequence (let's assume no one resets the sequence).
However, there is a possible hack...
A bad one, and I should stop here...
EF 6 introduced the command interceptor API. It allows you to manipulate EF's SQL commands and their results before and after the commands are executed. Of course we should not tamper with these commands, should we?
Well... if we look at an insert command that is executed when DatabaseGeneratedOption.Identity is set, we see something like this:
INSERT [dbo].[Person]([Name]) VALUES (#0)
SELECT [Id]
FROM [dbo].[Person]
WHERE ##ROWCOUNT > 0 AND [Id] = scope_identity()
The SELECT command is used to fetch the generated primary key value from the database and set the new object's identity property to this value. This enables EF to use this value in subsequent insert statements that refer to this new object by a foreign key in the same transaction.
When the primary key is generated by a default taking its value from a sequence (as you do) it is evident that there is no scope_identity(). There is however a current value of the sequence, which can be found by a command like
SELECT current_value FROM sys.sequences WHERE name = 'PersonSequence'
If only we could make EF execute this command after the insert instead of scope_identity()!
Well, we can.
First, we have to create a class that implements IDbCommandInterceptor, or inherits from the default implementation DbCommandInterceptor:
using System.Data.Entity.Infrastructure.Interception;
class SequenceReadCommandInterceptor : DbCommandInterceptor
{
public override void ReaderExecuting(DbCommand command
, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
}
}
We add this class to the interception context by the command
DbInterception.Add(new SequenceReadCommandInterceptor());
The ReaderExecuting command runs just before command is executed. If this is an INSERT command with an identity column, its text looks like the command above. Now we could replace the scope_identity() part by the query getting the current sequence value:
command.CommandText = command.CommandText
.Replace("scope_identity()",
"(SELECT current_value FROM sys.sequences
WHERE name = 'PersonSequence')");
Now the command will look like
INSERT [dbo].[Person]([Name]) VALUES (#0)
SELECT [Id]
FROM [dbo].[Person]
WHERE ##ROWCOUNT > 0 AND [Id] =
(SELECT current_value FROM sys.sequences
WHERE name = 'PersonSequence')
And if we run this, the funny thing is: it works. Right after the SaveChanges command the new object has received its persisted Id value.
I really don't think this is production-ready. You'd have to modify the command when it's an insert command, choose the right sequence based on the inserted entity, all by dirty string manipulation in a rather obscure place. And I don't know if with heavy concurrency you will always get the right sequence value back. But who knows, maybe a next version of EF will support this out of the box.

This issue has been already solved:
https://github.com/dotnet/ef6/issues/165
A workaround for this issue was added in 8e38ed8. It requires setting
the following flag in the application:
SqlProviderServices.UseScopeIdentity = false; Here is the description
of the new flag (which includes some caveats worth knowing):
Gets or sets a value indicating whether to use the SCOPE_IDENTITY()
function to retrieve values generated by the database for numeric
columns during an INSERT operation. The default value of true is
recommended and can provide better performance if all numeric values
are generated using IDENTITY columns. If set to false, an OUTPUT
clause will be used instead. An OUTPUT clause makes it possible to
retrieve values generated by sequences or other means.

How to retrieve server generated Identity values when using SqlBulkCopy

I know I can do a bulk insert into my table with an identity column by not specifying the SqlBulkCopyOptions.KeepIdentity as mentioned here.
What I would like to be able to do is get the identity values that the server generates and put them in my datatable, or even a list. I saw this post, but I want my code to be general, and I can't have a version column in all my tables. Any suggestions are much appreciated. Here is my code:
public void BulkInsert(DataTable dataTable, string DestinationTbl, int batchSize)
{
// Get the DataTable
DataTable dtInsertRows = dataTable;
using (SqlBulkCopy sbc = new SqlBulkCopy(sConnectStr))
{
sbc.DestinationTableName = DestinationTbl;
// Number of records to be processed in one go
sbc.BatchSize = batchSize;
// Add your column mappings here
foreach (DataColumn dCol in dtInsertRows.Columns)
{
sbc.ColumnMappings.Add(dCol.ColumnName, dCol.ColumnName);
}
// Finally write to server
sbc.WriteToServer(dtInsertRows);
}
}

AFAIK, you can't.
The only way (that I know of) to get the values(s) of the identity field is by using either SCOPE_IDENTITY() when you insert row-by-row; or by using the OUTPUT approach when inserting an entire set.
The 'simplest' approach probably would be that you would SqlBulkCopy the records in the table and then fetch them back again later on. The problem might be that it could be hard to properly (and quickly) fetch those rows from the server again. (e.g. it would be rather ugly (and slow) to have a WHERE clause with IN (guid1, guid2, .., guid999998, guid999999) =)
I'm assuming performance is an issue here as you're already using SqlBulkCopy so I'd suggest to go for the OUTPUT approach in which case you'll firstly need a staging table to SqlBulkCopy your records in. Said table should then be including some kind of batch-identifier (GUID?) as to allow multiple treads to run side by side. You'll need a stored procedure to INSERT <table> OUTPUT inserted.* SELECT the data from the staging-table into the actual destination table and also clean-up the staging table again. The returend recordset from said procedure would then match 1:1 to the origanal dataset responsible for filling the staging table, but off course you should NOT rely on it's order. In other words : your next challenge than will be matching the returned Identity-fields back to the original records in your application.
Thinking things over, I'd say that in all cases -- except the row-by-row & SCOPY_IDENTITY() approach, which is going to be dog-slow -- you'll need to have (or add) a 'key' to your data to link the generated id's back to the original data =/

You can do a similar approach described above by deroby but instead of retrieving them back via a WHERE IN (guid1, etc... You match them back up to the rows inserted in memory based on their order.
So I would suggest to add a column onto the table to match the row to a SqlBulkCopy transaction and then do the following to match the generated Ids back to the in memory collection of rows you just inserted.
Create a new Guid and set this value on all the rows in the bulk copy mapping to the new column
Run the WriteToServer method of the BulkCopy object
Retrieve all the rows that have that same key
Iterate through this list which will be in the order they were added, these will be in the same order as the the in memory collection of rows so you then will know the generated id for each item.
This will give you better performance than giving each individual row a unique key. So after you bulk insert the data table you could do something like this (In my example I will have a list of objects from which I will create the data table and then map the generated ids back to them)
List<myObject> myCollection = new List<myObject>
Guid identifierKey = Guid.NewGuid();
//Do your bulk insert where all the rows inserted have the identifierKey
//set on the new column. In this example you would create a data table based
//off the myCollection object.
//Identifier is a column specifically for matching a group of rows to a sql
//bulk copy command
var myAddedRows = myDbContext.DatastoreRows.AsNoTracking()
.Where(d => d.Identifier == identiferKey)
.ToList();
for (int i = 0; i < myAddedRows.Count ; i++)
{
var savedRow = myAddedRows[i];
var inMemoryRow = myCollection[i];
int generatedId = savedRow.Id;
//Now you know the generatedId for the in memory object you could set a
// a property on it to store the value
inMemoryRow.GeneratedId = generatedId;
}

MSDataSetGenerator using ExecuteScalar() instead of ExecuteNonQuery()

I'm using a strongly-typed dataset (MSDataSetGenerator run on an XSD) to do some of the DB access in a project I'm working on, and I've hit upon a bit of a problem.
As I'm using Identity columns as the primary key of my tables I want to return the newly-generated ID when I call the Insert() method. However, the TableAdapter is being generated with the insert method as follows:
public int Insert(...stuff...)
{
// Sets up the command..
return this.InsertCommand.ExecuteNonQuery();
}
which returns the number of rows affected (i.e. 1).
This is despite the fact that the InsertCommandText is generated as:
INSERT INTO table VALUES (values...);
SELECT Id, ...stuff... FROM table WHERE (Id = SCOPE_IDENTITY());
Which can obviously be used to return the ID by instead doing the following:
public int Insert(...stuff...)
{
// Sets up the command..
return (int)this.InsertCommand.ExecuteScalar();
}
Does anyone know if there's a way to make the MSDataSetGenerator use the ExecuteScalar function as opposed to ExecuteNonQuery()? It seems odd that it would generate an insert command that selects the new data straight after the insert but then doesn't allow you to retrieve that data!
Thanks, Ed

Check the ExecuteMode whether Scalar or NonQuery in the properties of the query.

TSQL: UPDATE with INSERT INTO SELECT FROM

so I have an old database that I'm migrating to a new one. The new one has a slightly different but mostly-compatible schema. Additionally, I want to renumber all tables from zero.
Currently I have been using a tool I wrote that manually retrieves the old record, inserts it into the new database, and updates a v2 ID field in the old database to show its corresponding ID location in the new database.
for example, I'm selecting from MV5.Posts and inserting into MV6.Posts. Upon the insert, I retrieve the ID of the new row in MV6.Posts and update it in the old MV5.Posts.MV6ID field.
Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually? I'm using SQL Server 2005, dev edition.

The key with migration is to do several things:
First, do not do anything without a current backup.
Second, if the keys will be changing, you need to store both the old and new in the new structure at least temporarily (Permanently if the key field is exposed to the users because they may be searching by it to get old records).
Next you need to have a thorough understanding of the relationships to child tables. If you change the key field all related tables must change as well. This is where having both old and new key stored comes in handy. If you forget to change any of them, the data will no longer be correct and will be useless. So this is a critical step.
Pick out some test cases of particularly complex data making sure to include one or more test cases for each related table. Store the existing values in work tables.
To start the migration you insert into the new table using a select from the old table. Depending on the amount of records, you may want to loop through batches (not one record at a time) to improve performance. If the new key is an identity, you simply put the value of the old key in its field and let the database create the new keys.
Then do the same with the related tables. Then use the old key value in the table to update the foreign key fields with something like:
Update t2
set fkfield = newkey
from table2 t2
join table1 t1 on t1.oldkey = t2.fkfield
Test your migration by running the test cases and comparing the data with what you stored from before the migration. It is utterly critical to thoroughly test migration data or you can't be sure the data is consistent with the old structure. Migration is a very complex action; it pays to take your time and do it very methodically and thoroughly.

Probably the simplest way would be to add a column on MV6.Posts for oldId, then insert all the records from the old table into the new table. Last, update the old table matching on oldId in the new table with something like:
UPDATE mv5.posts
SET newid = n.id
FROM mv5.posts o, mv6.posts n
WHERE o.id = n.oldid
You could clean up and drop the oldId column afterwards if you wanted to.

The best you can do that I know is with the output clause. Assuming you have SQL 2005 or 2008.
USE AdventureWorks;
GO
DECLARE #MyTableVar table( ScrapReasonID smallint,
Name varchar(50),
ModifiedDate datetime);
INSERT Production.ScrapReason
OUTPUT INSERTED.ScrapReasonID, INSERTED.Name, INSERTED.ModifiedDate
INTO #MyTableVar
VALUES (N'Operator error', GETDATE());
It still would require a second pass to update the original table; however, it might help make your logic simpler. Do you need to update the source table? You could just store the new id's in a third cross reference table.

Heh. I remember doing this in a migration.
Putting the old_id in the new table makes both the update easier -- you can just do an insert into newtable select ... from oldtable, -- and the subsequent "stitching" of records easier. In the "stitch" you'll either update child tables' foreign keys in the insert, by doing a subselect on the new parent (insert into newchild select ... (select id from new_parent where old_id = oldchild.fk) as fk, ... from oldchild) or you'll insert children and do a separate update to fix the foreign keys.
Doing it in one insert is faster; doing it in a separate step meas that your inserts aren't order dependent, and can be re-done if necessary.
After the migration, you can either drop the old_id columns, or, if you have a case where the legacy system exposed the ids and so users used the keys as data, you can keep them to allow use lookup based on the old_id.
Indeed, if you have the foreign keys correctly defined, you can use systables/information-schema to generate your insert statements.

Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually?
Since you wouldn't want to do it manually, but automatically, create a trigger on MV6.Posts so that UPDATE occurs on MV5.Posts automatically when you insert into MV6.Posts.
And your trigger might look something like,
create trigger trg_MV6Posts
on MV6.Posts
after insert
as
begin
set identity_insert MV5.Posts on
update MV5.Posts
set ID = I.ID
from inserted I
set identity_insert MV5.Posts off
end

AFAIK, you cannot update two different tables with a single sql statement
You can however use triggers to achieve what you want to do.

Make a column in MV6.Post.OldMV5Id
make a
insert into MV6.Post
select .. from MV5.Post
then make an update of MV5.Post.MV6ID

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.