Entity Framework 6 and SQL Server Sequences

Entity Framework 6 and SQL Server Sequences - c#

I am using EF6 with a database first project. We have a requirement to use sequences which was a feature introduced in SQL server 2012 (I believe).
On the table the identity column has a default value set using:
(NEXT VALUE FOR [ExhibitIdentity])
This is used as we have two tables which store exhibit information for separate departments but we need the identity to be unique across both of the tables as it is then used as a reference in lots of other shared common tables.
My problem is using this within the Entity Framework, I have googled but couldn't find much information in relation to whether EF6 supports them. I have tried setting StoreGeneratedPatttern in the EFdesigner to Identity but when saving this complains that zero rows were affected as it is using scope_identity to see if the insert succeeded but as we are using sequences this comes back as null.
Setting it to computed throws an error saying I should set it to identity and setting it to none causes it to insert 0 as the id value and fail.
Do I need to call a function/procedure in order to get the next sequence and then assign it to the id value before saving the record?
Any help is much appreciated.

It's clear that you can't escape from this catch-22 by playing with DatabaseGeneratedOptions.
The best option, as you suggested, is to set DatabaseGeneratedOption.None and get the next value from the sequence (e.g. as in this question) right before you save a new record. Then assign it to the Id value, and save. This is concurrency-safe, because you will be the only one drawing that specific value from the sequence (let's assume no one resets the sequence).
However, there is a possible hack...
A bad one, and I should stop here...
EF 6 introduced the command interceptor API. It allows you to manipulate EF's SQL commands and their results before and after the commands are executed. Of course we should not tamper with these commands, should we?
Well... if we look at an insert command that is executed when DatabaseGeneratedOption.Identity is set, we see something like this:
INSERT [dbo].[Person]([Name]) VALUES (#0)
SELECT [Id]
FROM [dbo].[Person]
WHERE ##ROWCOUNT > 0 AND [Id] = scope_identity()
The SELECT command is used to fetch the generated primary key value from the database and set the new object's identity property to this value. This enables EF to use this value in subsequent insert statements that refer to this new object by a foreign key in the same transaction.
When the primary key is generated by a default taking its value from a sequence (as you do) it is evident that there is no scope_identity(). There is however a current value of the sequence, which can be found by a command like
SELECT current_value FROM sys.sequences WHERE name = 'PersonSequence'
If only we could make EF execute this command after the insert instead of scope_identity()!
Well, we can.
First, we have to create a class that implements IDbCommandInterceptor, or inherits from the default implementation DbCommandInterceptor:
using System.Data.Entity.Infrastructure.Interception;
class SequenceReadCommandInterceptor : DbCommandInterceptor
{
public override void ReaderExecuting(DbCommand command
, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
}
}
We add this class to the interception context by the command
DbInterception.Add(new SequenceReadCommandInterceptor());
The ReaderExecuting command runs just before command is executed. If this is an INSERT command with an identity column, its text looks like the command above. Now we could replace the scope_identity() part by the query getting the current sequence value:
command.CommandText = command.CommandText
.Replace("scope_identity()",
"(SELECT current_value FROM sys.sequences
WHERE name = 'PersonSequence')");
Now the command will look like
INSERT [dbo].[Person]([Name]) VALUES (#0)
SELECT [Id]
FROM [dbo].[Person]
WHERE ##ROWCOUNT > 0 AND [Id] =
(SELECT current_value FROM sys.sequences
WHERE name = 'PersonSequence')
And if we run this, the funny thing is: it works. Right after the SaveChanges command the new object has received its persisted Id value.
I really don't think this is production-ready. You'd have to modify the command when it's an insert command, choose the right sequence based on the inserted entity, all by dirty string manipulation in a rather obscure place. And I don't know if with heavy concurrency you will always get the right sequence value back. But who knows, maybe a next version of EF will support this out of the box.

This issue has been already solved:
https://github.com/dotnet/ef6/issues/165
A workaround for this issue was added in 8e38ed8. It requires setting
the following flag in the application:
SqlProviderServices.UseScopeIdentity = false; Here is the description
of the new flag (which includes some caveats worth knowing):
Gets or sets a value indicating whether to use the SCOPE_IDENTITY()
function to retrieve values generated by the database for numeric
columns during an INSERT operation. The default value of true is
recommended and can provide better performance if all numeric values
are generated using IDENTITY columns. If set to false, an OUTPUT
clause will be used instead. An OUTPUT clause makes it possible to
retrieve values generated by sequences or other means.

Related

Map identity value to object after merge statement

I have a table called People with the following schema:
Id INT NOT NULL IDENTITY(1, 1)
FirstName NVARCHAR(64) NOT NULL
LastName NVARCHAR964) NOT NULL
I am using a query like this one to perform inserts and updates in one query:
MERGE INTO People AS TARGET
USING ( VALUES
(#id0, #firstName0, #lastname0),
(#id1, #firstName1, #lastname1)
...
) AS SOURCE ([Id],[FirstName],[LastName])
ON TARGET.[Id] = SOURCE.[Id]
WHEN MATCHED BY TARGET THEN
UPDATE SET
[FirstName] = SOURCE.[FirstName],
[LastName] = SOURCE.[LastName]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([FirstName],[LastName])
VALUES ([FirstName],[LastName])
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT $action, INSERTED.*;
My application is structured such that the client calls back to the server to load the existing state of the app. The client then creates/modifies/deletes entities locally and pushes those changes to the server in one bunch.
Here's an example of what my "SaveEntities" code currently looks like:
public void SavePeople(IEnumerable<Person> people)
{
// Returns the query I mentioned above
var query = GetMergeStatement(people);
using(var command = new SqlCommand(query))
{
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
// how do I tie these records back to
// the objects in the people collection?
}
}
}
}
I can use the value in the $action column to filter down to just INSERTED records. INSERTED.* returns all of the columns in TARGET for the inserted record. The problem is I have no way of distinctly linking those results back to the collection of objects passed into this method.
The only solution I could think of was to add a writable GUID column to the table and allow the MERGE statement to specify that value so I could link back to these objects in code using that and assign the ID value from there, but that seems like it defeats the purpose of having an automatic identity column and feels convoluted.
I'm really curious how this can work because I know Entity Framework does something to mitigate this problem (to be clear, I believe I'd have to same problem were I to be using a pure INSERT statement instead of a MERGE). In EF I can add objects to the model and call Entity.SaveChanges() and have the entity's ID property auto-update using magic. I guess it's that kind of magic I'm looking to understand more.
Also, I know I could structure my saves to insert one record at a time and cascade the changes appropriately (by returning SCOPE_IDENTITY for every insert) but this would be terribly inefficient.

One of the things I love about the MERGE statement is that the source data is in scope in the OUTPUT clause.
OUTPUT $action, SOURCE.Id, INSERTED.Id;
On insert, this will give you three columns: 'INSERT' in the first, the values of #id0 and #id1 in the second, and the matching, newly inserted Id values in the third.
In your C# code, just read the rows as you normally would.
while (reader.Read())
{
string action = reader.GetString(0);
if (action == "INSERT")
{
int oldId = reader.GetInt32(1);
int newId = reader.GetInt32(2);
// Now do what you want with them.
}
}
You can check for "DELETE" and "UPDATE" too, but keep in mind that ordinal 2 will be NULL on "DELETE" so you need to make sure you check for that before calling reader.GetInt32 in that case.
I've used this, in combination with table variables (OUTPUT SOURCE.Id, INSERTED.Id INTO #PersonMap ([OldId], [NewId])), to copy hierarchies 4 and 5 tables deep, all with identity columns.

How to properly reserve identity values for usage in a database?

We have some code in which we need to maintain our own identity (PK) column in SQL. We have a table in which we bulk insert data, but we add data to related tables before the bulk insert is done, thus we can not use an IDENTITY column and find out the value up front.
The current code is selecting the MAX value of the field and incrementing it by 1. Although there is a highly unlikely chance that two instances of our application will be running at the same time, it is still not thread-safe (not to mention that it goes to the database everytime).
I am using the ADO.net entity model. How would I go about 'reserving' a range of id's to use, and when that range runs out, grab a new block to use, and guarantee that the same range will not be used.

use more universal unique identifier data type like UNIQUEIDENTIFIER (UUID) instead of INTEGER. In this case you can basically create it on the client side, pass it to the SQL and do not have to worry about it. The disadvantage is that, of course, the size of this field.
create a simple table in the database CREATE TABLE ID_GEN (ID INTEGER IDENTITY), and use this as a factory to give you the identifiers. Ideally you would create a stored procedure (or function), to which you would pass the number of identifiers you need. The stored procedure will then insert this number of rows (empty) into this ID_GEN table and will return you all new ID's, which you can use in your code. Obviously, your original tables will not have the IDENTITY anymore.
create your own variation of the ID_Factory above.
I would choose simplicity (UUID) if you are not constrained otherwise.

If it's viable to change the structure of the table, then perhaps use a uniqueidentifier for the PK instead along with newid() [SQL] or Guid.NewGuid() [C#] in your row generation code.
From Guid.NewGuid() doco:
There is a very low probability that the value of the new Guid is all zeroes or equal to any other Guid.

Why are you using ADO.net Entity Framework to do what sounds like ETL work? (See critique of ADO.NET Entity Framework and ORM in general below. It is rant free).
Why use ints at all? Using a uniqueidentifier would solve the "multiple instances of the application running" issue.
Using a uniqueidentifier as a column default will be slower than using an int IDENTITY... it takes more time to generate a guid than an int. A guid will also be larger (16 byte) than an int (4 bytes). Try this first and if it results in acceptable performance, run with it.
If the delay introduced by generating a guid on each row insert it unacceptable, create guids in bulk (or on another server) and cache them in a table.
Sample TSQL code:
CREATE TABLE testinsert
(
date_generated datetime NOT NULL DEFAULT GETDATE(),
guid uniqueidentifier NOT NULL,
TheValue nvarchar(255) NULL
)
GO
CREATE TABLE guids
(
guid uniqueidentifier NOT NULL DEFAULT newid(),
used bit NOT NULL DEFAULT 0,
date_generated datetime NOT NULL DEFAULT GETDATE(),
date_used datetime NULL
)
GO
CREATE PROCEDURE GetGuid
#guid uniqueidentifier OUTPUT
AS
BEGIN
SET NOCOUNT ON
DECLARE #return int = 0
BEGIN TRY
BEGIN TRANSACTION
SELECT TOP 1 #guid = guid FROM guids WHERE used = 0
IF #guid IS NOT NULL
UPDATE guids
SET
used = 1,
date_used = GETDATE()
WHERE guid = #guid
ELSE
BEGIN
SET #return = -1
PRINT 'GetGuid Error: No Unused guids are available'
END
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SET #return = ERROR_NUMBER() -- some error occurred
SET #guid = NULL
PRINT 'GetGuid Error: ' + CAST(ERROR_NUMBER() as varchar) + CHAR(13) + CHAR(10) + ERROR_MESSAGE()
ROLLBACK
END CATCH
RETURN #return
END
GO
CREATE PROCEDURE InsertIntoTestInsert
#TheValue nvarchar(255)
AS
BEGIN
SET NOCOUNT ON
DECLARE #return int = 0
DECLARE #guid uniqueidentifier
DECLARE #getguid_return int
EXEC #getguid_return = GetGuid #guid OUTPUT
IF #getguid_return = 0
BEGIN
INSERT INTO testinsert(guid, TheValue) VALUES (#guid, #TheValue)
END
ELSE
SET #return = -1
RETURN #return
END
GO
-- generate the guids
INSERT INTO guids(used) VALUES (0)
INSERT INTO guids(used) VALUES (0)
--Insert data through the stored proc
EXEC InsertIntoTestInsert N'Foo 1'
EXEC InsertIntoTestInsert N'Foo 2'
EXEC InsertIntoTestInsert N'Foo 3' -- will fail, only two guids were created
-- look at the inserted data
SELECT * FROM testinsert
-- look at the guids table
SELECT * FROM guids
The fun question is... how do you map this to ADO.Net's Entity Framework?
This is a classic problem that started in the early days of ORM (Object Relational Mapping).
If you use relational-database best practices (never allow direct access to base tables, only allow data manipulation through views and stored procedures), then you add headcount (someone capable and willing to write not only the database schema, but also all the views and stored procedures that form the API) and introduce delay (the time to actually write this stuff) to the project.
So everyone cuts this and people write queries directly against a normalized database, which they don't understand... thus the need for ORM, in this case, the ADO.NET Entity Framework.
ORM scares the heck out of me. I've seen ORM tools generate horribly inefficient queries which bring otherwise performant database servers to their knees. What was gained in programmer productivity was lost in end-user waiting and DBA frustration.

The Hi/Lo algorithm may be of interest to you:
What's the Hi/Lo algorithm?

Two clients could reserve the same block of id's.
There is no solution short of serializing your inserts by locking.
See Locking Hints in MSDN.

I fyou have a lot of child tables you might not want to change the PK. PLus the integer filedsa relikely to perform better in joins. But you could still add a GUID field and populate it in the bulk insert with pre-generated values. Then you could leave the identity insert alone (almost alawys a bad idea to turn it off) and use the GUID values you pre-generated to get back the Identity values you just inserted for the insert into child tables.
If you use a regular set-based insert (one with the select clause instead of the values clause) instead of a bulk insert, you could use the output clause to get the identities back for the rows if you are using SQL Server 2008.

The most general solution is generate client identifiers that never across with database identifiers - usually it is negative values, then update identifiers with identifier generated by database on inserting.
This way is safe to use in application with many users inserts the data simultaneously. Any other ways except GUIDs are not multiuser-safe.
But if you have that rare case when entity's primary key is required to be known before entity is saved to database, and it is impossible to use GUID, you may use identifier generation algorithm which are prevent identifier overlapping.
The most simple is assigning a unique identifier prefix for each connected client, and prepend it to each identifier generated by this client.
If you are using ADO.NET Entity Framework, you probably should not worry about identifier generation: EF generates identifiers by itself, just mark primary key of the entity as IsDbGenerated=true.
Strictly saying, entity framework as other ORM does not require identifier for objects are not saved to database yet, it is enought object reference for correctly operating with new entities. Actual primary key value is required only on updating/deleting entity, and on updating/deleting/inserting entity that references new entity, e.i. in cases when actual primary key value is about to be written in database. If entity is new, it is impossible to save other entites that are referenced new entity until new entity is not saved to database, and ORMs maintains specific order of entities saving which take references map into account.

How to get the primary key from a table without making a second trip?

How would I get the primary key ID number from a Table without making a second trip to the database in LINQ To SQL?
Right now, I submit the data to a table, and make another trip to figure out what id was assigned to the new field (in an auto increment id field). I want to do this in LINQ To SQL and not in Raw SQL (I no longer use Raw SQL).
Also, second part of my question is: I am always careful to know the ID of a user that's online because I'd rather call their information in various tables using their ID as opposed to using a GUID or a username, which are all long strings. I do this because I think that SQL Server doing a numeric compare is much (?) more efficient than doing a username (string) or even a guid (very long string) compare. My questions is, am I more concerned than I should be? Is the difference worth always keeping the userid (int32) in say, session state?
#RedFilter provided some interesting/promising leads for the first question, because I am at this stage unable to try them, if anyone knows or can confirm these changes that he recommended in the comments section of his answer?

If you have a reference to the object, you can just use that reference and call the primary key after you call db.SubmitChanges(). The LINQ object will automatically update its (Identifier) primary key field to reflect the new one assigned to it via SQL Server.
Example (vb.net):
Dim db As New NorthwindDataContext
Dim prod As New Product
prod.ProductName = "cheese!"
db.Products.InsertOnSubmit(prod)
db.SubmitChanges()
MessageBox.Show(prod.ProductID)
You could probably include the above code in a function and return the ProductID (or equivalent primary key) and use it somewhere else.
EDIT: If you are not doing atomic updates, you could add each new product to a separate Collection and iterate through it after you call SubmitChanges. I wish LINQ provided a 'database sneak peek' like a dataset would.

Unless you are doing something out of the ordinary, you should not need to do anything extra to retrieve the primary key that is generated.
When you call SubmitChanges on your Linq-to-SQL datacontext, it automatically updates the primary key values for your objects.
Regarding your second question - there may be a small performance improvement by doing a scan on a numeric field as opposed to something like varchar() but you will see much better performance either way by ensuring that you have the correct columns in your database indexed. And, with SQL Server if you create a primary key using an identity column, it will by default have a clustered index over it.

Linq to SQL automatically sets the identity value of your class with the ID generated when you insert a new record. Just access the property. I don't know if it uses a separate query for this or not, having never used it, but it is not unusual for ORMs to require another query to get back the last inserted ID.
Two ways you can do this independent of Linq To SQL (that may work with it):
1) If you are using SQL Server 2005 or higher, you can use the OUTPUT clause:
Returns information from, or
expressions based on, each row
affected by an INSERT, UPDATE, or
DELETE statement. These results can be
returned to the processing application
for use in such things as confirmation
messages, archiving, and other such
application requirements.
Alternatively, results can be inserted
into a table or table variable.
2) Alternately, you can construct a batch INSERT statement like this:
insert into MyTable
(field1)
values
('xxx');
select scope_identity();
which works at least as far back as SQL Server 2000.

In T-SQL, you could use the OUTPUT clause, saying:
INSERT table (columns...)
OUTPUT inserted.ID
SELECT columns...
So if you can configure LINQ to use that construct for doing inserts, then you can probably get it back easily. But whether LINQ can get a value back from an insert, I'll let someone else answer that.

Calling a stored procedure from LINQ that returns the ID as an output parameter is probably the easiest approach.

Insert a Row Only if a Row does not Exist

I am building a hit counter. I have an article directory and tracking unique visitors. When a visitor comes i insert the article id and their IP address in the database. First I check to see if the ip exists for the article id, if the ip does not exist I make the insert. This is two queries -- is there a way to make this one query
Also, I am not using stored procedures I am using regular inline sql

Here are some options:
INSERT IGNORE INTO `yourTable`
SET `yourField` = 'yourValue',
`yourOtherField` = 'yourOtherValue';
from MySQL reference manual: "If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted.".) If the record doesn't yet exist, it will be created.
Another option would be:
INSERT INTO yourTable (yourfield,yourOtherField) VALUES ('yourValue','yourOtherValue')
ON DUPLICATE KEY UPDATE yourField = yourField;
Doesn't throw error or warning.

Yes, you create a UNIQUE constraint on the columns article_id and ip_address. When you attempt to INSERT a duplicate the INSERT will be refused with an error. Just answered the same question here for SQLite.

IF NOT EXISTS (SELECT * FROM MyTable where IPAddress...)
INSERT...

Not with SQL Server. With T-SQL you have to check for the existence of a row, then use either INSERT or UPDATE as appropriate.
Another option is to try UPDATE first, and then examine the row count to see if there was a record updated. If not, then INSERT. Given a 50/50 chance of a row being there, you have executed a single query 50% of the time.
MySQL has a extension called REPLACE that has the capability that you seek.

The only way I can think of is execute dynamic SQL using the SqlCommand object.
IF EXISTS(SELECT 1 FROM IPTable where IpAddr=<ipaddr>)
--Insert Statement

I agree with Larry about using uniqueness, but I would implement it like this:
IP_ADDRESS, pk
ARTICLE_ID, pk, fk
This ensures that a record is unique hit. Attempts to insert duplicates would get an error from the database.

I would really use procedures! :)
But either way, this will probably work:
Create a UNIQUE index for both the IP and article ID columns, the insert query will fail if they already exist, so technically it'll work! (tested on mysql)

try this (it's a real kludge, but it should work...):
Insert TableName ([column list])
Select Distinct #PK, #valueA, #ValueB, etc. -- list all values to be inserted
From TableName
Where Not Exists
(Select * From TableName
Where PK == #PK)

Using SqlServer uniqueidentifier/updated date columns with Linq to Sql - Best Approach

Rightly or wrongly, I am using unique identifier as a Primary Key for tables in my sqlserver database. I have generated a model using linq to sql (c#), however where in the case of an identity column linq to sql generates a unique key on inserting a new record for guid /uniqueidentifier the default value of 00000000-0000-0000-0000-000000000000.
I know that I can set the guid in my code: in the linq to sql model or elsewhere, or there is the default value in creating the sql server table (though this is overridden by the value generated in the code). But where is best to put generate this key, noting that my tables are always going to change as my solution develops and therefore I shall regenerate my Linq to Sql model when it does.
Does the same solution apply for a column to hold current datetime (of the insert), which would be updated with each update?

As you noted in you own post you can use the extensibility methods. Adding to your post you can look at the partial methods created in the datacontext for inserting and updating of each table. Example with a table called "test" and a "changeDate"-column:
partial void InsertTest(Test instance)
{
instance.idCol = System.Guid.NewGuid();
this.ExecuteDynamicInsert(instance);
}
partial void UpdateTest(Test instance)
{
instance.changeDate = DateTime.Now;
this.ExecuteDynamicUpdate(instance);
}

Thanks, I've tried this out and it seems to work OK.
I have another approach, which I think I shall use for guids: sqlserver default value to newid(), then in linqtosql set auto generated value property to true. This has to be done on each generation of the model, but this is fairly simple.

There's two things you can do:
either just generate the GUID in your C# client side code and use that value
create a DEFAULT constraint on the GUID column in SQL Server that defaults to newid() for the column - the SQL Server will make SURE to always add a default - unless you specify a value yourself
As for the self-updating date/time columns - here you probably have to use either client-side logic to do that, or if you want to do it on SQL Server, you'll have to write a trigger. That's really the only way to update a specific column everytime the row gets updated - there's no "autoupdate" constraint or anything like this (and the default constraint only work on INSERTs, not updates).
Something like this might work:
CREATE TRIGGER TRG_UpdateDateTimestamp
ON (your table name)
AFTER UPDATE
AS
IF NOT UPDATE(DateTimeStamp)
BEGIN
UPDATE (yourtablename)
SET DateTimeStamp = GETDATE()
WHERE EXISTS (SELECT * FROM inserted AS i
WHERE i.OID = (yourtable).OID)
END
Marc

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.