How do I structure this transaction?

How do I structure this transaction? - c#

We have an ASP.NET/MSSQL based web app which generates orders with sequential order numbers.
When a user saves a form, a new order is created as follows:
SELECT MAX(order_number) FROM order_table, call this max_order_number
set new_order_number = max_order_number + 1
INSERT a new order record, with this new_order_number (it's just a field in the order record, not a database key)
If I enclose the above 3 steps in single transaction, will it avoid duplicate order numbers from being created, if two customers save a new order at the same time? (And let's say the system is eventually on a web farm with multiple IIS servers and one MSSQL server).
I want to avoid two customers selecting the same MAX(order_number) due to concurrency somewhere in the system.
What isolation level should be used? Thank you.

Why not just use an Identity as the order number?
Edit:
As far as I know, you can make the current order_number column an Identity (you may have to reset the seed, it's been a while since I've done this). You might want to do some tests.
Here's a good read about what actually goes on when you change a column to an Identity in SSMS. The author mentions how this may take a while if the table already has millions of rows.

Using an identity is by far the best idea. I create all my tables like this:
CREATE TABLE mytable (
mytable_id int identity(1, 1) not null primary key,
name varchar(50)
)
The "identity" flag means, "Let SQL Server assign this number for me". The (1, 1) means that identity numbers should start at 1 and be incremented by 1 each time someone inserts a record into the table. Not Null means that nobody should be allowed to insert a null into this column, and "primary key" means that we should create a clustered index on this column. With this kind of a table, you can then insert your record like this:
-- We don't need to insert into mytable_id column; SQL Server does it for us!
INSERT INTO mytable (name) VALUES ('Bob Roberts')
But to answer your literal question, I can give a lesson about how transactions work. It's certainly possible, although not optimal, to do this:
-- Begin a transaction - this means everything within this region will be
-- executed atomically, meaning that nothing else can interfere.
BEGIN TRANSACTION
DECLARE #id bigint
-- Retrieves the maximum order number from the table
SELECT #id = MAX(order_number) FROM order_table
-- While you are in this transaction, no other queries can change the order table,
-- so this insert statement is guaranteed to succeed
INSERT INTO order_table (order_number) VALUES (#id + 1)
-- Committing the transaction releases your lock and allows other programs
-- to work on the order table
COMMIT TRANSACTION
Just keep in mind that declaring your table with an identity primary key column does this all for you automatically.

The risk is two processes selecting the MAX(order_number) before one of them inserts the new order. A safer way is to do it in one step:
INSERT INTO order_table
(order_number, /* other fields */)
VALUES
( (SELECT MAX(order_number)+1 FROM order_table ) order_number,
/* other values */
)

I agree with G_M; use an Identity field. When you add your record, just
INSERT INTO order_table (/* other fields */)
VALUES (/* other fields */) ; SELECT SCOPE_IDENTITY()
The return value from Scope Identity will be your order number.

Related

Explain Code First CRUD auto-generated SQL for Identity column

Code-first auto generates an insert procedure code as below for a table that has ProductID as primary key (identity column).
CREATE PROCEDURE [dbo].[InsertProducts]
#ProductName [nvarchar](max),
#Date [datetime],
AS
BEGIN
INSERT dbo.ProductsTable([ProductName], [Date])
VALUES (#ProductName, #Date)
-- identity stuff starts here
DECLARE #ProductID int
SELECT #ProductID = [ProductID]
FROM dbo.FIT_StorageLocations
WHERE ##ROWCOUNT > 0 AND [ProductID] = scope_identity()
SELECT t0.[ProductID]
FROM dbo.ProductsTable AS t0
WHERE ##ROWCOUNT > 0 AND t0.[ProductID] = #ProductID
END
GO
Could you please explain the code that handles the identity column? Also, if an insert procedure is to be manually written from scratch, would it be handled differently?
If for example I would remove this auto generated code, I would encounter one of the following errors:
Procedure ....expects parameter '#ProductID', which was not supplied
Store update, insert, or delete statement affected an unexpected number of rows (0). Entities may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=472540 for information on understanding and handling optimistic concurrency exceptions.
In the app, this is how I call the procedure which works fine until I try to mess with the code first auto generated SQL:
using (var db = new AppContext())
{
var record = new ProductObj()
{
ProductName= this.ProductName,
Date = DateTime.UtcNow
};
db.ProductDbSet.Add(record);
db.SaveChanges();
}

I guess there are two things to be explained here.
Why a SELECT statement when I insert stuff?
Let's first see what a regular insert by Entity Framework looks like. By "regular" I mean an insert without mapping CUD actions to stored procedures. The normal pattern is:
INSERT [dbo].[Product]([Name], ...)
VALUES (#0, ...)
SELECT [Id]
FROM [dbo].[Product]
WHERE ##ROWCOUNT > 0 AND [Id] = scope_identity()
So the INSERT is followed by a SELECT. This is because EF needs to know the identity value that the database assigns to the new Product to assign it to the entity object's Product.ProductId property and to track the entity. If for some reason you'd decide to do an update immediately after the insert, EF will be able to generate an update statement like UPDATE ... WHERE Id = #0.
When the insert is handled by a stored procedure, the sproc should return the new Id value in a way that looks like the regular insert. It expects to receive a one-column result set of which the column is named after the identity column. It should contain one row, the new identity value.
So that's why there is a SELECT statement in there, and why EF complains if you remove it. But, you might ask, does EF really need 7 lines of code to get an assigned identity value?
Why so much code?
Honestly, I have to speculate a bit here, because it isn't documented as far as I can find. But let's look at a minimal working version:
INSERT [dbo].[Products]([Name])
VALUES (#Name)
SELECT scope_identity() AS ProductId;
This does the job. It's even the standard example of many tutorials, including official ones, on mapping CUD actions to stored procedures.
But a database can be stuffed with triggers, constraints, defaults, etc. It's hard to predict their influence on the returned scope_identity() under the wide range of circumstances EF may encounter. So EF wants to guarantee that the returned value really belongs to the newly inserted record. And that a record has actually been inserted in the first place. That's why it adds the SELECT from the Product table, including the ##ROWCOUNT.
To implement these safeguards, a minimal version would be:
INSERT [dbo].[Products]([Name])
VALUES (#Name)
SELECT t0.[ProductId]
FROM [dbo].[Products] AS t0
WHERE ##ROWCOUNT > 0 AND t0.[ProductId] = scope_identity()
Same as in the regular insert.
That's as far as I can follow EF. It puzzles me a bit that this single SELECT apparently is enough for a regular INSERT but not for a stored procedure. I can't explain why there are two SELECTs in the generated code.

Tracking who makes changes to SQL Server within C# application or SQL Server

I'm converting an application from Access to SQL Server 2014. One of the capabilities of this tool is to allow users to create ad-hoc SQL queries to modify delete or add data to a number of tables.
Right now in Access there is no tracking of who does what so if something gets messed up on accident, there is no way to know who it was or when it happened (it has happened enough times that it is a serious issue and one of many reasons the tool is being rewritten).
The application I'm writing is a Windows application in C#. I'm looking for ANY and all suggestions on ways this can be done without putting a huge demand on the server (processing or space). Since the users are creating their own queries I can't just add a column for user name and date (also that would only track the most recent change).
We don't need to keep the old data or even identifying exactly what was changed. Just who changed data and when they did. I want to be able to look at something (view, table or even separate database) that shows me a list of users that made a change and when they did it.

You haven't specified the SQL Server Version, anyway if you have a version >= 2008 R2 you can use Extended Events to monitor your system.
On stackoverflow you can read my answer to similar problem

You can consider to use triggers and a log table, this will work on all SQL Servers. Triggers are a bit more expensive that CDC, but if your users already are updating directly on your tables, this should not be a problem. I think this also will depend on how many tables you want to log.
I will provide you with a simple example for logging the users that has changed a table, or several tables (just add the trigger to the tables):
CREATE TABLE UserTableChangeLog
(
ChangeID INT PRIMARY KEY IDENTITY(1,1)
, TableName VARCHAR(128) NOT NULL
, SystemUser VARCHAR(256) NOT NULL DEFAULT SYSTEM_USER
, ChangeDate DATETIME NOT NULL DEFAULT GETDATE()
)
GO
CREATE TABLE TestTable
(
ID INT IDENTITY(1,1)
, Test VARCHAR(255)
)
GO
--This sql can be added for multiple tables, just change the trigger name, and the table name
CREATE TRIGGER TRG_TABLENAME_Log ON TestTable
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
SET NOCOUNT ON;
--Can be used to get type of change, and wich data that was altered.
--SELECT * FROM INSERTED;
--SELECT * FROM DELETED;
DECLARE #tableName VARCHAR(255) = (SELECT OBJECT_NAME( parent_id ) FROM sys.triggers WHERE object_id = ##PROCID);
INSERT INTO UserTableChangeLog (TableName) VALUES (#tableName);
END
GO
This is how it will work:
INSERT INTO TestTable VALUES ('1001');
INSERT INTO TestTable VALUES ('2002');
INSERT INTO TestTable VALUES ('3003');
GO
UPDATE dbo.TestTable SET Test = '4004' WHERE ID = 2
GO
SELECT * FROM UserTableChangeLog

How can i safely allocate a list of ID numbers in a multi-user environment

I have a list of ID numbers provided by a third party. for the moment lets assume that the numbers are not in any order or incremental algorithm. How can i safely allocate a number upon request when there is a possibility of multiple people attempting the same thing at the same time? everything i have thought of i can also come up with a condition where it would fail.
I am using c# and sql server for this.
EG: add numbers to a table with an auto generated identity column, then match that identity to a table with the list of id numbers - no good because there exists the possibility of the need for reallocation of an id number, and the identity column may get out of sync with the id table for whatever reason.

I've always seen it done similar to this: Generate the ID first (probably guid), mark your record (locks the table so you can handle multiple simultaneous requests), then pull the data.
DECLARE #ID UniqueIdentifier
SET #ID = NEWID()
--ProductTable contains third party ID information
UPDATE TOP(1) ProductTable
SET Assigned = 1,
ID = #ID
WHERE Assigned <> 1
SELECT
ThirdPartyID
FROM ProductTable
WHERE ID = #ID
You could also generate your guid in C# if you don't want to do that in SQL
Here is an easier way with which you could use an identity column instead of a GUID, as Bertrand noted it would be a smaller storage cost. This one assumes that you have loaded the third party numbers into a table with ID as an identity column
DECLARE #ID INT
UPDATE TOP(1) ProductTable
SET Assigned = 1,
#ID = ID
WHERE Assigned <> 1
SELECT
ThirdPartyID
FROM ProductTable
WHERE ID = #ID

Well if I get you the simplest approach, a Table
ThirdPartyID (PK)
UserID (FK) (or session ?)
TimeAllocated DateTime
Then stored procs to allocate and deallocate
Use the Time column to look for ones that have been allocated longer than is sensible due to some failure. If you are using sessions and you are happy to release the 3rd party Id on it being deleted then a cascade when you tidy up dead sessions would do the job instead.
Making this fool prooof as in it will never fail can get very expensive, in fact given a finite list of IDs, you could run out, so not possible at all.
Think to work one is you don't habnd out more than one id to a user, or an id to more than one user. You could in theory get a key violation but a retry should sort that in almost all circumstances.
Start simple, see what happens.

How can I Insert/Update into two related tables in one command?

A database exists with two tables
Data_t : DataID Primary Key that is
Identity 1,1. Also has another field
'LEFT' TINYINT
Data_Link_t : DataID PK and FK where
DataID MUST exist in Data_t. Also has another field 'RIGHT' SMALLINT
Coming from a microsoft access environment into C# and sql server I'm looking for a good method of importing a record into this relationship.
The record contains information that belongs on both sides of this join (Possibly inserting/updating upwards 5000 records at once). Bonus to process the entire batch in some kind of LINQ list type command but even if this is done record by record the key goal is that BOTH sides of this record should be processed in the same step.
There are countless approaches and I'm looking at too many to determine which way I should go so I thought faster to ask the general public. Is LINQ an option for inserting/updating a big list like this with LINQ to SQL? Should I go record by record? What approach should I use to add a record to normalized tables that when joined create the full record?

Sounds like a case where I'd write a small stored proc and call that from C# - e.g. as a function on my Linq-to-SQL data context object.
Something like:
CREATE PROCEDURE dbo.InsertData(#Left TINYINT, #Right SMALLINT)
AS BEGIN
DECLARE #DataID INT
INSERT INTO dbo.Data_t(Left) VALUES(#Left)
SELECT #DataID = SCOPE_IDENTITY();
INSERT INTO dbo.Data_Link_T(DataID, Right) VALUES(#DataID, #Right)
END
If you import that into your data context, you could call this something like:
using(YourDataContext ctx = new YourDataContext)
{
foreach(YourObjectType obj in YourListOfObjects)
{
ctx.InsertData(obj.Left, obj.Right)
}
}
and let the stored proc handle all the rest (all the details, like determining and using the IDENTITY from the first table in the second one) for you.

I have never tried it myself, but you might be able to do exactly what you are asking for by creating an updateable view and then inserting records into the view.
UPDATE
I just tried it, and it doesn't look like it will work.
Msg 4405, Level 16, State 1, Line 1
View or function 'Data_t_and_Data_Link_t' is not updatable because the modification affects multiple base tables.
I guess this is just one more thing for all the Relational Database Theory purists to hate about SQL Server.
ANOTHER UPDATE
Further research has found a way to do it. It can be done with a view and an "instead of" trigger.
create table Data_t
(
DataID int not null identity primary key,
[LEFT] tinyint,
)
GO
create table Data_Link_t
(
DataID int not null primary key foreign key references Data_T (DataID),
[RIGHT] smallint,
)
GO
create view Data_t_and_Data_Link_t
as
select
d.DataID,
d.[LEFT],
dl.[RIGHT]
from
Data_t d
inner join Data_Link_t dl on dl.DataID = d.DataID
GO
create trigger trgInsData_t_and_Data_Link_t on Data_t_and_Data_Link_T
instead of insert
as
insert into Data_t ([LEFT]) select [LEFT] from inserted
insert into Data_Link_t (DataID, [RIGHT]) select ##IDENTITY, [RIGHT] from inserted
go
insert into Data_t_and_Data_Link_t ([LEFT],[RIGHT]) values (1, 2)

How to properly reserve identity values for usage in a database?

We have some code in which we need to maintain our own identity (PK) column in SQL. We have a table in which we bulk insert data, but we add data to related tables before the bulk insert is done, thus we can not use an IDENTITY column and find out the value up front.
The current code is selecting the MAX value of the field and incrementing it by 1. Although there is a highly unlikely chance that two instances of our application will be running at the same time, it is still not thread-safe (not to mention that it goes to the database everytime).
I am using the ADO.net entity model. How would I go about 'reserving' a range of id's to use, and when that range runs out, grab a new block to use, and guarantee that the same range will not be used.

use more universal unique identifier data type like UNIQUEIDENTIFIER (UUID) instead of INTEGER. In this case you can basically create it on the client side, pass it to the SQL and do not have to worry about it. The disadvantage is that, of course, the size of this field.
create a simple table in the database CREATE TABLE ID_GEN (ID INTEGER IDENTITY), and use this as a factory to give you the identifiers. Ideally you would create a stored procedure (or function), to which you would pass the number of identifiers you need. The stored procedure will then insert this number of rows (empty) into this ID_GEN table and will return you all new ID's, which you can use in your code. Obviously, your original tables will not have the IDENTITY anymore.
create your own variation of the ID_Factory above.
I would choose simplicity (UUID) if you are not constrained otherwise.

If it's viable to change the structure of the table, then perhaps use a uniqueidentifier for the PK instead along with newid() [SQL] or Guid.NewGuid() [C#] in your row generation code.
From Guid.NewGuid() doco:
There is a very low probability that the value of the new Guid is all zeroes or equal to any other Guid.

Why are you using ADO.net Entity Framework to do what sounds like ETL work? (See critique of ADO.NET Entity Framework and ORM in general below. It is rant free).
Why use ints at all? Using a uniqueidentifier would solve the "multiple instances of the application running" issue.
Using a uniqueidentifier as a column default will be slower than using an int IDENTITY... it takes more time to generate a guid than an int. A guid will also be larger (16 byte) than an int (4 bytes). Try this first and if it results in acceptable performance, run with it.
If the delay introduced by generating a guid on each row insert it unacceptable, create guids in bulk (or on another server) and cache them in a table.
Sample TSQL code:
CREATE TABLE testinsert
(
date_generated datetime NOT NULL DEFAULT GETDATE(),
guid uniqueidentifier NOT NULL,
TheValue nvarchar(255) NULL
)
GO
CREATE TABLE guids
(
guid uniqueidentifier NOT NULL DEFAULT newid(),
used bit NOT NULL DEFAULT 0,
date_generated datetime NOT NULL DEFAULT GETDATE(),
date_used datetime NULL
)
GO
CREATE PROCEDURE GetGuid
#guid uniqueidentifier OUTPUT
AS
BEGIN
SET NOCOUNT ON
DECLARE #return int = 0
BEGIN TRY
BEGIN TRANSACTION
SELECT TOP 1 #guid = guid FROM guids WHERE used = 0
IF #guid IS NOT NULL
UPDATE guids
SET
used = 1,
date_used = GETDATE()
WHERE guid = #guid
ELSE
BEGIN
SET #return = -1
PRINT 'GetGuid Error: No Unused guids are available'
END
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SET #return = ERROR_NUMBER() -- some error occurred
SET #guid = NULL
PRINT 'GetGuid Error: ' + CAST(ERROR_NUMBER() as varchar) + CHAR(13) + CHAR(10) + ERROR_MESSAGE()
ROLLBACK
END CATCH
RETURN #return
END
GO
CREATE PROCEDURE InsertIntoTestInsert
#TheValue nvarchar(255)
AS
BEGIN
SET NOCOUNT ON
DECLARE #return int = 0
DECLARE #guid uniqueidentifier
DECLARE #getguid_return int
EXEC #getguid_return = GetGuid #guid OUTPUT
IF #getguid_return = 0
BEGIN
INSERT INTO testinsert(guid, TheValue) VALUES (#guid, #TheValue)
END
ELSE
SET #return = -1
RETURN #return
END
GO
-- generate the guids
INSERT INTO guids(used) VALUES (0)
INSERT INTO guids(used) VALUES (0)
--Insert data through the stored proc
EXEC InsertIntoTestInsert N'Foo 1'
EXEC InsertIntoTestInsert N'Foo 2'
EXEC InsertIntoTestInsert N'Foo 3' -- will fail, only two guids were created
-- look at the inserted data
SELECT * FROM testinsert
-- look at the guids table
SELECT * FROM guids
The fun question is... how do you map this to ADO.Net's Entity Framework?
This is a classic problem that started in the early days of ORM (Object Relational Mapping).
If you use relational-database best practices (never allow direct access to base tables, only allow data manipulation through views and stored procedures), then you add headcount (someone capable and willing to write not only the database schema, but also all the views and stored procedures that form the API) and introduce delay (the time to actually write this stuff) to the project.
So everyone cuts this and people write queries directly against a normalized database, which they don't understand... thus the need for ORM, in this case, the ADO.NET Entity Framework.
ORM scares the heck out of me. I've seen ORM tools generate horribly inefficient queries which bring otherwise performant database servers to their knees. What was gained in programmer productivity was lost in end-user waiting and DBA frustration.

The Hi/Lo algorithm may be of interest to you:
What's the Hi/Lo algorithm?

Two clients could reserve the same block of id's.
There is no solution short of serializing your inserts by locking.
See Locking Hints in MSDN.

I fyou have a lot of child tables you might not want to change the PK. PLus the integer filedsa relikely to perform better in joins. But you could still add a GUID field and populate it in the bulk insert with pre-generated values. Then you could leave the identity insert alone (almost alawys a bad idea to turn it off) and use the GUID values you pre-generated to get back the Identity values you just inserted for the insert into child tables.
If you use a regular set-based insert (one with the select clause instead of the values clause) instead of a bulk insert, you could use the output clause to get the identities back for the rows if you are using SQL Server 2008.

The most general solution is generate client identifiers that never across with database identifiers - usually it is negative values, then update identifiers with identifier generated by database on inserting.
This way is safe to use in application with many users inserts the data simultaneously. Any other ways except GUIDs are not multiuser-safe.
But if you have that rare case when entity's primary key is required to be known before entity is saved to database, and it is impossible to use GUID, you may use identifier generation algorithm which are prevent identifier overlapping.
The most simple is assigning a unique identifier prefix for each connected client, and prepend it to each identifier generated by this client.
If you are using ADO.NET Entity Framework, you probably should not worry about identifier generation: EF generates identifiers by itself, just mark primary key of the entity as IsDbGenerated=true.
Strictly saying, entity framework as other ORM does not require identifier for objects are not saved to database yet, it is enought object reference for correctly operating with new entities. Actual primary key value is required only on updating/deleting entity, and on updating/deleting/inserting entity that references new entity, e.i. in cases when actual primary key value is about to be written in database. If entity is new, it is impossible to save other entites that are referenced new entity until new entity is not saved to database, and ORMs maintains specific order of entities saving which take references map into account.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.