I have a table with sequential numbers (think invoice numbers or student IDs).
At some point, the user needs to request the previous number (in order to calculate the next number). Once the user knows the current number, they need to generate the next number and add it to the table.
My worry is that two users will be able to erroneously generate two identical numbers due to concurrent access.
I've heard of stored procedures, and I know that that might be one solution. Is there a best-practice here, to avoid concurrency issues?
Edit: Here's what I have so far:
USE [master]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_GetNextOrderNumber]
AS
BEGIN
BEGIN TRAN
DECLARE #recentYear INT
DECLARE #recentMonth INT
DECLARE #recentSequenceNum INT
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- get the most recent numbers
SELECT #recentYear = Year, #recentMonth = Month, #recentSequenceNum = OrderSequenceNumber
FROM dbo.OrderNumbers
WITH (XLOCK)
WHERE Id = (SELECT MAX(Id) FROM dbo.OrderNumbers)
// increment the numbers
IF (YEAR(getDate()) > IsNull(#recentYear,0))
BEGIN
SET #recentYear = YEAR(getDate());
SET #recentMonth = MONTH(getDate());
SET #recentSequenceNum = 0;
END
ELSE
BEGIN
IF (MONTH(getDate()) > IsNull(#recentMonth,0))
BEGIN
SET #recentMonth = MONTH(getDate());
SET #recentSequenceNum = 0;
END
ELSE
SET #recentSequenceNum = #recentSequenceNum + 1;
END
-- insert the new numbers as a new record
INSERT INTO dbo.OrderNumbers(Year, Month, OrderSequenceNumber)
VALUES (#recentYear, #recentMonth, #recentSequenceNum)
COMMIT TRAN
END
This seems to work, and gives me the values I want. So far, I have not yet added any locking to prevent concurrent access.
Edit 2: Added WITH(XLOCK) to lock the table until the transaction completes. I'm not going for performance here. As long as I don't get duplicate entries added, and deadlocks don't happen, this should work.
you know that SQL Server does that for you, right? You can you a identity column if you need sequential number or a calculated column if you need to calculate the new value based on another one.
But, if that doesn't solve your problem, or if you need to do a complicated calculation to generate your new number that cant be done in a simple insert, I suggest writing a stored procedure that locks the table, gets the last value, generate the new one, inserts it and then unlocks the table.
Read this link to learn about transaction isolation level
just make sure to keep the "locking" period as small as possible
Here is a sample Counter implementation. Basic idea is to use insert trigger to update numbers of lets say, invoices. First step is to create a table to hold a value of last assigned number:
create table [Counter]
(
LastNumber int
)
and initialize it with single row:
insert into [Counter] values(0)
Sample invoice table:
create table invoices
(
InvoiceID int identity primary key,
Number varchar(8),
InvoiceDate datetime
)
Stored procedure LastNumber first updates Counter row and then retrieves the value. As the value is an int, it is simply returned as procedure return value; otherwise an output column would be required. Procedure takes as a parameter number of next numbers to fetch; output is last number.
create proc LastNumber (#NumberOfNextNumbers int = 1)
as
begin
declare #LastNumber int
update [Counter]
set LastNumber = LastNumber + #NumberOfNextNumbers -- Holds update lock
select #LastNumber = LastNumber
from [Counter]
return #LastNumber
end
Trigger on Invoice table gets number of simultaneously inserted invoices, asks next n numbers from stored procedure and updates invoices with that numbers.
create trigger InvoiceNumberTrigger on Invoices
after insert
as
set NoCount ON
declare #InvoiceID int
declare #LastNumber int
declare #RowsAffected int
select #RowsAffected = count(*)
from Inserted
exec #LastNumber = dbo.LastNumber #RowsAffected
update Invoices
-- Year/month parts of number are missing
set Number = right ('000' + ltrim(str(#LastNumber - rowNumber)), 3)
from Invoices
inner join
( select InvoiceID,
row_number () over (order by InvoiceID desc) - 1 rowNumber
from Inserted
) insertedRows
on Invoices.InvoiceID = InsertedRows.InvoiceID
In case of a rollback there will be no gaps left. Counter table could be easily expanded with keys for different sequences; in this case, a date valid-until might be nice because you might prepare this table beforehand and let LastNumber worry about selecting the counter for current year/month.
Example of usage:
insert into invoices (invoiceDate) values(GETDATE())
As number column's value is autogenerated, one should re-read it. I believe that EF has provisions for that.
The way that we handle this in SQL Server is by using the UPDLOCK table hint within a single transaction.
For example:
INSERT
INTO MyTable (
MyNumber ,
MyField1 )
SELECT IsNull(MAX(MyNumber), 0) + 1 ,
"Test"
FROM MyTable WITH (UPDLOCK)
It's not pretty, but since we were provided the database design and cannot change it due to legacy applications accessing the database, this was the best solution that we could come up with.
Related
I have a huge table (log) which keeps some history data. It has more than 10 columns:
Id, Year, Month, Day, data1, data2, data3, ......
Because the table is huge, it has lots of index on it.
The system keeps inserting lots of new data into this table. However, because of the way the system works, sometimes, duplidated data will be inserted (only id is different). The duplications' id (id only) are also inserted into another table (log_existing). We have another service which will delete the duplications in both tables. Here is what we are doing now.
SET #TotalRows = 0;
SET #Rows = 0;
WHILE 1=1
BEGIN
DECLARE #Ids TABLE (id BIGINT);
INSERT INTO #Ids
SELECT TOP (#BatchSize) Id
FROM Log
DELETE FROM Log WHERE Id IN (SELECT id FROM #Ids)
DELETE FROM Log_Existing WHERE Id IN (SELECT id FROM #Ids)
SET #Rows = ##ROWCOUNT
IF(#Rows < #BatchSize)
BEGIN
BREAK;
END
SET #TotalRows = #TotalRows + #Rows
IF(#TotalRows >= #DeleteSize)
BEGIN
BREAK;
END
SET #Rows = 0;
END
Basically, the service runs every 2 minutes (or 5 minutes, configurable) to run this batch delete. The #BatchSize = 2000 and #DeleteSize = 1000000, which usually runs more than 2/5 minutes.
It works ok for some time. But now we realize that there are too many dupliactions, this process can not delete duplications fast enough. So, the database size grows larger and larger, and process is slower and slower.
Is there a way to make it faster? or some kind of guideline?
Thanks
I would try to avoid inserting duplicates into the Log table. From your description this should be possible including some of the columns which make an entry unique (besides the Id).
One option is using the IGNORE_DUP_KEY option on an unique index. When such an index is existing and the INSERT statement tries to insert a row that violates the index's unique constraint the INSERT will be ignored. See Microsoft SQL Server Help.
CREATE TABLE #Test (C1 nvarchar(10), C2 nvarchar(50), C3 datetime);
GO
CREATE UNIQUE INDEX AK_Index ON #Test (C2)
WITH (IGNORE_DUP_KEY = ON);
GO
INSERT INTO #Test VALUES (N'OC', N'Ounces', GETDATE());
INSERT INTO #Test SELECT * FROM Production.UnitMeasure;
GO
SELECT COUNT(*)AS [Number of rows] FROM #Test;
GO
DROP TABLE #Test;
GO
I think if you use delete statement with the JOIN clause something like this. It should do better.
DELETE Log, Log_Existing FROM Log, Log_Existing
WHERE Log.LOGID=Log_Existing.LOGID
I am working on a C# .NET (& TSQL) based project which requires generation of configurable serial tracking numbers which depend on the data it contains.
Background
To elaborate, the software is used to track vehicles entering a factory. Each vehicle is assigned an RFID card and a tracking number. The tracking number is based on first of many warehouses to which the truck is headed. The admin can configure the format of the numbers for each warehouse. This is typically as follows: #YYYYMMDD-XX002211 where XX is a prefix unique to each warehouse. This number restarts from 1 whenever the month changes.
At the time of master (vehicle + RFID card details) creation, the tracking number is null. Once the first line (warehouse details) is added the system generates a tracking number and assigns it to the master. So, until the time a line is added, the tracking number field can have duplicates i.e. many records with null.
The problem
The code to generate the tracking number is written in C# and uses SQL transactions, however there have been infrequent collisions in the tracking which causes the system to be unstable. I did more research into TSQL Transactions to find the Isolation Level settings however I have not been successful with them.
Solution?
I have tried the following approaches:
Check in the master table for a possible duplicate before updating
Use Repeatable Read isolation level to ensure data isn't updated during the transaction execution
Should I?
Move the whole thing to a Stored Procedure?
Start a daemon which is responsible for the number generation, ensuring sequence by a single threaded execution
Thanks for your help!
In case of simplicity, I would suggest you to get a single Stored Procedure to update and get the sequence number, then do other formatting with C#
CREATE PROCEDURE GetSequenceOfWareHouse
#WareHouse char(2)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #Result table (Sequence int)
BEGIN Tran
UPDATE SequenceTable SET Sequence = Sequence + 1
OUTPUT inserted.Sequence INTO #Result
WHERE WareHouse = #WareHouse AND Month = YEAR(GETDATE()) * 100 + MONTH(GETDATE())
IF NOT EXISTS(SELECT * FROM #Result)
INSERT SequenceTable (Sequence, WareHouse, Month)
OUTPUT inserted.Sequence INTO #Result
VALUES (1, #WareHouse, YEAR(GETDATE()) * 100 + MONTH(GETDATE()))
COMMIT Tran
SELECT Sequence FROM #Result
END
GO
CREATE Table SequenceTable
(
Sequence int,
WareHouse char(2),
Month int,
unique (Sequence, WareHouse, Month)
)
We have an ASP.NET/MSSQL based web app which generates orders with sequential order numbers.
When a user saves a form, a new order is created as follows:
SELECT MAX(order_number) FROM order_table, call this max_order_number
set new_order_number = max_order_number + 1
INSERT a new order record, with this new_order_number (it's just a field in the order record, not a database key)
If I enclose the above 3 steps in single transaction, will it avoid duplicate order numbers from being created, if two customers save a new order at the same time? (And let's say the system is eventually on a web farm with multiple IIS servers and one MSSQL server).
I want to avoid two customers selecting the same MAX(order_number) due to concurrency somewhere in the system.
What isolation level should be used? Thank you.
Why not just use an Identity as the order number?
Edit:
As far as I know, you can make the current order_number column an Identity (you may have to reset the seed, it's been a while since I've done this). You might want to do some tests.
Here's a good read about what actually goes on when you change a column to an Identity in SSMS. The author mentions how this may take a while if the table already has millions of rows.
Using an identity is by far the best idea. I create all my tables like this:
CREATE TABLE mytable (
mytable_id int identity(1, 1) not null primary key,
name varchar(50)
)
The "identity" flag means, "Let SQL Server assign this number for me". The (1, 1) means that identity numbers should start at 1 and be incremented by 1 each time someone inserts a record into the table. Not Null means that nobody should be allowed to insert a null into this column, and "primary key" means that we should create a clustered index on this column. With this kind of a table, you can then insert your record like this:
-- We don't need to insert into mytable_id column; SQL Server does it for us!
INSERT INTO mytable (name) VALUES ('Bob Roberts')
But to answer your literal question, I can give a lesson about how transactions work. It's certainly possible, although not optimal, to do this:
-- Begin a transaction - this means everything within this region will be
-- executed atomically, meaning that nothing else can interfere.
BEGIN TRANSACTION
DECLARE #id bigint
-- Retrieves the maximum order number from the table
SELECT #id = MAX(order_number) FROM order_table
-- While you are in this transaction, no other queries can change the order table,
-- so this insert statement is guaranteed to succeed
INSERT INTO order_table (order_number) VALUES (#id + 1)
-- Committing the transaction releases your lock and allows other programs
-- to work on the order table
COMMIT TRANSACTION
Just keep in mind that declaring your table with an identity primary key column does this all for you automatically.
The risk is two processes selecting the MAX(order_number) before one of them inserts the new order. A safer way is to do it in one step:
INSERT INTO order_table
(order_number, /* other fields */)
VALUES
( (SELECT MAX(order_number)+1 FROM order_table ) order_number,
/* other values */
)
I agree with G_M; use an Identity field. When you add your record, just
INSERT INTO order_table (/* other fields */)
VALUES (/* other fields */) ; SELECT SCOPE_IDENTITY()
The return value from Scope Identity will be your order number.
I have a clean up process that needs to delete around 8 million rows in a table each day (sometimes more). This process is written in C#, and uses SMO to query the schema for table indexes, disabling them before executing an sproc that deletes in batches of 500K rows.
My problem is that the entire operation is living inside a transaction. The sproc is executed inside a TransactionScope that's configured with the TransactionScopeOption.Suppress (this runs along other things that each start a new TransactionScope), which I thought would not allow a Transaction, and there are explicit commit points inside the sproc.
The C# part of the process could be summarized as this:
try {
DisableIndexes(table);
CleanTable(table);
}
finally {
RebuildIndexes(table);
}
And the sproc has a loop inside that's basically:
DECLARE #rowCount bigint = 1
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
WHILE #rowCount <> 0 BEGIN
DELETE TOP (#rowsPerBatch) Table
WHERE
ID <= #maxID
SET #rowCount = ##rowcount
END
Last night this process timed out half an hour after it started, took half an hour to rollback, and another half hour of index rebuilding...too much down time for zero work...=(
Update: I've run the process on a small sample database (and with a small timeout), and it's not as I thought. Evidently the process is correctly removing rows and making progress as I want. Still, the log is getting consumed. As I'm in the SIMPLE database mode, shouldn't the log not grow in this case? Or is the deletion sproc so 'fast', that I'm not giving the process that actually deletes rows the time it needs to keep the log clean?
Depending on your circumstances you may be able to use partitioned views instead of partitioned tables. Something like:
create table A
(
X int,
Y int,
Z varchar(300),
primary key (X,Y),
check (X = 1)
)
create table B
(
X int,
Y int,
Z varchar(300),
primary key (X,Y),
check (X = 2)
)
create table C
(
X int,
Y int,
Z varchar(300),
primary key (X,Y),
check (X = 3)
)
go
create view ABC
as
select * from A
union all
select * from B
union all
select * from C
go
insert abc (x,y,z)
values (1,4,'test')
insert abc (x,y,z)
values (2,99,'test'), (3,123,'test')
insert abc (x,y,z)
values (3,15125,'test')
select * from abc
truncate table c
select * from abc
If I understand your problem correctly, what you want is an automatic sliding window and partitioned tables.
I have some C#/Linq code used to merge data from excel file into db, which needs better performance.
There are
1. A List read from excel file: List<Score> newScoreList
2. A DB table named Scores, primary keys peopleId and testDate
I need to merge data from the list to the table, and if there is any duplicate data, update it.
My current solution is:
1) Find the duplicate data with this LINQ expression:
var dupliData =
from newScore in newScoreList
from oldScore in db.Scores
where newScore.peopleId == oldScore.peopleId && newScore.testDate == oldScore.testDate
select oldScore;
2) Delete the duplicate data.
db.Scores.DeleteAllOnSubmit(dupliData);
3) Insert the new data from list.
db.Scores.InsertAllOnSubmit(newScoreList);
Could anybody give me a better solution?
I really hate stored procedures in general, but this is probably a perfect case for using one. My TSQL is rusty, but this should give an idea.
CREATE PROCEDURE dbo.InsertOrUpdateScore
(
#id as Int,
#date as DateTime,
#result as varchar(20)
)
AS
if not exists(SELECT id FROM Scores WHERE id = #id AND date = #date)
begin
INSERT INTO Scores (id, date, result) values (#id, #date, #result)
end
else
begin
UPDATE Scores
SET result = #result
WHERE id = #id AND date = #date
end
GO
Now in your LINQ server browser, select the Score entity, and change its INSERT and UPDATE behaviour to use the stored procedure you just created. Make sure the user accessing the database has EXECUTE permission to the SPROC.
This should perform quite a bit quicker than your version. You're trading an IN clause for N SELECTs on an index which may be quicker. However, the result set of the IN clause is not transported back to the client over the network, which could save quite a bit of time.
Profile exactly how long your method is taking before implementing this, so you can gauge if this is truly quicker.
I'm not sure if this is the only way to create a Score in your application, but you might want to consider the case where you're INSERTing a record that doesn't yet have an ID. You'll need to modify the SPROC to allow #id as null, and handle the INSERT appropriately.
Then it should just be:
db.Scores.InsertAllOnSubmit(newScoreList);
If you are using SQL 2008 you can use the Merge command
http://www.builderau.com.au/program/sqlserver/soa/Using-SQL-Server-2008-s-MERGE-statement/0,339028455,339283059,00.htm