Parallel Inserts/Updates in a SQL Server table - c#

I have a multithread environment and every thread wants to select a row (or insert it if it does not exist) in a table and increment something in it.
Basically, every thread does something like this :
using (var context = new Entity.DBContext()) {
if(!context.MyTable.Any(...)) {
var obj = new MyTable() {
SomeValue = 0
};
context.MyTable.Add(obj)
}
var row = context.MyTable.SingleOrDefault(...);
row.SomeValue += 1;
context.SaveChanges();
}
Problem in a example : a specific row has SomeValue = 0.
Two thread select this specific row at the same time, they both see 0.
-> they both increment it one time, and the final result in SomeValue will be 1, but we want it to be 2.
I assume that the thread that arrives just after the other should wait (using a lock ?) for the first one to be over. But i can't make it work properly.
Thanks.

Assuming SQL Server, you can do something like this:
create table T1 (
Key1 int not null,
Key2 int not null,
Cnt int not null
)
go
create procedure P1
#Key1 int,
#Key2 int
as
merge into T1 WITH (HOLDLOCK) t
using (select #Key1 k1,#Key2 k2) s
on
t.Key1 = s.k1 and
t.Key2 = s.k2
when matched then update set Cnt = Cnt + 1
when not matched then insert (Key1,Key2,Cnt) values (s.k1,s.k2,0)
output inserted.Key1,inserted.Key2,inserted.Cnt;
go
exec P1 1,5
go
exec P1 1,5
go
exec P1 1,3
go
exec P1 1,5
go
(Note, it doesn't have to be a procedure, and I'm just calling it from one thread to show how it works)
Results:
Key1 Key2 Cnt
----------- ----------- -----------
1 5 0
Key1 Key2 Cnt
----------- ----------- -----------
1 5 1
Key1 Key2 Cnt
----------- ----------- -----------
1 3 0
Key1 Key2 Cnt
----------- ----------- -----------
1 5 2
Even with multiple threads calling this, I believe that it should serialize access. I'm producing outputs just to show that each caller can also know what value they've set the counter to (here, the column Cnt), even if another caller immediately afterwards changes the value.

If only one process writes to the database at the same time, you could wrap your code in a C# lock(obj) {} statement. That limits you to one active query which will not make optimal use of the database, but if that's okay it's a simple solution.
Another option is to create a unique index on the columns that define whether the row already exists. If you insert over that, you'll get a duplicate key exception. You can catch that in C# and run an update instead.
If you can write raw SQL, you can use locking hints, for example with (updlock, holdlock), or set isolation level serializable. That probably gives you the best performance at the cost of complexity.

Related

Batching data from a temp table

I create a temp table, insert data (and output) and select. I need to batch them and read.
Example: If I have 110 records,and I want to read a batch of 50 records each time, then I will have read 1-50 the first time, 51-100 the second time and 100-110 the third time.
The queries are as follows
create table #mytable
{
customerID int not null
}
insert into #mytable(id)
Output inserted.CustomerID
select top 50 customerID from customer where customerID not in (select c.customerID from #mytable c)
In C#, I my code is like this
do {
getCustomerInformation() //this method does the query processing
} while(customerInfo.Any());
this query works when I run it on SQL server but not in C#. Keeps returning the first 50 rows all the time.
simple use this query ,
1) First time #PageNumber parameter will be 1 ,
2nd time will be 2 and third time send 3
DECLARE #PageNumber AS INT, #RowspPage AS INT
SET #PageNumber = 1 -- 2 , 3
SET #RowspPage = 50
SELECT customerID FROM (
SELECT ROW_NUMBER() OVER(ORDER BY id) AS Numero,
customerID FROM customer
) AS TBL
WHERE Numero BETWEEN ((#PageNumber - 1) * #RowspPage + 1) AND (#PageNumber * #RowspPage)
ORDER BY customerID
In SSMS, you have your temp table, and you check the IDs for your batch by checking the temp table.
Temp tables normally only live as long as the session. So if in C# you initialize a new session each time you call the method, your session will not see the temp table from the previous session.
You can get around this using ##two hashes, or just create an actual staging table.
There are ways ..
From SQL Server 2012, we can use OFFSET and FETCH NEXT Clause to achieve the pagination.
Try this:
2 For all version of sql post 2005 we can achieve the same with CTE and passing Param more detals
You can achieve paging in C# using the Take() and Skip() methods provided by LINQ:
const int batchSize = 50;
do {
var currentBatch = customerInfo.Take(batchSize);
customerInfo = customerInfo.Skip(batchSize);
} while(customerInfo.Any());

Correctly generating a unique invoice id

I've been asked to clean up someone else's controller code, which generates an invoice, and I've run into something I don't know how to fix. The code in question is as follows (this is using EF 6: Code First):
var invid = db.TransportJobInvoice.Where(c => c.CompanyId == CompanyId)
.Max(i => i.InvoiceId);
var invoiceId = invid == null ? 1 : (int)invid + 1;
The code is supposed to generate an invoiceId based on the company the invoice is being created for. So a small table of this might look as follows:
------------------------------
| Id | CompanyId | InvoiceId |
------------------------------
| 1 | 1 | 1 |
------------------------------
| 2 | 1 | 2 |
------------------------------
| 3 | 1 | 3 |
------------------------------
| 4 | 2 | 1 |
------------------------------
| 5 | 2 | 2 |
------------------------------
As you can see, the invoiceId would be generated based on the current number of invoices for the company in question. However, I think it's reasonable to suggest that two threads could execute the query before this line is evaluated:
var invoiceId = invid == null ? 1 : (int)invid + 1;
which would result in the same invoiceId being generated for two different invoices.
Is there a simple solution to this, possibly leveraging Entity Framework to do this automatically?
I suggest using the identity for the primary key, very important!
I would then add a column for "CustomerInvoiceID" and put a compound unique key on CustomerID and CustomerInvoiceID".
Then, create a stored procedure that will populate the field CustomerInvoiceID after it has been inserted, here is some pseudo code:
CREATE PROCEDURE usp_PopulateCustomerInvoiceID
#PrimaryKey INT, --this is your primary key identity column
#CustomerID INT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #cnt INT;
SELECT #CNT = COUNT(1)
FROM TBL
WHERE CustomerID = #CustomerID
AND PrimaryKeyColumn <= #PrimaryKey
UPDATE tbl
SET CustomerInvoiceID = #cnt + 1
WHERE PrimaryKeyColumn = #PrimaryKey
END
Two possibilities:
Server-side: Don't compute the max(ID)+1 on the client. Instead, as part of the INSERT statement, compute the max(ID)+1, via an INSERT..SELECT statement.
Client-side: Instead of an incrementing int, generate a GUID on the client, and use that as your InvoiceID.
A rather different approach would be to create a separate table with the NextId for each CustomerId. As new customers are added you would add a new row to this table. It has the advantage that the numbers assigned to invoices can remain unique even if you allow deleting invoices.
create procedure GetInvoiceIdForCustomer
#CustomerId as Int,
#InvoiceId as Int Output
as
begin
set nocount on
begin transaction
update CustomerInvoiceNumbers
set #InvoiceId = NextId, NextId += 1
where CustomerId = #CustomerId
if ##RowCount = 0
begin
set #InvoiceId = 1
insert into CustomerInvoiceNumbers ( CustomerId, NextId ) values ( #CustomerId, #InvoiceId + 1 )
end
commit transaction
end
end
If you use an Identity field in SQL Server, this will be handled automatically.
I don't know if you can make the invoice id auto generated unless it's beinng threated as a foreign key (which I think it isn't).
You problem with multiple threads could be solved using a lock statement.
lock (myLock)
{
var invid = db.TransportJobInvoice.Where(c => c.CompanyId == CompanyId)
.Max(i => i.InvoiceId);
var invoiceId = invid == null ? 1 : (int)invid + 1;
}
This will guarantee that only thread is executing these statements.
Be careful though, this could cause performance issues when those statements are executed alot in parallel and the query takes some significant time to execute.

Stored Procedure for date ranges in a single column

Finding a solution to an issue in my project
I have stages associated with contracts. That is, a contract can be in either Active stage, Process stage or Terminated stage.
I need to get the no the days the contract was in each stage.
For example, if a contract C1 was in Active stage from 20/10/2013 to 22/10/2013, then in the Process stage from 22/10/2013 to 25/10/2013 and finally in Terminated stage from 25/10/2013 to 26/10/2013 and then again in Active from 26/10/2013 to 28/10/2013, then I should get as result
Active = 4days
Process = 3days
Terminated = 1day /likewise something
My table is created with these columns:
EntryId (primary key)
StageId (foreign key to Stage table)
ContractId (foreign key to contract table)
DateofStageChange
How to do this in SQL Server?
As asked pls find the table entries:
EntryID | Stage ID | Contract ID | DateChange
1 | A1 | C1 |20/10/2013
2 | P1 | C1 |22/10/2013
3 | T1 | C1 |25/10/2013
4 | A1 | C1 |26/10/2013
5 | P1 | C1 |28/10/2013
6 | T1 | C1 |Null(currently in this stage)
Need to use group by on Stage ID
it is important to check and make sure how data is populated in your table.Based on just your sample data and also note that if your entryid is not in sequence then you can create one sequence using row_number.
declare #t table(EntryId int identity(1,1), StageId int,ContractId varchar(10),DateofStageChange date)
insert into #t values
(1,'C1','2013-10-20'),(1,'C1','2013-10-22'),(2,'C1','2013-10-22'),(2,'C1','2013-10-25')
,(3,'C1','2013-10-25'),(3,'C1','2013-10-26'),(1,'C1','2013-10-26'),(1,'C1','2013-10-28')
Select StageId,sum([noOfDays]) [totalNofDays] from
(select a.StageId,a.ContractId,a.DateofStageChange [Fromdate],b.DateofStageChange [ToDate]
,datediff(day,a.DateofStageChange,b.DateofStageChange) [noOfDays]
from #t a
inner join #t b on a.StageId=b.StageId and b.EntryId-a.EntryId=1)t4
group by StageId
You can't with your current structure.
You can get the latest one by doing datediff(d, getdate(), DateOfStageChange)
but you don't have any history so you can't get previous status
This can be done in SQL with CTE.
You didnt provide your tablenames, so you'll need to change where I've indicated below, but it would look like this:
;WITH cte
AS (
SELECT
DateofStageChange, StageID, ContractID,
ROW_NUMBER() OVER (ORDER BY ContractID, StageId, DateofStageChange) AS RowNum
FROM
DateOfStageChangeTable //<==== Change this table name
)
SELECT
a.ContractId,
a.StageId,
Coalesce(sum(DATEDIFF(d ,b.DateofStageChange,a.DateofStageChange)), 'CurrentState`) as Days
FROM
cte AS A
LEFT OUTER JOIN
cte AS B
ON A.RowNum = B.RowNum + 1 and a.StageId = b.StageId and a.ContractId = b.ContractId
group by a.StageId, a.ContractId
This really is just a self join that creates a row number on a table, orders the table by StageID and date and then joins to itself. The first date on the first row of the stage id and date, joins to the second date on the second row, then the daterange is calculated in days.
This assumes that you only have 2 dates for each stage, if you have several, you would just need to do a min and max on the cte table.
EDIT:
Based on your sample data, the above query should work well. Let me know if you get any syntax errors and I'll fix them.
I added a coalesce to indicate the state they are currently in.

Best way to avoid adding duplicates in database

I have a SQL Server table with three columns:
Table1
col1 int
col2 int
col3 string
I have a unique constraint defined for all three columns (col1, col2, col3)
Now, I have a .csv file from which I want to add records in this table and the *.csv file can have duplicate records.
I have searched for various options for avoiding duplicates in above scenario. Below are the three options which are working well for me. Please have a look and throw some ideas on pros/cons of each method so I can choose the best one.
Option#1 :
Avoiding duplicates in the first place i.e. while adding objects to the list from csv file. I have used HashSet<T> for this and overridden below methods for type T:
public override int GetHashCode()
{
return col1.GetHashCode() + col2.GetHashCode() + col3.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as T;
if (other == null)
{
return false;
}
return col1 == other.col1
&& col2 == other.col2
&& col3 == other.col3;
}
option #2
Having List<T> instead of HashSet<T>.
Removing duplicates after all the objects are added to List<T>
List<T> distinctObjects = allObjects
.GroupBy(x => new {x.col1, x.col2, x.col3})
.Select(x => x.First()).ToList();
option #3
Removing duplicates after all the objects are added to DataTable.
public static DataTable RemoveDuplicatesRows(DataTable dataTable)
{
IEnumerable<DataRow> uniqueRows = dataTable.AsEnumerable().Distinct(DataRowComparer.Default);
DataTable dataTable2 = uniqueRows.CopyToDataTable();
return dataTable2;
}
Although I have not compared their running time, but I prefer option#1 as I am removing duplicates as a first step - so moving ahead only with what is required.
Please share your views so I can choose the best one.
Thanks a lot!
I like option 1: the HashSet<T> provides a fast way of avoiding duplicates before ever sending them to the DB. You should implement a better GetHashCode, e.g. using Skeet's implementation from What is the best algorithm for an overridden System.Object.GetHashCode?
But there's a problem: what if the table already contains data that can be a duplicate of your CSV? You'd have to copy the whole table down first for a simple HashSet to really work. You could do just that, but to solve this, I might pair option 1 with a temporary table and an insert statement like Skip-over/ignore duplicate rows on insert's:
INSERT dbo.Table1(col1, col2, col3)
SELECT col1, col2, col3
FROM dbo.tmp_holding_Table1 AS t
WHERE NOT EXISTS (SELECT 1 FROM dbo.Table1 AS d
WHERE col1 = t.col1
AND col2 = t.col2
AND col3 = t.col3);
With this combination, the volume of data transferred to/from your DB is minimized.
Another solution could be the IGNORE_DUP_KEY = { ON | OFF } option when creating / rebuilding an index. This solution will prevent getting errors with inserting duplicate rows. Instead, SQL Server will generate warnings: Duplicate key was ignored..
CREATE TABLE dbo.MyTable (Col1 INT, Col2 INT, Col3 INT);
GO
CREATE UNIQUE INDEX IUN_MyTable_Col1_Col2_Col3
ON dbo.MyTable (Col1,Col2,Col3)
WITH (IGNORE_DUP_KEY = ON);
GO
INSERT dbo.MyTable (Col1,Col2,Col3)
VALUES (1,11,111);
INSERT dbo.MyTable (Col1,Col2,Col3)
SELECT 1,11,111 UNION ALL
SELECT 2,22,222 UNION ALL
SELECT 3,33,333;
INSERT dbo.MyTable (Col1,Col2,Col3)
SELECT 2,22,222 UNION ALL
SELECT 3,33,333;
GO
/*
(1 row(s) affected)
(2 row(s) affected)
Duplicate key was ignored.
*/
SELECT * FROM dbo.MyTable;
/*
Col1 Col2 Col3
----------- ----------- -----------
1 11 111
2 22 222
3 33 333
*/
Note: Because you have an UNIQUE constraint if you try to change index options with ALTER INDEX
ALTER INDEX IUN_MyTable_Col1_Col2_Col3
ON dbo.MyTable
REBUILD WITH (IGNORE_DUP_KEY = ON)
you will get following error:
Msg 1979, Level 16, State 1, Line 1
Cannot use index option ignore_dup_key to alter index 'IUN_MyTable_Col1_Col2_Col3' as it enforces a primary or unique constraint.`
So, if you choose this solution the options are:
1) Create another UNIQUE index and to drop the UNIQUE constraint (this option will require more storage space but will be a UNIQUE index/constraint active all time) or
2) Drop the UNIQUE constraint and create an UNIQUE index with WITH (IGNORE_DUP_KEY = ON) option (I wouldn't recommend this last option).

Auto-incrementing a number that is part of a string value in a SQL Server database

How can I auto-increment a number that is part of a string value in a SQL Server database?
For example, here is my table:
EMP_ID EMPNAME EMPSECTION
EMP_1 ROSE S-11
EMP_2 JANE R-11
When I add a new record, what I would like to do is automatically increment the number that follows EMP_. For example, EMP_3, EMP_4, etc.
one option is to have a table that has an autoincrement id field. Then you can write a trigger on this table that on insert, fires an insert on the autoincrement table and fetches the current value. Then concat that value on to the end of EMP_
By C# It's very to do , each time you want to insert a new row before inserting that row you should generate the key by following these steps :
1- get a list of your ID field
2- Do a for each loop to find tha maximum key value , something like this :
int maxID=1;
for each(var l in list)
{
if(int.Parse(l.ID.Replace("EMP_",""))>maxID)
{
maxID=int.Parse(l.ID.Replace("EMP_",""));
}
}
maxID=maxID+1;
string ID="EMP_"+maxID.Tostring();
And ID is your new ID !
but if your application is accessed by multiple programs (example : consider It's a website) I really don't suggest you to do something like this cause : 1. It's time consuming 2. In some condition same key value from multiple clients might be generated and you will have error while inserting .
You can have identity column in your table and display 'EMP_' appended to its value in your user interface. If you want to do it custom way, you'll need a sequence table
Create a sequence table
Sequence
-------------------
Seq_Name | Seq_Val
-------------------
EMPLOYEE | 0
Then you need a Stored Procedure to perform this
BEGIN
declare #curVal int
Select #curVal = Seq_Val+1 From Sequence Where Seq_Name='EMPLOYEE'
UPDATE Sequence SET Seq_Val = Seq_Val+1 Where Seq_Name='EMPLOYEE'
Insert into Employee Values ('EMP_'+Cast(#curVal As Varchar), 'Rose', 'S-11')
END
You can do something like:
create table dbo.foo
(
id int not null identity(1,1) , -- actual primary key
.
.
.
formatted_id as 'emp_' + convert(varchar,id) , -- surrogate/alternate key
constraint foo_PK primary key ( id ) ,
constraint foo_AK01 unique ( formatted_id ) ,
)
But I can't for the life of me think of just why one might want to do that.

Categories