We are creating a client server application using WPF/C# with SQL. Here we are generating a unique number b checking DB(To get the last maximum number) and with that max value, we are increment '1' and storing the value in DB. At this time another user also working on the same screen and creating unique numbers, in some case the the unique numbers gets duplicated and throws exception.
We found this is a concurrency issue.
Indeed, fetching a number out, adding one, and hoping it still isn't in use is a thread-race and a race between multiple clients - and should be avoided.
Options:
use an IDENTITY column in the database, and let the database generate the value itself during INSERT; the database server knows how to do this safely and reliably
if that isn't possible, you might want to delay this code until you are ready to INSERT so it is all part of a single database operation - and even then, if it isn't in a "serializable transaction" (with key-range read locks, etc), then you would have to loop on "get the max, increment, try to insert but note that we might have lost a race, so only insert if the value doesn't exist - which it might; repeat from start if unsuccessful"
alternatively, you could create the new record when you first need the number (even though the rest of the data isn't available), noting that you might still need the "loop until successful" approach
Frankly, the IDENTITY column approach is the simplest.
Finally, We have follwed Singleton pattern with lock to resolver this issue.
Thanks.
Related
I want to emphasize that I'm looking for ideas, not necessarily a concrete answer since it's difficult to show what my queries look like, but I don't believe that's needed.
The process looks like this:
Table A keeps filling up, like a bucket - an SQL job keeps calling SP_Proc1 every minute or less and it inserts multiple records into table A.
At the same time a C# process keeps calling another procedure SP_Proc2 every minute or less that does an ordered TOP 5 select from table A and returns the results to the C# method. After C# code finishes processing the results it deletes the selected 5 records from table A.
I bolded the problematic part above. It is necessary that the records from table A be processed 5 at a time in the order specified, but a few times a month SP_Proc2 selects the ordered TOP 5 records in a wrong order even though all the records are present in table A and have correct column values that are used for ordering.
Something to note:
I'm ordering by integers, not varchar.
The C# part is using 1 thread.
Both SP_Proc1 and SP_Proc2 use a transaction and use READ COMMITTED OR READ COMMITTED SNAPSHOT transaction isolation level.
One column that is used for ordering is a computed value, but a very simple one. It just checks if another column in table A is not null and sets the computed column to either 1 or 0.
There's a unique nonclustered index on primary key Id and a clustered index composed of the same columns used for ordering in SP_Proc2.
I'm using SQL Server 2012 (v11.0.3000)
I'm beginning to think that this might be an SQL bug or maybe the records or index in table A get corrupted and then deleted by the C# process and that's why I can't catch it.
Edit:
To clarify, SP_Proc1 commits a big batch of N records to table A at once and SP_Proc2 pulls the records from table A in batches of 5, it orders the records in the table and selects TOP 5 and sometimes a wrong batch is selected, the batch itself is ordered correctly, but a different batch was supposed to be selected according to ORDER BY. I believe Rob Farley might have the right idea.
My guess is that your “out of order TOP 5” is ordered, but that a later five overlaps. Like, one time you get 1231, 1232, 1233, 1234, and 1236, and the next batch is 1235, 1237, and so on.
This can be an issue with locking and blocking. You’ve indicated your processes use transactions, so it wouldn’t surprise me if your 1235 hasn’t been committed yet, but can just be ignored by your snapshot isolation, and your 1236 can get picked up.
It doesn’t sound like there’s a bug here. What I’m describing above is a definite feature of snapshot isolation. If you must have 1235 picked up in an earlier batch than 1236, then don’t use snapshot isolation, and force your table to be locked until each block of inserts is finished.
An alternative suggestion would be to use a table lock (tablock) for the reading and writing procedures.
Though this is expensive, if you desire absolute consistency then this may be the way to go.
TL&DR: Several concurrent Tasks trying to place identical records into a database; Essentially SEVERAL tasks are being spun up and opening up several files that could be identical.
It is vital to save all the information, in a heavily nested table, based on the IP Address; Here is what I have tried so far in the last 4 days of work (even during Christmas!)
Tried to use a Transaction, within a do while() loop (with context.Rollback(). [Didn't work!]
Tried to Put Random Sleeps within each of the Inserts to Stop Race Condition [Didn't work!]
Made Code no longer Asyc . [Didn't work!]
Current algorithm doesn't work and peg's CPU! [Doesn't work!]\
Seperately add EACH object to the Table individually [Didn't work!]
Each of the Objects increments, during insert. This is why this doesn't make sense. I am at at a loss of words.
Object Relationships
IP has many Incidents;
I think you might have a problem in these lines:
Vendor vendorInstancer = new Vendor();
vendorInstance.IncidentID = IncidentId;
context.Vendors.Add(vendorInstancer);
Note the variable names. You create vendorInstancer but update ID of the vendorInstance. That is, not the entity you're saving to the database. Hard to spot that one letter difference.
There are two data centers with 3 nodes each. I'm doing two simple inserts (very fast back to back) to the same table with a consistency level of local quorum. The table has one partitioning key and no clustering columns.
Sometimes the first insert wins over the second one. The data produced by the first insert statement is what gets saved in the database even though I do an insert right after that.
C# Code
var statement = "Insert Into customer (id,name) Values (1, "foo")";
statement.SetConsistencyLevel(ConsistencyLevel.LocalQuorum);
session.Execute(statement);
Set the timestamp on client. In most new drivers this is done automatically to better ensure order preserved. However older drivers or pre Cassandra 2.1 its not supported and needs to be in query. I dont know what driver or version you are using, but you can also put it in the CQL. Its supported on protocol level though so driver should have better mechanism.
Something like: var statement = "INSERT INTO customer (id,name) VALUES (1, 'foo') USING TIMESTAMP {microsecond timestamp}";
Best approach is to use a monatomic timestamp so that each call is always higher then last (ie use current milliseconds and add a counter). I don't know C# to tell you how to best approach that. Look at https://docs.datastax.com/en/developer/csharp-driver/3.3/features/query-timestamps/#using-a-timestamp-generator
If you don't have a timestamp set it on the mutation, the coordinator will assign it after it parses the query. Since networks and netty queues can do funny things order is not a sure thing, especially as they end up on different nodes that may have some clock drift.
Suppose we have the following situation:
We have 2-3 tables in database with a huge amount of data (let it be 50-100mln of records) and we want to add 2k of new records. But before adding them we need to check our db on duplicates. So if this 2k contains records which we have in our DB we should ignore them. But to find out whether new record is a duplicate or not we need info from both tables (for example we need to make left join).
The idea of solution is: one task or thread create a suitable data for comparison and pushes data into queue (by batches, not record by record), so our queue(or concurrentQueue) is a global variable. The second thread gets batch from queue and look it through. But there's a problem - memory is growing...
How can I clean memory after I've surfed through the batch?
P.S. If smb has another idea how to optimize this process - please describe it...
This is not the specific answer to the question you are asking, because what you are asking, doesn't really make sense to me.
if you are looking to update specific rows:
INSERT INTO tablename (UniqueKey,columnname1, columnname2, etc...)
VALUES (UniqueKeyValue,value1,value2, etc....)
ON DUPLICATE KEY
UPDATE columnname1=value1, columnname2=value2, etc...
If not, simply ignore/remove the update statement.
This would be darn fast, considering, it would use the unique index of whatever field you want to be unique, and just do an insert or update. No need to validate in a separate table or anything.
I have a table and it has one of the attribute set as identity. I want to get the value of the identity attribute that would be generated after I enter a value to the database.
I have EmpTable made of EmpID and EmpName. EmpID is set as Identity. I want to fetch the EmpID value before inserting a new row to the database.
I would advise against trying to do this with a table that is set up to use an integer column as the primary key. You will run into concurrency problems if you simply fetch the previous ID and increment it. Instead you should use a GUID (uniqueidentifier in SQL) as your primary key.
This will allow you to generate a new GUID in your code that can safely be saved to the database at a later stage.
http://msdn.microsoft.com/en-us/library/system.guid.newguid.aspx
http://msdn.microsoft.com/en-us/library/ms187942.aspx
Sure the server knows where the auto-increment count is in its sequence, but there is almost nothing useful you can do with that information. Imagine you go to the Post Office and they hand out numbered tickets so they can serve customers in order. Of course you could ask them what the next number they'll give out is, but since anyone can walk in at any time you don't know you'll get that number. If you don't know that you'll get it, you can't do anything with it - e.g. writing it as a reference number on a form would be a mistake.
Depending on what you're trying to do, your two main options are:
Use a client-generated guid as your identifier. This kind of messes up the order so the analogy isn't great, but imagine if each customer who walked in could generate a random number that they are sure would never have been used before. They could use that to fill out forms before taking a number.
Take a number, but do it in a transaction with the other operations. A customer can take a number and use it to fill out some paperwork. If they realize they left their money at home, they just throw everything away and you never call their number.
Why do you think you need this information? Can you use either of these strategies instead?