Concurrent reading and updating in a database table - c#

I have an Oracle database that I access using Devart and Entity Framework.
There's a table called IMPORTJOBS with a column STATUS.
I also have multiple processes running at the same time. They each read the first row in IMPORTJOBS that has status 'REGISTERED', put it to status 'EXECUTING', and if done put it to status 'EXECUTED'.
Now because these processes are running in parallel, I believe the following could happen:
process A reads row 10 which has status REGISTERED,
process B also reads row 10 which has still status REGISTERED,
process A updates row 10 to status EXECUTING.
Process B should not be able to read row 10 as process A already read it and is going to update its status.
How should I solve this? Put read and update in a transaction? Or should I use some versioning approach or something else?
Thanks!
EDIT: thanks to the accepted answer I got it working and documented it here: http://ludwigstuyck.wordpress.com/2013/02/28/concurrent-reading-and-writing-in-an-oracle-database.

You should use the built-in locking mechanisms of the database. Don't reinvent the wheel, especially since RDBMS are designed to deal with concurrency and consistency.
In Oracle 11g, I suggest you use the SKIP LOCKED feature. For example each process could call a function like this (assuming id are number):
CREATE OR REPLACE TYPE tab_number IS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION reserve_jobs RETURN tab_number IS
CURSOR c IS
SELECT id FROM IMPORTJOBS WHERE STATUS = 'REGISTERED'
FOR UPDATE SKIP LOCKED;
l_result tab_number := tab_number();
l_id number;
BEGIN
OPEN c;
FOR i IN 1..10 LOOP
FETCH c INTO l_id;
EXIT WHEN c%NOTFOUND;
l_result.extend;
l_result(l_result.size) := l_id;
END LOOP;
CLOSE c;
RETURN l_result;
END;
This will return 10 rows (if possible) that are not locked. These rows will be locked and the sessions will not block each other.
In 10g and before since Oracle returns consistent results, use FOR UPDATE wisely and you should not have the problem that you describe. For instance consider the following SELECT:
SELECT *
FROM IMPORTJOBS
WHERE STATUS = 'REGISTERED'
AND rownum <= 10
FOR UPDATE;
What would happen if all processes reserve their rows with this SELECT? How will that affect your scenario:
Session A gets 10 rows that are not processed.
Session B would get the same 10 rows, is blocked and waits for session A.
Session A updates the selected rows' statuses and commits its transaction.
Oracle will now (automatically) rerun Session B's select from the beginning since the data has been modified and we have specified FOR UPDATE (this clause forces Oracle to get the last version of the block).
This means that session B will get 10 new rows.
So in this scenario, you have no consistency problem. Also, assuming that the transaction to request a row and change its status is fast, the concurrency impact will be light.

Each process can issue a SELECT ... FOR UPDATE to lock the row when they read it. In this scenario, process A will read and lock the row, process B will attempt to read the row and block until process A releases the lock by committing (or rolling back) its transaction. Oracle will then determine whether the row still meets B's criteria and, in your example, won't return the row to B. This works but it means that your multi-threaded process may now be effectively single-threaded depending on how your transaction control needs to work.
Possible ways to improve scalability
A relatively common approach on the consumer to resolving this is to have a single coordinator thread that reads the data from the table, parcels out work to different threads, and updates the table appropriately (including knowing how to re-assign a job if the thread that was assigned it has died).
If you are using Oracle 11.1 or later, you can use the SKIP LOCKED clause on your FOR UPDATE so that each session gets back the first row that meets their criteria and is not locked (the clause existed in earlier versions but was not documented so it may not work correctly).
Rather than using a table for ImportJobs, you can use a queue with multiple consumers. This will allow Oracle to distribute messages to each process without you needing to build any additional locking (Oracle queues are doing it all behind the scenes).

Use versioning and optimistic concurrency.
The IMPORTJOBS table should have a timestamp column that you mark as ConcurrencyMode = Fixed in your model. Now when EF tries to do an update the timestamp column is incorporated in the update statement: WHERE timestamp = xxxxx.
For B, the timestamp changed in the mean time, so a concurrency exception is raised, which, in this case, you handle by skipping the update.
I'm from a SQL server background and I don't know the Oracle equivalent of timestamp (or rowversion), but the idea is that it's a field that auto-updates when an update is made to a record.

Related

Ideas on incorrect ORDER BY results

I want to emphasize that I'm looking for ideas, not necessarily a concrete answer since it's difficult to show what my queries look like, but I don't believe that's needed.
The process looks like this:
Table A keeps filling up, like a bucket - an SQL job keeps calling SP_Proc1 every minute or less and it inserts multiple records into table A.
At the same time a C# process keeps calling another procedure SP_Proc2 every minute or less that does an ordered TOP 5 select from table A and returns the results to the C# method. After C# code finishes processing the results it deletes the selected 5 records from table A.
I bolded the problematic part above. It is necessary that the records from table A be processed 5 at a time in the order specified, but a few times a month SP_Proc2 selects the ordered TOP 5 records in a wrong order even though all the records are present in table A and have correct column values that are used for ordering.
Something to note:
I'm ordering by integers, not varchar.
The C# part is using 1 thread.
Both SP_Proc1 and SP_Proc2 use a transaction and use READ COMMITTED OR READ COMMITTED SNAPSHOT transaction isolation level.
One column that is used for ordering is a computed value, but a very simple one. It just checks if another column in table A is not null and sets the computed column to either 1 or 0.
There's a unique nonclustered index on primary key Id and a clustered index composed of the same columns used for ordering in SP_Proc2.
I'm using SQL Server 2012 (v11.0.3000)
I'm beginning to think that this might be an SQL bug or maybe the records or index in table A get corrupted and then deleted by the C# process and that's why I can't catch it.
Edit:
To clarify, SP_Proc1 commits a big batch of N records to table A at once and SP_Proc2 pulls the records from table A in batches of 5, it orders the records in the table and selects TOP 5 and sometimes a wrong batch is selected, the batch itself is ordered correctly, but a different batch was supposed to be selected according to ORDER BY. I believe Rob Farley might have the right idea.
My guess is that your “out of order TOP 5” is ordered, but that a later five overlaps. Like, one time you get 1231, 1232, 1233, 1234, and 1236, and the next batch is 1235, 1237, and so on.
This can be an issue with locking and blocking. You’ve indicated your processes use transactions, so it wouldn’t surprise me if your 1235 hasn’t been committed yet, but can just be ignored by your snapshot isolation, and your 1236 can get picked up.
It doesn’t sound like there’s a bug here. What I’m describing above is a definite feature of snapshot isolation. If you must have 1235 picked up in an earlier batch than 1236, then don’t use snapshot isolation, and force your table to be locked until each block of inserts is finished.
An alternative suggestion would be to use a table lock (tablock) for the reading and writing procedures.
Though this is expensive, if you desire absolute consistency then this may be the way to go.

What is the proper way to lock rows in SQL while a user works with the data?

I am currently working on some kind of ERP like WPF application with SQL Server as the database.
Up to now, I had to work only with small tasks that does not need row locking on the server side. So the basic was "Create SQLConnection-> Select Data in the DataTable -> close connection".
Now I would like to create the functionality to work on orders.
How could I Lock the records that has been selected till the user finishes the work so no other user can read that rows?
I think I should use transactions, but I am not sure how to keep the transaction alive until the next statement, because I am closing the connection after each command.
Locking data like that is a bad practice. A transaction is intended to ensure that your data is completely saved or not at all. They are not intended to lock the data for the reason specified in your question.
It sounds like the data being entered could be a lot so you don't want a user spending time entering data to only be met with an error because someone else changed the data. You could have a locked_by column that you set when a user is editing the data and simply not allow anyone else to edit the data if that column is not NULL. You could still allow reads of the data or exclude locked data from view with queries depending on your need.
You may also want to include a locked_time column so you know when it was locked. You could then clear the lock if it's stale, or at least query how long it's been locked allowing for an admin user to look for lengthy locks so they can contact that user or clear the lock.
The query could look like this:
UPDATE Table SET locked_by = #lockedByUser, locked_time = #lockedTime
WHERE Id = #fetchId and locked_by IS NULL
SELECT * FROM Table WHERE locked_by = #lockedByUser
If no data is returned, the lock failed or the id doesn't exist. Either way, the data isn't available. You could retrieve the records updated count, to also verify if the lock was successful or not.
Don't close the connection
open transaction
on the select use an uplock so record(s) are locked
perform updates
commit or rollback the transaction
Put some type of timer on it.
One way to handle concurrency via application is implement some kind of "LastServerUpdateDateTime" column on the table you are working on.
When User A pulls the data for a row the ViewModel will have that LastServerUpdateDateTime value saved. Your User A does their updates and then try to save back to the DB. If the LastServerUpdateDateTime value is the same, then that means there was no updates while you were working and you are good to save (and LastServerUpdateDateTime is also updated). If at any point while User A is working on a set of data on the application side, and User B comes in makes their changes and saves, then when User A eventually saves the LastServerUpdateDateTime will be different than what they initially pulled down and save will be rejected. Yes User A then has to redo their changes, but it shouldn't happen often (depending on your application of course) and you don't have to deal with direct DB locking or anything like that.
I will describe the mechanism that I have used with success in the past.
1) Create a document ID table. In this table, each record represents a document type and an ID which can be incremented whenever a new document is created. The importance of this table is really as a root lock; the document ID is not strictly needed.
2) Create a lock table. In this table, each record represents a lock which includes a reference to a document record, a reference to the lock owner, and some additional data such as when the lock was created, when it was last acted upon, its status, or anything else you find useful. Each record means "user A holds a lock on document type X, document ID Y".
3) When locking a document (fetch + lock), lock (SELECT/UPDATE) the relevant record in the document ID table. Then, check the lock table for an existing lock record, and INSERT a new one as appropriate. At this point you may choose to over-write an existing lock, or return an error to the user.
4) When updating a document, again lock (SELECT/UPDATE) the relevant record in the document ID table. Then verify the user holds a lock, and if so do the actual update, and then DELETE the lock record. If the user does not hold a lock, you may choose to allow the update if no other user holds a lock, or return an error.
With this mechanism, a user goes through a open/lock operation, and a save/unlock, or discard/unlock operation. Additionally, locks can be removed by a cron job or by an administrator, in case users fail to update or discard (which they will).
This approach avoids holding record locks and transactions open for long periods of time, which can cause concurrency issues. It also allows locks to survive software crashes. It also allows all kinds of flexibility; for example, my implementation allowed a lock to be "demoted" after some period of time, and once a lock was demoted, it could be over-written by an ordinary user, while still allowing the owner to perform an update as long as the lock remained.

Building a simple multithreaded newsletter engine using Entity Framework

I understand the concepts around multithreading and using Thread Pools. One concept I am trying to figure out is how to keep track of what emails have been sent to on each thread. So imagine, each thread is responsible for pulling x number of records, iterating through those emails, applying an email template, then saving the email to a pick up directory. Obviously, I need a way to tell each thread not to pull the same data as another thread.
One solution I was thinking was to page the data, have a global variable or array to keep track of the pages already sent to, have each thread examine that variable and start from the next available page. The only issue I can think of is if the data changes, then the pages available might get out of sync.
another solution is to set a boolean value in the database to determine if an account has been emailed to or not. So, EF would pull X amount of records and update those records as being ready to email on. This way each query would only look for emails that are not ready to be emailed to.
I wanted to get some other suggestions, if possible, or expand on the solutions I provided.
Given that you may one day want to scale to more than one app server, memory synchronization implementations might also not be sufficient to guarantee that emails are not duplicated.
One of the simplest ways to solve is to implement a batch processing mechanism would be at the database level.
Under a Unit of Work
Read N x records, with Pessimistic Locking (i.e. preventing concurrent reads by other threads pulling the same emails)
Stamp these records with a batch id (or a IsProcessed Indicator)
Return the records to your app
e.g. a Batching PROC in SQL server might look something like (Assuming table = dbo.Emails, which has a PK EmailId and a processed indicator BIT field IsProcessed):
CREATE PROC dbo.GetNextBatchOfEmails
AS
BEGIN
-- Identify the next N emails to be batched. UPDLOCK is to prevent another thread batching same emails
SELECT top 100 EmailId
INTO #tmpBatch
FROM dbo.Emails WITH (UPDLOCK)
WHERE IsProcessed = 0
-- Stamp emails as sent. Assumed that PROC is called under a UOW. The batch IS the UOW
UPDATE e
SET e.IsProcessed = 1
FROM dbo.Emails e
INNER JOIN #tmpBatch t
on e.EmailId = t.EmailId
-- Return the batch of emails to caller
SELECT e.*
FROM dbo.Emails e
INNER JOIN #tmpBatch t
on e.EmailId = t.EmailId
END
Then expose the PROC as an EF Function Import mapped to your Email Entity. Under a TransactionScope ts, you can then call the EF Function Import, and send emails, and call ts.Complete() on success.
In addition to nonnb's method, you can accomplish it all in one statement if you wish if you are using SQL Server 2005+.
;WITH q AS
(
SELECT TOP 10 *
FROM dbo.your_queue_table
WHERE
IsProcessing = 0
--you can obviously include more filtering criteria to meet your needs
)
UPDATE q WITH (ROWLOCK, READPAST)
SET IsProcessing = 1
OUTPUT INSERTED.*
There is also some great information located here about using database tables as queues.

How do I lock a table from write operations until my insert is complete with Linq-to-sql

I am working on an auction system and one of the issues I am trying to make sure I don't get affected by is a situation where 2 people put in a bid at the exact same time for the same item.
To do this I need to put a lock on the table, get the highest bid for the current item, make sure the entered bid is greater than that bid, add a new bid entry into the table, then unlock the table.
I need to lock this so a second webserver does not trigger a bid insert between when I check for the highest bid and when I insert my new bid into the table, as this would cause data issues.
How do I accomplish this with Linq-to-sql?
Note, I don't know if transactionscopes can do this but I can't use them, as they tend to trigger a distributed transaction due to our webfarm setup, and I can't use distributed transactions.
There seem to be a couple of obstacles implementing a solution in pure Linq:
You should definitely avoid a table lock
A table lock would make it impossible for several items to be bid on during the processing of one single bid, thus severely harming performance
Linq to SQL does not seem to support pessimistic locking
as stated in other answers on SO.
If you cannot have transactions in your code, I suggest the following procedure:
generate a GUID for your operation
pseudo-lock the item's record using the guid:
UPDATE Items SET LockingGuid = #guid
WHERE ItemId = #ItemId and LockingGuid IS NULL
SELECT #recordsaffected = ##ROWCOUNT
the lock succeeded if ##rowcount == 1
perform your bidding operation
UPDATE the record back to LockingGuid = NULL
if the lock fails, either raise the failure to the .Net client, or busy-wait using WAITFOR.
You should implement proper exception handling so that item records do not get locked indefinitely by a dying or failing process, probably by adding a datetime column storing the timestamp the lock occurred, and cleaning up orphaned locks.
If your architecture allows for separate backend operation, you might want to have a look and CQRS and Event Sourcing for processing such bidding operations.
You could use a separate table to store information when this processing occurs. For example, your second table could be something like:
Table name:
ItemProcessing
Columns:
ItemId (int)
ProcessingToken (guid)
When a process wants to check on a current bid, it writes the ID of the item and a token/guid to the ItemProcessing table. That tells other processes that this item is currently being inspected. If there is already a row in the ItemProcessing table for this item, the other process must wait or abort. When the original process is done, it removes the token (sets it to null), or removes the row from ItemProcessing altogether. Then other processes know they can process that item.
Of course, you'll need a way to make sure both processes don't write to this processing table at the same time. You could accomplish that by inserting into this table where ProcessingToken is null. If another table just beat a process to it, the second process won't be able to insert because the ProcessingToken will exist.
While not a full solution, in detail, that's the basic idea.
You can manually begin a transaction and pass that transaction to the DataContext.
http://geekswithblogs.net/robp/archive/2009/04/02/your-own-transactions-with-linq-to-sql.aspx
I think it is necessary as well to manually control the opening and closing of the Connection to avoid an unwanted escalation to a distributed transaction. It seems that the DataContext will actually get in its own way and try to open two connections sometimes, thus causing a promotion to a distributed transaction.

Getting last inserted ID

I'm currently using the method below to get the ID of the last inserted row.
Database.ExecuteNonQuery(query, parameters);
//if another client/connection inserts a record at this time,
//could the line below return the incorrect row ID?
int fileid = Convert.ToInt32(Database.Scalar("SELECT last_insert_rowid()"));
return fileid;
This method has been working fine so far, but I'm not sure it's completely reliable.
Supposing there are two client applications, each with their own separate database connections, that invoke the server-side method above at exactly the same time. Keeping in mind that the client threads run in parallel and that SQLite can only run one operation at a time (or so I've heard), would it be possible for one client instance to return the row ID of a record that was inserted by the other instance?
Lastly, is there a better way of getting the last inserted row ID?
If another client/connection inserts a record at this time, could the line below return the incorrect row ID?
No, since the write will either happen after the read, or before the read, but not during the read.
Keeping in mind that the client threads run in parallel and that SQLite can only run one operation at a time, would it be possible for one client to get the row ID of the record that was inserted by the other client?
Yes, of course.
It doesn't matter that the server-side methods are invoked at exactly the same time. The database's locking allows concurrent reads, though not concurrent writes, or reads while writing.
if you haven't already, have a read over SQLite's file locking model.
If both the insert command and the get last inserted row id command are inside the same write lock and no other insert command in that write lock can run between those two commands, then you're safe.
If you start the write lock after the insert command, than there's no way to be sure that another thread didn't get a write lock first. If another thread did get a write lock first, then you won't be able to execute a search for the row id until after that other thread has released their lock. By then, it could be too late if the other thread inserted a new row.

Categories