How do you do Versioning in Nhibernate? - c#

I can't believe it is so hard to get someone to show me a simple working example. It leads me to believe that everyone can only talk like they know how to do it but in reality they don't.
I shorten the post down to only what I want the example to do. Maybe the post was getting to long and scared people away.
To get this bounty I am looking for a WORKING EXAMPLE that I can copy in VS 2010 and run.
What the example needs to do.
Show what datatype should be in my domain for version as a timestamp in mssql 2008
Show nhibernate automatically throwing the "StaleObjectException"
Show me working examples of these 3 scenarios
Scenario 1
User A comes to the site and edits Row1. User B comes(note he can see Row1) and clicks to edit Row1, UserB should be denied from editing the row until User A is finished.
Scenario 2
User A comes to the site and edits Row1. User B comes 30mins later and clicks to edit Row1. User B should be able to edit this row and save. This is because User A took too long to edit the row and lost his right to edit.
Scenario 3
User A comes back from being away. He clicks the update row button and he should be greeted with StaleObjectException.
I am using asp.net mvc and fluent nhibernate. Looking for the example to be done in these.
What I tried
I tried to build my own but I can't get it throw the StaleObjectException nor can I get the version number to increment. I tired opening 2 separate browser and loaded up the index page. Both browsers showed the same version number.
public class Default1Controller : Controller
{
//
// GET: /Default1/
public ActionResult Index()
{
var sessionFactory = CreateSessionFactory();
using (var session = sessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
var firstRecord = session.Query<TableA>().FirstOrDefault();
transaction.Commit();
return View(firstRecord);
}
}
}
public ActionResult Save()
{
var sessionFactory = CreateSessionFactory();
using (var session = sessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
var firstRecord = session.Query<TableA>().FirstOrDefault();
firstRecord.Name = "test2";
transaction.Commit();
return View();
}
}
}
private static ISessionFactory CreateSessionFactory()
{
return Fluently.Configure()
.Database(MsSqlConfiguration.MsSql2008
.ConnectionString(c => c.FromConnectionStringWithKey("Test")))
.Mappings(m => m.FluentMappings.AddFromAssemblyOf<TableA>())
// .ExposeConfiguration(BuidSchema)
.BuildSessionFactory();
}
private static void BuidSchema(NHibernate.Cfg.Configuration config)
{
new NHibernate.Tool.hbm2ddl.SchemaExport(config).Create(false, true);
}
}
public class TableA
{
public virtual Guid Id { get; set; }
public virtual string Name { get; set; }
// Not sure what data type this should be for timestamp.
// To eliminate changing to much started with int version
// but want in the end timestamp.
public virtual int Version { get; set; }
}
public class TableAMapping : ClassMap<TableA>
{
public TableAMapping()
{
Id(x => x.Id);
Map(x => x.Name);
Version(x => x.Version);
}
}

Will nhibernate stop the row from being retrieved?
No. Locks are only placed for the extent of a transaction, which in a web application ends when the request ends. Also, the default type of transaction isolation mode is Read committed which means that read locks are released as soon as the select statement terminates. If you are reading and making edits in the same request and transaction, you could place a read and write lock on the row at hand which would prevent other transactions from writing to or reading from that row. However, this type of concurrency control doesn't work well in a web application.
Or would the User B be able to still see the row but if he tried to save it would crash?
This would happen if [optimistic concurrency] was being used. In NHibernate, optimistic concurrency works by adding a version field. Save/update commands are issued with the version upon which the update was based. If that differs from the version in the database table, no rows are updated and NHibernate will throw.
What happens if User A say cancels and does not edit. Do I have to
release the lock myself or is there a timeout can be set to release
the lock?
No, the lock is released at the end of the request.
Overall, your best bet is to opt for optimistic concurrency with version fields managed by NHibernate.

How does it look in code? Do I setup in my fluent nhibernate to
generate a timestamp(not sure if I would timespan datatype).
I would suggest using a version column. If you're using FluentNhibernate with auto mappings, then if you make a column called Version of type int/long it will use that to version by default, alternatively you can use the Version() method in the mapping to do so (it's similar for timestamp).
So now I generated somehow the timestamp and the user is editing a
row(through a gui). Should I be storing the timestamp in memory or
something? Then when the user submits call from memory the timestamp
and id of the row and check?
When the user starts editing a row, you retrieve it and store the current version (the value of the version property). I would recommend putting the current version in a hidden field in the form. When the user saves his changes, you can either do a manual check against the version in the database (check that it's the same as the version in the hidden field), or you can set the version property to the value from the hidden field (if you are using databinding, you could do this automatically). If you set the version property, then when you try to save the entity, NHibernate will check that the version you're saving matches the version in the database, and throws an exception if it doesn't.
NHibernate will issue an update query something like:
UPDATE xyz
SET ,
Version = 16
WHERE Id = 1234 AND Version = 15
(assuming your version was 15) - in the process it will also increment the version field
If so that means the business logic is keeping track of the "row
locking" but in theory someone still could just go Where(x => x.Id ==
id) and grab that row and update at will.
If someone else updates the row via NHibernate, it will increment the version automatically, so when your user tries to save it with the wrong version you will get an exception which you need to decide how to handle (ie. try show some merge screen, or tell the user to try again with the new data)
What happens when the row gets updated? Do you set null to the timestamp?
It updates the version or timestamp (timestamp will get updated to the current time) automatically
What happens if the user never actually finishes updating and leaves. How does the row
every become unlocked again?
The row is not locked per se, it is instead using optimistic concurrency, where you assume that no-one will change the same row at the same time, and if someone does, then you need to retry the update.
Is there still a race condition what happens or is this next to
impossible in happening? I am just concerned 2 ppl try to get edit the
same row and both of them see it in their gui for editing but one is
actually going to get denied in the end because they lost the race
condition.
If 2 people try to edit the same row at the same time, one of them will lose if you're using optimistic concurrency. The benefit is that they will KNOW that there was a conflict, as opposed to either losing their changes and thinking that it updated, or overwriting someone else's changes without knowing about it.
So I did something like this
var test = session.Query.Where(x => x.Id ==
id).FirstOrDefault(); // send to user for editing. Has versioning on
it. user edits and send back the data 30mins later.
Codes does
test.Id = vm.Id; test.ColumnA = vm.ColumnA; test.Version = vm.Version;
session.Update(test); session.Commit(); So the above will work right?
The above will throw an exception if someone else has gone in and changed the row. That's the point of it, so you know that a concurrency issue has arisen. Typically you'd show the user a message saying "Someone else has changed this row" with the new row there and possibly their changes also so the user has to select which changes win.
but if I do this
test.Id = vm.Id;
test.ColumnA = vm.ColumnA;
session.Update(test);
session.Commit(); it would not commit right?
Correct as long as you haven't reloaded test (ie. you did test = new Xyz(), not test = session.Load() ) because the Timestamp on the row wouldn't match
If someone else updates the row via NHibernate, it will increment the
version automatically, so when your user tries to save it with the
wrong version you will get an exception which you need to decide how
to handle (ie. try show some merge screen, or tell the user to try
again with the new data)
Can I make it so when the record is grabbed this checked. I want to
keep it simple at first that only one person can edit at a time. The
other guy won't even be able to access the record to edit while
something is editing it.
That's not optimistic concurrency. As a simple answer you could add a CheckOutDate property which you set when someone starts editing it, and set it to null when they finish. Then when they start to edit, or when you show them the rows to edit you could exclude all rows where that CheckOutDate is newer than say the last 10 minutes (then you wouldn't need a scheduled task to reset it periodically)
The row is not locked per se, it is instead using optimistic
concurrency, where you assume that no-one will change the same row at
the same time, and if someone does, then you need to retry the update.
I am not sure what your saying does this mean I can do
session.query.Where(x => x.id == id).FirstOrDefault(); all day
long and it will keep getting me the record(thought it would keep
incrementing the version).
The query will NOT increment the version, only an update to it will increment the version.

I don't know that much about nHibernate itself, but if you are prepared to create some stored procs on the database it can >sort of< be done.
You will need one extra data column and two fields in your object model to store information against each row:
A 'hash' of all the field values (using SQL Server CHECKSUM 2008 and later or HASHBYTES for earlier editions) other than the hash field itself and the EditTimestamp field. This could be persisted to the table using INSERT/UPDATE triggers if needs be.
An 'edit-timestamp' of type datetime.
Change your procedures to do the following:
The 'select' procedure should include a where clause similar to 'edit-timestamp < (Now - 30 minutes)' and should update the 'edit-timestamp' to the current time. Run the select with appropriate locking BEFORE updating the row I'm thinking a stored procedure with hold locking such as this one here Use a persistent date/time rather than something like GETDATE().
Example (using fixed values):
BEGIN TRAN
DECLARE #now DATETIME
SET #now = '2012-09-28 14:00:00'
SELECT *, #now AS NewEditTimestamp, CHECKSUM(ID, [Description]) AS RowChecksum
FROM TestLocks
WITH (HOLDLOCK, ROWLOCK)
WHERE ID = 3 AND EditTimestamp < DATEADD(mi, -30, #now)
/* Do all your stuff here while the record is locked */
UPDATE TestLocks
SET EditTimestamp = #now
WHERE ID = 3 AND EditTimestamp < DATEADD(mi, -30, #now)
COMMIT TRAN
If you get a row back from this procedure then you 'have' the 'lock', otherwise, no rows will be returned and there's nothing to edit.
The 'update' procedure should add a where clause similar to 'hash = previously returned hash'
Example (using fixed values):
BEGIN TRAN
DECLARE #RowChecksum INT
SET #RowChecksum = -845335138
UPDATE TestLocks
SET [Description] = 'New Description'
WHERE ID = 3 AND CHECKSUM(ID, [Description]) = #RowChecksum
SELECT ##ROWCOUNT AS RowsUpdated
COMMIT TRAN
So in your scenarios:
User A edits a row. When you return this record from the database, the 'edit-timestamp' has been updated to the current time and you have a row so you know you can edit. User B would not get a row because the timestamp is still too recent.
User B edits the row after 30 minutes. They get a row back because the timestamp has passed more than 30 minutes ago. The hash of the fields will be the same as for user A 30 minutes ago as no updates have been written.
Now user B updates. The previously retrieved hash still matches the hash of the fields in the row, so the update statement succeeds, and we return the row-count to show that the row was updated. User A however, tries to update next. Because the value of the description field has changed, the hashvalue has changed, and so nothing is updated by the UPDATE statement. We get a result of 'zero rows updated' so we know that either the row has since been changed or the row was deleted.
There are probably some issues regarding scalability with all these locks going on and the above code could be optimised (might get problems with clocks going forward/back for example, use UTC), but I wrote these examples just to explain how it could work.
Outside of that I can't see how you can do this without utilising database level row-locking within the select transaction. It might be that you can request those locks via nHibernate, but that's beyond my knowledge of nHibernate I'm afraid.

Have you looked at the ISaveOrUpdateEventListener interface?
public class SaveListener : NHibernate.Event.ISaveOrUpdateEventListener
{
public void OnSaveOrUpdate(NHibernate.Event.SaveOrUpdateEvent e)
{
NHibernate.Persister.Entity.IEntityPersister p = e.Session.GetEntityPersister(null, e.Entity);
if (p.IsVersioned)
{
//TODO: check types etc...
MyEntity m = (MyEntity) e.Entity;
DateTime oldversion = (DateTime) p.GetVersion(m, e.Session.EntityMode);
DateTime currversion = (DateTime) p.GetCurrentVersion(m.ID, e.Session);
if (oldversion < currversion.AddMinutes(-30))
throw new StaleObjectStateException("MyEntity", m.ID);
}
}
}
Then in your Configuration, register it.
private static void Configure(NHibernate.Cfg.Configuration cfg)
{
cfg.EventListeners.SaveOrUpdateEventListeners = new NHibernate.Event.ISaveOrUpdateEventListener[] {new SaveListener()};
}
public static ISessionFactory CreateSessionFactory()
{
return Fluently.Configure().Database(...).
.Mappings(...)
.ExposeConfiguration(Configure)
.BuildSessionFactory();
}
And version the Properties you want to version in your Mapping class.
public class MyEntityMap: ClassMap<MyENtity>
{
public MyEntityMap()
{
Table("MyTable");
Id(x => x.ID);
Version(x => x.Timestamp);
Map(x => x.PropA);
Map(x => x.PropB);
}
}

The short answer to your question is you can't/shouldn't do this in a simple web application with nhibernates optimistic (version) and pessimistic (row locks) locking. The fact that your transactions are only as long as a request are your limiting factor.
What you CAN do is create another table and entity class, and mappings that manages these "locks". At the lowest level you need an Id of the object being edited and the Id of the user performing the editing, and a datetime of when the lock was acquired. I would make the Id of the object being edited the primary key since you want it to be exclusive...
When a user clicks on a row to edit, you can try to acquire a lock (create a new record in that table with the ids and current datetime). If the lock already exists for another user, than it will fail because you are trying to violate a primary key constraint.
If a lock is acquired, when the user clicks save you need to check that they still have a valid "lock" before performing the actual save. Then, perform the actual save and remove the lock record.
I would also recommend a background service/process that sweeps these locks periodically and removes the ones that have expired or are older than your time limit.
This is my prescribed way of dealing with "locks" in a web environment. Good luck!

Yes, it is possible to lock a row with nhibernate but if I understand well, your scenario is in a web context and then it is not the best practice.
The best practive is to use optimistic locking with automatic versioning as you mentioned.
Locking a row when page is opening and releasing it when page is unloading will quickly lead to dead lock the row (javascript issue, page not killed properly...).
Optimistic locking will make NHibernate throws an exception when flushing a transaction which contains objects modified by another session.
If you want to have true concurent modification of the same information you may try to think about a system which merge many users input inside a same document, but it is a system on its own, not managed by ORM.
You will have to choose a way to deal with session in a web environment.
http://nhibernate.info/doc/nh/en/index.html#transactions-optimistic
The only approach that is consistent with high concurrency and high
scalability is optimistic concurrency control with versioning.
NHibernate provides for three possible approaches to writing
application code that uses optimistic concurrency.

Hey you can try these sites
http://thesenilecoder.blogspot.ca/2012/02/nhibernate-samples-row-versioning-with.html
http://stackingcode.com/blog/2010/12/09/optimistic-concurrency-and-nhibernate

Related

Will DbContextTransaction.BeginTransaction prevent this race condition

I have a method that needs to "claim" a payment number to ensure it is available at a later time. I cannot just get a new payment number when ready to commit to the database, as the number is added to a signed token, and then the payment number is taken from the signed token later on when committing to the database to allow the token to be linked to the payment afterwards.
Payment numbers are sequential and the current method used in existing code is:
Create a Payment
Get the last payment number from the database
Increment the payment number
Use this payment number for the Payment
Update the database with the incremented payment number
In my service I am trying to prevent the following race-condition:
My service reads the payment number (eg. 100)
Another service uses and updates the payment number (now 101)
My service increments the number locally (to 101) and updates the database (still 101)
This would produce two payments with a payment number of 100.
Here is my implementation so far, in my Transaction class:
private DbSet<PaymentIdentifier> paymentIdentifier;
//...
private int ClaimNextPaymentNumber()
{
int nextPaymentNumber = -1;
using(var dbTransaction = db.Database.BeginTransaction())
{
int lastPaymentNumber = paymentIdentifier.ElementAt(0).Identifier;
nextPaymentNumber = lastPaymentNumber + 1;
paymentIdentifier.ElementAt(0).Identifier = nextPaymentNumber;
db.SaveChanges();
dbTransaction.Commit();
}
return nextPaymentNumber;
}
The PaymentIdentifier table has a single row and a single column "Identifier" (hence the .ElementAt(0)). I am unable to change the database structure as there is lots of legacy code relying on it that is very brittle.
Will having the code wrapped in a transaction (as I have done) protect against the race condition, or is there some Entity Framework / PostgreSQL idiosyncrasies I need to deal with to protect the identifier from being read whilst performing the transaction?
Thank you!
(As a side point, I believe lots of legacy code in the other software connecting to the database simply ignores the race condition and relies on it being "very fast")
It helps you with the race condition only if all code, including legacy, will use this method. If there is still code that continue using client side incrementing without transaction, you'll get the same problem. Just exchange 'My service' and 'Another service' in your description.
1. Another service reads the payment number (eg. 100) **without** transaction
2. My service uses and updates the payment number (now 101) **with** transaction
3. Another service increments the number locally (to 101) and updates the database (still 101) **without** transaction
Note that you can replace your code with simpler one by executing this query without explicit transaction.
update PaymentIdentifier set Identifier = Identifier + 1 returning Identifier;
But again, it will not solve your concurrency problem until you replace all places where the Identifier is incremented. If you can change that, you would better use SEQUENCE or Generators that will safely provide you with incremental Ids.
A transaction does not automaticaly lock your table. A Transaction just ensures that multiple changes to the database are done altogether or nothing at all (see the A (atomic) in ACID). But the thing you want is that only one session can read, add one, update the value. And after that is done the next session is allowed to do the same thing.
So you now have different possibilities:
Use a Sequence you can get the next value for example like that SELECT nextval('mysequencename'). If if two sessions try to get a value at the same time they will get two differnt values.
If you have more complex needs and want to store every "token" with additional data in a table. so every token is a row in the table with additional colums you could use table locking. With this you could restrict the access to table. So only one session is allowed to access the table at a time. But make sure that you use locks for as short as possible because this will become your performance bottleneck.
The database prevents the race condition by throwing a concurrency violation error in this case. So, I looked at how this is handled in the legacy code (following the suggestion by #sergey-l) and it uses a simple retry mechanism. So, I did the same:
private int ClaimNextPaymentNumber()
{
DbContextTransaction dbTransaction;
bool failed;
int paymentNumber = -1;
do
{
failed = false;
using(dbTransaction = db.Database.BeginTransaction())
{
try
{
paymentNumber = TryToClaimNextPaymentNumber();
}
catch(DbUpdateConcurrencyException ex)
{
failed = true;
ResetForClaimPaymentNumberRetry(ex);
}
dbTransaction.Commit();
concurrencyExceptionRetryCount = 0;
}
}
while(failed);
return paymentNumber;
}

Is it possible for data to change within a transaction with repeatable read isolation?

I have some .NET code wrapped up in a repeatable read transaction that looks like this:
using (
var transaction = new TransactionScope(
TransactionScopeOption.Required,
new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead },
TransactionScopeAsyncFlowOption.Enabled))
{
int theNextValue = GetNextValueFromTheDatabase();
var entity = new MyEntity
{
Id = Guid.NewGuid(),
PropertyOne = theNextValue, //An identity column
PropertyTwo = Convert.ToString(theNextValue),
PropertyThree = theNextValue,
...
};
DbSet<MyEntity> myDbSet = GetEntitySet();
myDbSet.Add(entity);
await this.databaseContext.Entities.SaveChangesAsync();
transaction.Complete();
}
The first method, GetNextValueFromTheDatabase, retrieves the max value stored in a column in a table in the database. I'm using repeatable read because I don't want two users to read and use the same value. Then, I simply create an Entity in memory and call SaveChangesAsync() to write the values to the database.
Sporadically, I see that the values of entity.PropertyOne, entity.PropertyTwo, and entity.PropertyThree do not match each other. For example, entity.PropertyOne has a value of 500, but entity.PropertyTwo and entity.PropertyThree have a value of 499. How is that possible? Even if the code weren't wrapped in a transaction, I would expect the values to match (just maybe duplicated across the Entities if two users ran at the same time).
I am using Entity Framework 6 and Sql Server 2008R2.
Edit:
Here is the code for GetNextValueFromTheDatabase
public async Task<int> GetNextValueFromTheDatabase()
{
return await myQuerable
.OrderByDescending(x => x.PropertyOne) //PropertyOne is an identity column (surprise!)
.Select(x => x.PropertyOne)
.Take(1)
.SingleAsync() + 1;
}
So this question cannot be definitively answered because GetNextValueFromTheDatabase is not shown. I'm going off of what you said what it does:
REPEATABLE READ in SQL Server S-locks rows that you have read. When you read the current maximum, presumably from an index, that row is S-locked. Now, if a new maximum appears that row is unaffected by the lock. That's why the lock does not prevent other, competing maximum values from appearing.
You need SERIALIZABLE isolation if you obtain the maximum by reading the largest values from a table. This will result in deadlocks in your specific case. That can be solved through locking hints or retries.
You could also keep a separate table that stores the current maximum value. REPEATABLE READ is enough here because you always access the same row of that table. You will be seeing deadlocks here as well even with REPEATABLE READ without locking hints.
Retries are a sound solution to deadlocks.
I think that you are basically experiencing the phantom read.
Consider two transactions T1, T2 that are scheaduled for execution like shown below. The things is that in T1's first read you do not get value (X) that is inserted from transaction T2. In the second time you do get the value (X) in your select statement. This is the scary nature of the repeatable read. It does not block insertion in the whole table if some rows are read from it. It only locks existing rows.
T1 T2
SELECT A.X FROM WeirdTable
INSERT INTO WeirdTable TABLE (A) VALUES (X)
SELECT A.X FROM WeirdTable
.
UPDATE
It seems that this answer turned out irrelavant for this specific question. It is related to the repeatable read isolation level, matches the keywords of this question and is not concepcually wrong though, so I will leave it in here.
I finally figured this out. As described in usr's response, multiple transaction can read the same max value at the same time (S-Lock).The problem was that one of the columns is an identity column. EF allows you specify an identity column's value when inserting but ignores the value you specify. So the identity column seemed to update with the expected value most of the time, but in fact the value specified in the domain entity just happen to match what the database was generating internally.
So for example, let's say the current max number is 499, transaction A and transaction B both read 499. When transaction A finishes, it successfully writes 500 to all three properties. Transaction B attempts to write 500 to all 3 columns. The non-identity columns are updated successfully to 500, but the identity column's value is incremented to the next available value automatically (without throwing an error)
A few solutions
The solution I used is to not set the value for any of the columns when inserting the record. Once the record is inserted, update the other two columns with the database assigned identity column's value.
Another option would be to change the column's option to .HasDatabaseGeneratedOption(DatabaseGeneratedOption.None)
...which would perform better than the first option, but would require the changes usr suggested to mitigate the lock issues.

How to totally lock a row in Entity Framework

I am working with a situation where we are dealing with money transactions.
For example, I have a table of users wallets, with their balance in that row.
UserId; Wallet Id; Balance
Now in our website and web services, every time a certain transaction happens, we need to:
check that there is enough funds available to perform that transaction:
deduct the costs of the transaction from the balance.
How and what is the correct way to go about locking that row / entity for the entire duration of my transaction?
From what I have read there are some solutions where EF marks an entity and then compares that mark when it saves it back to the DB, however what does it do when another user / program has already edited the amount?
Can I achieve this with EF? If not what other options do I have?
Would calling a stored procedure possibly allow for me to lock the row properly so that no one else can access that row in the SQL Server whilst program A has the lock on it?
EF doesn't have built-in locking mechanism, you probably would need to use raw query like
using (var scope = new TransactionScope(...))
{
using (var context = new YourContext(...))
{
var wallet =
context.ExecuteStoreQuery<UserWallet>("SELECT UserId, WalletId, Balance FROM UserWallets WITH (UPDLOCK) WHERE ...");
// your logic
scope.Complete();
}
}
you can set the isolationlevel on the transaction in Entity framework to ensure no one else can change it:
YourDataContext.Database.BeginTransaction(IsolationLevel.RepeatableRead)
RepeatableRead
Summary:
Locks are placed on all data that is used in a query, preventing other users from updating the data. Prevents non-repeatable reads but phantom rows are still possible.
The whole point of a transactional database is that the consumer of the data determines how isolated their view of the data should be.
Irrespective of whether your transaction is serialized someone else can perform a dirty read on the same data that you just changed, but did not commit.
You should firstly concern yourself with the integrity of your view and then only accept a degredation of the quality of that view to improve system performance where you are sure it is required.
Wrap everthing in a TransactionScope with Serialized isolation level and you personally cannot really go wrong. Only drop the isolation level when you see it is genuinely required (i.e. when getting things wrong sometimes is OK).
Someone asks about this here: SQL Server: preventing dirty reads in a stored procedure

Concurrent reading and updating in a database table

I have an Oracle database that I access using Devart and Entity Framework.
There's a table called IMPORTJOBS with a column STATUS.
I also have multiple processes running at the same time. They each read the first row in IMPORTJOBS that has status 'REGISTERED', put it to status 'EXECUTING', and if done put it to status 'EXECUTED'.
Now because these processes are running in parallel, I believe the following could happen:
process A reads row 10 which has status REGISTERED,
process B also reads row 10 which has still status REGISTERED,
process A updates row 10 to status EXECUTING.
Process B should not be able to read row 10 as process A already read it and is going to update its status.
How should I solve this? Put read and update in a transaction? Or should I use some versioning approach or something else?
Thanks!
EDIT: thanks to the accepted answer I got it working and documented it here: http://ludwigstuyck.wordpress.com/2013/02/28/concurrent-reading-and-writing-in-an-oracle-database.
You should use the built-in locking mechanisms of the database. Don't reinvent the wheel, especially since RDBMS are designed to deal with concurrency and consistency.
In Oracle 11g, I suggest you use the SKIP LOCKED feature. For example each process could call a function like this (assuming id are number):
CREATE OR REPLACE TYPE tab_number IS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION reserve_jobs RETURN tab_number IS
CURSOR c IS
SELECT id FROM IMPORTJOBS WHERE STATUS = 'REGISTERED'
FOR UPDATE SKIP LOCKED;
l_result tab_number := tab_number();
l_id number;
BEGIN
OPEN c;
FOR i IN 1..10 LOOP
FETCH c INTO l_id;
EXIT WHEN c%NOTFOUND;
l_result.extend;
l_result(l_result.size) := l_id;
END LOOP;
CLOSE c;
RETURN l_result;
END;
This will return 10 rows (if possible) that are not locked. These rows will be locked and the sessions will not block each other.
In 10g and before since Oracle returns consistent results, use FOR UPDATE wisely and you should not have the problem that you describe. For instance consider the following SELECT:
SELECT *
FROM IMPORTJOBS
WHERE STATUS = 'REGISTERED'
AND rownum <= 10
FOR UPDATE;
What would happen if all processes reserve their rows with this SELECT? How will that affect your scenario:
Session A gets 10 rows that are not processed.
Session B would get the same 10 rows, is blocked and waits for session A.
Session A updates the selected rows' statuses and commits its transaction.
Oracle will now (automatically) rerun Session B's select from the beginning since the data has been modified and we have specified FOR UPDATE (this clause forces Oracle to get the last version of the block).
This means that session B will get 10 new rows.
So in this scenario, you have no consistency problem. Also, assuming that the transaction to request a row and change its status is fast, the concurrency impact will be light.
Each process can issue a SELECT ... FOR UPDATE to lock the row when they read it. In this scenario, process A will read and lock the row, process B will attempt to read the row and block until process A releases the lock by committing (or rolling back) its transaction. Oracle will then determine whether the row still meets B's criteria and, in your example, won't return the row to B. This works but it means that your multi-threaded process may now be effectively single-threaded depending on how your transaction control needs to work.
Possible ways to improve scalability
A relatively common approach on the consumer to resolving this is to have a single coordinator thread that reads the data from the table, parcels out work to different threads, and updates the table appropriately (including knowing how to re-assign a job if the thread that was assigned it has died).
If you are using Oracle 11.1 or later, you can use the SKIP LOCKED clause on your FOR UPDATE so that each session gets back the first row that meets their criteria and is not locked (the clause existed in earlier versions but was not documented so it may not work correctly).
Rather than using a table for ImportJobs, you can use a queue with multiple consumers. This will allow Oracle to distribute messages to each process without you needing to build any additional locking (Oracle queues are doing it all behind the scenes).
Use versioning and optimistic concurrency.
The IMPORTJOBS table should have a timestamp column that you mark as ConcurrencyMode = Fixed in your model. Now when EF tries to do an update the timestamp column is incorporated in the update statement: WHERE timestamp = xxxxx.
For B, the timestamp changed in the mean time, so a concurrency exception is raised, which, in this case, you handle by skipping the update.
I'm from a SQL server background and I don't know the Oracle equivalent of timestamp (or rowversion), but the idea is that it's a field that auto-updates when an update is made to a record.

SQL Server 2008 change tracking by C# client

I have 2 applications, one an ASP.NET site the user interfaces with, and then a second C# application that schedules jobs for execution. I'm having trouble with change tracking in the C# application, so that it can respond to updates to the database from the user. The basic idea is this:
The user adds a new row to the Products database via the ASP.NET website. The C# application should then be notified of the new row, and it will spawn a Quartz.NET job to do something with this newly created product. If the product details are updated, the C# application is notified and updates the Quartz.NET job appropriately.
Pseudocode:
while(true) // it's not really a loop like this, but this suffices
{
var newProducts = from p in dc.Products where Added_On > DateTime.Now;
foreach(Product product in newProducts)
CreateNewJob(product);
}
Then, each job is required for tracking changes to it's individual column:
void Execute(...)
{
var product = from p in dc.Products where Id == this._product.Id;
if (CompareProduct(this._product, product) == false)
_product = product;
// do work with the product
}
or with a timestamp:
void Execute(...)
{
var updated = from p in dc.Products where Id == _product.Id && Updated_On > DateTime.Now;
if (updated != null)
_product = updated;
}
Since more often than not, the product's won't be changing, is there a better solution than to query for changes every time it executes?
The requirement is that if the data has changed, the Quartz.NET job should not execute with stale data. At the beginning of Execute() it should be the most recent changes. It is OK if the state changes during execution though.
I've looked at SqlDependency, but everything I've read says it does not tell you what changed, only that something changed. It seems infeasible for a large database to constantly pull down the full table, and do the comparison myself.
Change tracking also seems that it might be inefficient. I'd have to pull down all the changes, find the Quartz.NET job corresponding to the changed item, and update it with the most recent data.
Probably the best way is to do this the old-fashoined way -- use the sql TIMESTAMP column to do the change tracking yourself. Then your scheduler only need grab rows where TIMESTAMP is > [last runned TIMESTAMP] to schedule. When scheduled, it can check the row to see if the TIMESTAMP has changed and grab the new data before running.
Now, if you've got multiple tables to worry about, this gets to be a bit more complex.

Categories