How to totally lock a row in Entity Framework

How to totally lock a row in Entity Framework - c#

I am working with a situation where we are dealing with money transactions.
For example, I have a table of users wallets, with their balance in that row.
UserId; Wallet Id; Balance
Now in our website and web services, every time a certain transaction happens, we need to:
check that there is enough funds available to perform that transaction:
deduct the costs of the transaction from the balance.
How and what is the correct way to go about locking that row / entity for the entire duration of my transaction?
From what I have read there are some solutions where EF marks an entity and then compares that mark when it saves it back to the DB, however what does it do when another user / program has already edited the amount?
Can I achieve this with EF? If not what other options do I have?
Would calling a stored procedure possibly allow for me to lock the row properly so that no one else can access that row in the SQL Server whilst program A has the lock on it?

EF doesn't have built-in locking mechanism, you probably would need to use raw query like
using (var scope = new TransactionScope(...))
{
using (var context = new YourContext(...))
{
var wallet =
context.ExecuteStoreQuery<UserWallet>("SELECT UserId, WalletId, Balance FROM UserWallets WITH (UPDLOCK) WHERE ...");
// your logic
scope.Complete();
}
}

you can set the isolationlevel on the transaction in Entity framework to ensure no one else can change it:
YourDataContext.Database.BeginTransaction(IsolationLevel.RepeatableRead)
RepeatableRead
Summary:
Locks are placed on all data that is used in a query, preventing other users from updating the data. Prevents non-repeatable reads but phantom rows are still possible.

The whole point of a transactional database is that the consumer of the data determines how isolated their view of the data should be.
Irrespective of whether your transaction is serialized someone else can perform a dirty read on the same data that you just changed, but did not commit.
You should firstly concern yourself with the integrity of your view and then only accept a degredation of the quality of that view to improve system performance where you are sure it is required.
Wrap everthing in a TransactionScope with Serialized isolation level and you personally cannot really go wrong. Only drop the isolation level when you see it is genuinely required (i.e. when getting things wrong sometimes is OK).
Someone asks about this here: SQL Server: preventing dirty reads in a stored procedure

Related

What is the proper way to lock rows in SQL while a user works with the data?

I am currently working on some kind of ERP like WPF application with SQL Server as the database.
Up to now, I had to work only with small tasks that does not need row locking on the server side. So the basic was "Create SQLConnection-> Select Data in the DataTable -> close connection".
Now I would like to create the functionality to work on orders.
How could I Lock the records that has been selected till the user finishes the work so no other user can read that rows?
I think I should use transactions, but I am not sure how to keep the transaction alive until the next statement, because I am closing the connection after each command.

Locking data like that is a bad practice. A transaction is intended to ensure that your data is completely saved or not at all. They are not intended to lock the data for the reason specified in your question.
It sounds like the data being entered could be a lot so you don't want a user spending time entering data to only be met with an error because someone else changed the data. You could have a locked_by column that you set when a user is editing the data and simply not allow anyone else to edit the data if that column is not NULL. You could still allow reads of the data or exclude locked data from view with queries depending on your need.
You may also want to include a locked_time column so you know when it was locked. You could then clear the lock if it's stale, or at least query how long it's been locked allowing for an admin user to look for lengthy locks so they can contact that user or clear the lock.
The query could look like this:
UPDATE Table SET locked_by = #lockedByUser, locked_time = #lockedTime
WHERE Id = #fetchId and locked_by IS NULL
SELECT * FROM Table WHERE locked_by = #lockedByUser
If no data is returned, the lock failed or the id doesn't exist. Either way, the data isn't available. You could retrieve the records updated count, to also verify if the lock was successful or not.

Don't close the connection
open transaction
on the select use an uplock so record(s) are locked
perform updates
commit or rollback the transaction
Put some type of timer on it.

One way to handle concurrency via application is implement some kind of "LastServerUpdateDateTime" column on the table you are working on.
When User A pulls the data for a row the ViewModel will have that LastServerUpdateDateTime value saved. Your User A does their updates and then try to save back to the DB. If the LastServerUpdateDateTime value is the same, then that means there was no updates while you were working and you are good to save (and LastServerUpdateDateTime is also updated). If at any point while User A is working on a set of data on the application side, and User B comes in makes their changes and saves, then when User A eventually saves the LastServerUpdateDateTime will be different than what they initially pulled down and save will be rejected. Yes User A then has to redo their changes, but it shouldn't happen often (depending on your application of course) and you don't have to deal with direct DB locking or anything like that.

I will describe the mechanism that I have used with success in the past.
1) Create a document ID table. In this table, each record represents a document type and an ID which can be incremented whenever a new document is created. The importance of this table is really as a root lock; the document ID is not strictly needed.
2) Create a lock table. In this table, each record represents a lock which includes a reference to a document record, a reference to the lock owner, and some additional data such as when the lock was created, when it was last acted upon, its status, or anything else you find useful. Each record means "user A holds a lock on document type X, document ID Y".
3) When locking a document (fetch + lock), lock (SELECT/UPDATE) the relevant record in the document ID table. Then, check the lock table for an existing lock record, and INSERT a new one as appropriate. At this point you may choose to over-write an existing lock, or return an error to the user.
4) When updating a document, again lock (SELECT/UPDATE) the relevant record in the document ID table. Then verify the user holds a lock, and if so do the actual update, and then DELETE the lock record. If the user does not hold a lock, you may choose to allow the update if no other user holds a lock, or return an error.
With this mechanism, a user goes through a open/lock operation, and a save/unlock, or discard/unlock operation. Additionally, locks can be removed by a cron job or by an administrator, in case users fail to update or discard (which they will).
This approach avoids holding record locks and transactions open for long periods of time, which can cause concurrency issues. It also allows locks to survive software crashes. It also allows all kinds of flexibility; for example, my implementation allowed a lock to be "demoted" after some period of time, and once a lock was demoted, it could be over-written by an ordinary user, while still allowing the owner to perform an update as long as the lock remained.

Entity Framework insertion performance

Im testing using Entity Framework with a Azure Sql db.
When inserting 1 record, the action takes 400ms. When adding 20 it is 2500ms.
400ms for inserting 1 record via EF seems like a lot.
What is the normal performance rate for EF?
Am I doing something wrong?
Im aware that bulk insertion can be improved, but I thought that a single insert could be done a lot faster!?
var start = DateTime.Now;
testdbEntities testdbEntities = new testdbEntities();
for (int i = 0; i < 20; i++)
testdbEntities.Users.Add(new User{Name = "New user"});
testdbEntities.SaveChanges();
var end = DateTime.Now;
var timeElapsed = (end - start).TotalMilliseconds;

All common tricks like:
AutoDetectChangesEnabled = false
Use AddRange over Add
Etc.
Will not work like you already have noticed since the performance problem is not within Entity Framework but with SQL Azure
SQL Azure may look pretty cool at first but it's slow as hell unless you paid for a very good Premium Database Tier.
As Evk recommended, you should try to execute a simple SQL Command like "SELECT 1" and you will notice this probably take more than 100ms which is ridiculously slow.
Solution:
Move to a better SQL Azure Tier
Move away from SQL Azure
Disclaimer: I'm the owner of the project Entity Framework Extensions
Another solution is using this library which will batch multiple queries/bulk operations. However again, even if this library is very fast, you will need a better SQL Azure Tier since it look every database round-trip take more than 200ms in your case.

Each insert results in a commit and causes log harden (flush to disk). In case of writing in batches this may not result in one flush per insert (until log buffers full). So try to batch the results somehow, for example using TVFs

You can disable the auto detect changes during your insert. It can really improve performance. https://msdn.microsoft.com/en-us/data/jj556205.aspx
I hope it helps :)

Most EF applications make use of persistent ignorant POCO entities and snapshot change tracking. This means that there is no code in the entities themselves to keep track of changes or notify the context of changes.
When using most POCO entities the determination of how an entity has changed (and therefore which updates need to be sent to the database) is handled by the Detect Changes algorithm. Detect Changes works by detecting the differences between the current property values of the entity and the original property values that are stored in a snapshot when the entity was queried or attached.
Snapshot change detection takes a copy of every entity in the system when they are added to the Entity Framework tracking graph. Then as entities change each entity is compared to its snapshot to see any changes. This occurs by calling the DetectChanges method. Whats important to know about DetectChanges is that it has to go through all of your tracked entities each time its called, so the more stuff you have in your context the longer it takes to traverse.
What Auto Detect Changes does is plugs into events which happen on the context and calls detect changes as they occur.
Whenever you are adding a new User object, EF is internally tracking it & keeping the current state of newly added object in its snapshot.
For bulk insert operations, EF will first insert all records into the DB & then call DetectChanges function. So execution time required for bulk insert is (time required to insert all records + time required for updating EF context).
You can make your DB insertion relatively faster by disabling AutoDetectChanges. So your code will look like,
using (var context = new YourContext())
{
try
{
context.Configuration.AutoDetectChangesEnabled = false;
// do your DB operations
}
finally
{
context.Configuration.AutoDetectChangesEnabled = true;
}
}

Entity Framework Concurrency insertion issue

We are working on a MVC application. In this application we have a payment module. Once user starting a recurring subscription, application will get two responses from paypal for payment complete, with the same TransactionId.
One is through “Success Url” and other one is through IPN listener.
We are using a “Transaction”table to keep paypal transaction details.
Application will check whether the “TransactionId” exist in the database, while getting a response from Paypal. So net result is first response from paypal will insert to “Transaction” table.
Recently We are having issues related with Entity Frame work concurrency. If the two response parellay comes, both the two records are inserting to the transaction table with the same “trnasction id”, even if we have the code for check existence of transactionid .
How do we prevent this duplicate insertion?
Both insertion is happening from different CONTEXT.
var ipnDetail = unitOfWork.TransactionDetailRepository.GetTransaction(transactionNumber);
if (ipnDetail == null)
{
}
We are using same code for both insertion. Only difference is we are calling from different EF Context.
You can also note the first inserted entry having greater time than second inserted record. Actually we are setting the date from code.
How do we solve this concurrency issue?
We tried to use a “ObjectContext.Refresh” for a solution. But it does not help us.
((IObjectContextAdapter)context).ObjectContext.Refresh(System.Data.Objects.RefreshMode.StoreWins, ((IObjectContextAdapter)context).ObjectContext.ObjectStateManager.GetObjectStateEntries(EntityState.Added));
Any help would be appreciable. Please note that application is in production environment.
Best Regards,
Ranish

If you have SQL Server 2014 or greater, the merge command is exactly what you need. It allows you to put the if condition at the right place in the operation.
The example below inserts your new transactionId if it does not exist in the database. Most alternatives involve a query followed by an insert, leaving a window in which another connection can sneak in an insert before yours completes.
You can find resources on the internet about calling a stored procedure from entity framework.
CREATE proc [dbo].[usp_InsertNewTransactionId](#transactionDate datetime2, #transactionId varchar(255))
as
begin
;with data as (select #transactionDate as transactionDate, #transactionId as transactionId)
merge transactions t
using data s
on s.transactionId = t.transactionId
when not matched by target
then insert ([date],transactionId) values (s.transactionDate, s.transactionId);
end

Wrap the entire checking and inserting logic inside TransactionScope.
using (var scope = new TransactionScope(TransactionScopeOption.RequiresNew))
{
// read & write logic here
scope.Complete();
}
RequiresNew will cause a new transaction to be used and it should be blocked on the db level so your another request should be waiting until the first has completed the transaction either by adding the Id or not.

How do I lock a table from write operations until my insert is complete with Linq-to-sql

I am working on an auction system and one of the issues I am trying to make sure I don't get affected by is a situation where 2 people put in a bid at the exact same time for the same item.
To do this I need to put a lock on the table, get the highest bid for the current item, make sure the entered bid is greater than that bid, add a new bid entry into the table, then unlock the table.
I need to lock this so a second webserver does not trigger a bid insert between when I check for the highest bid and when I insert my new bid into the table, as this would cause data issues.
How do I accomplish this with Linq-to-sql?
Note, I don't know if transactionscopes can do this but I can't use them, as they tend to trigger a distributed transaction due to our webfarm setup, and I can't use distributed transactions.

There seem to be a couple of obstacles implementing a solution in pure Linq:
You should definitely avoid a table lock
A table lock would make it impossible for several items to be bid on during the processing of one single bid, thus severely harming performance
Linq to SQL does not seem to support pessimistic locking
as stated in other answers on SO.
If you cannot have transactions in your code, I suggest the following procedure:
generate a GUID for your operation
pseudo-lock the item's record using the guid:
UPDATE Items SET LockingGuid = #guid
WHERE ItemId = #ItemId and LockingGuid IS NULL
SELECT #recordsaffected = ##ROWCOUNT
the lock succeeded if ##rowcount == 1
perform your bidding operation
UPDATE the record back to LockingGuid = NULL
if the lock fails, either raise the failure to the .Net client, or busy-wait using WAITFOR.
You should implement proper exception handling so that item records do not get locked indefinitely by a dying or failing process, probably by adding a datetime column storing the timestamp the lock occurred, and cleaning up orphaned locks.
If your architecture allows for separate backend operation, you might want to have a look and CQRS and Event Sourcing for processing such bidding operations.

You could use a separate table to store information when this processing occurs. For example, your second table could be something like:
Table name:
ItemProcessing
Columns:
ItemId (int)
ProcessingToken (guid)
When a process wants to check on a current bid, it writes the ID of the item and a token/guid to the ItemProcessing table. That tells other processes that this item is currently being inspected. If there is already a row in the ItemProcessing table for this item, the other process must wait or abort. When the original process is done, it removes the token (sets it to null), or removes the row from ItemProcessing altogether. Then other processes know they can process that item.
Of course, you'll need a way to make sure both processes don't write to this processing table at the same time. You could accomplish that by inserting into this table where ProcessingToken is null. If another table just beat a process to it, the second process won't be able to insert because the ProcessingToken will exist.
While not a full solution, in detail, that's the basic idea.

You can manually begin a transaction and pass that transaction to the DataContext.
http://geekswithblogs.net/robp/archive/2009/04/02/your-own-transactions-with-linq-to-sql.aspx
I think it is necessary as well to manually control the opening and closing of the Connection to avoid an unwanted escalation to a distributed transaction. It seems that the DataContext will actually get in its own way and try to open two connections sometimes, thus causing a promotion to a distributed transaction.

Mid-Tier Help Needed

In one sentence, what i ultimately need to know is how to share objects between mid-tier functions w/ out requiring the application tier to to pass the data model objects.
I'm working on building a mid-tier layer in our current environment for the company I am working for. Currently we are using primarily .NET for programming and have built custom data models around all of our various database systems (ranging from Oracle, OpenLDAP, MSSQL, and others).
I'm running into issues trying to pull our model from the application tier and move it into a series of mid-tier libraries. The main issue I'm running into is that the application tier has the ability to hang on to a cached object throughout the duration of a process and make updates based on the cached data, but the Mid-Tier operations do not.
I'm trying to keep the model objects out of the application as much as possible so that when we make a change to the underlying database structure, we can edit and redeploy the mid-tier easily and multiple applications will not need to be rebuilt. I'll give a brief update of what the issue is in pseudo-code, since that is what us developers understand best :)
main
{
MidTierServices.UpdateCustomerName("testaccount", "John", "Smith");
// since the data takes up to 4 seconds to be replicated from
// write server to read server, the function below is going to
// grab old data that does not contain the first name and last
// name update.... John Smith will be overwritten w/ previous
// data
MidTierServices.UpdateCustomerPassword("testaccount", "jfjfjkeijfej");
}
MidTierServices
{
void UpdateCustomerName(string username, string first, string last)
{
Customer custObj = DataRepository.GetCustomer(username);
/*******************
validation checks and business logic go here...
*******************/
custObj.FirstName = first;
custObj.LastName = last;
DataRepository.Update(custObj);
}
void UpdateCustomerPassword(string username, string password)
{
// does not contain first and last updates
Customer custObj = DataRepository.GetCustomer(username);
/*******************
validation checks and business logic go here...
*******************/
custObj.Password = password;
// overwrites changes made by other functions since data is stale
DataRepository.Update(custObj);
}
}
On a side note, options I've considered are building a home grown caching layer, which takes a lot of time and is a very difficult concept to sell to management. Use a different modeling layer that has built in caching support such as nHibernate: This would also be hard to sell to management, because this option would also take a very long time tear apart our entire custom model and replace it w/ a third party solution. Additionally, not a lot of vendors support our large array of databases. For example, .NET has LINQ to ActiveDirectory, but not a LINQ to OpenLDAP.
Anyway, sorry for the novel, but it's a more of an enterprise architecture type question, and not a simple code question such as 'How do I get the current date and time in .NET?'
Edit
Sorry, I forgot to add some very important information in my original post. I feel very bad because Cheeso went through a lot of trouble to write a very in depth response which would have fixed my issue were there not more to the problem (which I stupidly did not include).
The main reason I'm facing the current issue is in concern to data replication. The first function makes a write to one server and then the next function makes a read from another server which has not received the replicated data yet. So essentially, my code is faster than the data replication process.
I could resolve this by always reading and writing to the same LDAP server, but my admins would probably murder me for that. The specifically set up a server that is only used for writing and then 4 other servers, behind a load balancer, that are only used for reading. I'm in no way an LDAP administrator, so I'm not aware if that is standard procedure.

You are describing a very common problem.
The normal approach to address it is through the use of Optimistic Concurrency Control.
If that sounds like gobbledegook, it's not. It's pretty simple idea. The concurrency part of the term refers to the fact that there are updates happening to the data-of-record, and those updates are happening concurrently. Possibly many writers. (your situation is a degenerate case where a single writer is the source of the problem, but it's the same basic idea). The optimistic part I'll get to in a minute.
The Problem
It's possible when there are multiple writers that the read+write portion of two updates become interleaved. Suppose you have A and B, both of whom read and then update the same row in a database. A reads the database, then B reads the database, then B updates it, then A updates it. If you have a naive approach, then the "last write" will win, and B's writes may be destroyed.
Enter optimistic concurrency. The basic idea is to presume that the update will work, but check. Sort of like the trust but verify approach to arms control from a few years back. The way to do this is to include a field in the database table, which must be also included in the domain object, that provides a way to distinguish one "version" of the db row or domain object from another. The simplest is to use a timestamp field, named lastUpdate, which holds the time of last update. There are other more complex ways to do the consistency check, but timestamp field is good for illustration purposes.
Then, when the writer or updater wants to update the DB, it can only update the row for which the key matches (whatever your key is) and also when the lastUpdate matches. This is the verify part.
Since developers understand code, I'll provide some pseudo-SQL. Suppose you have a blog database, with an index, a headline, and some text for each blog entry. You might retrieve the data for a set of rows (or objects) like this:
SELECT ix, Created, LastUpdated, Headline, Dept FROM blogposts
WHERE CONVERT(Char(10),Created,102) = #targdate
This sort of query might retrieve all the blog posts in the database for a given day, or month, or whatever.
With simple optimistic concurrency, you would update a single row using SQL like this:
UPDATE blogposts Set Headline = #NewHeadline, LastUpdated = #NewLastUpdated
WHERE ix=#ix AND LastUpdated = #PriorLastUpdated
The update can only happen if the index matches (and we presume that's the primary key), and the LastUpdated field is the same as what it was when the data was read. Also note that you must insure to update the LastUpdated field for every update to the row.
A more rigorous update might insist that none of the columns had been updated. In this case there's no timestamp at all. Something like this:
UPDATE Table1 Set Col1 = #NewCol1Value,
Set Col2 = #NewCol2Value,
Set Col3 = #NewCol3Value
WHERE Col1 = #OldCol1Value AND
Col2 = #OldCol2Value AND
Col3 = #OldCol3Value
Why is it called "optimistic"?
OCC is used as an alternative to holding database locks, which is a heavy-handed approach to keeping data consistent. A DB lock might prevent anyone from reading or updating the db row, while it is held. This obviously has huge performance implications. So OCC relaxes that, and acts "optimistically", by presuming that when it comes time to update, the data in the table will not have been updated in the meantime. But of course it's not blind optimism - you have to check right before update.
Using Optimistic Cancurrency in practice
You said you use .NET. I don't know if you use DataSets for your data access, strongly typed or otherwise. But .NET DataSets, or specifically DataAdapters, include built-in support for OCC. You can specify and hand-code the UpdateCommand for any DataAdapter, and that is where you can insert the consistency checks. This is also possible within the Visual Studio design experience.
(source: asp.net)
If you get a violation, the update will return a result showing that ZERO rows were updated. You can check this in the DataAdapter.RowUpdated event. (Be aware that in the ADO.NET model, there's a different DataAdapter for each sort of database. The link there is for SqlDataAdapter, which works with SQL Server, but you'll need a different DA for different data sources.)
In the RowUpdated event, you can check for the number of rows that have been affected, and then take some action if the count is zero.
Summary
Verify the contents of the database have not been changed, before writing updates. This is called optimistic concurrency control.
Other links:
MSDN on Optimistic Concurrency Control in ADO.NET
Tutorial on using SQL Timestamps for OCC

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.