How to prevent race conditions in a web application

How to prevent race conditions in a web application - c#

I have a web application where user register by clicking a button "Join". There can be so many users on the website and that is why in order to keep my database queries fast; I chose not to have foriegnkey constraint added in database(Though it is relational database).
Now what happens is when user with same userId opens the application in two different browsers and hit the "Join" button exactly at the same time; two rows for same user is added into database which is wrong.
The ideas I have to stop this are:
Do the check/insertion logic in stored procedure and within a transaction with SQL Transaction Isolation level as SERIALIZABLE; but with this approach table will be locked even if two different users would be hitting "JOIN" button at the same time.
Use lock keyword in c# and perform check/insertion logic from inside it but I believe if the same user from two browser they will acquire their own lock and would still be able to have two entries in database. Also for different users it might create a problem as other's code would be waiting for the first one to free the resources.
Use Optimistic concurrency which is supported out of box by EntityFramework but I am not sure if it will solve my problem.
Could you please help me with this?

You can easyly solve your problem by creating an unique index in the user name. So, only the first one will be saved. The next one will be reported as an error, because it would break the unique index.
In fact, it should be the primary key.
According to your comments, your table is huge. So it must be much worse to look for a row in the whole table without using an index on every insert operation, than updating the an index on each insert/delete/update operation. You should consider this.
Anyway, the only way to solve the problem of not inserting the value if already exists means checking it.
Optimistic concurrency has nothing to do with that. Optimistic concurrency has to do with reading data, modifying it, and saving changes, without locking the table. What optimistic concurrency does can be explained in this steps:
read the original row from the DB, without any locks or transactions
the app modifies the original row
when the app tries to save the changes, it checks if the row in the DB is exactly as it was when it was read on the step 1. If it is, the changes are saved. If it isn't a concurrency exception is thrown.
So optimistic concurrency will not help you.
I insist on using an unique index, which is the safest, most simple, and probably more preformant solution.

I would use Entity and its Optimistic Concurrency.
It will wrap it in a transaction and handle these problems for you. Remember to place both identity and a primary key on the table. In case the username has to be unique then add the unique annotation on the table.

Related

How to update/delete row in multi user environment

I am new to C# and SQL Server.
I am developing an application using Winforms.
I am using dataset.
In master and details transactions, suppose I retrieve one transaction from the database.
At the same time another user also retrieves the same transaction.
My requirement is when I am changing this transaction no one else should be allowed to update or delete the same transaction.
As dataset is in disconnect mode, how can I achieve the locking mechanism?

using (var transaction = connection.BeginTransaction())
{
adapter.Update(dataSet);
transaction.Commit();
}
If the update is to a small number of rows within a table, SQL server grants a row level lock to the caller. If the change is to a large number of rows, SQL server grants a table level lock. Its all automatic. Hence concurrency is taken care of.
The problem however is that with many users simultaneously working on the same set of rows, chance of a dead lock are high. The new CQRS design pattern promoted by Udi Dahan takes care of that. How ever if your application is small, applying CQRS would be an overkill.

If im correct in assuming you are using C# then you should look into some ORM frameworks as they can handle collisions like this for you or at least alert you when they have happened so you can handle them in your code. So you could for instance inform the user someone else has made a change and refresh their display or merge the changes and save the merged data.
Have a look at entity framework. There are literally loads of tutorials and examples available for you. This should get you started.
http://www.asp.net/entity-framework
http://msdn.microsoft.com/en-us/library/bb386876.aspx
This specifically references data concurrency (its MVC but the principles are the same)
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application

Entity Framework POCO long-term change tracking

I'm using .NET entity framework 4.1 with code-first approach to effectively solve the following problem, here simplified.
There's a database table with tens of thousands of entries.
Several users of my program need to be able to
View the (entire) table in a GridRow, which implied that the entire Table has to be downloaded.
Modify values of any random row, changes are frequent but need not be persisted immediately. It's expected that different users will modify different rows, but this is not always true. Some loss of changes is permitted, as users will most likely update same rows to same values.
On occasion add new rows.
Sounds simple enough. My initial approach was to use a long-running DbContext instance. This one DbContext was supposed to track changes to the entities, so that when SaveChanges() is called, most of the legwork is done automatically. However many have pointed out that this is not an optimal solution in the long run, notably here. I'm still not sure if I understand the reasons, and I don't see what a unit-of-work is in my scenario either. The user chooses herself when to persist changes, and let's say that client always wins for simplicity. It's also important to note that objects that have not been touched don't overwrite any data in the database.
Another approach would be to track changes manually or use objects that track changes for me, however I'm not too familiar with such techniques, and I would welcome a nudge in the right direction.
What's the correct way to solve this problem?
I understand that this question is a bit wishy-washy, but think of it as more fundamental. I lack fundamental understanding about how to solve this class of problems. It seems to me that long living DbContext is the right way, but knowledgeable people tell me otherwise, which leads me to confusion and imprecise questions.
EDIT1
Another point of confusion is the existance of Local property on the DbSet<> object. It invites me to use a long running context, as another user has posted here.

Problem with long running context is that it doesn't refresh data - I more discussed problems here. So if your user opens the list and modify data half an hour she doesn't know about changes. But in case of WPF if your business action is:
Open the list
Do as many actions as you want
Trigger saving changes
Then this whole is unit of work and you can use single context instance for that. If you have scenario where last edit wins you should not have problems with this until somebody else deletes record which current user edits. Additionally after saving or cancelling changes you should dispose current context and load data again - this will ensure that you really have fresh data for next unit of work.
Context offers some features to refresh data but it only refreshes data previously loaded (without relations) so for example new unsaved records will be still included.
Perhaps you can also read about MS Sync framework and local data cache.

Sounds to me like your users could have a copy (cached) of the data for an indefinate period of time. The longer the users are using cached data the greater the odds that they could become disconnected from the database connection in DbContext. My guess is EF doesn't handle this well and you probably want to deal with that. (e.g. occaisionally connected architecture). I would expect implementing that may solve many of your issues.

Auditing record changes in sql server databases

Using only microsoft based technologies (MS SQL Server, C#, EAB, etc) if you needed keep the track of changes done on a record in a database which strategy would you will use? Triggers, AOP on the DAL, Other? And how you will display the collected data? Is there a pattern about it? Is there a tool or a framework that help to implement this kind of solution?

The problem with Change Data capture is that it isn't flexible enough for real auditing. You can't add the columns you need. Also it dumps the records every three days by default (you can change this, but I don't think you can store forever) so you have to have a job dunping the records to a real audit table if you need to keep the data for a long time which is typical of the need to audit records (we never dump our audit records).
I prefer the trigger approach. You have to be careful when you write the triggers to ensure that they will capture the data if multiple records are changed. We have two tables for each table audited, one to store the datetime and id of the user or process that took the action and one to store the old and new data. Since we do a lot of multiple record processes this is critical for us. If someone reports one bad record, we want to be able to see if it was a process that made the change and if so, what other records might have been affected as well.
At the time you create the audit process, create the scripts to restore a set of audited data to the old values. It's a lot easier to do this when under the gun to fix things, if you already have this set up.

Sql Server 2008 R2 has this built-in - lookup Change Data Capture in books online

This is probably not a popular opinion, but I'm going to throw it out there anyhow.
I prefer stored procedures for all database writes. If auditing is required, it's right there in the stored procedure. There's no magic happening outside the code, everything that happens is documented right at the point where writes occur.
If, in the future, a table needs to change, one has to go to the stored procedure to make the change. The need to update the audit is documented right there. And because we used a stored procedure, it's simpler to "version" both the table and its audit table.

Checking if an item exists before saving

I've a SQL DB with various tables that save info about a product (it's for an online shop) and I'm coding in C#. There are options associated with a given product and as mentioned the info recorded about these options is spread across a few tables when saved.
Now when I come to edit this product in the CMS I see a list of the existing product options and I can add to that list or delete from it, as you'd expect.
When I save the product I need to check if the record already exists and if so update it, if not then save a new record. I'm trying to find an efficient way of doing this. It's very important that I maintain the ID's associated with the product options so clearing them all out each time and re-saving them isn't viable unfortunately.
To describe again, possibly more clearly: Imagine I have a collection of options when I load the product, this is loaded into memory and added to / deleted from depending on what the user chooses. When they click 'Save' I need to check what options are updates and what ones are new to the list.
Any suggestions of an efficient way of doing this?
Thanks.

If the efficiency you are looking to achieve is in relation to the number of round trips to the database then you could write a stored procedure to do the update or insert for you.
In most cases however it's not really necessary to avoid the SELECT first, provided you have appropriate primary keys or unique indices on your tables this should be very quick.
If the efficiency is in terms of elegant or reduced code on the server side then I would look at using some sort of ORM, for example Entity Framework 4.0. With a proper ORM architecture you can almost stop thinking in terms of the database records and INSERT/UPDATE and just work with a collection of objects in memory.

I usually do this by performing the following:
For each item, execute an update query that will update the item if it exists.
After each update, check how many rows were updated (using ##ROWCOUNT in SQL Server). If zero rows were updated, execute an insert to create the row.
Alternatively, you can do the opposite, if you create a unique constraint that prevents duplicate rows:
For each item, try to insert it.
If the insert fails because of theconstraint (check the error code), perform the update instead.

Run a select query checking for the ID. If it exists then you need to update. If it does not exist then you need to insert.
Without more details I'm not really sure what else to tell you. This is fairly standard.

Why is the LinqToSql generated Where clause so convoluted?

I receive a daily XML file which I use to update a database with the content. The file is always a complete file, i.e. everything is included whether it is changed or not. I am using Linq2Sql to update the database and I was debating whether to check if anything had changed in each record (most will not change) and only update those which did change, or just update each record with the current data.
I feel that I need to hit the database with an update for each record to enable me to weed out the records which are not included in the xml file. I am setting a processed date on each record, then revisiting those not processed to delete them. Then I wondered whether I should just find the corresponding record in the database ad update the object with the current information whether it has changed or not. That led me to taking a closer look at the sql generated for updates. I found that only the data which has changed is set in the update statement to the database, but I found that the WHERE clause includes all of the columns in the record, not just the primary key. This seems very wasteful in terms of data flying around the system and therefore set me wondering why this is the case and whether there is setting for the LinqToSql context to use only the primary key in the clause.
So I have two questions:
Why does LinqToSql where clause include all of the current data, not just the primary key?
Is there a way to configure the context to only use the primary key in the where clause?

This is optimistic concurrency - it's basically making sure that it doesn't stomp on changes made by anything else. You can tweak the concurrency settings in various ways, although I'm not an expert on it.
The MSDN page for Linq to Sql optimistic concurrency is a good starting point.
If you have a column representing the "version" of the row (e.g. an autoupdated timestamp) you can use just that - or you can just set UpdateCheck=Never on all the columns if you know nothing else will have changed the data.
You haven't really described enough about "your use of the processed date" to answer the third point.

To answer #2, in the dbml designer, set the property "Update Check" equal to "Never" on the column level for each column in the table to avoid the generation of massive where clauses.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.