Avoiding concurrent access of data in MSSQL - c#

We are developing a C# application that used to work as a single instance application. Now we need to change it to be a multi-user application, meaning the GUI front-end will be run on multiple workstations while accessing a single MS SQL Server 2008 R2 data store.
Part of the work this application manages is queue based, meaning there's a pool of workitems (the list of workitems is in a single SQL table) from which each user can "take" the next available workitem. What I want to accomplish are the following:
once a workitem is "taken" by a user, no other user should have access to it in any way (including reading) until the first user finished working,
handle timeouts (user goes home for the weekend while workitem is taken) and frozen clients (reset button is pressed on the station while workitem is taken).
I know this is a rather general question (much rather a research), so I'm not expecting a detailed solution, but useful links, best practices and/or some literature to read on the subject. Any help is really appreciated since I'm completely lost where to start.

I've seen this done with a transactional resource lock table or column. For example, you assign the record to someone (be it by setting a user ID or some other mechanism) and you simultaneously set a timestamped record as to when that resource was locked. When accessing the data, be it querying it or trying to update it, you first check this lock table/column to make sure it's available. If not, you don't take the changes.
This also supports timeouts then. If the timestamp is too old, the lock is released. You can automatically assumed release if the timestamp is too old, or you can write a scheduled service that will check for expired locks and unlock them. I'd prefer the second way, as it is less costly to check if a lock is there (boolean logic for row exists or if field value exists [i.e. is not null]). But I've seen it done both ways.

Related

Blocking editing fields when already opened by another user [duplicate]

I have a SQL Server 2008 database and an asp.net frontend.
I would like to implement a lock when a user is currently editing a record but unsure of which is the best approach.
My idea is to have an isLocked column for the records and it gets set to true when a user pulls that record, meaning all other users have read only access until the first user finishes the editing.
However, what if the session times out and he/she never saves/updates the record, the record will remain with isLocked = true, meaning others cannot edit it, right?
How can I implement some sort of session time out and have isLocked be automatically set to false when the session times out (or after a predefined period)
Should this be implemented on the asp.net side or the SQL side?
Don't do it at all. Use optimistic concurrency instead.
Pessimistic locking is possible, but not from .Net applications. .Net app farms are not technically capable of maintaining a long lived session to keep a lock (obtained via sp_getapplock or, worse, obtained by real data locking) because .Net app farms:
load balance requests across instances
do not keep a request stack between HTTP calls
recycle the app domain
Before you say 'I don't have a farm, is only one IIS server' I will point out that you may only have one IIS server now and if you rely on it you will never be able to scale out, and you still have the problem of app-domain recycle.
Simulating locking via app specific updates (eg. 'is_locked' field) is deeply flawed in real use, for reasons you already started to see, and many more. When push comes to shove this is the only approach that can be made to work, but I never heard of anyone saying 'Gee, I'm really happy we implemented pessimistic locking with data writes!'. Nobody, ever.
App layer locking is also not workable, for exactly the same reasons .Net farms cannot use back-end locking (load-balancing, lack of context between calls, app-domain recycle). Writing a distributed locking app-protocol is just not going to work, that road is paved with bodies.
Just don't do it. Optimistic concurrency is sooooo much better in every regard.

Scalability and availability

I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.
I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.

What's the best way to manage concurrency in a database access application?

A while ago, I wrote an application used by multiple users to handle trades creation.
I haven't done development for some time now, and I can't remember how I managed the concurrency between the users. Thus, I'm seeking some advice in terms of design.
The original application had the following characteristics:
One heavy client per user.
A single database.
Access to the database for each user to insert/update/delete trades.
A grid in the application reflecting the trades table. That grid being updated each time someone changes a deal.
I am using WPF.
Here's what I'm wondering:
Am I correct in thinking that I shouldn't care about the connection to the database for each application? Considering that there is a singleton in each, I would expect one connection per client with no issue.
How can I go about preventing the concurrency of the accesses? I guess I should lock when modifying the data, however don't remember how to.
How do I set up the grid to automatically update whenever my database is updated (by another user, for example)?
Thank you in advance for your help!
Consider leveraging Connection Pooling to reduce # of connections. See: http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx
lock as late as possible and release as soon as possible to maximize concurrency. You can use TransactionScope (see: http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx and http://blogs.msdn.com/b/dbrowne/archive/2010/05/21/using-new-transactionscope-considered-harmful.aspx) if you have multiple db actions that need to go together to manage consistency or just handle them in DB stored proc. Keep your query simple. Follow the following tips to understand how locking work and how to reduce resource contention and deadlock: http://www.devx.com/gethelpon/10MinuteSolution/16488
I am not sure other db, but for SQL, you can use SQL Dependency, see http://msdn.microsoft.com/en-us/library/a52dhwx7(v=vs.80).aspx
Concurrency is usually granted by the DBMS using locks. Locks are a type of semaphore that grant the exclusive lock to a certain resource and allow other accesses to be restricted or queued (only restricted in the case you use uncommited reads).
The number of connections itself does not pose a problem while you are not reaching heights where you might touch on the max_connections setting of your DBMS. Otherwise, you might get a problem connecting to it for maintenance purposes or for shutting it down.
DBMSes usually use a concept of either table locks (MyISAM) or row locks (InnoDB, most other DBMSes). The type of lock determines the volume of the lock. Table locks can be very fast but are usually considered inferior to row level locks.
Row level locks occur inside a transaction (implicit or explicit). When manually starting a transaction, you begin your transaction scope. Until you manually close the transaction scope, all changes you make will be attributes to this exact transaction. The changes you make will also obey the ACID paradigm.
Transaction scope and how to use it is a topic far too long for this platform, if you want, I can post some links that carry more information on this topic.
For the automatic updates, most databases support some kind of trigger mechanism, which is code that is run at specific actions on the database (for instance the creation of a new record or the change of a record). You could post your code inside this trigger. However, you should only inform a recieving application of the changes, not really "do" the changes from the trigger, even if the language might make it possible. Remember that the action which triggered the code is suspended until you finish with your trigger code. This means that a lean trigger is best, if it is needed at all.

Prevent duplicate editing / Locking DB records while editing - single backend server

Situation: multiple front-ends (e.g. Silverlight, ASP) sharing a single back-end server (WCF RIA or other web service).
I am looking for a standard to prevent multiple people from editing the same form. I understand that this is not an easy topic, but requirements are requirements.
Previously I used the DB last modified date against the submitted data and give a warning or error if the data was modified since it was loaded. The initial system simply overrode the data without any warning. The problem is that I have a new requirement to prevent both these situations. There will be many UIs, so a locking system might be a challenge, and there is obviously no guarantee that the client will not close the window/browser in the middle of an edit.
I would appreciate any help.
If I'm correct, it seems what you are talking about is a form of check-out/edit/check-in style workflow. You want when one user is editing a record, no other users can even begin to edit the same record.
This is a form of pessimistic concurrency. Many web and data access frameworks have support for (the related) optimistic concurrency - that is, they will tell you that someone else already changed the record when you tried to save. Optimistic has no notion of locking, really - it makes sure that no other user saved between the time you fetched and the time you save.
What you want is not an easy requirement over the web, since the server really has no way to enforce the check-in when a user aborts an edit (say, by closing the browser). I'm not aware of any frameworks that handle this in general.
Basically what you need is to hold checkout information on the server. A user process when editing would need to request a checkout, and the server would grant/deny this based on what they are checking out. The server would also have to hold the information that the resource is checked out. When a user saves the server releases the lock and allows a new checkout when requested. The problem comes when a user aborts the edit - if it's through the UI, no problem... just tell the server to release the lock.
But if it is through closing the browser, powering off the machine, etc then you have an orphaned lock. Most people solve this one of two ways:
1. A timeout. The lock will eventually be released. The upside here is that it is fairly easy and reliable. The downsides are that the record is locked for a while where it's not really in edit. And, you must make your timeout long enough that if the user takes a really, really long time to save they don't get an error because the lock timed out (and they have to start over).
2. A heartbeat. The user has a periodic ping back to the server to say "yep, still editing". This is basically the timeout option from #1, but with a really short timeout that can be refreshed on demand. The upside is that you can make it arbitrarily short. The downside is increased complexity and network usage.
Checkin/checkout tokens are really not that hard to implement if you already have a transacted persistant store (like a DB): the hard part is integrating it into your user experience.

C#/SQL Database listener

I have a requirement to monitor the Database rows continuously to check for the Changes(updates). If there are some changes or updates from the other sources the Event should be fired on my application (I am using a WCF). Is there any way to listen the database row continuously for the changes?
I may be having more number of events to monitor different rows in the same table. is there any problem in case of performance. I am using C# web service to monitor the SQL Server back end.
You could use an AFTER UPDATE trigger on the respective tables to add an item to a SQL Server Service Broker queue. Then have the queued notifications sent to your web service.
Another poster mentioned SqlDependency, which I also thought of mentioning but the MSDN documentation is a little strange in that it provides a windows client example but also offers this advice:
SqlDependency was designed to be used
in ASP.NET or middle-tier services
where there is a relatively small
number of servers having dependencies
active against the database. It was
not designed for use in client
applications, where hundreds or
thousands of client computers would
have SqlDependency objects set up for
a single database server.
Ref.
I had a very similar requirement some time ago, and I solved it using a CLR SP to push the data into a message queue.
To ease deployment, I created an CLR SP with a tiny little function called SendMessage that was just pushing a message into a Message Queue, and tied it to my tables using an AFTER INSERT trigger (normal trigger, not CLR trigger).
Performance was my main concern in this case, but I have stress tested it and it greatly exceeded my expectations. And compared to SQL Server Service Broker, it's a very easy-to-deploy solution. The code in the CLR SP is really trivial as well.
Monitoring "continuously" could mean every few hours, minutes, seconds or even milliseconds. This solution might not work for millisecond updates: but if you only have to "monitor" a table a few times a minute you could simply have an external process check a table for updates. (If there is a DateTime column present.) You could then process the changed or newly added rows and perform whatever notification you need to. So you wouldn't be listening for changes, you'd be checking for them. One benefit of doing the checking in this manner would be that you wouldn't risk as much of a performance hit if a lot of rows were updated during a given quantum of time since you'd bulk them together (as opposed to responding to each and every change individually.)
I pondered the idea of a CLR function
or something of the sort that calls
the service after successfully
inserting/updating/deleting data from
the tables. Is that even good in this
situation?
Probably it's not a good idea, but I guess it's still better than getting into table trigger hell.
I assume your problem is you want to do something after every data modification, let's say, recalculate some value or whatever. Letting the database be responsible for this is not a good idea because it can have severe impacts on performance.
You mentioned you want to detect inserts, updates and deletes on different tables. Doing it the way you are leaning towards, this would require you to setup three triggers/CLR functions per table and have them post an event to your WCF Service (is that even supported in the subset of .net available inside sql server?). The WCF Service takes the appropriate actions based on the events received.
A better solution for the problem would be moving the responsibility for detecting data modification from your database to your application. This can actually be implemented very easily and efficiently.
Each table has a primary key (int, GUID or whatever) and a timestamp column, indicating when the entry was last updated. This is a setup you'll see very often in optimistic concurrency scenarios, so it may not even be necessary to update your schema definitions. Though, if you need to add this column and can't offload updating the timestamp to the application using the database, you just need to write a single update trigger per table, updating the timestamp after each update.
To detect modifications, your WCF Service/Monitoring application builds up a local dictionay (preferably a hashtable) with primary key/timestamp pairs at a given time interval. Using a coverage index in the database, this operation should be really fast. The next step is to compare both dictionaries and voilá, there you go.
There are some caveats to this approach though. One of them is the sum of records per table, another one is the update frequency (if it gets too low it's ineffective) and yet another pinpoint is if you need access to the data previous to modification/insertion.
Hope this helps.
Why don't you use SQL Server Notification service? I think that's the exact thing you are looking for. Go through the documentation of notification services and see if that fits your requirement.
I think there's some great ideas here; from the scalability perspective I'd say that externalizing the check (e.g. Paul Sasik's answer) is probably the best one so far (+1 to him).
If, for some reason, you don't want to externalize the check, then another option would be to use the HttpCache to store a watcher and a callback.
In short, when you put the record in the DB that you want to watch, you also add it to the cache (using the .Add method) and set a SqlCacheDependency on it, and a callback to whatever logic you want to call when the dependency is invoked and the item is ejected from the cache.

Categories