I have a C# console application which does some processing and then writes to the database. I have it deployed multiple times on a server with different config settings to do slightly different things. However, they all have to write to the same database (and may need to insert the same data into to the same table if it doesn't already exist) using Linq to Entities.
If I were using threads I could lock the method, or stored procedures I could queue up the writes to avoid clashes, but is there any way to keep these as seperate applications, and prevent them both trying to write the same thing to the database at the same time?
I'm getting an exception every so often when there is a conflict.
Edit:
I'm not necessarily trying to debug why I'm getting the exception, looking more for any suggestions of a 'best practice' way of doing this e.g. Should this be handled at the console app level, the L2E level, or the database level.
Why can't you start a transaction with high isolation level so that the lock is active at the server side?
You may use locks (pessimistic concurrency model) or timestamps (optimistic concurrency model) to deal with concurrency issues.
It is a very wide topic so i would suggest you start by googling for database concurrency.
Related
I'm facing the following situation:
A system I'm working on has a few different parts(services and ASP.net) with seperate responsibilities. These parts are combined by 2 resources: A MSSQL-DB and files on a windows filesystem.
Currently all these parts access these resources individually. I think this is causing unpredictability and inconsistency.
I'm thinking of introducing a service that regulates access to these resources. I'm not sure if this is an accepted design principle.
The general question is:
What kind of solution should I be looking at and what should I keep in mind when designing this?
Specific questions:
Is this just a Data Access Layer?
Is it bad to introduce a SPOF like this?
Can you recommend any reading material aimed at this kind of solution? (especially if there's specific material for C#)
edit because of a great question by allen-smithee:
The database is currently accessed by embedded queries. They are seperated into a class but these are different for every service so it's not a shared library.
1/ A Data Access Layer simply encapsulates the data logic, what you need is concurrency control to ensure consistency of your data model across the independent services.
2/ Depending how you implement concurrency it can be a single point of failure but I don't think there is anything wrong with that - "plan for failure" is a great design mantra. You can build in redundancy and fail-over mechanisms, or you can distribute your concurrency control across your services.
3/ The way you choose to implement concurrency will depend on how your application functions and what your users expect. To give some specific scenarios:
Scenario A
When a service begins an update start a transaction and take out one or more row-level locks for the records involved. If any other service tries to edit the record at the same time either block or return an error such as 'this record is currently locked'. Note that all locks have to be taken before reading and kept for the duration of the update to ensure consistency with other writes.
Pros - Fairly straight forward to implement for small data models. MSSQL supports plenty of locking scenarios and even custom application locks that you can use to group resources.
Cons - If your transaction needs to access multiple tables/rows and different services or functions access overlapping tables you can easily get into all sorts of deadlock problems.
MSSQL generally prefers pessimistic locking and can escalate locks from row to page and table level, which means read and write locks may behave in ways you wouldn't initially expect. You may need to spend a considerable amount of time debugging these interactions in SQL Server Profiler and be prepared to make changes to your data model to work around these issues.
Scenario B
Each table row has an incremental version number. A service reads the data it needs, performs a series of updates, and then within a transaction lock checks the current row version against the one it used for the update. If the version numbers do not match it rolls back the transaction, cancelling the update. The service may then attempt to perform the operation again starting with reading the data.
Pros - Readers are not blocked and the lock is held only very briefly while the service tries to commit the update. MSSQL has built-in support for this concurrency method in the form of 'Row Versioning' with the 'Snapshot Isolation' level. If conflicts are rare this method can be extremely responsive - perfect for real-time applications.
Cons - This method may require significant changes to your data model and the service behaviour.
Scenario C
A single data service is responsible for all data access. Other services request data from and submit updates to this service. The service is responsible for reading and writing to the database and filesystem, and performs some level of data integrity checking and resolves data conflicts.
Pros - Encapsulates data integrity and control in one module, simplifying other services. Allows you to implement caching, locking etc at the application level providing finer-grained control.
Cons - Significant changes to existing architecture required. Resolving data conflicts can require a significant amount of code if you choose to resolve at the field level. Services will need to be able to handle a rejected update when resolution is not possible.
That's the major scenarios I can think of off the top of my head but there are plenty more. Generally all concurrency control for data will revolve around locking while performing an action (pessimistic locking); performing an action and then checking for a conflict (optimistic locking via versioning); or performing an action and then merging conflicts (conflict resolution.)
Thinking about your specific data model and how the model is updated will guide which mix of these techniques you will use. Searching for any of the terms above will give you plenty to read and there are a lot of Technet articles that specifically address these issues in an MSSQL context. Take heart - I've seen good programmers get this stuff wrong, it really is a challenging problem, but it is solvable if you work through it methodically.
I'm maintaining a ASP/C# program that uses an MS SQL Server 2008 R2 for its database requirements.
On normal and perfect days, everything works fine as it is. But we don't live in a perfect world.
An Application (for Leave, Sick Leave, Overtime, Undertime, etc.) Approval process requires up to ten separate connections to the database. The program connects to the database, passes around some relevant parameters, and uses stored procedures to do the job. Ten times.
Now, due to the structure of the entire thing, which I can not change, a dip in the connection, or heck, if I put a debug point in VS2005 and let it hang there long enough, the Application Approval Process goes incomplete. The tables are often just joined together, so a data mismatch - a missing data here, a primary key that failed to update there - would mean an entire row would be useless.
Now, I know that there is nothing I can do to prevent this - this is a connection issue, after all.
But are there ways to minimize connection lag / failure? Or a way to inform the users that something went wrong with the process? A rollback changes feature (either via program, or SQL), so that any incomplete data in the database will be undone?
Thanks.
But are there ways to minimize connection lag / failure? Or a way to
inform the users that something went wrong with the process? A
rollback changes feature (either via program, or SQL), so that any
incomplete data in the database will be undone?
As we discussed in the comments, transactions will address many of your concerns.
A transaction comprises a unit of work performed within a database
management system (or similar system) against a database, and treated
in a coherent and reliable way independent of other transactions.
Transactions in a database environment have two main purposes:
To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system
failure, when execution stops (completely or partially) and many
operations upon a database remain uncompleted, with unclear status.
To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome
are possibly erroneous.
Source
Transactions in .Net
As you might expect, the database is integral to providing transaction support for database-related operations. However, creating transactions from your business tier is quite easy and allows you to use a single transaction across multiple database calls.
Quoting from my answer here:
I see several reasons to control transactions from the business tier:
Communication across data store boundaries. Transactions don't have to be against a RDBMS; they can be against a variety of entities.
The ability to rollback/commit transactions based on business logic that may not be available to the particular stored procedure you are calling.
The ability to invoke an arbitrary set of queries within a single transaction. This also eliminates the need to worry about transaction count.
Personal preference: c# has a more elegant structure for declaring transactions: a using block. By comparison, I've always found transactions inside stored procedures to be cumbersome when jumping to rollback/commit.
Transactions are most easily declared using the TransactionScope (reference) abstraction which does the hard work for you.
using( var ts = new TransactionScope() )
{
// do some work here that may or may not succeed
// if this line is reached, the transaction will commit. If an exception is
// thrown before this line is reached, the transaction will be rolled back.
ts.Complete();
}
Since you are just starting out with transactions, I'd suggest testing out a transaction from your .Net code.
Call a stored procedure that performs an INSERT.
After the INSERT, purposely have the procedure generate an error of any kind.
You can validate your implementation by seeing that the INSERT was rolled back automatically.
Transactions in the Database
Of course, you can also declare transactions inside a stored procedure (or any sort of TSQL statement). See here for more information.
If you use the same SQLConnection, or other connection types that implement IDbConnection, you can do something similar to transactionscopes but without the need to create the security risk that is a transactionscope.
In VB:
Using scope as IDbTransaction = mySqlCommand.Connection.BeginTransaction()
If blnEverythingGoesWell Then
scope.Commit()
Else
scope.Rollback()
End If
End Using
If you don't specify commit, the default is to rollback the transaction.
I am not sure if this is asked before or not (as I googled it).
Well I have written a web-service that will be hosted with SQLite database.
Many clients would be performing CRUD Operations on it. I planed to use this just for simplicity.
Now I have written my most methods and at this time I thought that there is no DBMS with that SQLite (I suppose) so there may be conflicts and data inconsistency issues if two or more client applications write to my application.
or Does SQLite supports managing of operation for multiple connections? or I have to switch to SQL Server 2008
SQLite "supports managing of operation for multiple connections" in the sense that it won't blow up or cause data corruption. It is not, however, designed to be as efficient as MS-SQL Server is with a high load of concurrent operations. So, what it boils down to is how many is "Many clients". If you are talking about tens of simultaneous requests, you will be fine with SQLite. If you are talking about hundreds of simultaneous requests, you will probably need to migrate to MS-SQL Server. Note that in order for two requests to be simultaneous the two clients must press the 'Submit' button at roughly the same few-millisecond time window. So it takes hundreds of simultaneously connected clients to get dozens of simultaneous requests.
The short answer is yes. Take a look at this Sqlite FAQ entry. The longer answer is a bit more complicated... Would you want to use Sqlite in an architecture that is meant to handle heavy transaction loads? Probably not. If you do want to move in that direction I would suggest starting with SQL Server Express. If you need to upgrade to a full-blown SQL Server it won't be an issue at all...
Sqlite Excerpt:
(5) Can multiple applications or multiple instances of the same application access a single database file at the same time?
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
SQLite uses reader/writer locks to control access to the database. [...]
Yes SQLite supports concurrency and locking
Question: I currently store ASP.net application data in XML files.
Now the problem is I have asynchronous operations, which means I ran into the problem of simultanous write access on a XML file...
Now, I'm considering moving to an embedded database to solve the issue.
I'm currently considering SQlite and embeddable Firebird.
I'm not sure however if SQlite or Firebird can handle multiple concurrent write access.
And I certainly don't want the same problem again.
Anybody knows ?
SQlite certainly is better known, but which one is better - SQlite or Firebird ? I tend to say Firebird, but I don't really know.
No MS-Access or MS-SQL-express recommodations please, I'm a sane person.
I wll choose Firebird for many reasons and for this too
Although it is transactional, SQLite
does not support concurrent
transactions, so if your embedded
application needs two or more
connections, they must be serialized.
An embedded Firebird database is
simple to upgrade to a fully shared
database - just change the shared
library.
May be you can also check this
SQLITE can be configured to gracefully handle simultaneous writes in most situations. What happens is that when one thread or process begins a write to the db, the file is locked. When the second write is attempted, and encounters the lock, it backs off for a short period before attempting the write again, until it succeeds or times out. The timeout is configurable, but otherwise all this happens without the application code having to do anything special except enabling the option, like this:
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );
All this works very well and without any difficulty, except in two circumstances:
If an application does a great many writes, say a thousand inserts, all in one transaction, then the database will be locked up for a significant period and can cause problems for any other application attempting to write. The solution is to break up such large writes into seperate transactions, so other applications can get access to the database.
If the database is shared by different processes running on different machines, sharing a network mounted disk. Many operating systems have bugs in network mounted disks that making file locking unreliable. There is no answer to this. If you need to share a db on a network mounted disk, you need another database engine such as MySQL.
I do not have any experience with Firebird. I have used SQLITE in situations like this for many applications over several years.
Have you looked into Berkeley DB with the SQLite API for SQL support?
It sounds like SQLite will be a good fit. We use SQLite in a number of production apps, it supports, actually, it prefers transactions which go a long way to handling concurrency.
transactional sqlite? in C#
I would add #3 to the list from ravenspoint above: if you have a large call-center or order-processing center, say, where dozens of people might be hitting the SAVE button at the same time, even if each is updating or inserting just one record, you can run into problems using the busy timeout approach.
For scenario #3, a true SQL engine that can serialize is ideal; less ideal but serviceable is a dbms that can do byte-range record locking of a shared-file. But be aware that even a byte-range record lock will be inadequate for a large number of concurrent writes when new records are appended to the end of the file like a caboose on the end of a freight train, so that multiple processes are trying at the same time to set a lock on the same byte-range. On the other hand, a byte-range record locking scheme coupled with a hashed-key sparse file approach (e.g. the old Revelation/OpenInsight database for LANs) will be far superior to ISAM for this scenario.
Here I am dealing with a database containing tens of millions of records. I have an application which connects to the database, gets all the data from a single column in a table and does some operation on it and updates it (for SQL Server - using cursors).
For millions of records it is taking very very ... long time to update. So I want to make it faster by
using multiple threads with an independent connection for each thread.
or
by using a single connection throughout all the threads to fire the update queries.
Which one is faster, or if you have any other ideas plz explain.
I need a solution which is independent of database type , or even if you know specific solutions for each type of db, please reply.
The speedup you're trying to achieve won't work. To the contrary, it will slow down the overall processing as the database now has also to keep multiple connections/sessions/transactions in sync.
Keep with as few connections/transactions as possible for repetitive and comparable operations.
If it takes too long for your taste, maybe try to analyze if the queries can be optimized somehow. Also have a look at database-specific extensions (ie bulk operations) suitable for your problem.
All depends on the database, and the hardware it is running on.
If the database can make use of concurrent processing, and avoids contention on shared resources (e.g. page base locks would span multiple records, record based would not). Shared resources in this case include hardware, a single core box will not be able to execute multiple CPU intensive activities (e.g. parsing SQL) truely in parallel.
Network latency is something you might help alleviate with concurrent inserts even if the database is itself not able to exploit concurrency.
As with any question of performance there is substitute for testing in your specific scenario.
If possible try to use the Stored procedure the do all the processing and update the records.