C# and SQL Server : multithreading, connections and transactions

C# and SQL Server : multithreading, connections and transactions - c#

I'm looking for a solution to a thorny problem.
Me and my colleagues have made an 'engine' (in C#) that performs different elaborations on a SQL Server database.
Initially, these elaborations were contained in many stored procedures called in series in a nightly batch. It was a system with many flaws.
Now we have extracted every single query from each stored procedure and, may sound strange, we have inserted the queries into the DB.
(Note: the reasons are different and I'm not listing them all, but you just need to know that, for business reasons, we do not have the opportunity to make frequent software releases... but we have a lot of freedom with SQL scripts).
Mainly, the logic behind our engine is:
there are Phases, called sequentially
each Phase contains several Step, then subdivided into Set
the Set is a set of Steps, that will be executed sequentially
the Sets, unless otherwise specified, start running parallel to each other
the Step that by default does not belong to any Set, will be embedded in a Set (created at runtime)
a Set before starting may have to wait the completion of one or more Steps
Step corresponds to atomic (or almost) SQL queries or C# methods to run
at start the engine queries the database, then composes the Phases, Step and Set (and related configurations)... which will be executed
We have created the engine, we have all the configurations... and everything works.
However, we have a need: some phases must have a transaction. If even a single step of that phase fails, we need to rollback the entire phase.
What creates problems is the management of the transaction.
Initially we created a single transaction and connection for the entire phase, but we soon realized that - because of multithreading - this is not thread-safe.
In addition, after several tests, we have had exceptions regarding the transaction. Apparently, when a phase contains a LOT of steps (= many database queries), the same transaction cannot execute any further statements.
So, now, we've made a change and made sure that each step in the phases that require a transaction opens a connection and a transaction on its own, and if all goes well, all commits (otherwise rollback).
It works. However, we have noticed a limitation: the use of temporary tables.
In a transactional phase, when I create a temporary temp table (#TempTable1) in a step x, I can't use #TempTable1 in the next step y (SELECT TOP 1 1 FROM #TempTable1).
This is logical: as it is a separate transaction and #TempTable1 is deleted at the end of the execution instance.
Then we tried to use a global temp table ##TempTable2, but, in step y, the execution of the SELECT is blocked until the timeout passes..
I also tried lowering the transaction isolation level, but nothing.
Now we are in the unfortunate situation of having to create real tables instead of using temporary tables.
I'm looking for a compromise between the use of transactions on a large number of steps and the use of temporary tables. I believe that the focus of the speech is the management of transactions. Suggestions?

Related

Select query blocking database table

Our production setup is that we have an application server with applications connecting to a SQL Server 2016 database. On the application server there is several IIS applications which run under a GMSA account. The GMSA account has db_datawriter and db_datareader privileges on the database.
Our team have db_datareader privileges on the same SQL Server database. We require this for production support purposes.
We recently had an incident where a team member invoked a query on SQL Server Management Studio on their local machine:
SELECT * FROM [DatabaseA].[dbo].[TableA] order by CreateDt desc;
TableAhas about 1.4m records and there are multiple blob type columns. CreateDt is a DATETIME2 type column.
We have RedGate SQL Monitor configured for the SQL Server Database Server. This raised a long-running query alert that ran for 1738 seconds.
At the same time one of our web applications (.NET 4.6) which exclusively inserts new records to TableA was experiencing constant query timeout errors:
Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
These errors occurred for almost the exact same 1738 second period. This leads me to believe these are connected.
My understanding is that a SELECT query only creates a Shared lock and would not block access to this table for another connection. Is my understanding correct here?
My question is that is db_datareader safe for team members? Is there a lesser privilege that would allow reading data but absolutely no way for blocking behaviours to be created.

The presence of SELECT * (SELECT STAR) in a query, leads generally to do not use an index and make a SCAN of the table.
With many LOBs (BLOBs or CLOBS or NCLOBs) and many rows, the order by clause will take a long time to :
generate the entries
make a sort on CreateDt
So a read lock (shared lock) is put while reading all the data of the table. This lock accepts other shared locks but prohibit to put an exclusive lock to modify data (INSERT, UPDATE, DELETE). This may guarantee to other users that the data won't be modified.
This locking technics is well known as pessimistic lock. The locks are taken before beginning the execution of the query and relaxed at the end. So reader blocks writers and writers blocks all.
The other technic, that SQL Server can do, called optimistic locking, consists to use a copy of the data, without any locking and verify at the end of the execution that the data involved in writes has not been modified since the beginning. So the blocking is less...
To do a pessimistic locking you have the choise to allow or to force:
ALTER DATABASE CURRENT SET ALLOW_SNAPSHOT_ISOLATION ON;
ALTER DATABASE CURRENT SET READ_COMMITTED_SNAPSHOT ON;

In SQL Server, writers block readers, and readers block writers.
This query doesn't have a where clause and will touch the entire table, probably starting with an IS (Intent Shared) and eventually escalating to a shared lock that updates/inserts/deletes can't access while the lock is there. This is likely held during that very long sort, the order by is causing.
It can be bypassed in several ways, but I don't assume you're actually after how, seeing as whoever ran the query was probably not really thinking straight anyway, and this is not a regular occurrence.
Nevertheless, here are some ways to bypass:
Read Committed Snapshot Isolation
With (nolock). But only if you don't really care about the data that is retrieved, as it can return rows twice, rows that were never committed and skip rows altogether.
Reducing the columns you return and reading from a non-clustered index instead.
But to answer your question, yes selects can block inserts.

Is there any way wrap a Sql and Mongo update into a single transaction

Morning all,
I am working on a project where I am saving data to Mongo DB and also to Sql Server (Entity Framework code first). There are a number of scenarios where I carry out a write to each database in a single block of code. I'm wondering, can any of you suggest a way to handle this in something similar to a transaction ? such that if any part of the code block fails, the whole thing should fail and rollback ?

I don't think there's any bullet-proof way of doing this since you not only have two separate connections but also two different architectures.
Assuming your SQL Server has the more complicated data model (the one that's more likely to fail for some reason) I came up with an approach in the past that worked for me:
Execute the operations sequentially not both at the same time
Execute the SQL satement first, if it fails don't execute the MongoDB statement and you'll be consistent
Should it succeed, execute the MongoDB statement next
If the MongoDB statement fails write an error log. Make sure the log is not on a remote machine so that the possibility that the logging could fail is as small as possible.
You can later use the error log to either manually or automatically salvage your data. In any case you should implement a retry policy for all statements, since the most likely reason for a failed operation is (given your code is correct) a timing issue and retrying solves this in general. If you're doing it right there will be maybe like one exotic error a month.
Of course in step 4 you could try to revert the SQL operation, instead of (or in addition to) writing a log entry. But this is mostly cumbersome and leaves you with the question what to do should the revert fail.
Maybe there still is some magic middleware to integrate MongoDB statements into an SQL transaction but for the time being I would just acknowledge that data consistency and using different databases are opposing goals. Try to monitor errors closely and reduce the error potential.

Two Phase Commit will suit your scenario. In a single transaction we can hit any number(normally we use two) of DB's and maintain our Data synchronized across the DB's.
More info on Two Phase Commit
https://lostechies.com/jimmybogard/2013/05/09/ditching-two-phased-commits/
https://www.coursera.org/learn/data-manipulation/lecture/mXS0H/two-phase-commit-and-consensus-protocols
https://sankarsan.wordpress.com/tag/two-phase-commit/
Read this post
How to force only one transaction within multiple DbContext classes?

Deadlock on transaction with multiple tables

My scenario is common:
I have a stored procedure that need to update multiple tables.
if one of updates failed - all the updates should be rolled back.
the strait forward answer is to include all the updates in one transaction and just roll that back. however, in system like ours , this will cause concurrency issues.
when we break the updates into multiple short transactions - we get throughput of ~30 concurrent executions per second before and deadlocking issues start to emerge.
if we put it to one transaction which span all of them - we get concurrent ~2 per second before deadlock shows up.
in our case, we place a try-catch block after every short transaction, and manually DELETE/Update back the changes from the previous ones. so essentially we mimic the transaction behavior in a very expensive way...
It is working alright since its well written and dont get many "rollbacks"...
one thing this approach cannot resolve at all is a case of command timeout from the web server / client.
I have read extensively in many forms and blogs and scanned through the MSDN and cannot find a good solution. many have presented the problem but I am yet to see a good solution.
The question is this: is there ANY solution to this issue that will allow a stable rollback of update to multiple tables, without require to establish exclusivity lock on all of the rows for the entire duration of the long transaction.
Assume that it is not an optimization issue. The tables are almost at the max optimization probably, and can give a very high throughput as long as deadlock don't hit it. there are no table locks/page locks etc. all row locks on updates - but when you have so many concurrent sessions some of them need to update the same row...
it can be via SQL, client side C#, server side C# (extend the SQL server?).
Is there such solution in any book/blog that i have not found?
we are using SQL server 2008 R2, with .NET client/web server connecting to it.
Code example:
Create procedure sptest
Begin transaction
Update table1
Update table2
Commit transaction
In this case, if sptest is run twice, the second instance cannot update table 1 until instance 1 has committed.
Compared to this
Create sptest2
Update table1
Update table2
Sptest2 has a much higher throughput - but it has chance to corrupt the data.
This is what we are trying to solve. Is there even a theoretical solution to this?
Thanks,
JS

I would say that you should dig deeper to find out the reason why deadlock occurs. Possibly you should change the order of updates to avoid them. Maybe some index is "guilty".
You cannot roolback changes if other transactions can change data. So you need to have update lock on them. But you can use snapshot isolation level to allow consistent reads before update commits.

For all inner joined tables that are mostly static or with a high degree of probability not effect the query by using dirty data then you can apply:
INNER JOIN LookupTable (with NOLOCK) lut on lut.ID=SomeOtherTableID
This will tell the query that I do not care about updates made to SomeOtherTable
This can reduce your issue in most cases. For more difficult deadlocks I have implemented a deadlock graph that is generated and emailed when a deadlock occurs contains all the detailed info for the deadlock.

How do I minimize or inform users of database connection lag / failure?

I'm maintaining a ASP/C# program that uses an MS SQL Server 2008 R2 for its database requirements.
On normal and perfect days, everything works fine as it is. But we don't live in a perfect world.
An Application (for Leave, Sick Leave, Overtime, Undertime, etc.) Approval process requires up to ten separate connections to the database. The program connects to the database, passes around some relevant parameters, and uses stored procedures to do the job. Ten times.
Now, due to the structure of the entire thing, which I can not change, a dip in the connection, or heck, if I put a debug point in VS2005 and let it hang there long enough, the Application Approval Process goes incomplete. The tables are often just joined together, so a data mismatch - a missing data here, a primary key that failed to update there - would mean an entire row would be useless.
Now, I know that there is nothing I can do to prevent this - this is a connection issue, after all.
But are there ways to minimize connection lag / failure? Or a way to inform the users that something went wrong with the process? A rollback changes feature (either via program, or SQL), so that any incomplete data in the database will be undone?
Thanks.

But are there ways to minimize connection lag / failure? Or a way to
inform the users that something went wrong with the process? A
rollback changes feature (either via program, or SQL), so that any
incomplete data in the database will be undone?
As we discussed in the comments, transactions will address many of your concerns.
A transaction comprises a unit of work performed within a database
management system (or similar system) against a database, and treated
in a coherent and reliable way independent of other transactions.
Transactions in a database environment have two main purposes:
To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system
failure, when execution stops (completely or partially) and many
operations upon a database remain uncompleted, with unclear status.
To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome
are possibly erroneous.
Source
Transactions in .Net
As you might expect, the database is integral to providing transaction support for database-related operations. However, creating transactions from your business tier is quite easy and allows you to use a single transaction across multiple database calls.
Quoting from my answer here:
I see several reasons to control transactions from the business tier:
Communication across data store boundaries. Transactions don't have to be against a RDBMS; they can be against a variety of entities.
The ability to rollback/commit transactions based on business logic that may not be available to the particular stored procedure you are calling.
The ability to invoke an arbitrary set of queries within a single transaction. This also eliminates the need to worry about transaction count.
Personal preference: c# has a more elegant structure for declaring transactions: a using block. By comparison, I've always found transactions inside stored procedures to be cumbersome when jumping to rollback/commit.
Transactions are most easily declared using the TransactionScope (reference) abstraction which does the hard work for you.
using( var ts = new TransactionScope() )
{
// do some work here that may or may not succeed
// if this line is reached, the transaction will commit. If an exception is
// thrown before this line is reached, the transaction will be rolled back.
ts.Complete();
}
Since you are just starting out with transactions, I'd suggest testing out a transaction from your .Net code.
Call a stored procedure that performs an INSERT.
After the INSERT, purposely have the procedure generate an error of any kind.
You can validate your implementation by seeing that the INSERT was rolled back automatically.
Transactions in the Database
Of course, you can also declare transactions inside a stored procedure (or any sort of TSQL statement). See here for more information.

If you use the same SQLConnection, or other connection types that implement IDbConnection, you can do something similar to transactionscopes but without the need to create the security risk that is a transactionscope.
In VB:
Using scope as IDbTransaction = mySqlCommand.Connection.BeginTransaction()
If blnEverythingGoesWell Then
scope.Commit()
Else
scope.Rollback()
End If
End Using
If you don't specify commit, the default is to rollback the transaction.

How to rollback transaction at later stage?

I have a data entry ASP.NET application. During a one complete data entry many transactions occur. I would like to keep track of all those transactions so that if the user wants to abandon the data entry, all the transaction of which I have been keeping record can be rolled back.
SQL 2008 ,Framework version is 4.0 and I am using c#.

This is always a tough lesson to learn for people that are new to web development. But here it is:
Each round trip web request is a separate, stand-alone thread of execution
That means, simply put, each time you submit a page request (click a button, navigate to a new page, even refresh a page) then it can run on a different thread than the previous one. What's more, even if you do get the same thread twice, several other web requests may have been processed by the thread in the time between your two requests.
This makes it effectively impossible to span simple transactions across more than one web request.
Here's another concept that you should keep in mind:
Transactions are intended for batch operations, not interactive operations.
What this means is that transactions are meant to be short-lived, and to encompass several operations executing sequentially (or simultaneously) in which all operations are atomic, and intended to either all complete, or all fail. Transactions are not typically designed to be long-lived (meaning waiting for a user to decide on various actions interactively).
Web apps are not desktop apps. They don't function like them. You have to change your thinking when you do web apps. And the biggest lesson to learn, each request is a stand-alone unit of execution.
Now, above, I said "simple transactions", also known as lightweight or local transactions. There's also what's known as a Distributed Transaction, and to use those requires a Distributed Transaction Coordinator. MSDTC is pretty commonly used. However, DT's perform much more slowly than LWT's. Also, they require that the infrastructure be setup to use a DTC.
It's possible to span a transaction over web requests using a DTC. This is done by "Enlisting" in a Distribute Transaction, and then somehow sharing this transaction identifier between requests. But this is a lot of work to setup, and deal with, and has a lot of error prone situations. It's not something you want to do if you have other options.
In general, you're better off adding the data to a temporary table or tables, and then when the final save is done, transfer that data to the permanent tables. Another option is to maintain some state (such as using ViewState or Session) to keep track of the changes.
One popular way of doing this is to perform operations client-side using JavaScript and then submitting all the changes to the server when you are done. This is difficult to implement if you need to navigate to different pages, however.

From your question, it appears that the transactions are complete when the user exercises the option to roll them back. In such cases, I doubt if the DBMS's transaction rollback semantics would be available. So, I would provide such semantics at the application layer as follows:
Any atomic operation that can be performed on the database should be encapsulated in a Command object. Each command will implement the undo method that would revert the action performed by its execute method.
Each transaction would contain a list of commands that were run as part of it. The transaction is persisted as is for further operations in future.
The user would be provided with a way to view these transactions that can be potentially rolled back. Upon selection of a transaction by user to roll it back, the list of commands corresponding to such a transaction are retrieved and the undo method is called on all those command objects.
HTH.

You can also store them on temporary Table and move those records to your original table 'at later stage'..

If you are just managing transactions during a single save operation, use TransactionScope. But it doesn't sound like that is the case.
If the user may wish to abandon n number of previous save operations, it suggests that an item may exist in draft form. There might be one working draft or many. Subsequently, there must be a way to promote a draft to a final version, either implicitly or explicitly. Think of how an email program saves a draft. It doesn't actually send your message, you may abandon it at any time, and you may recall it at a later time. When you send the message, you have "committed the transaction".
You might also add a user interface to rollback to a specific version.
This will be a fair amount of work, but if you are willing to save and manage multiple copies of the same item it can be accomplished.
You may save the a copy of the same data in the same schema using a status flag to indicate that it is a draft, or you might store the data in an intermediate format in separate table(s). I would prefer the first approach in that it allows the same structures to be used.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.