I'm having a major performance issue with LINQ2SQL and transactions. My code does the following using IDE generated LINQ2SQL code:
Run a stored proc checking for an existing record
Create the record if it doesn't exist
Run a stored proc that wraps its own code in a transaction
When I run the code with no transaction scope, I get 20 iterations per second. As soon as I wrap the code in a transaction scope, it drops to 3-4 iterations per second. I don't understand why the addition of a transaction at the top level reduces the performance by so much. Please help?
Psuedo stored proc with transaction:
begin transaction
update some_table_1;
insert into some_table_2;
commit transaction;
select some, return, values
Pseudo LINQ code without transaction:
var db = new SomeDbContext();
var exists = db.RecordExists(some arguments);
if (!exists) {
var record = new SomeRecord
{
// Assign property values
};
db.RecordsTable.InsertOnSubmit(record);
db.SubmitChanges();
var result = db.SomeStoredProcWithTransactions();
}
Pseudo LINQ code with transaction:
var db = new SomeDbContext();
var exists = db.RecordExists(some arguments);
if (!exists) {
using (var ts = new TransactionScope())
{
var record = new SomeRecord
{
// Assign property values
};
db.RecordsTable.InsertOnSubmit(record);
db.SubmitChanges();
var result = db.SomeStoredProcWithTransactions();
ts.Complete();
}
}
I know the transaction isn't being escalated to the DTC because I've disabled the DTC. SQL Profiler shows that several of the queries take much longer with the transactionscope enabled, but I'm not sure why. The queries involved are very short lived and I've got indexes that I have verified are being used. I'm unable to determine why the addition of a parent transaction causes so much degredation in performance.
Any ideas?
EDIT:
I've traced the problem to the following query within the final stored procedure:
if exists
(
select * from entries where
ProfileID = #ProfileID and
Created >= #PeriodStart and
Created < #PeriodEnd
) set #Exists = 1;
If I had with(nolock) as shown below, the problem disappears.
if exists
(
select * from entries with(nolock) where
ProfileID = #ProfileID and
Created >= #PeriodStart and
Created < #PeriodEnd
) set #Exists = 1;
However, I'm concerned that doing so may cause problems down the road. Any advice?
One big thing that changes as soon as you get a transaction - the isolation level. Is your database under heavy contention? If so: by default a TransactionScope is at the highest "serializable" isolation level, which involves read locks, key-range locks, etc. If it can't acquire those locks immediately it will slow down while it it blocked. You could investigate by reducing the isolation level of the transaction (via the constructor). For example (but pick your own isolation-level):
using(var tran = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions { IsolationLevel = IsolationLevel.Snapshot })) {
// code
tran.Complete();
}
However, picking an isolation level is... tricky; serializable is the safest (hence the default). You can also use granular hints (but not via LINQ-to-SQL) such as NOLOCK and UPDLOCK to help control locking of specific tables.
You could also investigate whether the slowdown is due to trying to talk to DTC. Enable DTC and see if it speeds up. The LTM is good, but I've seen composite operations to a single database escalate to DTC before...
Although you are using a single datacontext, your code sample is likely to use more than one connection and that will escalate your transaction to a distributed transaction.
Try initializing your datacontext with an explicit db connection, or call db.Connection.Open() right after creating the datacontext. That removes the overhead of distributed transactions...
Does the Stored Procedure you call participate in the ambient (parent) transaction? - that is the question.
It's likely that the Stored Procedure participates in the ambient transaction, which is causing the degredation. There's an MSDN article here discussing how they interrelate.
From the article:
"When a TransactionScope object joins an existing ambient transaction, disposing of the scope object may not end the transaction, unless the scope aborts the transaction. If the ambient transaction was created by a root scope, only when the root scope is disposed of, does Commit get called on the transaction. If the transaction was created manually, the transaction ends when it is either aborted, or committed by its creator."
There's also a serious looking document on nested transactions which looks like it is directly applicable localted on MSDN here.
Note:
"If TransProc is called when a transaction is active, the nested transaction in TransProc is largely ignored, and its INSERT statements are committed or rolled back based on the final action taken for the outer transaction."
I think that explains the difference in performance - it's essentially the cost of maintaining the parent transaction. Kristofer's suggestion may help to reduce the overhead.
Related
What is difference of these codes:
using (TransactionScope tran = new TransactionScope ())
{
using (Entities ent = new Entities())
{
and
using (Entities ent = new Entities())
{
using (TransactionScope tran = new TransactionScope ())
{
Does line order matter?
Thanks
In this case it wouldn't matter, since the Entities seems like Model / Data class, which cannot / needn't enlist into a transaction, so you can have any order, it would not create a difference. Now if the there's any issue in the Entities operation, Read / DML then transaction would also abort, assuming that operation takes place in the ambient transaction context, though class / object doing it like IDBConnection would anyway automatically enlisted into a transaction (provided not set to false), however even if Connection is created outside transaction context, then would not automatically enlist, needs explicit enlisting for Transaction Context
In Short
For your current code, it doesn't matter till the point you are dealing with an object that needs Transaction enlistment like IDBConnection. Out of two code snippets though my preference would be first one, where ambient transaction is all encompassing all the objects need to be enlist automatically does, we don't leave any object by chance.
Important Difference
What you though may want to be careful about whether you want to access Entities outside the Transaction Context, since that is not possible in the Option 1 but would not be an issue in the Option 2
Yes the order matters. Or rather we can't say it doesn't matter without looking at your code.
DbConnection instances will enist in an ambient transaction if one exists when they are Open()ed.
Your DbContext constructor might open the underlying DbConnection, in which case the two patterns differ.
The first one is the normal pattern, and you should stick to that.
Also if you are using SQL Server don't use the default constructor of TransactionScope. See Using new TransactionScope() Considered Harmful
I use Entity Framework so process long running tasks (10-30 secs on avg). I have many instances of workers and each worker fetches the next task id from a database table and with that it gets to the work description for that id.
Of course, the access to the task table must be serialized so that each request from a worker gets a new id. I thought this would do it:
static int? GetNextDetailId()
{
int? id = null;
using ( var ctx = Context.GetContext() )
using ( var tsx = ctx.Database.BeginTransaction( System.Data.IsolationLevel.Serializable ))
{
var obj = ctx.DbsInstrumentDetailRaw.Where( x => x.ProcessState == ProcessState.ToBeProcessed ).FirstOrDefault();
if ( obj != null )
{
id = obj.Id;
obj.ProcessState = ProcessState.InProcessing;
ctx.SaveChanges();
}
tsx.Commit();
}
return id;
} // GetNextDetailId
Unfortunately when I run it with 10 workers I nearly immediately get
Transaction (Process ID 65) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
I do not have any explanation for this behavior. I know deadlock situations: we have two or more resources and two or more processes that try to aquire the resources not in the same order. But here we only have one resource! All I want is the processes to have sequential access to this resource. So if A has a transaction open, B should simply wait until A commits/rollbacks. This seems not to happen here.
Can someone please
shed some light what is going on here, to educate me.
Give a ( "THE?" ) solution to the problem. I assume that this problem should be very common in programing.
Thanks
Martin
You can verify this using SQL Profiler to sniff the SQL statements that are being executed on your SQL server, but the issue likely is that, even though you are inside a transaction with the isolation level set to serializable, an exclusive lock is not being issued, so what is occurring is that two threads are accessing the same row at the same time, and both are trying to update it.
The best advice I've seen is that, if you need to control locking at this level, execute a stored procedure or SQL instead of attempting to use LINQ.
Locking a table with a select in Entity Framework
shed some light what is going on here, to educate me.
ctx.DbsInstrumentDetailRaw.Where ... acquires a shared lock on the table. With serializable isolation level, this lock is held until the transaction is committed or rolled back.
ctx.SaveChanges() needs an exclusive lock to update the row.
Two or more transactions can simultaneously get a shared lock in step 1, but then none of them can get an exclusive lock in step 2. Deadlock.
Give a ( "THE?" ) solution to the problem.
I can think of 2 ways to solve this problem.
Change the order of operations: update one row, then return it. You will have to use a stored procedure to do it in EF.
Use a lower isolation level (e.g. repeatable read) and optimistic concurrency. You will not get deadlocks (shared locks will be released immediately after select). When 2 workers try to update the same row, one of them will get a concurrency exception.
OK, I now do this:
select 1 from InstrumentDetailRaw with (tablockx, holdlock) where 0 = 1"
at the beginnning of my transaction, which, according to this post:
Locking a table with a select in Entity Framework
does the trick. No deadlocks with 10 workers running for hours now.
I need to perform atomic operation check value of some field of some entity framework model and increase it if its value is 0.
I thought about transactions, sth like:
bool controlPassed = false;
using (TransactionScope scope = new TransactionScope())
{
var model = ...ModelEntities.first_or_default(...)
if (model.field == 0){
++model.field;
...saveChanges();
controlPassed = true;
}
scope.Complete();
}
if (controlPassed)
{
...
using (TransactionScope scope = new TransactionScope())
{
--model.field;
...saveChanges();
scope.Complete();
}
}
Of course, everything in try catch and so on.
My question is: how would it work?
It is really hard to check.
I have multithreaded application.
Is there a possibility, that two or more threads would pass control (check that field == 0 and increase it)?
Whout would be blocked in database (database, table, row, field)?
I can't let two or more threads to be in controlPassed section simultaneously.
Is there a possibility, that two or more threads would pass control
(check that field == 0 and increase it)?
You have Serializable transaction (that is default for TransactionScope). It means that there can be two threads with field == 0 but immediately after that deadlock happens because transaction for the first thread holds a shared lock on the filed and transaction for the second thread holds another shared lock for the same field. Neither of these transaction can upgrade the lock to exclusive to save changes because they are blocked by shared lock in other transaction. I think same would happen for RepeatableRead isolation level.
If you change isolation level to ReadCommitted (the default for SaveChangeswithout TransactionScope when using MS SQL Server) the answer will be again yes but this time without deadlock because EF uses normal selects without any table hints - that means that no lock on record is held when select completes. Only save changes (modification) locks records until transaction commits or rolls back.
To lock record in ReadCommitted transaction with select you must use native SQL query and UPDLOCK (to lock record for update during select and held it until transaction ends) table hint. EF queries do not support table hints.
Edit: I wrote a long article about pessimistic concurrency which describes why your solution doesn't work and what must be changed to make it work correctly.
I want to understand what is the trade-of/downside of using TransactionScopeOption.RequiresNew on EntityFramework (w/ Sql Server 2008), what are the reasons why we should NOT use RequiresNew always.
Regards.
You should use Required not RequiresNew. RequiresNew means every operation will use a new transaction, even if there is an encompassing already existing transaction scope. This will certainly lead to deadlocks. Even with Required there is another serious problem with TransactionScope, namely that it creates by default a Serializable transaction, which is a horribly bad choice and yet another shortcut to deadlock hell and no scalability. See using new TransactionScope() Considered Harmful. You should always create a transaction scope with the explicit TransactionOption setting the isolation level to ReadCommitted, which a much much much more sane isolation level:
using(TransactionScope scope = new TransactionScope(
TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = IsolationLevel.ReadCommitted}))
{
/// do work here
...
scope.Complete();
}
I just wanted to add here that in a couple certain cases the method i've written is inside a parent transaction scope that may or may not be closed with scope.Complete() in these cases I didn't want to be dependent on the parent transaction so we needed to set RequiresNew.
In general though I agree it's not necessary and should use read committed.
http://msdn.microsoft.com/en-us/library/ms172152(v=vs.90).aspx
I'm writing an integration test where I will be inserting a number of objects into a database and then checking to make sure whether my method retrieves those objects.
My connection to the database is through NHibernate...and my usual method of creating such a test would be to do the following:
NHibernateSession.BeginTransaction();
//use nhibernate to insert objects into database
//retrieve objects via my method
//verify actual objects returned are the same as those inserted
NHibernateSession.RollbackTransaction();
However, I've recently found out about TransactionScope which apparently can be used for this very purpose...
Some example code I've found is as follows:
public static int AddDepartmentWithEmployees(Department dept)
{
int res = 0;
DepartmentAdapter deptAdapter = new DepartmentAdapter();
EmployeeAdapter empAdapter = new EmployeeAdapter();
using (TransactionScope txScope = new TransactionScope())
{
res += deptAdapter.Insert(dept.DepartmentName);
//Custom method made to return Department ID
//after inserting the department "Identity Column"
dept.DepartmentID = deptAdapter.GetInsertReturnValue();
foreach(Employee emp in dept.Employees)
{
emp.EmployeeDeptID = dept.DepartmentID;
res += empAdapter.Insert(emp.EmployeeName, emp.EmployeeDeptID);
}
txScope.Complete();
}
return res;
}
I believe that if I don't include the line txScope.Complete() that the data inserted will be rolled back. But unfortunately I don't understand how that is possible... how does the txScope object keep a track of the deptAdapter and empAdapter objects and their transactions on the database.
I feel like I'm missing a bit of information here...am I really able to replace my BeginTransaction() and RollbackTransaction() calls by surrounding my code using TransactionScope?
If not, how then does TransactionScope work to roll back transactions?
Essentially TransactionScope doesn't track your Adapter's, what it does is it tracks database connections. When you open a DB connection the connections will looks if there is an ambient transaction (Transaction Scope) and if so enlist with it. Caution if there are more the one connection to the same SQL server this will escalate to a Distribtued Transaction.
What happens since you're using a using block you are ensuring dispose will be called even if an exception occurs. So if dispose is called before txScope.Complete() the TransactionScope will tell the connections to rollback their transactions (or the DTC).
The TransactionScope class works with the Transaction class, which is thread-specific.
When the TransactionScope is created, it checks to see if there is a Transaction for the thread; if one exists then it uses that, otherwise, it creates a new one and pushes it onto the stack.
If it uses an existing one, then it just increments a counter for releases (since you have to call Dispose on it). On the last release, if the Transaction was not comitted, it rolls back all the work.
As for why classes seem to magically know about transactions, that is left as an implementation detail for those classes that wish to work with this model.
When you create your deptAdapter and emptAdapter instances, they check to see if there is a current transaction on the thread (the static Current property on the Transaction class). If there is, then it registers itself with the Transaction, to take part in the commit/rollback sequence (which Transaction controls, and might propogate to varying transaction coordinators, such as kernel, distributed, etc.).