What are the rules for how a linq-to-sql datacontext keeps the database connection open?
The question came up when we made a few tests on performance for one SubmitChanges() per updated entity instead of one SubmitChanges() for the entire batch of entities. Results:
Inserting 3000 items in one SubmitChanges() call... Duration: 1318ms
Inserting 3000 items in one SubmitChanges() call, within
transactionscope... Duration: 1280ms
Inserting 3000 items in individual SubmitChanges() calls... Duration:
4377ms
Inserting 3000 items in individual SubmitChanges() calls within a
transaction... Duration: 2901ms
Note that when doing individual SubmitChanges() for each changed entity, putting everything within a transaction improves performance, which was quite unexpected to us. In the sql server profiler we can see that the individual SubmitChanges() calls within the transaction do not reset the DB connection for each call, as opposed to the one without the transaction.
In what cases does the data context keep the connection open? Is there any detailed documentation available on how linq-to-sql handles connections?
You aren't showing the entire picture; LINQ-to-SQL will wrap a call to SubmitChanges in a transaction by default. If you are wrapping it with another transaction, then you won't see the connection reset; it can't until all of the SubmitChanges calls are complete and then when the external transaction is committed.
There may be a number of factors that could be influencing the timings besides when connections are opened/closed.
edit: I've removed the bit about tracked entities after realizing how linq2sql manages the cached entities and the dirty entities separately.
You can get a good idea how the connections are managed under the covers by using Reflector or some other disassembler to examine the methods on the SqlConnectionManager class. SubmitChanges will call ClearConnection on its IProvider (typically SqlProvider which then uses SqlConnectionManager) after the submit if it wrapped the submit in its own transaction, but not if the SubmitChanges is part of a larger transaction. When the the connection is opened and closed depends on whether there is other activity making use of the SqlConnectionManager.
I messed about with this lately also. Calling SubmitChanges 3000 times will not be a good idea, but then depending on how critical it is that each record gets inserted, you may want to do this, after all it only takes 1000ms.
The transaction scope and multiple SubmitChanges is what i'd expect to see. Since your still within one transaction i'd expect to see SQL server handle this better, which it seems to. One SubmitChanges and using a explicit/implicit TransactionScope seems to yield the same result, which is to be expected. There shouldn't be any/much of a performance difference there.
I think connections are created when needed, but you have to remember this will be pooled within your provider so unless your connection string is changing, you should hook onto the same connection pool which will yield the same performance regardless of approach. Since LINQ-SQL uses SqlConnection behind the scenes, some information about it is at the following:
http://msdn.microsoft.com/en-us/library/8xx3tyca(VS.80).aspx
If your after brute force performance, look at moving into a Stored Proceedure for insert with an explicit TransactionScope. If that isn't fast enough, look at using SqlBulkCopy. 3000 rows should insert faster than 1000ms.
Have you tried opening and close the connection yourself:
Force the Opening of the DataContext's Connection (LINQ)
I think in that case you do not need the extra transaction.
Related
This is prior to EF 6. My company has a process that works with all of our other clients expect one. The process opens a connection to the clients database, reads out 1000 records at a time, and commits it to our database.
For this client, we read and commit the first 1000 records just fine. When it starts to read again I get "Underlying provider failed on Open". I understand that EF transactions open and close for each read, so when it tries to reopen the connection to do the next read is when it is failing.
Details: We connect through a VPN to the client database.
The code flow is:
connection.open()
create datareader
while datareader.read()
get 1000 records
bulk commit
db.SaveChanges
get next 1000 records
and so on until it gets all records
After the first SaveChanges is when we get the error.
Any help is appreciated.
Prior to EF6 the DbContext was closing the connection when it was getting disposed regardless whether it owned it or not. Starting from EF6 the context honors the contextOwnsConnection flag passed to the constructor (see here). It's not clear from your pseudocode how you instantiate the connection and the context, so presume you create the context in the loop and pass the opened connection. If that's the case you have a few options:
Upgrade to EF6, or
Use only one DbContext for all the saves, or
Load all the records into memory and process them in chunks each in it's own DbContext, or
Load them in chunks and process them in chunks
If you avoid using the same context for your processing for performance reasons, you can use .AsNoTracking(). There is an article on MSDN on EF performance tuning in case you need more.
Thanks for everyone's help. Turns out that the connection being lost was to our database and not the client's. Not entirely sure why, but what seemed to help was putting our BulkInsert method to create the SqlBulkCopy object inside of a using block. We also, at the point at which it failed, reestablished the connection. It's a little hacky, but it's working.
I'm using Entity 6 with PostgreSQL database (with Npgsql connector). Everything works fine, except for poor performance of this setup. When I try to insert not-so-large amount of objects to database (about 20k records), it takes much more time than it should. As this is my first time using Entity Framework, I was rather confused why inserting 20k records to database on my local machine would take more than 1 minute.
In order to optimize inserts I followed every tip I found. I tried to set AutoDetectChangesEnabled to false, call SaveChanges() every 100 or 1000 records, re-creating database context object and use DbContextTransaction objects (by calling dbContext.Database.BeginTransaction() and commiting transaction at the end of the operation or every 100/1000 records). Nothing improved inserts performance even a little bit.
By loging SQL queries generated by Entity I was finally able to discover that no matter what I do, every object is inserted separatedly and every insert takes 2-4 ms. Without re-creating DB context objects and without transactions, there is just one commit after over 20k inserts. When I use transactions and commit every few records, there are more commits and new transaction creations (same when I re-create DB context object, just with connection being re-established as well). If I use transactions and commit them every a few records, I should notice a performance boost, no? But in the end there is no difference in performance, no matter if I use multiple transactions or not. I know transactions won't improve performance drastically, but they should help at least a little bit. Instead, every insert still takes at least 2ms to execute on my local DB.
Database on local machine is one thing, but performing creation of 20k objects on remote database takes much, much, MUCH longer than one minute - logs indicate that single insert can take even 30ms (!), with transactions being commited and created again every 100 or 1000 records. On the other hand, if I execute a single insert manually (taking it straight from the log), it takes less than 1ms to execute. It seems like Entity takes its sweet time inserting every single object to database, even though it uses transactions to wrap larger amount of inserts together. I don't really get it...
What can I do to speed it up for real?
In case anyone's interested, I found a solution to my problem. Entity Framework 6 is unable to provide fast bulk inserts without additional third-party libraries (as mentioned in comments to my question), which are either expensive or not supporting other databases than SQL Server. Entity Framework Core, on the other hand, is another story. It supports fast bulk insertions and can replace EF 6 in project with just a bunch of changes in code: https://learn.microsoft.com/pl-pl/ef/core/index
I have a situation where I am using a transaction scope in .NET.
Within it are multiple method calls, the first perform database updates, and then the last reads the database.
My question is will the database reads pick up the changes in the first method calls that update the databases (note there are commits in these methods, but they are not truly committed until the transaction scope completes).
E.g Using TransactionScope.
{
Method 1 (Insert new comment into database).
Method 2 (Retrieve all comments from database).
complete.
}
Will method 2 results include the method 1 insert?
Thing that is confusing me is that I have ran loads of tests, and sometimes the update is there, sometimes its not!?!
I am aware there are isolation levels (at a high level), is there one that would allow reads to uncommitted data ONLY within the transactionscope?
Any and all help greatly appreciated......
You can do any operations on databases that you want (ms-sql), and until you don't make
transaction.commit()
any changes will appear.
Even if you insert NEW record in one transaction you can get its value in this same transaction. (ofc if you wont rollback()) it.
Yes, this is the purpose of transactions. Think about the situation where you have 2 tables, and 1 foreign keys the other. In your transaction, you insert into one and then the other one with a foreign key of your first insert, and it works. If the data was not available to you, the transaction would be pointless: it would be one operation at a time, which would be atomic, and thus negate the need for transactions.
I'm maintaining a ASP/C# program that uses an MS SQL Server 2008 R2 for its database requirements.
On normal and perfect days, everything works fine as it is. But we don't live in a perfect world.
An Application (for Leave, Sick Leave, Overtime, Undertime, etc.) Approval process requires up to ten separate connections to the database. The program connects to the database, passes around some relevant parameters, and uses stored procedures to do the job. Ten times.
Now, due to the structure of the entire thing, which I can not change, a dip in the connection, or heck, if I put a debug point in VS2005 and let it hang there long enough, the Application Approval Process goes incomplete. The tables are often just joined together, so a data mismatch - a missing data here, a primary key that failed to update there - would mean an entire row would be useless.
Now, I know that there is nothing I can do to prevent this - this is a connection issue, after all.
But are there ways to minimize connection lag / failure? Or a way to inform the users that something went wrong with the process? A rollback changes feature (either via program, or SQL), so that any incomplete data in the database will be undone?
Thanks.
But are there ways to minimize connection lag / failure? Or a way to
inform the users that something went wrong with the process? A
rollback changes feature (either via program, or SQL), so that any
incomplete data in the database will be undone?
As we discussed in the comments, transactions will address many of your concerns.
A transaction comprises a unit of work performed within a database
management system (or similar system) against a database, and treated
in a coherent and reliable way independent of other transactions.
Transactions in a database environment have two main purposes:
To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system
failure, when execution stops (completely or partially) and many
operations upon a database remain uncompleted, with unclear status.
To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome
are possibly erroneous.
Source
Transactions in .Net
As you might expect, the database is integral to providing transaction support for database-related operations. However, creating transactions from your business tier is quite easy and allows you to use a single transaction across multiple database calls.
Quoting from my answer here:
I see several reasons to control transactions from the business tier:
Communication across data store boundaries. Transactions don't have to be against a RDBMS; they can be against a variety of entities.
The ability to rollback/commit transactions based on business logic that may not be available to the particular stored procedure you are calling.
The ability to invoke an arbitrary set of queries within a single transaction. This also eliminates the need to worry about transaction count.
Personal preference: c# has a more elegant structure for declaring transactions: a using block. By comparison, I've always found transactions inside stored procedures to be cumbersome when jumping to rollback/commit.
Transactions are most easily declared using the TransactionScope (reference) abstraction which does the hard work for you.
using( var ts = new TransactionScope() )
{
// do some work here that may or may not succeed
// if this line is reached, the transaction will commit. If an exception is
// thrown before this line is reached, the transaction will be rolled back.
ts.Complete();
}
Since you are just starting out with transactions, I'd suggest testing out a transaction from your .Net code.
Call a stored procedure that performs an INSERT.
After the INSERT, purposely have the procedure generate an error of any kind.
You can validate your implementation by seeing that the INSERT was rolled back automatically.
Transactions in the Database
Of course, you can also declare transactions inside a stored procedure (or any sort of TSQL statement). See here for more information.
If you use the same SQLConnection, or other connection types that implement IDbConnection, you can do something similar to transactionscopes but without the need to create the security risk that is a transactionscope.
In VB:
Using scope as IDbTransaction = mySqlCommand.Connection.BeginTransaction()
If blnEverythingGoesWell Then
scope.Commit()
Else
scope.Rollback()
End If
End Using
If you don't specify commit, the default is to rollback the transaction.
The question is solely about rolling back the changes, not commiting.
Let's say I fetch some data, I change them, I submit changes (optional step) and I roll back transaction. Wherever you look every author writes, this cancels the changes.
But I found out that is half true -- LINQ DataContext will keep the changed data! I tested this using TransactionScope and DataContext.Transaction. In both cases I got the same behaviour.
A workaround would be to recreate DataContext after roll back (however this leads to other problems like caching data and handling nested transactions) or manually discarding the changes in DataContext. Nevertheless those are just workarounds.
Questions
So what am I missing? Is LINQ to SQL not suited for transactions? How to use transactions so they would REALLY roll back changes?
Example
MyTable record = null;
db.Connection.Open();
using (db.Transaction = db.Connection.BeginTransaction())
{
record = db.MyTable.First();
record.BoolField = !record.BoolField; // changed
db.SubmitChanges();
db.Transaction.Rollback();
}
A data-context should be considered as a unit-of-work. How granular you make that is up to you - it could be a page request, or a single operation; but - if you get an exception (or pretty much anything unexpected) - stop; abandon the data-context and rollback. After a rollback, your data-context is going to be confused, so just don't keep it.
Additionally; don't keep a data-context for longer than necessary. It is not intended as an app-long data cache.
What you seem to be asking for is an in-memory cache of the database (or some part of it) rather than a lightweight ORM. I would say that LINQ to SQL is just fine for transactions and as a lightweight ORM, but not so good to use out of the box as a database cache. The data context functions best, in my opinion, using the Unit of Work pattern. Create the context for a particular task, perform the task, then dispose of the context. If the task happens to include a failed transaction, then you need to figure out how to respond to the failure. This could be by either correcting the errors and retrying with the existing context or, as in a web context, passing back the attempted changes to the user, then trying again with a new context when the data is resubmitted.
Two things:
1) stale datacontext
What you observe is commonly refered to as a 'stale' datacontext. The entities in the datacontext do not notice your rollbak. You would get simular behaviour if you would execute a stored procedure after your submitchanges. That will also not be noticed by the datacontext. However, your transactions will be rolled back in the DB! (and likewise the stored procedure will be executed)
2) about transactions
There is no need to manage the transaction. Linq2Sql already creates a transaction for you in the Submitchanges.
If you really want to manage the transactions (e.g. over multiple datacontexts or a stored procedure combined with some linq2sql, wrap the whole thing in a TransactionScope. Call transaction.Complete() at the point where you want to commit.