Deadlock: Insert into a filetable statements appears to block one another - c#

We are having trouble inserting into a filetable. This is our current constellation, I will try to explain it as detailed as possible:
Basically we have three tables:
T_Document (main metadata of a document)
T_Version (versioned metadata of a document)
T_Content (the binary content of a document, FileTable)
Our WCF service saves the documents and is being used by multiple persons. The service will open a transaction and will call the method SaveDocument which saves the documents:
//This is the point where the tranaction starts
using (IDBTransaction tran = db.GetTransaction())
{
try
{
m_commandFassade.SaveDocument(document, m_loginName, db, options, lastVersion);
tran.Commit();
return document;
}
catch
{
tran.Rollback();
throw;
}
}
The SaveDocument method looks like this:
public void SaveDocument(E2TDocument document, string login, IDBConnection db, DocumentUploadOptions options, int lastVersion)
{
document.GuardNotNull();
options.GuardNotNull();
if (lastVersion == -1)
{
//inserting new T_Document
SaveDocument(document, db);
}
else
{
//updating the existing T_Document
UpdateDocument(document, db); //document already exists, updating it
}
Guid contentID = Guid.NewGuid();
//inserting the content
SaveDocumentContent(document, contentID, db);
//inserting the new / initial version
SaveDocumentVersion(document, contentID, db);
}
Basically all the methods you see are either inserting or updating those three tables. The insert content query, that appears to make some trouble like this:
INSERT INTO T_Content
(stream_id
,file_stream
,name)
VALUES
(#ContentID
,#Content
,#Title)
And the method (please take this as pseudo code):
private void SaveDocumentContent(E2TDocument e2TDokument, Guid contentID, IDBConnection db)
{
using (m_log.CreateScope<MethodScope>(GlobalDefinitions.TracePriorityForData))
{
Command cmd = CommandFactory.CreateCommand("InsertContents");
cmd.SetParameter("ContentID", contentID);
cmd.SetParameter("Content", e2TDokument.Content);
string title = string.Concat(e2TDokument.Titel.RemoveIllegalPathCharacters(), GlobalDefinitions.UNTERSTRICH,
contentID).TrimToMaxLength(MaxLength_T_Contents_Col_Name, SuffixLength_T_Contents_Col_Name);
cmd.SetParameter("Title", title);
db.Execute(cmd);
}
}
I have no experience in deadlock-analysis, but the deadlock graphs show me that when inserting the content into the filetable, it appears to be deadlocked with another process also writing into the same table at the same time.
(the other side shows the same statement, my application log confirms two concurrent tries to save documents)
The same deadlock appears 30 times a day. I already shrinked the transaction to a minimum, removing all unneccessary selects, but yet I had no luck to solve this issue.
What I'm most curious about is how its possible to deadlock on an insert into a filetable. Are there internal things that are being executed that I'm not aware of. I saw some strange statements in the profiler trace on that table, that we dont use anywhere in the code, e.g.:
set #pathlocator = convert(hierachyid, #path_locator__bin)
and things like:
if exists (
select 1
from [LGOL_Content01].[dbo].[T_Contents]
where parent_path_locator = #path_locator
)
If you need any more details, please let me know. Any tips how to proceed would be awesome.
Edit 1:\
Following you find the execution plan for the T_Content insert:

So, after hours and hours and research and consulting with microsoft, the deadlock is actually a filetable / sql server related bug, which will be hotfixed by Microsoft.

Related

Strange SaveChanges behavior in Entity Framework and SQL Server

I have some code, you can check project github, error contains in UploadContoller method GetExtensionId.
Database diagram:
Code (in this controller I sending files to upload):
[HttpPost]
public ActionResult UploadFiles(HttpPostedFileBase[] files, int? folderid, string description)
{
foreach (HttpPostedFileBase file in files)
{
if (file != null)
{
string fileName = Path.GetFileNameWithoutExtension(file.FileName);
string fileExt = Path.GetExtension(file.FileName)?.Remove(0, 1);
int? extensionid = GetExtensionId(fileExt);
if (CheckFileExist(fileName, fileExt, folderid))
{
fileName = fileName + $" ({DateTime.Now.ToString("dd-MM-yy HH:mm:ss")})";
}
File dbFile = new File();
dbFile.folderid = folderid;
dbFile.displayname = fileName;
dbFile.file_extensionid = extensionid;
dbFile.file_content = GetFileBytes(file);
dbFile.description = description;
db.Files.Add(dbFile);
}
}
db.SaveChanges();
return RedirectToAction("Partial_UnknownErrorToast", "Toast");
}
I want to create Extension in database if it not exist yet. And I do it with GetExtensionId:
private static object locker = new object();
private int? GetExtensionId(string name)
{
int? result = null;
lock (locker)
{
var extItem = db.FileExtensions.FirstOrDefault(m => m.displayname == name);
if (extItem != null) return extItem.file_extensionid;
var fileExtension = new FileExtension()
{
displayname = name
};
db.FileExtensions.Add(fileExtension);
db.SaveChanges();
result = fileExtension.file_extensionid;
}
return result;
}
In the SQL Server database I have unique constraint on displayname column of FileExtension.
Problem starts only if I uploading few files with the same extension and this extension not exist in database yet.
If I remove lock, in GetExtensionId will be Exception about unique constraint.
Maybe, for some reason, next iteration of foreach cycle calls GetExtensionId without waiting? I don't know.
But only if I set lock my code works fine.
If you know why it happens please explain.
This sounds like a simple concurrency race condition. Imagine two requests come in at once; they both check the FirstOrDefault, which correctly says "nope" for both. Then they both try and insert; one wins, one fails because the DB has changed. While EF manages transactions around SaveChanges, that transaction doesn't start from when you query the data initially
The lock appears to work, by preventing them getting into the looking code at the same time, but this is not a reliable solution for this in general, as it only works inside a single process, let alone node.
So: a few option here:
your code could detect the foreign key violation exception and recheck from the start (FirstOrDefault etc), which keeps things simple in the success case (which is going to be the majority of the time) and not horribly expensive in the failure case (just an exception and an extra DB hit) - pragmatic enough
you could move the "select if exists, insert if it doesn't" into a single operation inside the database inside a transaction (ideally serializable isolation level, and/or using the UPDLOCK hint) - this requires writing TSQL yourself, rather than relying on EF, but minimises round trips and avoids writing "detect failure and compensate" code
you could perform the selects and possible inserts inside a transaction via EF - complicated and messy, frankly: don't do this (and it would again need to be serializable isolation level, but now the serializable transaction spans multiple round trips, which can start to impact locking, if at scale)

EntityFramework and handling duplicate primary key/concurrency/race conditions situations

I wrote a library, referenced by numerous applications, that tracks who is online and which application and page they are viewing.
The data is stored, using EF6, in a Sql Server 2008 table which tracks their username (primary key), application, page and timestamp. I only want to store the latest request for each person so each username should only be stored once.
The library code, which is called from the Global.asax of each application looks like this:
public static void Add(ApplicationType application, string username, string pageRequested)
{
using (var db = new CommonDAL()) // EF context
{
var exists = db.ActiveUsers.Find(username);
if (exists != null)
db.ActiveUsers.Remove(exists);
var activeUser = new ActiveUser() { ApplicationID = application.Value(), Username = username, PageRequested = pageRequested, TimeRequested = DateTime.Now };
db.ActiveUsers.Add(activeUser);
db.SaveChanges();
}
}
I'm intermittently getting the error Violation of PRIMARY KEY constraint 'PK_tblActiveUser_Username'. Cannot insert duplicate key in object 'dbo.tblActiveUser'. The duplicate key value is (xxxxxxxx)
What I can only guess is happening is Request A comes in, removes the existing username. Request B (from same user) then comes in, tries to remove the username, sees nothing exists. Request A then adds the username. Request B then tries to add the username. The error frequently seems to be triggered when a web server sends a client a 401 status, which again points to multiple requests within a short period of time triggering this.
I'm having trouble mocking this race condition using unit tests as I haven't done much async programming before, but tried to create async tests with delays to mock multiple simultaneous slow requests. I've tried to use using (var transaction = new TransactionScope()) and using (var transaction = db.Database.BeginTransaction(System.Data.IsolationLevel.ReadCommitted)) to lock the requests so request A can complete before request B begins but can't verify either one fixes the issue as I can't mock the situation reliably.
1) Which is the right way to prevent the exception (Most recent request is the one that ultimately is stored)?
2) Which is the right way to to write a unit test to prove this is working?
Since you only want to store the latest item, you could use a last update wins and avoid the race condition on who can insert first, the database handles the locks and the last to call update (which is the most recent) is what is in the table.
Something like the following should handle any primary key errors if you run into concurrency issues on the edge case that a brand new user has 2 requests at the same time and avoid an "infinite" loop of errors (well until a stack overflow exception any way).
public static void Add(ApplicationType application,
string username,
string pageRequested,
int recursionCount = 0)
{
using (var db = new CommonDAL()) // EF context
{
var exists = db.ActiveUsers.Find(username);
if (exists != null)
{
exists.propa = "someVal";
}
else
{
var activeUser = new ActiveUser
{
ApplicationID = application.Value(),
Username = username,
PageRequested = pageRequested,
TimeRequested = DateTime.Now
};
db.ActiveUsers.Add(activeUser);
}
try
{
db.SaveChanges();
}
catch(<Primary Key Violation>)
{
if(recursionCount < x)
{
Add(application, username, pageRequested, recursionCount++)
}
else
{
throw;
}
}
}
}
As for unit testing this, it will be very hard unless you insert an artificial delay or can force both threads to run at the same time. Sometimes the timing on the race conditions is in the millisecond range depending on the issue. Tasks may not work because they are not guaranteed to run at the same time, you throw them to the background thread pool and they run when they can. Old school threads may work but I don't know how to force it since the time between read and remove & create are most likely in the 5 ms range or less.

How to make multiple database operations atomic/one transaction in C#?

I need to make the code below atomic/fail or succeed as a single unit. How could I go about achieving that?
void Processor(Input input)
{
var mapper = new Mapper(recordDetails);
int remainingRecords = GetCountForRemainingRecords(recordDetails);
try
{
while (remainingRecords > 0)
{
mapper.CreateRecords(dataset);
Validate(dataset);
//the Save(dataset) uses SqlBulkCopy maps tables, transaction, and saves it..
Save(dataset);
//I cannot perform the operation below on the dataset directly because dataset doesn't have the records that is in the database
//the method below eventually calls a stored proc that sends a list of users that was recently created
OutdateDuplicateUsers(dataset.userTable);
remainingRecords = MethodToGetUpdatedCount();
}
}
catch (Exception exception)
{
//exception handler..
}
}
Now if my OutdateDuplicateUsers throws an exception, I would still end up with the accounts that Save method persisted. I do not want that to happen.
I want both Save and OutdateDuplicateUsers method to be atomic. I read about this great article about TransactionScope and seemed it is exactly what I want. However, I could not get it to work. The implementation seems straight forward reading from the article, but I couldn't get it working myself.
What I tried:
void Processor(Input input)
{
var mapper = new Mapper(recordDetails);
int remainingRecords = GetCountForRemainingRecords(recordDetails);
try
{
while (remainingRecords > 0)
{
using (var scope = new TransactionScope())
{
try
{
mapper.CreateRecords(dataset);
Validate(dataset);
//the method Save(dataset) is using SqlBulkCopy; maps tables, uses transaction, and saves it..
Save(dataset);
//I cannot perform this opertaion on the dataset directly because dataset doesn't have the records that is in the database
//the method below eventually calls a stored proc that sends a list of users that was recently created
OutdateDuplicateUsers(dataset.userTable);
remainingRecords = MethodToGetUpdatedCount();
scope.Complete();
}
catch (Exception)
{
//not both at the same time. I tried using both, one at a time though.
TransactionScope.Dispose();
TransactionScope.Current.Rollback();
//exception handler
}
}
}
}
}
update:
The dataset is a strongly typed dataset and is schema only. The CreateRecords and Validate method populates the data based on the business logic. The 'mapper' takes in recordDetails which is, for instance, a list of Users (updated the snippet).
What I mean by doesn't work is that if OutdateDuplicateUser() method throws an exception and cannot complete the outdating operation, I could still see that the records have been persisted in the database from Save(dataset) method, which I am trying to prevent.

Child Parent Transactions roll back

I have a scenario in which I have to process multiple .sQL files, every file contains 3-4 insert or Update queries, now when any query in a file fails I do rollback whole transaction means whole file we be rolled back , and all other files executed before that file will get committed, I want an option where user can rollback entire transaction means all queries in a file executed and all files executed before that particular file containing error, and if user wants to skip that particular file with error we will just rollback single file which contains error all other files will get committed, I am using SQL Transaction right now , no TransactionScope but obviously I can switch too TransactionScope() if needed and possible,
Currently pseudo for my code (what i want) is as follows
Var Files[]
for each (string query in Files)
{
Execute(Query)
IF(TRUE)
CommitQuery()
Else
result=MBOX("IF You want to abort all files or skip this one")
if(result=abort)
rollbackall()
else
QueryRollBack()
}
It seems you are looking for SavePoints, i.e. the option to partially roll back and then resume a larger transaction. AFAIK TransactionScope doesn't support SavePoints so you'll need to deal directly with the native provider (e.g. SqlClient if your RDBMS is Sql Server). (i.e. you cannot leverage the ability of TransactionScope to implement DTC equivalent of SavePoints, e.g. across distributed databases, disparate RDBMS, or parallel transactions)
That said, I would suggest a strategy where the user elects to skip or abort up front, before transactional processing begins, as it will be expensive awaiting UI response while a large number of rows are still locked - this will likely cause contention issues.
Edit
Here's a small sample of using SavePoints. Foo1 and Foo3 are inserted, Foo2 is rolled back to the preceding save point.
using (var conn = new SqlConnection(ConfigurationManager.ConnectionStrings["Foo"].ConnectionString))
{
conn.Open();
using (var txn = conn.BeginTransaction("Outer"))
{
txn.Save("BeforeFoo1");
InsertFoo(txn, "Foo1");
txn.Save("BeforeFoo2");
InsertFoo(txn, "Foo2");
txn.Rollback("BeforeFoo2");
txn.Save("BeforeFoo3");
InsertFoo(txn, "Foo3");
txn.Commit();
}
}
Where InsertFoo is:
private void InsertFoo(SqlTransaction txn, string fooName)
{
using (var cmd = txn.Connection.CreateCommand())
{
cmd.Transaction = txn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "INSERT INTO FOO(Name) VALUES(#Name)";
cmd.Parameters.Add(new SqlParameter("#Name", SqlDbType.VarChar)).Value = fooName;
cmd.ExecuteNonQuery();
}
}
And the underlying table is:
create table Foo
(
FooId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Name NVARCHAR(50)
)
Keep all insert, update queries in a try{..}catch(..){..} and if any exception occurs, in the catch roll the db transaction back.
private void InsertFoo(SqlTransaction txn, string fooName)
{
using (var cmd = txn.Connection.CreateCommand())
{
try
{
do your process here...
cmd.Commit();
}
catch(Exception ex)
{
cmd.Rollback();
}
}
}

Selecting million records from SQL Server

We need to index (in ASP.NET) all our records stored in a SQL Server table. That table has around 2M records with text (nvarchar) data too in each row.
Is it okay to fetch all records in one go as we need to index them (for search)? What is the other option (I want to avoid pagination)?
Note: I am not displaying these records, just need all of them in one go so that I can index them via a background thread.
Do I need to set any long time outs for my query? If yes, what is the most effective method for setting longer time outs if I am running the query from ASP.NET page?
If I needed something like this, just thinking about it from the database side, I'd probably export it to a file. Then that file can get moved around pretty easily. Moving around data sets that large is a huge pain to all involved. You can use SSIS, sqlcmd or even bcp in a batch command to get it done.
Then, you just have to worry about what you're doing with it on the app side, no worries about locking & everything on the database side once you've exported it.
I don't think a page is a good place for this regardless. There should be a different process or program that does this. On a related note maybe something like http://incubator.apache.org/lucene.net/ would help you?
Is it okay to fetch all records in one go as we need to index them
(for search)? What is the other option (I want to avoid pagination)?
Memory Management Issue / Performance Issue
You can face System Out Of Memory Exception in case you are bringing 2 millions of records
As you will be keeping all those records in DataSet and the dataset memory will be in RAM.
Do I need to set any long time outs for my query? If yes, what is the
most effective method for setting longer time outs if I am running the
query from ASP.NET page?
using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand())
{
cmd.CommandTimeout = 0;
}
Suggestion
It's better to filter out the record from database level...
Fetch all records from database and save it in a file. Access that file for any intermediate operations.
What you describe in Extract Transform Load (ETL). there are 2 options I'm aware of:
SSIS which is part of sql server
Rhino.ETL
I prefer Rhino.Etl as it's comletely written in C#, you can create scripts in Boo and it's much easier to test and compose ETL Processes. And the library is built to handle large sets of data, so memory management is built in.
One final note: while asp.net might be the entry point to start the indexing process, I wouldn't run the process within asp.net as it could take minutes or hours depending on the amount of records and processing.
instead have asp.net be the entry point to fires off a background task to process the records. Ideally, completely independent of asp.net so you avoid any timeout or shutdown issues.
Process your records in batches. You are going to have two main issues. (1) You need to index all of the existing records. (2) you will want to update the index with records that were added, updated or deleted. It might sound eaiser just to drop the index and recreate it, but it should be avoided if possible. Below is an example of processing the [Production].[TransactionHistory] from the AdventureWorks2008R2 database in batches of 10,000 records. It does not load all of the records into memory. Output on my local computer produces Processed 113443 records in 00:00:00.2282294. Obviously, this doesn't take into consideration remote computer and processing time for each record.
class Program
{
private static string ConnectionString
{
get { return ConfigurationManager.ConnectionStrings["db"].ConnectionString; }
}
static void Main(string[] args)
{
int recordCount = 0;
int lastId = -1;
bool done = false;
Stopwatch timer = Stopwatch.StartNew();
do
{
done = true;
IEnumerable<TransactionHistory> transactionDataRecords = GetTransactions(lastId, 10000);
foreach (TransactionHistory transactionHistory in transactionDataRecords)
{
lastId = transactionHistory.TransactionId;
done = false;
recordCount++;
}
} while (!done);
timer.Stop();
Console.WriteLine("Processed {0} records in {1}", recordCount, timer.Elapsed);
}
/// Get a new open connection
private static SqlConnection GetOpenConnection()
{
SqlConnection connection = new SqlConnection(ConnectionString);
connection.Open();
return connection;
}
private static IEnumerable<TransactionHistory> GetTransactions(int lastTransactionId, int count)
{
const string sql = "SELECT TOP(#count) [TransactionID],[TransactionDate],[TransactionType] FROM [Production].[TransactionHistory] WHERE [TransactionID] > #LastTransactionId ORDER BY [TransactionID]";
return GetData<TransactionHistory>((connection) =>
{
SqlCommand command = new SqlCommand(sql, connection);
command.Parameters.AddWithValue("#count", count);
command.Parameters.AddWithValue("#LastTransactionId", lastTransactionId);
return command;
}, DataRecordToTransactionHistory);
}
// funtion to convert a data record to the TransactionHistory object
private static TransactionHistory DataRecordToTransactionHistory(IDataRecord record)
{
TransactionHistory transactionHistory = new TransactionHistory();
transactionHistory.TransactionId = record.GetInt32(0);
transactionHistory.TransactionDate = record.GetDateTime(1);
transactionHistory.TransactionType = record.GetString(2);
return transactionHistory;
}
private static IEnumerable<T> GetData<T>(Func<SqlConnection, SqlCommand> commandBuilder, Func<IDataRecord, T> dataFunc)
{
using (SqlConnection connection = GetOpenConnection())
{
using (SqlCommand command = commandBuilder(connection))
{
using (IDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
T record = dataFunc(reader);
yield return record;
}
}
}
}
}
}
public class TransactionHistory
{
public int TransactionId { get; set; }
public DateTime TransactionDate { get; set; }
public string TransactionType { get; set; }
}

Categories