I need to make the code below atomic/fail or succeed as a single unit. How could I go about achieving that?
void Processor(Input input)
{
var mapper = new Mapper(recordDetails);
int remainingRecords = GetCountForRemainingRecords(recordDetails);
try
{
while (remainingRecords > 0)
{
mapper.CreateRecords(dataset);
Validate(dataset);
//the Save(dataset) uses SqlBulkCopy maps tables, transaction, and saves it..
Save(dataset);
//I cannot perform the operation below on the dataset directly because dataset doesn't have the records that is in the database
//the method below eventually calls a stored proc that sends a list of users that was recently created
OutdateDuplicateUsers(dataset.userTable);
remainingRecords = MethodToGetUpdatedCount();
}
}
catch (Exception exception)
{
//exception handler..
}
}
Now if my OutdateDuplicateUsers throws an exception, I would still end up with the accounts that Save method persisted. I do not want that to happen.
I want both Save and OutdateDuplicateUsers method to be atomic. I read about this great article about TransactionScope and seemed it is exactly what I want. However, I could not get it to work. The implementation seems straight forward reading from the article, but I couldn't get it working myself.
What I tried:
void Processor(Input input)
{
var mapper = new Mapper(recordDetails);
int remainingRecords = GetCountForRemainingRecords(recordDetails);
try
{
while (remainingRecords > 0)
{
using (var scope = new TransactionScope())
{
try
{
mapper.CreateRecords(dataset);
Validate(dataset);
//the method Save(dataset) is using SqlBulkCopy; maps tables, uses transaction, and saves it..
Save(dataset);
//I cannot perform this opertaion on the dataset directly because dataset doesn't have the records that is in the database
//the method below eventually calls a stored proc that sends a list of users that was recently created
OutdateDuplicateUsers(dataset.userTable);
remainingRecords = MethodToGetUpdatedCount();
scope.Complete();
}
catch (Exception)
{
//not both at the same time. I tried using both, one at a time though.
TransactionScope.Dispose();
TransactionScope.Current.Rollback();
//exception handler
}
}
}
}
}
update:
The dataset is a strongly typed dataset and is schema only. The CreateRecords and Validate method populates the data based on the business logic. The 'mapper' takes in recordDetails which is, for instance, a list of Users (updated the snippet).
What I mean by doesn't work is that if OutdateDuplicateUser() method throws an exception and cannot complete the outdating operation, I could still see that the records have been persisted in the database from Save(dataset) method, which I am trying to prevent.
Related
We are having trouble inserting into a filetable. This is our current constellation, I will try to explain it as detailed as possible:
Basically we have three tables:
T_Document (main metadata of a document)
T_Version (versioned metadata of a document)
T_Content (the binary content of a document, FileTable)
Our WCF service saves the documents and is being used by multiple persons. The service will open a transaction and will call the method SaveDocument which saves the documents:
//This is the point where the tranaction starts
using (IDBTransaction tran = db.GetTransaction())
{
try
{
m_commandFassade.SaveDocument(document, m_loginName, db, options, lastVersion);
tran.Commit();
return document;
}
catch
{
tran.Rollback();
throw;
}
}
The SaveDocument method looks like this:
public void SaveDocument(E2TDocument document, string login, IDBConnection db, DocumentUploadOptions options, int lastVersion)
{
document.GuardNotNull();
options.GuardNotNull();
if (lastVersion == -1)
{
//inserting new T_Document
SaveDocument(document, db);
}
else
{
//updating the existing T_Document
UpdateDocument(document, db); //document already exists, updating it
}
Guid contentID = Guid.NewGuid();
//inserting the content
SaveDocumentContent(document, contentID, db);
//inserting the new / initial version
SaveDocumentVersion(document, contentID, db);
}
Basically all the methods you see are either inserting or updating those three tables. The insert content query, that appears to make some trouble like this:
INSERT INTO T_Content
(stream_id
,file_stream
,name)
VALUES
(#ContentID
,#Content
,#Title)
And the method (please take this as pseudo code):
private void SaveDocumentContent(E2TDocument e2TDokument, Guid contentID, IDBConnection db)
{
using (m_log.CreateScope<MethodScope>(GlobalDefinitions.TracePriorityForData))
{
Command cmd = CommandFactory.CreateCommand("InsertContents");
cmd.SetParameter("ContentID", contentID);
cmd.SetParameter("Content", e2TDokument.Content);
string title = string.Concat(e2TDokument.Titel.RemoveIllegalPathCharacters(), GlobalDefinitions.UNTERSTRICH,
contentID).TrimToMaxLength(MaxLength_T_Contents_Col_Name, SuffixLength_T_Contents_Col_Name);
cmd.SetParameter("Title", title);
db.Execute(cmd);
}
}
I have no experience in deadlock-analysis, but the deadlock graphs show me that when inserting the content into the filetable, it appears to be deadlocked with another process also writing into the same table at the same time.
(the other side shows the same statement, my application log confirms two concurrent tries to save documents)
The same deadlock appears 30 times a day. I already shrinked the transaction to a minimum, removing all unneccessary selects, but yet I had no luck to solve this issue.
What I'm most curious about is how its possible to deadlock on an insert into a filetable. Are there internal things that are being executed that I'm not aware of. I saw some strange statements in the profiler trace on that table, that we dont use anywhere in the code, e.g.:
set #pathlocator = convert(hierachyid, #path_locator__bin)
and things like:
if exists (
select 1
from [LGOL_Content01].[dbo].[T_Contents]
where parent_path_locator = #path_locator
)
If you need any more details, please let me know. Any tips how to proceed would be awesome.
Edit 1:\
Following you find the execution plan for the T_Content insert:
So, after hours and hours and research and consulting with microsoft, the deadlock is actually a filetable / sql server related bug, which will be hotfixed by Microsoft.
I have an application that reads data from one database, and transforms that data into a new form and writes it into a new database. Some of the tables in the new database are made from multiple tables in the old database so there is a large amount of reading and writing going on. Here is the basic concept of the system:
public void TransferData()
{
OldEntities oldContext = new OldEntities()
NewEntities newContext = new NewEntities()
using(var transaction = newContext.Database.BeginTransaction())
{
try{
TransferTable(oldContext, newContext);
} catch (Exception e) {
transaction.Rollback();
}
}
}
public void TransferTable(OldEntities oldContext, NewEntities newContext)
{
List<Entity1> mainTable = oldContext.Where();
Parallel.ForEach(mainTable, (row) =>
{
using(NewEntities anotherNewContext = new NewContext())
{
anotherNewContext.Database.UseTransaction(newContext.Database.CurrentTransaction.UnderlyingTransaction);
// Do Work
}
});
}
This causes the following exception:
The transaction passed in is not associated with the current connection. Only transactions associated with the current connection may be used.
How can I get around this. The transaction will always be coming from a different EF context but I need them all to share the same transaction. I couldn't find a way to create the new context as a "child" of the original and I am trying to avoid creating a transaction entirely separate from the EF context. Any suggestions?
There is an excellent overview of transactions here which explains how to use transactions in a variety of contexts some of which are similar to yours. Rather than trying to fix your code as is it may be that a modified approach will help.
I assume you are using EF6
I am currently trying to save an EntityCollection that is populated with both new and Dirty Entity objects in different scenarios.
I have set up a transaction to roll back in the Event of failure while saving.
However, it always seems to fail and throws an Error...in both cases, saving a new or an existing EntityCollection.
I also have a method that picks and adds individual Entities i.e LanguagetranslationEntity to an Entitycollection that is defined as property in the class.
public EntityCollection<LanguageTranslationEntity> LanguagetranslationCollection { get; set; }
public void AddLanguageTranslationToCollection(LanguageTranslationEntity prompt,bool isnew)
{
//Add the prompt to the collection
LanguagetranslationCollection.Add(prompt);
Isnewcollection = isnew;
}
However, an exception is always thrown regardless of whether i try to save new or old entities like as shown below.
An exception was caught during the execution of an action query: Violation of PRIMARY KEY constraint 'PK_LanguageTranslations'. Cannot insert duplicate key in object 'dbo.LanguageTranslations'. The duplicate key value is (translation_10374, 1).
public void SaveLanguageTranslationCollection(DataAccessAdapter adapter)
{
using (DataAccessAdapter newadapter = adapter)
{
adapter.SaveEntityCollection(LanguagetranslationCollection);
}
}
Should i save each Entity on its Own?? and also, how should i use the SaveEntityCollection()?
I intend to use it for saving a number of LanguageTranslationEntities by populating them into an EntityCollection and saving them all at once,using a Transaction for purposes of Rollback in the Event an Exception is thrown.
Kindly help
The exception suggests that one of the entities inside LanguagetranslationCollection is marked as 'new' but the primary key is already used in your DB.
So, you don't have to save them individually, but it actually could help to identify what is the duplicate entity. Once you identify it, you can investigate further Why is it using an already used PK.
I finally figured it out :-)
Within every transaction, one must always remember that they shouldnt have any methods reinitializing the DataaccessAdapter i.e
using(var adapter = new DataAccessAdapter())
{
//Do saving here
SaveLanguageTranslationCollection(adapter);
};
this is what causes the OurOfSyncException to be thrown,as the state data is cleared and initialized a new for the transaction that had been created with the initial dataAccessAdapter.
here is an Example.
public void Save(PromptEntity prompt)
{
using (var adapter = new DataAccessAdapter())
{
//start transaction
adapter.StartTransaction(IsolationLevel.ReadCommitted, "SavePrompt");
try
{
//saving occurs here.
adapter.SaveEntity(prompt);
SaveLanguageTranslationCollection(adapter);
adapter.Commit();
}
catch (Exception)
{
adapter.Rollback();
throw;
}
}
}
you must pass the same adapter running the transaction to the methods saving. i.e
private void savetranslationprompt(LanguageTranslationEntity translationentity,
DataAccessAdapter adapter)
{
adapter.SaveEntity(translationentity);
}
I am working on an existing application. This application reads data from a huge file and then, after doing some calculations, it stores the data in another table.
But the loop doing this (see below) is taking a really long time. Since the file sometimes contains 1,000s of records, the entire process takes days.
Can I replace this foreach loop with something else? I tried using Parallel.ForEach and it did help. I am new to this, so will appreciate your help.
foreach (record someredord Somereport.r)
{
try
{
using (var command = new SqlCommand("[procname]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(…);
IAsyncResult result = command.BeginExecuteReader();
while (!result.IsCompleted)
{
System.Threading.Thread.Sleep(10);
}
command.EndExecuteReader(result);
}
}
catch (Exception e)
{
…
}
}
After reviewing the answers , I removed the Async and used edited the code as below. But this did not improve performance.
using (command = new SqlCommand("[sp]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
foreach (record someRecord in someReport.)
{
command.Parameters.Clear();
command.Parameters.Add(....)
command.Prepare();
using (dr = command.ExecuteReader())
{
while (dr.Read())
{
if ()
{
}
else if ()
{
}
}
}
}
}
Instead of looping the sql connection so many times, ever consider extracting the whole set of data out from sql server and process the data via the dataset?
Edit: Decided to further explain what i meant..
You can do the following, pseudo code as follow
Use a select * and get all information from the database and store them into a list of the class or dictionary.
Do your foreach(record someRecord in someReport) and do the condition matching as usual.
Step 1: Ditch the try at async. It isn't implemented properly and you're blocking anyway. So just execute the procedure and see if that helps.
Step 2: Move the SqlCommand outside of the loop and reuse it for each iteration. that way you don't incurr the cost of creating and destroying it for every item in your loop.
Warning: Make sure you reset/clear/remove parameters you don't need from the previous iteration. We did something like this with optional parameters and had 'bleed-thru' from the previous iteration because we didn't clean up parameters we didn't need!
Your biggest problem is that you're looping over this:
IAsyncResult result = command.BeginExecuteReader();
while (!result.IsCompleted)
{
System.Threading.Thread.Sleep(10);
}
command.EndExecuteReader(result);
The entire idea of the asynchronous model is that the calling thread (the one doing this loop) should be spinning up ALL of the asynchronous tasks using the Begin method before starting to work with the results with the End method. If you are using Thread.Sleep() within your main calling thread to wait for an asynchronous operation to complete (as you are here), you're doing it wrong, and what ends up happening is that each command, one at a time, is being called and then waited for before the next one starts.
Instead, try something like this:
public void BeginExecutingCommands(Report someReport)
{
foreach (record someRecord in someReport.r)
{
var command = new SqlCommand("[procname]", sqlConn);
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(…);
command.BeginExecuteReader(ReaderExecuted,
new object[] { command, someReport, someRecord });
}
}
void ReaderExecuted(IAsyncResult result)
{
var state = (object[])result.AsyncState;
var command = state[0] as SqlCommand;
var someReport = state[1] as Report;
var someRecord = state[2] as Record;
try
{
using (SqlDataReader reader = command.EndExecuteReader(result))
{
// work with reader, command, someReport and someRecord to do what you need.
}
}
catch (Exception ex)
{
// handle exceptions that occurred during the async operation here
}
}
In SQL on the other end of a write is a (one) disk. You rarely can write faster in parallel. In fact in parallel often slows it down due to index fragmentation. If you can sort the data by primary (clustered) key prior to loading. In a big load even disable other keys, load data rebuild keys.
Not really sure what are doing in the asynch but for sure it was not doing what you expected as it was waiting on itself.
try
{
using (var command = new SqlCommand("[procname]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
foreach (record someredord Somereport.r)
{
command.Parameters.Clear()
command.Parameters.Add(…);
using (var rdr = command.ExecuteReader())
{
while (rdr.Read())
{
…
}
}
}
}
}
catch (…)
{
…
}
As we were talking about in the comments, storing this data in memory and working with it there may be a more efficient approach.
So one easy way to do that is to start with Entity Framework. Entity Framework will automatically generate the classes for you based on your database schema. Then you can import a stored procedure which holds your SELECT statement. The reason I suggest importing a stored proc into EF is that this approach is generally more efficient than doing your queries in LINQ against EF.
Then run the stored proc and store the data in a List like this...
var data = db.MyStoredProc().ToList();
Then you can do anything you want with that data. Or as I mentioned, if you're doing a lot of lookups on primary keys then use ToDictionary() something like this...
var data = db.MyStoredProc().ToDictionary(k => k.MyPrimaryKey);
Either way, you'll be working with your data in memory at this point.
It seems executing your SQL command puts lock on some required resources and that's the reason enforced you to use Async methods (my guess).
If the database in not in use, try an exclusive access to it. Even then in there are some internal transactions due to data-model complexity consider consulting to database designer.
static Object LockEx=new Object();
public void SaveMyData(IEnumerable<MyData> list)
{
lock (LockEx)
{
using (PersistencyContext db = new PersistencyContext())
{
foreach (var el in list)
{
try
{
db.MyData.Add(el);
db.SaveChanges();
}
catch (DbUpdateException)
{
db.Entry(el).State = EntityState.Modified;
db.SaveChanges();
}
}
}
}
}
This methods is called from multiple threads. Right now I use a static lock to avoid 2 threads to save data at the same time. Though this is wrong because I only want to save data. The catch is used to create an update query in case the insert (Add) fails because the entry already exists.
What happens if I remove the lock. How will the SaveChanges work? How should my code look like? Thanks
I would remove the lock because the database already handles concurrency anyway by design, then I will also verify if the record exists before trying to add it, then I would do the add or update depending on this result. Just to avoid exceptions because they are performance killers.
Building on Davide's answer, you could also call SaveChanges once after you added all the new entities. That should be faster.