I am building a batch processing system. Batches of Units come in quantities from 20-1000. Each Unit is essentially a hierarchy of models (one main model and many child models). My task involves saving each model hierarchy to a database as a single transaction (either each hierarchy commits or it rolls back). Unfortunately EF was unable to handle two portions of the model hierarchy due to their potential to contain thousands of records.
What I've done to resolve this is set up SqlBulkCopy to handle these two potentially high count models and let EF handle the rest of the inserts (and referential integrity).
Batch Loop:
foreach (var unitDetails in BatchUnits)
{
var unitOfWork = new Unit(unitDetails);
Task.Factory.StartNew(() =>
{
unitOfWork.ProcessX(); // data preparation
unitOfWork.ProcessY(); // data preparation
unitOfWork.PersistCase();
});
}
Unit:
class Unit
{
public PersistCase()
{
using (var dbContext = new CustomDbContext())
{
// Need an explicit transaction so that
// EF + SqlBulkCopy act as a single block
using (var scope = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions() {
IsolationLevel = System.Transaction.IsolationLevel.ReadCommitted
}))
{
// Let EF Insert most of the records
// Note Insert is all it is doing, no update or delete
dbContext.Units.Add(thisUnit);
dbContext.SaveChanges(); // deadlocks, DbConcurrencyExceptions here
// Copy Auto Inc Generated Id (set by EF) to DataTables
// for referential integrity of SqlBulkCopy inserts
CopyGeneratedId(thisUnit.AutoIncrementedId, dataTables);
// Execute SqlBulkCopy for potentially numerous model #1
SqlBulkCopy bulkCopy1 = new SqlBulkCopy(...);
...
bulkCopy1.WriteToServer(dataTables["#1"]);
// Execute SqlBulkCopy for potentially number model #2
SqlBulkCopy bulkCopy2 = new SqlBulkCopy(...);
...
bulkCopy2.WriteToServer(dataTables["#2"]);
// Commit transaction
scope.Complete();
}
}
}
}
Right now I'm essentially stuck between a rock and a hard place. If I leave the IsolationLevel set to ReadCommitted, I get deadlocks between EF INSERT statements in different Tasks.
If I set the IsolationLevel to ReadUncommitted (which I thought would be fine since I'm not doing any SELECTs) I get DbConcurrencyExceptions.
I've been unable to find any good information about DbConcurrencyExceptions and Entity Framework but I'm guessing that ReadUncommitted is essentially causing EF to receive invalid "rows inserted" information.
UPDATE
Here is some background information on what is actually causing my deadlocking issues while doing INSERTS:
http://connect.microsoft.com/VisualStudio/feedback/details/562148/how-to-avoid-using-scope-identity-based-insert-commands-on-sql-server-2005
Apparently this same issue was present a few years ago when Linq To SQL came out and Microsoft fixed it by changing how scope_identity() gets selected. Not sure why their position has changed to this being a SQL Server problem when the same issue came up with Entity Framework.
This issue is explained fairly well here: http://connect.microsoft.com/VisualStudio/feedback/details/562148/how-to-avoid-using-scope-identity-based-insert-commands-on-sql-server-2005
Essentially its an internal EF issue. I migrated my code to use Linq To SQL and it now works fine (no longer does the unnecessary SELECT for the identity value).
Relevant quote from the exact same issue in Linq To Sql which was fixed:
When a table has an identity column, Linq to SQL generates extremely
inefficient SQL for insertion into such a table. Assume the table is
Order and the identiy column is Id. The SQL generated is:
exec sp_executesql N'INSERT INTO [dbo].[Order]([Colum1], [Column2])
VALUES (#p0, #p1)
SELECT [t0].[Id] FROM [dbo].[Order] AS [t0] WHERE [t0].[Id] =
(SCOPE_IDENTITY()) ',N'#p0 int,#p1 int,#p0=124,#p1=432
As one can see instead of returning SCOPE_IDENTITY() directly by using
'SELECT SCOPE_IDENTITY()', the generated SQL performs a SELECT on the
Id column using the value returned by SCOPE_IDENTITY(). When the
number of the records in the table is large, this significantly slows
down the insertion. When the table is partitioned, the problem gets
even worse.
Related
I have a strange performance issue with executing a simple merge SQL command on Entity Framework 6.
First my Entity Framework code:
var command = #"MERGE [StringData] AS TARGET
USING (VALUES (#DCStringID_Value, #TimeStamp_Value)) AS SOURCE ([DCStringID], [TimeStamp])
ON TARGET.[DCStringID] = SOURCE.[DCStringID] AND TARGET.[TimeStamp] = SOURCE.[TimeStamp]
WHEN MATCHED THEN
UPDATE
SET [DCVoltage] = #DCVoltage_Value,
[DCCurrent] = #DCCurrent_Value
WHEN NOT MATCHED THEN
INSERT ([DCStringID], [TimeStamp], [DCVoltage], [DCCurrent])
VALUES (#DCStringID_Value, #TimeStamp_Value, #DCVoltage_Value, #DCCurrent_Value);";
using (EntityModel context = new EntityModel())
{
for (int i = 0; i < 100; i++)
{
var entity = _buffer.Dequeue();
context.ContextAdapter.ObjectContext.ExecuteStoreCommand(command, new object[]
{
new SqlParameter("#DCStringID_Value", entity.DCStringID),
new SqlParameter("#TimeStamp_Value", entity.TimeStamp),
new SqlParameter("#DCVoltage_Value", entity.DCVoltage),
new SqlParameter("#DCCurrent_Value", entity.DCCurrent),
});
}
}
Execution time ~20 seconds.
This looks a little bit slow so I tried the same command to run direct in management studio (also 100 times in a row).
SQL Server Management Studio:
Execution time <1 second.
Ok that is strange!?
Some tests:
First I compare both execution plans (Entity Framework and SSMS) but they are absolutely identical.
Second I tried is using a transaction inside my code.
using (PowerdooModel context = PowerdooModel.CreateModel())
{
using (var dbContextTransaction = context.Database.BeginTransaction())
{
try
{
for (int i = 0; i < 100; i++)
{
context.ContextAdapter.ObjectContext.ExecuteStoreCommand(command, new object[]
{
new SqlParameter("#DCStringID_Value", entity.DCStringID),
new SqlParameter("#TimeStamp_Value", entity.TimeStamp),
new SqlParameter("#DCVoltage_Value", entity.DCVoltage),
new SqlParameter("#DCCurrent_Value", entity.DCCurrent),
});
}
dbContextTransaction.Commit();
}
catch (Exception)
{
dbContextTransaction.Rollback();
}
}
}
Third I added 'OPTION(recompile)' to avoid parameter sniffing.
Execution time still ~10 seconds. What is still very poor performance.
Question: what am I doing wrong? Please give me a hint.
-------- Some more tests - edited 18.11.2016 ---------
If I execute the commands inside a transaction (like above), the following times comes up:
Duration complete: 00:00:06.5936006
Average Command: 00:00:00.0653457
Commit: 00:00:00.0590299
Is it not strange that the commit nearly takes no time and the average command takes nearly the same time?
In SSMS you are running one long batch that contains 100 separate MERGE statements. In C# you are running 100 separate batches. Obviously it is longer.
Running 100 separate batches in 100 separate transactions is obviously longer than 100 batches in 1 transaction. Your measurements confirm that and show you how much longer.
To make it efficient use a single MERGE statement that processes all 100 rows from a table-valued parameter in one go. See also Table-Valued Parameters for .NET Framework
Often a table-valued parameter is a parameter of a stored procedure, but you don't have to use a stored procedure. It could be a single statement, but instead of multiple simple scalar parameters you'd pass the whole table at once.
I never used entity framework, so I can't show you a C# example how to call it. I'm sure if you search for "how to pass table-valued parameter in entity framework" you'll find an example. I use DataTable class to pass a table as a parameter.
I can show you an example of T-SQL stored procedure.
At first you define a table type that pretty much follows the definition of your StringData table:
CREATE TYPE dbo.StringDataTableType AS TABLE(
DCStringID int NOT NULL,
TimeStamp datetime2(0) NOT NULL,
DCVoltage float NOT NULL,
DCCurrent float NOT NULL
)
Then you use it as a type for the parameter:
CREATE PROCEDURE dbo.MergeStringData
#ParamRows dbo.StringDataTableType READONLY
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRANSACTION;
BEGIN TRY
MERGE INTO dbo.StringData WITH (HOLDLOCK) as Dst
USING
(
SELECT
TT.DCStringID
,TT.TimeStamp
,TT.DCVoltage
,TT.DCCurrent
FROM
#ParamRows AS TT
) AS Src
ON
Dst.DCStringID = Src.DCStringID AND
Dst.TimeStamp = Src.TimeStamp
WHEN MATCHED THEN
UPDATE SET
Dst.DCVoltage = Src.DCVoltage
,Dst.DCCurrent = Src.DCCurrent
WHEN NOT MATCHED BY TARGET THEN
INSERT
(DCStringID
,TimeStamp
,DCVoltage
,DCCurrent)
VALUES
(Src.DCStringID
,Src.TimeStamp
,Src.DCVoltage
,Src.DCCurrent)
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
-- TODO: handle the error
ROLLBACK TRANSACTION;
END CATCH;
END
Again, it doesn't have to be a stored procedure, it can be just a MERGE statement with one table-valued parameter.
I'm pretty sure that it would be much faster than your loop with 100 separate queries.
Details on why there should be HOLDLOCK hint with MERGE: “UPSERT” Race Condition With MERGE
A side note:
It is not strange that Commit in your last test is very fast. It doesn't do much, because everything is already written to the database. If you tried to do Rollback, that would take some time.
It looks like the difference is that you are using OPTION (RECOMPILE) in your EF code only, but when you run the code in SSMS there is no recompile.
Recompiling 100 times would certainly add some time to execution.
Related to the answer from #Vladimir Baranov I take a look how Entity Framework supports bulk operations (for me bulk merge).
Short answer... no! Bulk delete, update and merge operations are not supported.
Then I found this sweet little library!
zzzproject Entity Framework Extensions
It provides a finished extension that works like #Vladimir Baranov descripted.
Create a temporary table in SQL Server.
Bulk Insert data with .NET SqlBulkCopy into the temporary table.
Perform a SQL statement between the temporary table and the destination table.
Drop the temporary table from the SQL Server.
You can find a short interview by Jonathan Allen about the functions and how it can dramatically improve Entity Framework performance with bulk operations.
For me it really kicks up the performance from 10 seconds to under 1 second. Nice!
I want to be clear, I'm not working or be related to them. This is no advertisement. Yes the library is commercial, but basic version is free.
I need to empty a table and then enter around 1000 rows into it. I need for the whole thing to be a transaction, however, so I'm not stuck with an empty (or partially empty) table if any of the inserts fail.
So I experimented with the code below, where the insert (.Add) will intentionally fail. When running it, however, the call to the delete stored procedure (prDeleteFromUserTable) does not roll back with the transaction. I'm left with an empty table and no inserts.
using (var context = new Entities(_strConnection))
{
using (var transaction = new TransactionScope())
{
//delete all rows in the table
context.prDeleteFromUserTable();
//add a row, which I made intentionally make fail to test the transaction
context.UserTable.Add(row);
context.SaveChanges();
//end the transaction
transaction.Complete();
}
}
How would I accomplish this using Linq-to-SQL?
LINQ is for queries (Language Integrated Query) and is not designed for BULK deletion and insertion. A good solution would be to use SQL to delete all rows DELETE FROM myTable and the SqlBulkCopy for the 1000 inserts.
I have a database schema that includes tables Orders, Options [Id, OptionGroupId, ...], OrderOptions [Id, OptionId] and an associative table OrderSheetOptions [OrderSheetId, OrderOptionId].
My C# code is creating an OrderOption EF object (an object that maps to the database table):
// Select all default options
var options = Sheet.OptionGroups.Select(og => og.Option);
foreach (var option in options)
{
// Add all default options
OrderOptions
.Add(new OrderOption
{
Option = option
});
}
The problem seems to arise because I am setting the Option property of OrderOption when it gets created. This results in EF running this query:
exec sp_executesql N'SELECT
[Extent1].[Id] AS [Id],
[Extent1].[OptionId] AS [OptionId]
FROM [dbo].[OrderOptions] AS [Extent1]
WHERE [Extent1].[OptionId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=1
Every Order has a default OrderOption with an OptionId = 1, so as you can see from the above query, every record is returned. As this table grows, this query is getting very slow and is severely impacting web site UX - delays of minutes.
I am typically only adding 3 records, as those are the default options. I have added an index to OrderOptions, but it doesn't help because every record in that table has an OptionId = 1, so the automatic EF query returns all records.
It looks like EF is querying the entire OrderOptions table to ensure that a duplicate object is not being created, but the OrderOption is Order-specific, so it only has to worry about the current order. However, Options is not directly tied to Orders; only indirectly via OrderSheetOptions. So, the EF query can't restrict itself to the current order. Can anyone tell me how I could optimize this so that EF doesn't have to query the entire OrderOptions table just to add a new item?
Thanks!
I have a database with 'transactions' and 'transaction_lines'. A transaction has the basic shared details, and the transaction lines holds the values of components that make up the transaction. In traditional SQL, I'd use a SQL Transaction (Sorry, we have an ambiguous word now...) and then I'd INSERT INTO Transaction ... get the IDENTITY key value from the new row in the transaction table, then insert into my transaction_line table, using the identity column from the Transaction table as the foreign key.
Is there a good way to do this with linq?
Here's what I have so far:
account_transaction ac = new account_transaction
{
transaction_date = DateTime.Now,
is_credit = isCredit,
account = (db.accounts.FirstOrDefault(a => a.account_id == accountid))
};
db.AddToaccount_transaction(ac);
db.SaveChanges();
I think inbetween the 'AddToaccount_transaction(ac)' and the 'db.SaveChanges()', I need to add my transaction_lines.
You should just be able to create new transaction_line objects and set the foreign entity to ac. In LINQ-to-SQL, you work with entity objects instead of explicit keys. By setting the entity object to the ac object you will instruct the layer to determine what the key is and set it for you.
Answer above is correct but I would wrap it all in a TransactionScope. That way if any part of that query fails it will roll back the transaction. Just remember to call the scope.Complete() method prior to closing connection.
http://msdn.microsoft.com/en-us/library/ms251686(VS.80).aspx
I have a C# application, using ADO.Net to connect to MSSQL
I need to create the table (with a dynamic number of columns), then insert many records, then do a select back out of the table.
Each step must be a separate C# call, although I can keep a connection/transaction open for the duration.
There are two types of temp tables in SQL Server, local temp tables and global temp tables. From the BOL:
Prefix local temporary table names with single number sign (#tablename), and prefix global temporary table names with a double number sign (##tablename).
Local temp tables will live for just your current connection. Globals will be available for all connections. Thus, if you re-use (and you did say you could) the same connection across your related calls, you can just use a local temp table without worries of simultaneous processes interfering with each others' temp tables.
You can get more info on this from the BOL article, specifically under the "Temporary Tables" section about halfway down.
The issue is that #Temp tables exist only within the Connection AND the Scope of the execution.
When the first call from C# to SQL completes, control passes up to a higher level of scope.
This is just as if you had a T-SQL script that called two stored procedures. Each SP created a table named #MyTable. The second SP is referencing a completly different table than the first SP.
However, if the parent T-SQL code created the table, both SP's could see it, but they can't see each others.
The solution here is to use ##Temp tables. They cross scope and connections.
The danger though is that if you use a hard coded name, then two instances of your program running at the same time could see the same table. So dynamically set the table name to something that will be always be unique.
You might take a look at the repository pattern as far as dealing with this concept in C#. This allows you to have a low level repository layer for data access where each method performs a task. But the connection is passed in to the method and actual actions are performed with in a transaction scope. This means you can theoretically call many different methods in your data access layer (implemented as repository) and if any of them fail you can roll back the whole operation.
http://martinfowler.com/eaaCatalog/repository.html
The other aspects of your question would be handled by standard sql where you can dynamically create a table, insert into it, delete from it, etc. The tricky part here is keeping one transaction away from another transaction. You might look to using temp tables...or you might simply have a 2nd database specifically for performing this dynamic table concept.
Personaly I think you are doing this the hard way. Do all the steps in one stored proc.
One way to extend the scope/lifetime of your single pound sign #Temp is to use a transaction. For as long as the transaction lives, the #temp table continues to exist. You can also use TransactionScope to give you the same effect, because TransactionScope creates an ambient transaction in the background.
The below test methods pass, proving that the #temp table contents survive between executions.
This may be preferable to using double-pound temp tables, because ##temp tables are global objects. If you have more than one client that happens to use the same ##temp table name, then they could step on each other. Also, ##temp tables do not survive a server restart, so their lifespan is technically not forever. IMHO it's best to control the scope of #temp tables because they're meant to be limited.
using System.Transactions;
using Dapper;
using Microsoft.Data.SqlClient;
using IsolationLevel = System.Data.IsolationLevel;
namespace TestTempAcrossConnection
{
[TestClass]
public class UnitTest1
{
private string _testDbConnectionString = #"Server=(localdb)\mssqllocaldb;Database=master;trusted_connection=true";
class TestTable1
{
public int Col1 { get; set; }
public string Col2 { get; set; }
}
[TestMethod]
public void TempTableBetweenExecutionsTest()
{
using var conn = new SqlConnection(_testDbConnectionString);
conn.Open();
var tran = conn.BeginTransaction(IsolationLevel.ReadCommitted);
conn.Execute("create table #test1(col1 int, col2 varchar(20))", transaction: tran);
conn.Execute("insert into #test1(col1,col2) values (1, 'one'),(2,'two')", transaction: tran);
var tableResult = conn.Query<TestTable1>("select col1, col2 from #test1", transaction: tran).ToList();
Assert.AreEqual(1, tableResult[0].Col1);
Assert.AreEqual("one", tableResult[0].Col2);
tran.Commit();
}
[TestMethod] public void TempTableBetweenExecutionsScopeTest()
{
using var scope = new TransactionScope();
using var conn = new SqlConnection(_testDbConnectionString);
conn.Open();
conn.Execute("create table #test1(col1 int, col2 varchar(20))");
conn.Execute("insert into #test1(col1,col2) values (1, 'one'),(2,'two')");
var tableResult = conn.Query<TestTable1>("select col1, col2 from #test1").ToList();
Assert.AreEqual(2, tableResult[1].Col1);
Assert.AreEqual("two", tableResult[1].Col2);
scope.Complete();
}
}
}