Code like this:
var compIds = from p in packinglist.List
select p.ComponentId;
var components = from c in context.Components
where compIds.Contains(c.Id)
select c;
foreach (var item in components)
{
item.CurrentSiteId = packinglist.DestinationId;
}
context.SaveChanges();
Ends up issuing lots of SQL Statements like
update [dbo].[Components] set [CurrentSiteId] = #0 where ([Id] = #1)
Is there a way to instruct EF (Code First) to issue the following statement:
update [dbo].[Components] set [CurrentSiteId] = #0 where ([Id] in (....))
Or should I look into using the one of the SQLQuery methods available, or a seperate tool like Dapper or massive or ...?
There is not currently a way to perform bulk updates in EF 4 out of the box. There are some very long, complicated work arounds that end up generating SQL though. I suggest using a stored procedure or T-SQL. Here's a quick T-SQL snippet that I've used in the past:
using (var context = new YourEntities())
{
context.ExecuteStoreCommand(
#"UPDATE Components SET CurrentSiteId = 1 WHERE ID IN(1,2,3,4)");
}
The simplest answer for this is just to write that query and use DbContext.SQLQuery() to run it. As mentioned, there's no way to do this in EF itself.
Related
This question already has answers here:
Why Entity Framework performs faster than Dapper in direct select statement [closed]
(4 answers)
Closed 7 months ago.
I read that Dapper is faster than EF (at least at retrieving data) and I want to confirm that so I am comparing Dapper and EntityFramework with the help of BenchmarkDotNet.
So I tried this...
[Benchmark]
public Player EntityFramework_GetByName()
{
using (ApplicationDbContext context = new())
{
return context.Players.FirstOrDefault(x => x.FirstName == _name);
}
}
[Benchmark]
public Player Dapper_GetByName()
{
using (SqlConnection conn = new(Database.ConnectionString))
{
return conn.QueryFirstOrDefault<Player>($"SELECT * FROM Players WHERE FirstName = '{_name}'");
}
}
But the result are not what I expecting...
Then I read here about the column type "problem" and how that can affect the performance, so I change the type of the column to NVarchar with max length of 100 and my code for the Dapper to this
[Benchmark]
public Player Dapper_GetByName()
{
using (SqlConnection conn = new(Database.ConnectionString))
{
return conn.QueryFirstOrDefault<Player>($"SELECT * FROM Players WHERE FirstName = #name", new
{ #name = new DbString { Value = _name, IsAnsi = false } });
}
}
The results of the benchmark tests are the following..
Method
Mean
Error
StdDev
Allocated
Dapper_GetByName
41,092.8 us
1,400.39 us
4,085.0 us
4 KB
EntityFramework_GetByName
2,971.6 us
305.43 us
895.8 us
110 KB
The difference is very big. Is there a way to improve this?
Uhm, maybe you should not compare
// Open and Close a completely new database connection
using (SqlConnection conn = new(Database.ConnectionString))
vs
// Create a new Unit of Work / Transaction
using (ApplicationDbContext context = new())
Benchmark only the inner part:
return conn.QueryFirstOrDefault<Player>($"SELECT * FROM Players WHERE FirstName = '{_name}'");
I think this example shows very clearly the responsibility of SQL query generation when using Dapper, CA.Blocks.DataAccess or ADO.NET directly. When using these packages for accessing the database the developer is entirely in charge of the SQL query, its projection and execution. When using EF the responsibility of generating the query is removed from the developer and delegated to EF. This is a double-edged sword and can result in good queries as well as very bad queries. Most of the performance gains made in Dapper are from having full control over the SQL and eliminating bad SQL generation. The converse is also true, most of the performance problems with Dapper when compared to EF are due to EF creating a better query.
So what is happening here. In simple terms EF has looked at the request and has knowledge that you only what the first record FirstOrDefault so its query generation has resulted in
SELECT TOP 1 * FROM … WHERE…
The Dapper query you are making the comparison with is
SELECT * FROM … WHERE …
So the difference I suspect is purely on SQL. The Test database used, probably has many records in the Person table. Given the numbers it is likely that there is no index on name resulting in a Table Scan to find the matching data.
In the query generated by EF the database can stop the execute as soon as it finds the first record, in the Dapper example the database assembles the full record set with all the matches based on name then sends that row-set. Dapper is simply reading the first row and closing the connection.
To make this a fair comparison you need to change the query to be top 1. Like
[Benchmark]
public Player Dapper_GetByName()
{
using (SqlConnection conn = new(Database.ConnectionString))
{
return conn.QueryFirstOrDefault<Player>($"SELECT Top 1 * FROM Players WHERE FirstName = #name", new
{ #name = new DbString { Value = _name, IsAnsi = false } });
}
}
Also, the decision to go with Dapper for performance means you need to get to know and love SQL.
Without writing an entire foreach loop is there a way to do a Update/Set in LINQ to Entities?
Using EF 6.x
Simple update query:
UPDATE stop_detail
SET cap_unique_id = b.Delivery_Location_Id
FROM order_detail b
WHERE Stop_Detail.CAP_Unique_Id IS NULL AND ((b.customer_id = 20 OR b.customer_id = 291) AND b.id = stop_detail.order_detail_id AND stop_type = 1)
all the context name are the same.
I normally end up writing about 30 lines of C# code to do this and I know there has to be a better way!
Whether you can and whether you should are two different things.
Here's how you can.
Example from EF6 Raw SQL Queries
using (var context = new BloggingContext())
{
context.Database.ExecuteSqlCommand(
"UPDATE dbo.Blogs SET Name = 'Another Name' WHERE BlogId = 1");
}
Hint: you probably shouldn't
We're analyzing Azure "Query Performance Insight" to look for expensive queries, the problem is that, there is no way to relate SQL generated vs Entity Framework query.
Is there any extension method or anything else to do something like this:
SQL generated:
-- BlahMethod
SELECT Id
FROM Table1
Entity Framework cmd:
Context.Table1.Naming("BlahMethod").ToList()
Or even better:
Context.Table1.ToList() // intercept sql generated by EF and put through reflection the Method and Namespace "MyAssembly.Foo.MyMethodName"
SQL Generated:
-- MyAssembly.Foo.MyMethodName
SELECT Id
FROM Table1
Yes, look at this article Logging and Intercepting Database Operations.
It can be as simple as using Console.Write:
using (var context = new BlogContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
Or you can use a log class:
using (var db = new MyDBContext())
{
db.Database.Log = s => Log.TraceVerbose("DB Context:{0}", s);
...
I have a List containing ids that I want to insert into a temp table using Dapper in order to avoid the SQL limit on parameters in the 'IN' clause.
So currently my code looks like this:
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
return db.Query<int>(
#"SELECT a.animalID
FROM
dbo.animalTypes [at]
INNER JOIN animals [a] on a.animalTypeId = at.animalTypeId
INNER JOIN edibleAnimals e on e.animalID = a.animalID
WHERE
at.animalId in #animalIds", new { animalIds }).ToList();
}
}
The problem I need to solve is that when there are more than 2100 ids in the animalIds list then I get a SQL error "The incoming request has too many parameters. The server supports a maximum of 2100 parameters".
So now I would like to create a temp table populated with the animalIds passed into the method. Then I can join the animals table on the temp table and avoid having a huge "IN" clause.
I have tried various combinations of syntax but not got anywhere.
This is where I am now:
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
db.Execute(#"SELECT INTO #tempAnmialIds #animalIds");
return db.Query<int>(
#"SELECT a.animalID
FROM
dbo.animalTypes [at]
INNER JOIN animals [a] on a.animalTypeId = at.animalTypeId
INNER JOIN edibleAnimals e on e.animalID = a.animalID
INNER JOIN #tempAnmialIds tmp on tmp.animalID = a.animalID).ToList();
}
}
I can't get the SELECT INTO working with the list of IDs. Am I going about this the wrong way maybe there is a better way to avoid the "IN" clause limit.
I do have a backup solution in that I can split the incoming list of animalIDs into blocks of 1000 but I've read that the large "IN" clause sufferes a performance hit and joining a temp table will be more efficient and it also means I don;t need extra 'splitting' code to batch up the ids in to blocks of 1000.
Ok, here's the version you want. I'm adding this as a separate answer, as my first answer using SP/TVP utilizes a different concept.
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
// This Open() call is vital! If you don't open the connection, Dapper will
// open/close it automagically, which means that you'll loose the created
// temp table directly after the statement completes.
db.Open();
// This temp table is created having a primary key. So make sure you don't pass
// any duplicate IDs
db.Execute("CREATE TABLE #tempAnimalIds(animalId int not null primary key);");
while (animalIds.Any())
{
// Build the statements to insert the Ids. For this, we need to split animalIDs
// into chunks of 1000, as this flavour of INSERT INTO is limited to 1000 values
// at a time.
var ids2Insert = animalIds.Take(1000);
animalIds = animalIds.Skip(1000).ToList();
StringBuilder stmt = new StringBuilder("INSERT INTO #tempAnimalIds VALUES (");
stmt.Append(string.Join("),(", ids2Insert));
stmt.Append(");");
db.Execute(stmt.ToString());
}
return db.Query<int>(#"SELECT animalID FROM #tempAnimalIds").ToList();
}
}
To test:
var ids = LoadAnimalTypeIdsFromAnimalIds(Enumerable.Range(1, 2500).ToList());
You just need to amend your select statement to what it originally was. As I don't have all your tables in my environment, I just selected from the created temp table to prove it works the way it should.
Pitfalls, see comments:
Open the connection at the beginning, otherwise the temp table will
be gone after dapper automatically closes the connection right after
creating the table.
This particular flavour of INSERT INTO is limited
to 1000 values at a time, so the passed IDs need to be split into
chunks accordingly.
Don't pass duplicate keys, as the primary key on the temp table will not allow that.
Edit
It seems Dapper supports a set-based operation which will make this work too:
public IList<int> LoadAnimalTypeIdsFromAnimalIdsV2(IList<int> animalIds)
{
// This creates an IEnumerable of an anonymous type containing an Id property. This seems
// to be necessary to be able to grab the Id by it's name via Dapper.
var namedIDs = animalIds.Select(i => new {Id = i});
using (var db = new SqlConnection(this.connectionString))
{
// This is vital! If you don't open the connection, Dapper will open/close it
// automagically, which means that you'll loose the created temp table directly
// after the statement completes.
db.Open();
// This temp table is created having a primary key. So make sure you don't pass
// any duplicate IDs
db.Execute("CREATE TABLE #tempAnimalIds(animalId int not null primary key);");
// Using one of Dapper's convenient features, the INSERT becomes:
db.Execute("INSERT INTO #tempAnimalIds VALUES(#Id);", namedIDs);
return db.Query<int>(#"SELECT animalID FROM #tempAnimalIds").ToList();
}
}
I don't know how well this will perform compared to the previous version (ie. 2500 single inserts instead of three inserts with 1000, 1000, 500 values each). But the doc suggests that it performs better if used together with async, MARS and Pipelining.
In your example, what I can't see is how your list of animalIds is actually passed to the query to be inserted into the #tempAnimalIDs table.
There is a way to do it without using a temp table, utilizing a stored procedure with a table value parameter.
SQL:
CREATE TYPE [dbo].[udtKeys] AS TABLE([i] [int] NOT NULL)
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[myProc](#data as dbo.udtKeys readonly)AS
BEGIN
select i from #data;
END
GO
This will create a user defined table type called udtKeys which contains just one int column named i, and a stored procedure that expects a parameter of that type. The proc does nothing else but to select the IDs you passed, but you can of course join other tables to it. For a hint regarding the syntax, see here.
C#:
var dataTable = new DataTable();
dataTable.Columns.Add("i", typeof(int));
foreach (var animalId in animalIds)
dataTable.Rows.Add(animalId);
using(SqlConnection conn = new SqlConnection("connectionString goes here"))
{
var r=conn.Query("myProc", new {data=dataTable},commandType: CommandType.StoredProcedure);
// r contains your results
}
The parameter within the procedure gets populated by passing a DataTable, and that DataTable's structure must match the one of the table type you created.
If you really need to pass more that 2100 values, you may want to consider indexing your table type to increase performance. You can actually give it a primary key if you don't pass any duplicate keys, like this:
CREATE TYPE [dbo].[udtKeys] AS TABLE(
[i] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[i] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)
GO
You may also need to assign execute permissions for the type to the database user you execute this with, like so:
GRANT EXEC ON TYPE::[dbo].[udtKeys] TO [User]
GO
See also here and here.
For me, the best way I was able to come up with was turning the list into a comma separated list in C# then using string_split in SQL to insert the data into a temp table. There are probably upper limits to this, but in my case I was only dealing with 6,000 records and it worked really fast.
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
return db.Query<int>(
#" --Created a temp table to join to later. An index on this would probably be good too.
CREATE TABLE #tempAnimals (Id INT)
INSERT INTO #tempAnimals (ID)
SELECT value FROM string_split(#animalIdStrings)
SELECT at.animalTypeID
FROM dbo.animalTypes [at]
JOIN animals [a] ON a.animalTypeId = at.animalTypeId
JOIN #tempAnimals temp ON temp.ID = a.animalID -- <-- added this
JOIN edibleAnimals e ON e.animalID = a.animalID",
new { animalIdStrings = string.Join(",", animalIds) }).ToList();
}
}
It might be worth noting that string_split is only available in SQL Server 2016 or higher or if using Azure SQL then compatibility mode 130 or higher. https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15
I am attempting to use one single Update statement to update multiple records with different values (I'm not trying to update many rows to have the same values which is pretty straight forward). Here's what I'm trying right now:
using (var cn = GetOpenConnection()) {
// get items where we need to set calculated fields that will now be persisted in the DB
var items = cn.Query<MaintenanceItem>("select TOP 500 * from [Maintenance] where Tolerance IS NOT NULL");
foreach (var mi in maintItems)
{
// Set calculated fields on multiple recrods
logic.CalculateToleranceFields(mi, true);
}
var updateInput = items.Select(a => new {a.ToleranceMonths, a.ToleranceDays, a.ToleranceHours, a.ToleranceLandings, a.ToleranceCycles, a.ToleranceRIN }).ToList();
// THIS DOESN'T WORK - attempting to update multiple rows with different values
var numResults = cn.Execute(#"UPDATE rm
SET rm.ToleranceMonths=ur.ToleranceMonths,
rm.ToleranceDays=ur.ToleranceDays,
rm.ToleranceHours=ur.ToleranceHours,
rm.ToleranceLandings=ur.ToleranceLandings,
rm.ToleranceCycles=ur.ToleranceCycles,
rm.ToleranceRIN=ur.ToleranceRIN
from [RoutineItems] rm
Inner Join #UpdatedRecords ur ON rm.AircraftId=ur.AircraftId AND rm.ItemNumber=ur.ItemNumber", updateInput);
Assert.IsTrue(numResults == maintItems.Count());
}
Is this sort of bulk update possible with Dapper? I would rather do the update in bulk rather than using a for loop to push the data into the DB.
Looks like this isn't currently possible with one statement in Dapper. That's totally understandable when considering what would need done under the covers to accomplish this.
What I ended up doing was using 3 statements to create a temp table, populate it the data that needs updated, and then call an update with a inner join to my temp table:
cn.Execute(#"create table #routineUpdatedRecords
(
AircraftId int,
ItemNumber int,
ToleranceMonths int,
ToleranceDays int,
ToleranceLandings int,
ToleranceCycles decimal(12,2),
ToleranceRIN decimal(12,2),
ToleranceHours decimal(12,2)
);");
cn.Execute(#"Insert INTO #routineUpdatedRecords
VALUES(#AircraftId, #ItemNumber, #ToleranceMonths, #ToleranceDays,
#ToleranceLandings, #ToleranceCycles, #ToleranceRIN, #ToleranceHours)", updateInput);
var numResults = cn.Execute(#"UPDATE rm
SET rm.ToleranceMonths=ur.ToleranceMonths,
rm.ToleranceDays=ur.ToleranceDays,
rm.ToleranceHours=ur.ToleranceHours,
rm.ToleranceLandings=ur.ToleranceLandings,
rm.ToleranceCycles=ur.ToleranceCycles,
rm.ToleranceRIN=ur.ToleranceRIN
from [RoutineItems] rm
Inner Join #routineUpdatedRecords ur ON rm.AircraftId=ur.AircraftId AND rm.ItemNumber=ur.ItemNumber");
I believe this was faster than calling update in a loop since I was updating about 600K rows.
I know the thread is a little bit old. But instead of using temp table you can do like this. Gives a little bit nicer syntax.
string sql = #"UPDATE rm
SET rm.ToleranceMonths=#ToleranceMonths,
rm.ToleranceDays=#ToleranceDays,
rm.ToleranceHours=#ToleranceHours,
rm.ToleranceLandings=#ToleranceLandings,
rm.ToleranceCycles=#ToleranceCycles,
rm.ToleranceRIN=#ToleranceRIN
FROM [RoutineItems] rm
WHERE rm.AircraftId=#AircraftId AND rm.ItemNumber=#ItemNumber
";
var numResults = cn.Execute(sql, updateInput);