This question already has answers here:
Why Entity Framework performs faster than Dapper in direct select statement [closed]
(4 answers)
Closed 7 months ago.
I read that Dapper is faster than EF (at least at retrieving data) and I want to confirm that so I am comparing Dapper and EntityFramework with the help of BenchmarkDotNet.
So I tried this...
[Benchmark]
public Player EntityFramework_GetByName()
{
using (ApplicationDbContext context = new())
{
return context.Players.FirstOrDefault(x => x.FirstName == _name);
}
}
[Benchmark]
public Player Dapper_GetByName()
{
using (SqlConnection conn = new(Database.ConnectionString))
{
return conn.QueryFirstOrDefault<Player>($"SELECT * FROM Players WHERE FirstName = '{_name}'");
}
}
But the result are not what I expecting...
Then I read here about the column type "problem" and how that can affect the performance, so I change the type of the column to NVarchar with max length of 100 and my code for the Dapper to this
[Benchmark]
public Player Dapper_GetByName()
{
using (SqlConnection conn = new(Database.ConnectionString))
{
return conn.QueryFirstOrDefault<Player>($"SELECT * FROM Players WHERE FirstName = #name", new
{ #name = new DbString { Value = _name, IsAnsi = false } });
}
}
The results of the benchmark tests are the following..
Method
Mean
Error
StdDev
Allocated
Dapper_GetByName
41,092.8 us
1,400.39 us
4,085.0 us
4 KB
EntityFramework_GetByName
2,971.6 us
305.43 us
895.8 us
110 KB
The difference is very big. Is there a way to improve this?
Uhm, maybe you should not compare
// Open and Close a completely new database connection
using (SqlConnection conn = new(Database.ConnectionString))
vs
// Create a new Unit of Work / Transaction
using (ApplicationDbContext context = new())
Benchmark only the inner part:
return conn.QueryFirstOrDefault<Player>($"SELECT * FROM Players WHERE FirstName = '{_name}'");
I think this example shows very clearly the responsibility of SQL query generation when using Dapper, CA.Blocks.DataAccess or ADO.NET directly. When using these packages for accessing the database the developer is entirely in charge of the SQL query, its projection and execution. When using EF the responsibility of generating the query is removed from the developer and delegated to EF. This is a double-edged sword and can result in good queries as well as very bad queries. Most of the performance gains made in Dapper are from having full control over the SQL and eliminating bad SQL generation. The converse is also true, most of the performance problems with Dapper when compared to EF are due to EF creating a better query.
So what is happening here. In simple terms EF has looked at the request and has knowledge that you only what the first record FirstOrDefault so its query generation has resulted in
SELECT TOP 1 * FROM … WHERE…
The Dapper query you are making the comparison with is
SELECT * FROM … WHERE …
So the difference I suspect is purely on SQL. The Test database used, probably has many records in the Person table. Given the numbers it is likely that there is no index on name resulting in a Table Scan to find the matching data.
In the query generated by EF the database can stop the execute as soon as it finds the first record, in the Dapper example the database assembles the full record set with all the matches based on name then sends that row-set. Dapper is simply reading the first row and closing the connection.
To make this a fair comparison you need to change the query to be top 1. Like
[Benchmark]
public Player Dapper_GetByName()
{
using (SqlConnection conn = new(Database.ConnectionString))
{
return conn.QueryFirstOrDefault<Player>($"SELECT Top 1 * FROM Players WHERE FirstName = #name", new
{ #name = new DbString { Value = _name, IsAnsi = false } });
}
}
Also, the decision to go with Dapper for performance means you need to get to know and love SQL.
Related
Without writing an entire foreach loop is there a way to do a Update/Set in LINQ to Entities?
Using EF 6.x
Simple update query:
UPDATE stop_detail
SET cap_unique_id = b.Delivery_Location_Id
FROM order_detail b
WHERE Stop_Detail.CAP_Unique_Id IS NULL AND ((b.customer_id = 20 OR b.customer_id = 291) AND b.id = stop_detail.order_detail_id AND stop_type = 1)
all the context name are the same.
I normally end up writing about 30 lines of C# code to do this and I know there has to be a better way!
Whether you can and whether you should are two different things.
Here's how you can.
Example from EF6 Raw SQL Queries
using (var context = new BloggingContext())
{
context.Database.ExecuteSqlCommand(
"UPDATE dbo.Blogs SET Name = 'Another Name' WHERE BlogId = 1");
}
Hint: you probably shouldn't
I have a C# based api and I send queries to a mysql server. I wonder how can i read the id from a select to a table on C# Note that I am using MySql.Data.MySqlClient;
My code until the execute is this one below. But in this step I wonder how can I retrieve the desired id. I used ExecuteNotQuery but it seems it does not fit on what I need.
string connectionString = #"server=x.x.x.x;userid=xxxx;password=xxxxxx;database=testdatabase";
string getLastStoryIdQuery = "SELECT MAX(ID) FROM testdatabase.test";
MySqlCommand getLastTestIdCommand = new MySqlCommand(getLastStoryIdQuery, mySqlConnection);
int lastId = getLastStoryIdCommand.ExecuteNonQuery();
How can I retrieve the result as an Integer or in worst case as a string response? Thank you in advance. :)
int lastId = Convert.ToInt32(getLastStoryIdCommand.ExecuteScalar());
You can find the documentation on MySqlCommand here: https://dev.mysql.com/doc/dev/connector-net/8.0/html/T_MySql_Data_MySqlClient_MySqlCommand.htm
The method ExecuteNonQuery returns the number affected by the query, while ExecuteScalar returns the first column of the first row. You can also use ExecuteReader to get a datareader so that you can read a resultset the database produces.
In practice, I rarely use DbCommand/DbReader anymore and prefer to just use Dapper for database access in most cases where performance isn't absolutely critical. It simplifies parameter creation, and object filling which serves the vast majority of my use cases.
Dapper would look like this:
string connectionString = #"server=x.x.x.x;userid=xxxx;password=xxxxxx;database=testdatabase";
string getLastStoryIdQuery = "SELECT MAX(ID) FROM testdatabase.test";
int lastId;
using(var conn = new MySqlConnection(connectionString))
{
lastId=conn.Query<int>(getLastStoryIdQuery).First();
// you can also do the following in this instance, but you will use the
// above for results that return multiple rows or multiple columns
//lastId=conn.ExecuteScalar<int>(getLastStoryIdQuery);
// Here is how you use parameters:
// var something = conn.ExecuteScalar<int>("SELECT id FROM testdatabase.test WHERE id=#param",new {param = 10});
// This gets multiple columns and rows into a List<person> (assuming you have a person class with fname,lname,dob properties):
// var people = conn.Query<person>("SELECT fname,lname,dob FROM persons WHERE dob>#start", new {start=new DateTime(2000,1,1)}).ToList();
}
Recent bug report states that a method being called is crashing the service causing it to restart. After troubleshooting, the cause was found to be an obnoxious Oracle SQL call with thousands of strings passed. There is a collection of strings being passed to a method from an external service which often is more than 10,000 records. The original code used a where clause on the passed collection using the LIKE keyword, which I think is really, really bad.
public IList<ContainerState> GetContainerStates(IList<string> containerNumbers)
{
string sql =
String.Format(#"Select CTNR_NO, CNTR_STATE FROM CONTAINERS WHERE CTRN_SEQ = 0 AND ({0})",
string.Join("OR", containerNumbers
.Select(item => string.Concat(" cntr_no LIKE '", item.SliceLeft(10), "%' ")))
);
return DataBase.SelectQuery(sql, MapRecordToContainerState, new { }).ToList();
}
Clarification of in house methods used which may be confusing:
DataBase.SelectQuery is an internal library method using generics which gets passed the sql string, a function to map the records to .NET objects, and the parameters being passed and returns an IEnumerable of Objects of type retuned by the Mapping function.
SliceLeft is an extension method from another internal helper library that just returns the first part of a string up to the number of characters specified by the parameter.
The reason that the LIKE statement was apparently used, is that the strings being passed and the strings in the database only are guaranteed to match the first 10 characters. Example ("XXXX000000-1" in the strings being passed should match a database record like "XXXX000000-8").
I believed that the IN clause using the SUBSTR would be more efficent than using multiple LIKE clauses and replaced the code with:
public IList<ContainerRecord> GetContainerStates(IList<string> containerNumbers)
{
string sql =
String.Format(#"Select CTNR_NO, CNTR_STATE FROM CONTAINERS WHERE CTRN_SEQ = 0 AND ({0})",
string.Format("SUBSTR(CNTR_NO, 1, 10) IN ({0}) ",
string.Join(",", containerNumbers.Select(item => string.Format("\'{0}\'", item.SliceLeft(10) ) ) )
)
);
return DataBase.SelectQuery(sql, MapRecordToContainerState, new { }).ToList();
}
This helped slightly, and there were fewer issues in my tests, but when there are huge amounts of records passed, there is still an exception thrown and core dumps occur, as the SQL is longer than the server can parse during these times. The DBA suggests saving all the strings being passed to a temporary table, and then joining against that temp table.
Given that advice, I changed the function to:
public IList<ContainerRecord> GetContainerStates(IList<string> containerNumbers)
{
string sql =
#"
CREATE TABLE T1(cntr_num VARCHAR2(10));
DECLARE GLOBAL TEMPORARY TABLE SESSION.T1 NOT LOGGED;
INSERT INTO SESSION.T1 VALUES (:containerNumbers);
SELECT
DISTINCT cntr_no,
'_IT' cntr_state
FROM
tb_master
WHERE
cntr_seq = 0
AND cntr_state IN ({0})
AND adjustment <> :adjustment
AND SUBSTR(CTNR_NO, 1, 10) IN (SELECT CNTR_NUM FROM SESSION.T1);
";
var parameters = new
{
#containerNumbers = containerNumbers.Select( item => item.SliceLeft(10)).ToList()
};
return DataBase.SelectQuery(sql, MapRecordToContainerState, parameters).ToList();
}
Now I'm getting a "ORA-00900: invalid SQL statement". This is really frustrating, how can I properly write a SQL Statement that will put this list of strings into a temporary table and then use it in a SELECT Statement to return the list I need?
There are couple possible places could cause this error, it seams that the "DECLARE GLOBAL TEMPORARY" is a JAVA API, I don't think .net has this function. Please try "Create global temporary table" instead. And, I don't know whether your internal API could handle multiple SQLs in one select sql. As far as I know, ODP.net Command class can only execute one sql per call. Moreover, "create table" is a DDL, it therefore has its own transaction. I can't see any reason we should put them in the same sql to execute. Following is a sample code for ODP.net,
using (OracleConnection conn = new OracleConnection(BD_CONN_STRING))
{
conn.Open();
using (OracleCommand cmd = new OracleCommand("create global temporary table t1(id number(9))", conn))
{
// actually this should execute once only
cmd.ExecuteNonQuery();
}
using (OracleCommand cmd = new OracleCommand("insert into t1 values (1)", conn)) {
cmd.ExecuteNonQuery();
}
// customer table is a permenant table
using (OracleCommand cmd = new OracleCommand("select c.id from customer c, t1 tmp1 where c.id=tmp1.id", conn)) {
cmd.ExecuteNonQuery();
}
}
I'm really struggling to iron out this issue. When I use the following code to update my database for large numbers of records it runs extremely slow. I've got 500,000 records to update which takes nearly an hour. During this operation, the journal file grows slowly with little change on the main SQLite db3 file - is this normal?
The operation only seems to be a problem when I have large numbers or records to update - it runs virtually instantly on smaller numbers of records.
Some other operations are performed on the database prior to this code running so could they be some how tying up the database? I've tried to ensure that all other connections are closed properly.
Thanks for any suggestions
using (SQLiteConnection sqLiteConnection = new SQLiteConnection("Data Source=" + _case.DatabasePath))
{
sqLiteConnection.Open();
using (SQLiteCommand sqLiteCommand = new SQLiteCommand("begin", sqLiteConnection))
{
sqLiteCommand.ExecuteNonQuery();
sqLiteCommand.CommandText = "UPDATE CaseFiles SET areaPk = #areaPk, KnownareaPk = #knownareaPk WHERE mhash = #mhash";
var pcatpk = sqLiteCommand.CreateParameter();
var pknowncatpk = sqLiteCommand.CreateParameter();
var pmhash = sqLiteCommand.CreateParameter();
pcatpk.ParameterName = "#areaPk";
pknowncatpk.ParameterName = "#knownareaPk";
pmhash.ParameterName = "#mhash";
sqLiteCommand.Parameters.Add(pcatpk);
sqLiteCommand.Parameters.Add(pknowncatpk);
sqLiteCommand.Parameters.Add(pmhash);
foreach (CatItem CatItem in _knownFiless)
{
if (CatItem.FromMasterHashes == true)
{
pcatpk.Value = CatItem.areaPk;
pknowncatpk.Value = CatItem.areaPk;
pmhash.Value = CatItem.mhash;
}
else
{
pcatpk.Value = CatItem.areaPk;
pknowncatpk.Value = null;
pmhash.Value = CatItem.mhash;
}
sqLiteCommand.ExecuteNonQuery();
}
sqLiteCommand.CommandText = "end";
sqLiteCommand.ExecuteNonQuery();
sqLiteCommand.Dispose();
sqLiteConnection.Close();
}
sqLiteConnection.Close();
}
The first thing to ensure that you have an index on mhash.
Group commands into batches.
Use more than one thread.
Or [inserted]
Bulk import the records to a temporary table. Create an index on the mhash column. Perform a single update statement to update the records.
You need to wrap everything inside a transaction otherwise I believe SQLite will create and commit one for you for every update ... hence the slowness. You clearly know that looking at your code but I am not sure using "Begin" and "End" commands achieve the same result here, you might end up with empty transaction at start and finish instead of one wrapping everything. Try something like this instead just in case:
using (SQLiteTransaction mytransaction = myconnection.BeginTransaction())
{
using (SQLiteCommand mycommand = new SQLiteCommand(myconnection))
{
SQLiteParameter myparam = new SQLiteParameter();
mycommand.CommandText = "YOUR QUERY HERE";
mycommand.Parameters.Add(myparam);
foreach (CatItem CatItem in _knownFiless)
{
...
mycommand.ExecuteNonQuery();
}
}
mytransaction.Commit();
}
This part is most certainly your problem.
foreach (CatItem CatItem in _knownFiless)
{
....
sqLiteCommand.ExecuteNonQuery();
}
You are looping a List(?) and executing a query against the database. That is not a good way to do it. Because database calls are quite expensive. So you might consider using another way of updating these items.
The SQL code appears to be okay. The C# code is not wrong, but it has some redundancy (explicit close/dispose is not needed since you're using a using already).
There is a for loop on _knownFiless (intended with double s?), could that run slowly possibly? It is unusual to run a query in a for loop against the DB, rather you should create a query with the respective set of parameters. Consider that (especially without an index on the hash) you will perform n * m operations (n being the run count of the for loop, m being the table size).
Considering that m is around 500k, and assuming that m = n you will get 250,000,000,000 operations. That may well last an hour.
Former connections or operations should have no effect as far as I know.
You should also ensure that the internal structure of the database is not causing problems. Is there a compound index that is affected from this operation? Any foreign keys / complex contraints?
Code like this:
var compIds = from p in packinglist.List
select p.ComponentId;
var components = from c in context.Components
where compIds.Contains(c.Id)
select c;
foreach (var item in components)
{
item.CurrentSiteId = packinglist.DestinationId;
}
context.SaveChanges();
Ends up issuing lots of SQL Statements like
update [dbo].[Components] set [CurrentSiteId] = #0 where ([Id] = #1)
Is there a way to instruct EF (Code First) to issue the following statement:
update [dbo].[Components] set [CurrentSiteId] = #0 where ([Id] in (....))
Or should I look into using the one of the SQLQuery methods available, or a seperate tool like Dapper or massive or ...?
There is not currently a way to perform bulk updates in EF 4 out of the box. There are some very long, complicated work arounds that end up generating SQL though. I suggest using a stored procedure or T-SQL. Here's a quick T-SQL snippet that I've used in the past:
using (var context = new YourEntities())
{
context.ExecuteStoreCommand(
#"UPDATE Components SET CurrentSiteId = 1 WHERE ID IN(1,2,3,4)");
}
The simplest answer for this is just to write that query and use DbContext.SQLQuery() to run it. As mentioned, there's no way to do this in EF itself.