How to fix super slow EF/LINQ query executing multiple SQL statements

How to fix super slow EF/LINQ query executing multiple SQL statements - c#

I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.

Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.

Related

Why a query that is searching through 1 million related records takes considerable time even after applying indexes?

var recordsStates = from s in db.Record
join ss in db.Record_State
on s.Id equals ss.RecordId
join apct in db.RecordTypesContentMapping
on new { t1 = ss.RecordId } equals new { t1 = apct.ContentId } into ss_apct
from apct in ss_apct.DefaultIfEmpty()
where (apct.ContentSourceName == null || apct.ContentSourceName == nameof(Record))
&& ss.isCurrent && ss.StateId == (int)RecordStateType.Publish
&& !ss.isDeleated && !s.isDeleted
&& (searchRecords.CategoryIds.Count == 0 || searchRecords.CategoryIds.Contains((int)s.CategoryId))
&& (string.IsNullOrEmpty(searchRecords.RecordTitle) || (string.IsNullOrEmpty(s.RecordNameModified) ?
s.RecordName.Contains(searchRecords.RecordTitle) :
s.RecordNameModified.Contains(searchRecords.RecordTitle)))
Each table has around 1 million records.
It takes around 7-8 seconds if I send the RecordTitle empty and takes 4-5 seconds if not empty.
I tried applying NC indexes on the subject title and whatnot. It’s of type nvarchar(1000).
Every table is related via foreign key. I don’t know now what makes it slow.

The problem is probably Cartesian explosion
You either improve the query performance by linq to sql ( as you should ) or you can use Split query*.
Note that : there is a possibility that split query can cause dirty data problems , think your first sql query done but before second sql query starts there could be data change with another request thus second query will continue with old data .
Split query usage msdn
Cartesian explosion

How to improve AsEnumerable performance in EF

I work on vs2012 ef.
I have 1 to many mapping table structure in my edmx.
var query = (
from bm in this.Context.BilBillMasters.AsEnumerable ()
join g in
(
from c in this.Context.BilBillDetails.AsEnumerable ()
group c by new { c.BillID }
)
on bm.BillID equals (g == null ? 0 : g.Key.BillID) into bDG
from billDetailGroup in bDG.DefaultIfEmpty()
where bm.IsDeleted == false
&& (companyID == 0 || bm.CompanyID == companyID)
&& (userID == 0 || bm.CustomerID == userID)
select new
{
bm.BillID,
BillNo = bm.CustomCode,
bm.BillDate,
BillMonth = bm.MonthFrom,
TransactionTypeID = bm.TransactionTypeID ?? 0,
CustomerID = bm.CustomerID,
Total = billDetailGroup.Sum(p => p.Amount),//group result
bm.ReferenceID,
bm.ReferenceTypeID
}
);
This method is taking close 30 seconds to return back the result in the first run.
Not sure what is wrong.
I tried getting List of results and tried elementAt(0) that is also slow.

As soon as you use AsEnumerable, your query stops being a "queryable". That means that what you're doing is that you're downloading the whole BilBillMasters and BilBillDetails tables and then doing some processing on those in your application, rather than on the SQL server. This is bound to be slow.
The obvious solution is obvious - don't use AsEnumerable - it basically moves processing from the SQL server (which has all the data and indexes etc.) to your application server (which has neither and has to get the data from the DB server; all of the data).
At the very least, you want to limit the amount of data downloaded as much as possible, ie. for example filter the tables by CompanyID and CustomerID before using AsEnumerable. However, overall, I see no reason why the query couldn't be executed completely on the SQL server - this is usually the preferred solution for many reasons.
Overall, it sounds as if you're using the AsEnumerable as a fix to another problem, but it's almost definitely a bad solution - at least without further filtering of the data before using AsEnumerable.

LINQ Query takes too long

I am running this query :
List<RerocdByGeo> reports = (from p in db.SOMETABLE.ToList()
where (p.colID == prog && p.status != "some_string" && p.col_date < enddate && p.col_date > startdate)
group p by new {
country = (
(p.some_integer_that_represents_an_id <= -0) ? "unknown" : (from f in db.A_LAGE_TABLE where (f.ID == p.some_integer_that_represents_and_id) select f.COUNTRIES_TABLE.COU_Name).FirstOrDefault()),
p.status }
into g
select new TransRerocdByGeo
{
ColA = g.Sum(x => x.ColA),
ColB = g.Sum(x => x.ColB),
Percentage = (g.Sum(x => x.ColA) != null && g.Sum(x => x.ColA) != 0) ? (g.Sum(x => x.ColB) / g.Sum(x => x.ColA)) * 100 : 0,
Status = g.Key.status,
Country = g.Key.country
}).ToList();
a similar query in sql for the same database would run for a few seconds while this one takes about 30-60 seconds in the good case...
the table SOMETABLE contains abount 10-60 K rows
and the table called here A_LARGE_TABLE contains about 10-20 mill rows
the coulmn some_inteher_that_reoresents_an_id is the id on the large table but can also be 0 or -1 and than needs to get the "unknown" value, so i cannot make a relationship (or can i ? if so please explain)
the COUNTRIES_TABLE contains 100-200 rows.
the coulID and ID are identity columns ...
any suggestions ?

You're calling ToList on "SOMETABLE" right at the start. This is pulling the entire database table, with all rows and all columns, into memory and then performing all of the subsequent operations via Linq-to-objects on that in-memory data structure.
Not only are you suffering the penalty of transferring way more information across the network than you need to (which is slow), but C# isn't able to perform the operations nearly as efficiently as a database. That's partly because it looses access to any indexes, any database caching, any cached compiled queries, it isn't as efficient at dealing with data sets that large to begin with, and any higher level optimizations of the query itself (databases tend to do a lot of that).
Next, you have a query inside of your GroupBy clause from f in db.A_LAGE_TABLE where [...] that is performed for every row in the sequence. If the entire query is evaluated at the database level that would potentially be optimized, and even if it's not you're not going across the network to pass information (which is quite slow) for each record.

from p in db.SOMETABLE.ToList()
This basically says "get every record from SOMETABLE and put it in a List", without filtering at all. This is probably your problem.

How can I fix the performance issue of this linq query?

The below C# code executes in 3 seconds. I listed the SQL Profiler output as well. If I change the statement to not use Dynamic SQL it executes in milliseconds. I can't find any good resources to give a solution to this problem. But I was able to find an article that explaine that in Dynamic SQL since the parser doesn't know the value of the parameters, it cannot optimize the query plan.
public string GetIncorporation(Parcel parcel)
{
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select c).ToList();
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
MAX([Filter1].[A1]) AS [A1]
FROM ( SELECT
SUBSTRING([Extent1].[DISTRICT_CD], 0 + 1, 2) + N''00'' AS [A1]
FROM [STAGE].[DISTRICT] AS [Extent1]
WHERE ([Extent1].[PARCEL_ID] = #p__linq__0) AND ([Extent1].[DB_YEAR] = #p__linq__1) AND ([Extent1].[DISTRICT_CD] < N''9000'')
) AS [Filter1]
) AS [GroupBy1]',N'#p__linq__0 nvarchar(4000),#p__linq__1 int',#p__linq__0=N'0001-02-0003',#p__linq__1=2012
I'm trying to build a service layer. I don't want to have a mixed batch of Stored Procedures and Linq Queries

Did you paste that query in SSMS, run the execution plan, and see if it suggestions any missing indexes?
Also, if you don't need all the columns from the table, limit them by using a select:
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select c.Parcel_ID).ToList();
or
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select new { c.Parcel_ID, c.column2, c.column3}).ToList();

The LINQ looks fine, have you got the correct indexes?
In the query from SSMS you've pasted, it's not doing any limiting on DISTRICT_CD, so make sure that is actually the query that is running.

Your performance problem is in the 'CompareTo' part. This function can not be translated to regular SQL, so the Entity framework will first materialize all objects matching the first 2 conditions (fetched with pure SQL). After this (whitch takes some time as you can see), the third condition is matched in memory. Avoid the CompareTo method in your linq query, and your problems will go away.

Entity-Framework multiple table query

My code is taking about 3 seconds to execute for 60 employes which is horrible performance. I would like my code to run in about 0.5 seconds max. I have a method that require 5 tables in my database. Since you can only .include("AdjescentTable") in your queries, I have to make 3 queries, take their result and add them to my Employee.
var feuilleDeTemps = from fdt in context.FT.Include("FTJ") where
(fdt.ID_Employe == employe.ID_Employe) &&
(fdt.DateDepart <= date) &&
(fdt.DateFin >= date)
select fdt;
var horaireEmploye = from h in context.HR
where h.ID_Employe == employe.ID_Employe
select h;
var congeCedule = from cc in context.CC.Include("C")
where (cc.ID_Employe == employe.ID_Employe &&
cc.Date <= dateFin &&
cc.Date >= dateDebut)
select cc;
Employe.FeuilleDeTemps = feuilleDeTemps;
Employe.horaireEmploye = horaireEmploye;
Employe.congeCedule = congeCedule;
return Employe;
Its taking about 0.7 seconds per 60 execution of the 3 query above and my database doesn't have a lot of rows. For a set of theses 3 query I return 1 FT 7 FTJ, 5 HR, 0-5 CCand 0-5 C. There are about 300 rows in FT, 1.5k row in FTJ, 500 row in HR, 500 row in CC and 500 row in C.
Of course these aren't the real names but I made em shorter for clearer text.
I used DateTime.Now and TimeSpans to determine the time of each query. If I run the 3 queries directly on SQL Server they take about 300 milliseconds.
Here are my SQL queries:
Select e.ID_Employe, ft.*, ftj.* FROM Employe e
INNER JOIN FeuilleDeTemps ft
ON e.ID_Employe = ft.ID_Employe
INNER JOIN FeuilleDeTempsJournee ftj
ON ft.ID_FeuilleDeTemps = ftj.ID_FeuilleDeTemps
WHERE ft.DateDepart >= '2011-09-25 00:00:00.000' AND ft.DateFin <= '2011-10-01 23:59:59.000'
Select e.ID_Employe, hr.* FROM Employe e
INNER JOIN HoraireFixeEmployeParJour hr
ON hr.ID_Employe = e.ID_Employe
Select e.ID_Employe, cc.* FROM Employe e
INNER JOIN CongeCedule cc
ON cc.ID_Employe = e.ID_Employe
INNER JOIN Conge c
ON c.ID_Conge = cc.ID_Conge
We use WCF, Entity Framework and LINQ
Why is this taking so much time on Entity Framework and how can I improve it?

A bunch of questions with no answers:
Are you sure you need all of the fields you are selecting to do the work you are wanting? Are there any children that you could lazy load to reduce the number of up-front queries?
What happens if you run this code several times during a session? Does it increase in performance over time? If so, you may want to consider changing some of your queries to use Compiled Query so that EF doesn't need to repeatedly parse the expression tree into TSQL each time (note: With 4.2, this will be done for you automatically).
I assume you have profiled your application to make sure that no other queries are being run that you aren't expecting. Also, I expect that you have run the profile trace through the query analizer to make sure the appropriate indexes exist on your tables.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to fix super slow EF/LINQ query executing multiple SQL statements - c#

Related

Why a query that is searching through 1 million related records takes considerable time even after applying indexes?

How to improve AsEnumerable performance in EF

LINQ Query takes too long

How can I fix the performance issue of this linq query?

Entity-Framework multiple table query

Categories

Resources