LINQ Query takes too long - c#

I am running this query :
List<RerocdByGeo> reports = (from p in db.SOMETABLE.ToList()
where (p.colID == prog && p.status != "some_string" && p.col_date < enddate && p.col_date > startdate)
group p by new {
country = (
(p.some_integer_that_represents_an_id <= -0) ? "unknown" : (from f in db.A_LAGE_TABLE where (f.ID == p.some_integer_that_represents_and_id) select f.COUNTRIES_TABLE.COU_Name).FirstOrDefault()),
p.status }
into g
select new TransRerocdByGeo
{
ColA = g.Sum(x => x.ColA),
ColB = g.Sum(x => x.ColB),
Percentage = (g.Sum(x => x.ColA) != null && g.Sum(x => x.ColA) != 0) ? (g.Sum(x => x.ColB) / g.Sum(x => x.ColA)) * 100 : 0,
Status = g.Key.status,
Country = g.Key.country
}).ToList();
a similar query in sql for the same database would run for a few seconds while this one takes about 30-60 seconds in the good case...
the table SOMETABLE contains abount 10-60 K rows
and the table called here A_LARGE_TABLE contains about 10-20 mill rows
the coulmn some_inteher_that_reoresents_an_id is the id on the large table but can also be 0 or -1 and than needs to get the "unknown" value, so i cannot make a relationship (or can i ? if so please explain)
the COUNTRIES_TABLE contains 100-200 rows.
the coulID and ID are identity columns ...
any suggestions ?

You're calling ToList on "SOMETABLE" right at the start. This is pulling the entire database table, with all rows and all columns, into memory and then performing all of the subsequent operations via Linq-to-objects on that in-memory data structure.
Not only are you suffering the penalty of transferring way more information across the network than you need to (which is slow), but C# isn't able to perform the operations nearly as efficiently as a database. That's partly because it looses access to any indexes, any database caching, any cached compiled queries, it isn't as efficient at dealing with data sets that large to begin with, and any higher level optimizations of the query itself (databases tend to do a lot of that).
Next, you have a query inside of your GroupBy clause from f in db.A_LAGE_TABLE where [...] that is performed for every row in the sequence. If the entire query is evaluated at the database level that would potentially be optimized, and even if it's not you're not going across the network to pass information (which is quite slow) for each record.

from p in db.SOMETABLE.ToList()
This basically says "get every record from SOMETABLE and put it in a List", without filtering at all. This is probably your problem.

Related

Why a query that is searching through 1 million related records takes considerable time even after applying indexes?

var recordsStates = from s in db.Record
join ss in db.Record_State
on s.Id equals ss.RecordId
join apct in db.RecordTypesContentMapping
on new { t1 = ss.RecordId } equals new { t1 = apct.ContentId } into ss_apct
from apct in ss_apct.DefaultIfEmpty()
where (apct.ContentSourceName == null || apct.ContentSourceName == nameof(Record))
&& ss.isCurrent && ss.StateId == (int)RecordStateType.Publish
&& !ss.isDeleated && !s.isDeleted
&& (searchRecords.CategoryIds.Count == 0 || searchRecords.CategoryIds.Contains((int)s.CategoryId))
&& (string.IsNullOrEmpty(searchRecords.RecordTitle) || (string.IsNullOrEmpty(s.RecordNameModified) ?
s.RecordName.Contains(searchRecords.RecordTitle) :
s.RecordNameModified.Contains(searchRecords.RecordTitle)))
Each table has around 1 million records.
It takes around 7-8 seconds if I send the RecordTitle empty and takes 4-5 seconds if not empty.
I tried applying NC indexes on the subject title and whatnot. It’s of type nvarchar(1000).
Every table is related via foreign key. I don’t know now what makes it slow.
The problem is probably Cartesian explosion
You either improve the query performance by linq to sql ( as you should ) or you can use Split query*.
Note that : there is a possibility that split query can cause dirty data problems , think your first sql query done but before second sql query starts there could be data change with another request thus second query will continue with old data .
Split query usage msdn
Cartesian explosion

How to improve AsEnumerable performance in EF

I work on vs2012 ef.
I have 1 to many mapping table structure in my edmx.
var query = (
from bm in this.Context.BilBillMasters.AsEnumerable ()
join g in
(
from c in this.Context.BilBillDetails.AsEnumerable ()
group c by new { c.BillID }
)
on bm.BillID equals (g == null ? 0 : g.Key.BillID) into bDG
from billDetailGroup in bDG.DefaultIfEmpty()
where bm.IsDeleted == false
&& (companyID == 0 || bm.CompanyID == companyID)
&& (userID == 0 || bm.CustomerID == userID)
select new
{
bm.BillID,
BillNo = bm.CustomCode,
bm.BillDate,
BillMonth = bm.MonthFrom,
TransactionTypeID = bm.TransactionTypeID ?? 0,
CustomerID = bm.CustomerID,
Total = billDetailGroup.Sum(p => p.Amount),//group result
bm.ReferenceID,
bm.ReferenceTypeID
}
);
This method is taking close 30 seconds to return back the result in the first run.
Not sure what is wrong.
I tried getting List of results and tried elementAt(0) that is also slow.
As soon as you use AsEnumerable, your query stops being a "queryable". That means that what you're doing is that you're downloading the whole BilBillMasters and BilBillDetails tables and then doing some processing on those in your application, rather than on the SQL server. This is bound to be slow.
The obvious solution is obvious - don't use AsEnumerable - it basically moves processing from the SQL server (which has all the data and indexes etc.) to your application server (which has neither and has to get the data from the DB server; all of the data).
At the very least, you want to limit the amount of data downloaded as much as possible, ie. for example filter the tables by CompanyID and CustomerID before using AsEnumerable. However, overall, I see no reason why the query couldn't be executed completely on the SQL server - this is usually the preferred solution for many reasons.
Overall, it sounds as if you're using the AsEnumerable as a fix to another problem, but it's almost definitely a bad solution - at least without further filtering of the data before using AsEnumerable.

Linq to Entity Paging With Large dataset too slow

I'm analyzing player data over millions of matches from an online game. I'm trying to page data into memory in chunks to reduce load times but using OrderBy with skip/take takes way too long (20+ minutes even for smaller queries).
This is my query:
var playerMatches = (from p in context.PlayerMatchEntities
join m in context.MatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
.OrderBy(p => p.MatchId)
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
MatchId is indexed.
Each match has 10 players, and I currently have 3.3 million matches w/ 33 million rows in the PlayerMatch table, but data is being collected constantly.
Is there a way to get around the large performance drop caused by OrderBy?
This post is similar but didn't seem to be resolved.
Edit:
This is the SQL query generated:
SELECT
`Project1`.`AccountId`,
`Project1`.`MatchId`,
`Project1`.`PlayerSlot`,
`Project1`.`HeroId`,
`Project1`.`Item_0`,
`Project1`.`Item_1`,
`Project1`.`Item_2`,
`Project1`.`Item_3`,
`Project1`.`Item_4`,
`Project1`.`Item_5`,
`Project1`.`Kills`,
`Project1`.`Deaths`,
`Project1`.`Assists`,
`Project1`.`LeaverStatus`,
`Project1`.`Gold`,
`Project1`.`GoldSpent`,
`Project1`.`LastHits`,
`Project1`.`Denies`,
`Project1`.`GoldPerMin`,
`Project1`.`XpPerMin`,
`Project1`.`Level`,
`Project1`.`HeroDamage`,
`Project1`.`TowerDamage`,
`Project1`.`HeroHealing`
FROM (SELECT
`Extent2`.`AccountId`,
`Extent2`.`MatchId`,
`Extent2`.`PlayerSlot`,
`Extent2`.`HeroId`,
`Extent2`.`Item_0`,
`Extent2`.`Item_1`,
`Extent2`.`Item_2`,
`Extent2`.`Item_3`,
`Extent2`.`Item_4`,
`Extent2`.`Item_5`,
`Extent2`.`Kills`,
`Extent2`.`Deaths`,
`Extent2`.`Assists`,
`Extent2`.`LeaverStatus`,
`Extent2`.`Gold`,
`Extent2`.`GoldSpent`,
`Extent2`.`LastHits`,
`Extent2`.`Denies`,
`Extent2`.`GoldPerMin`,
`Extent2`.`XpPerMin`,
`Extent2`.`Level`,
`Extent2`.`HeroDamage`,
`Extent2`.`TowerDamage`,
`Extent2`.`HeroHealing`
FROM `match` AS `Extent1` INNER JOIN `playermatch` AS `Extent2` ON `Extent1`.`MatchId` = `Extent2`.`MatchId`
WHERE ((`Extent1`.`GameMode`) = 2) AND ((`Extent1`.`LobbyType`) = 7)) AS `Project1`
ORDER BY
`Project1`.`MatchId` ASC LIMIT 0,1000
Another approach could be to have a VIEW that does the join and indexes the appropriate columns and then create a Table-Valued Function that uses the VIEW and returns a TABLE with only the page data.
You'll have to manually write the SQL query for the paging, but i think it would be faster.
I haven't tried something like that so i can't be sure there is gonna be a big speed boost.
You didn't include enough information to help you so I'll suggest.
One way to avoid order by is to store rows in a table already in the order. I suggest 'MatchId' is a primary key and a clustered index of MatchEntities. That means MatchEntities.MatchId is stored physically sorted. If you switch join streams to pull the sorted stream first and additive stream second you avoid expensive sorting.
Like this:
var playerMatches = (from m in context.MatchEntities // note the switch: MatchEntities goes first
join p in context.PlayerMatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
// .OrderBy(p => p.MatchId) // no need for this any more
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
Also see a query plan to find out how the query is executed by the database, what type of join is being used, etc. Maybe your original query does not exploit sorting at all.

How to fix super slow EF/LINQ query executing multiple SQL statements

I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.
Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.

Entity-Framework multiple table query

My code is taking about 3 seconds to execute for 60 employes which is horrible performance. I would like my code to run in about 0.5 seconds max. I have a method that require 5 tables in my database. Since you can only .include("AdjescentTable") in your queries, I have to make 3 queries, take their result and add them to my Employee.
var feuilleDeTemps = from fdt in context.FT.Include("FTJ") where
(fdt.ID_Employe == employe.ID_Employe) &&
(fdt.DateDepart <= date) &&
(fdt.DateFin >= date)
select fdt;
var horaireEmploye = from h in context.HR
where h.ID_Employe == employe.ID_Employe
select h;
var congeCedule = from cc in context.CC.Include("C")
where (cc.ID_Employe == employe.ID_Employe &&
cc.Date <= dateFin &&
cc.Date >= dateDebut)
select cc;
Employe.FeuilleDeTemps = feuilleDeTemps;
Employe.horaireEmploye = horaireEmploye;
Employe.congeCedule = congeCedule;
return Employe;
Its taking about 0.7 seconds per 60 execution of the 3 query above and my database doesn't have a lot of rows. For a set of theses 3 query I return 1 FT 7 FTJ, 5 HR, 0-5 CCand 0-5 C. There are about 300 rows in FT, 1.5k row in FTJ, 500 row in HR, 500 row in CC and 500 row in C.
Of course these aren't the real names but I made em shorter for clearer text.
I used DateTime.Now and TimeSpans to determine the time of each query. If I run the 3 queries directly on SQL Server they take about 300 milliseconds.
Here are my SQL queries:
Select e.ID_Employe, ft.*, ftj.* FROM Employe e
INNER JOIN FeuilleDeTemps ft
ON e.ID_Employe = ft.ID_Employe
INNER JOIN FeuilleDeTempsJournee ftj
ON ft.ID_FeuilleDeTemps = ftj.ID_FeuilleDeTemps
WHERE ft.DateDepart >= '2011-09-25 00:00:00.000' AND ft.DateFin <= '2011-10-01 23:59:59.000'
Select e.ID_Employe, hr.* FROM Employe e
INNER JOIN HoraireFixeEmployeParJour hr
ON hr.ID_Employe = e.ID_Employe
Select e.ID_Employe, cc.* FROM Employe e
INNER JOIN CongeCedule cc
ON cc.ID_Employe = e.ID_Employe
INNER JOIN Conge c
ON c.ID_Conge = cc.ID_Conge
We use WCF, Entity Framework and LINQ
Why is this taking so much time on Entity Framework and how can I improve it?
A bunch of questions with no answers:
Are you sure you need all of the fields you are selecting to do the work you are wanting? Are there any children that you could lazy load to reduce the number of up-front queries?
What happens if you run this code several times during a session? Does it increase in performance over time? If so, you may want to consider changing some of your queries to use Compiled Query so that EF doesn't need to repeatedly parse the expression tree into TSQL each time (note: With 4.2, this will be done for you automatically).
I assume you have profiled your application to make sure that no other queries are being run that you aren't expecting. Also, I expect that you have run the profile trace through the query analizer to make sure the appropriate indexes exist on your tables.

Categories