Entity-Framework multiple table query - c#

My code is taking about 3 seconds to execute for 60 employes which is horrible performance. I would like my code to run in about 0.5 seconds max. I have a method that require 5 tables in my database. Since you can only .include("AdjescentTable") in your queries, I have to make 3 queries, take their result and add them to my Employee.
var feuilleDeTemps = from fdt in context.FT.Include("FTJ") where
(fdt.ID_Employe == employe.ID_Employe) &&
(fdt.DateDepart <= date) &&
(fdt.DateFin >= date)
select fdt;
var horaireEmploye = from h in context.HR
where h.ID_Employe == employe.ID_Employe
select h;
var congeCedule = from cc in context.CC.Include("C")
where (cc.ID_Employe == employe.ID_Employe &&
cc.Date <= dateFin &&
cc.Date >= dateDebut)
select cc;
Employe.FeuilleDeTemps = feuilleDeTemps;
Employe.horaireEmploye = horaireEmploye;
Employe.congeCedule = congeCedule;
return Employe;
Its taking about 0.7 seconds per 60 execution of the 3 query above and my database doesn't have a lot of rows. For a set of theses 3 query I return 1 FT 7 FTJ, 5 HR, 0-5 CCand 0-5 C. There are about 300 rows in FT, 1.5k row in FTJ, 500 row in HR, 500 row in CC and 500 row in C.
Of course these aren't the real names but I made em shorter for clearer text.
I used DateTime.Now and TimeSpans to determine the time of each query. If I run the 3 queries directly on SQL Server they take about 300 milliseconds.
Here are my SQL queries:
Select e.ID_Employe, ft.*, ftj.* FROM Employe e
INNER JOIN FeuilleDeTemps ft
ON e.ID_Employe = ft.ID_Employe
INNER JOIN FeuilleDeTempsJournee ftj
ON ft.ID_FeuilleDeTemps = ftj.ID_FeuilleDeTemps
WHERE ft.DateDepart >= '2011-09-25 00:00:00.000' AND ft.DateFin <= '2011-10-01 23:59:59.000'
Select e.ID_Employe, hr.* FROM Employe e
INNER JOIN HoraireFixeEmployeParJour hr
ON hr.ID_Employe = e.ID_Employe
Select e.ID_Employe, cc.* FROM Employe e
INNER JOIN CongeCedule cc
ON cc.ID_Employe = e.ID_Employe
INNER JOIN Conge c
ON c.ID_Conge = cc.ID_Conge
We use WCF, Entity Framework and LINQ
Why is this taking so much time on Entity Framework and how can I improve it?

A bunch of questions with no answers:
Are you sure you need all of the fields you are selecting to do the work you are wanting? Are there any children that you could lazy load to reduce the number of up-front queries?
What happens if you run this code several times during a session? Does it increase in performance over time? If so, you may want to consider changing some of your queries to use Compiled Query so that EF doesn't need to repeatedly parse the expression tree into TSQL each time (note: With 4.2, this will be done for you automatically).
I assume you have profiled your application to make sure that no other queries are being run that you aren't expecting. Also, I expect that you have run the profile trace through the query analizer to make sure the appropriate indexes exist on your tables.

Related

GroupBy query is slow what are the other options?

I need a query to select rows with minimum Insert Date Time. but groupby query is very slow
what can I do instead of groupby?
How can improve the performance of the query?
Is there any other way to write this query using linq?
Number of rows in my table is around 600,000 rows and growing
fullsimnumber isn't indexed but isdeleted and insertdatetime are indexed.
fullsimnumber is a column which consist of three indexed coloumn prec+cod+subscribec.
my problem is with linq queries which always gives timeout exception.
I changed fullsimnumber to groupby prec,cod,subscribec which are indexed but still getting tiemout exception
Im using linq to EF, (code first style) and my query in sql is:
SELECT *
FROM dbo.Sim AS t1
JOIN (SELECT FullSimNumber,MIN(InsertDateTime) AS insd
FROM dbo.Sim
GROUP BY FullSimNumber) AS t2
ON t1.FullSimNumber = t2.FullSimNumber AND t1.InsertDateTime = t2.insd
WHERE t1.IsDeleted = 0
and my query in linq
from s in ADbContext.Numbers
where !s.IsDeleted
group s by s.FullSimNumber
into g
let sd = g.OrderBy(x => x.InsertDateTime).FirstOrDefault()
select sd;
without any data on indexes and amounts of rows and such it is hard to know what exactly is needed to make your query faster. One thing which is worthwhile to try out is this:
SELECT *
FROM dbo.Sim AS t1
CROSS APPLY (
SELECT TOP 1 FullSimNumber
, InsertDateTime AS insd
FROM dbo.Sim t2
WHERE t1.FullSimNumber = t2.FullSimNumber
AND t1.InsertDateTime = t2.insd
ORDER BY InsertDateTime DESC
) AS t2
WHERE t1.IsDeleted = 0
Again this could very well be worse! test it and compare the execution times and load.

Linq to Entity Paging With Large dataset too slow

I'm analyzing player data over millions of matches from an online game. I'm trying to page data into memory in chunks to reduce load times but using OrderBy with skip/take takes way too long (20+ minutes even for smaller queries).
This is my query:
var playerMatches = (from p in context.PlayerMatchEntities
join m in context.MatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
.OrderBy(p => p.MatchId)
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
MatchId is indexed.
Each match has 10 players, and I currently have 3.3 million matches w/ 33 million rows in the PlayerMatch table, but data is being collected constantly.
Is there a way to get around the large performance drop caused by OrderBy?
This post is similar but didn't seem to be resolved.
Edit:
This is the SQL query generated:
SELECT
`Project1`.`AccountId`,
`Project1`.`MatchId`,
`Project1`.`PlayerSlot`,
`Project1`.`HeroId`,
`Project1`.`Item_0`,
`Project1`.`Item_1`,
`Project1`.`Item_2`,
`Project1`.`Item_3`,
`Project1`.`Item_4`,
`Project1`.`Item_5`,
`Project1`.`Kills`,
`Project1`.`Deaths`,
`Project1`.`Assists`,
`Project1`.`LeaverStatus`,
`Project1`.`Gold`,
`Project1`.`GoldSpent`,
`Project1`.`LastHits`,
`Project1`.`Denies`,
`Project1`.`GoldPerMin`,
`Project1`.`XpPerMin`,
`Project1`.`Level`,
`Project1`.`HeroDamage`,
`Project1`.`TowerDamage`,
`Project1`.`HeroHealing`
FROM (SELECT
`Extent2`.`AccountId`,
`Extent2`.`MatchId`,
`Extent2`.`PlayerSlot`,
`Extent2`.`HeroId`,
`Extent2`.`Item_0`,
`Extent2`.`Item_1`,
`Extent2`.`Item_2`,
`Extent2`.`Item_3`,
`Extent2`.`Item_4`,
`Extent2`.`Item_5`,
`Extent2`.`Kills`,
`Extent2`.`Deaths`,
`Extent2`.`Assists`,
`Extent2`.`LeaverStatus`,
`Extent2`.`Gold`,
`Extent2`.`GoldSpent`,
`Extent2`.`LastHits`,
`Extent2`.`Denies`,
`Extent2`.`GoldPerMin`,
`Extent2`.`XpPerMin`,
`Extent2`.`Level`,
`Extent2`.`HeroDamage`,
`Extent2`.`TowerDamage`,
`Extent2`.`HeroHealing`
FROM `match` AS `Extent1` INNER JOIN `playermatch` AS `Extent2` ON `Extent1`.`MatchId` = `Extent2`.`MatchId`
WHERE ((`Extent1`.`GameMode`) = 2) AND ((`Extent1`.`LobbyType`) = 7)) AS `Project1`
ORDER BY
`Project1`.`MatchId` ASC LIMIT 0,1000
Another approach could be to have a VIEW that does the join and indexes the appropriate columns and then create a Table-Valued Function that uses the VIEW and returns a TABLE with only the page data.
You'll have to manually write the SQL query for the paging, but i think it would be faster.
I haven't tried something like that so i can't be sure there is gonna be a big speed boost.
You didn't include enough information to help you so I'll suggest.
One way to avoid order by is to store rows in a table already in the order. I suggest 'MatchId' is a primary key and a clustered index of MatchEntities. That means MatchEntities.MatchId is stored physically sorted. If you switch join streams to pull the sorted stream first and additive stream second you avoid expensive sorting.
Like this:
var playerMatches = (from m in context.MatchEntities // note the switch: MatchEntities goes first
join p in context.PlayerMatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
// .OrderBy(p => p.MatchId) // no need for this any more
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
Also see a query plan to find out how the query is executed by the database, what type of join is being used, etc. Maybe your original query does not exploit sorting at all.

LINQ Query takes too long

I am running this query :
List<RerocdByGeo> reports = (from p in db.SOMETABLE.ToList()
where (p.colID == prog && p.status != "some_string" && p.col_date < enddate && p.col_date > startdate)
group p by new {
country = (
(p.some_integer_that_represents_an_id <= -0) ? "unknown" : (from f in db.A_LAGE_TABLE where (f.ID == p.some_integer_that_represents_and_id) select f.COUNTRIES_TABLE.COU_Name).FirstOrDefault()),
p.status }
into g
select new TransRerocdByGeo
{
ColA = g.Sum(x => x.ColA),
ColB = g.Sum(x => x.ColB),
Percentage = (g.Sum(x => x.ColA) != null && g.Sum(x => x.ColA) != 0) ? (g.Sum(x => x.ColB) / g.Sum(x => x.ColA)) * 100 : 0,
Status = g.Key.status,
Country = g.Key.country
}).ToList();
a similar query in sql for the same database would run for a few seconds while this one takes about 30-60 seconds in the good case...
the table SOMETABLE contains abount 10-60 K rows
and the table called here A_LARGE_TABLE contains about 10-20 mill rows
the coulmn some_inteher_that_reoresents_an_id is the id on the large table but can also be 0 or -1 and than needs to get the "unknown" value, so i cannot make a relationship (or can i ? if so please explain)
the COUNTRIES_TABLE contains 100-200 rows.
the coulID and ID are identity columns ...
any suggestions ?
You're calling ToList on "SOMETABLE" right at the start. This is pulling the entire database table, with all rows and all columns, into memory and then performing all of the subsequent operations via Linq-to-objects on that in-memory data structure.
Not only are you suffering the penalty of transferring way more information across the network than you need to (which is slow), but C# isn't able to perform the operations nearly as efficiently as a database. That's partly because it looses access to any indexes, any database caching, any cached compiled queries, it isn't as efficient at dealing with data sets that large to begin with, and any higher level optimizations of the query itself (databases tend to do a lot of that).
Next, you have a query inside of your GroupBy clause from f in db.A_LAGE_TABLE where [...] that is performed for every row in the sequence. If the entire query is evaluated at the database level that would potentially be optimized, and even if it's not you're not going across the network to pass information (which is quite slow) for each record.
from p in db.SOMETABLE.ToList()
This basically says "get every record from SOMETABLE and put it in a List", without filtering at all. This is probably your problem.

How to fix super slow EF/LINQ query executing multiple SQL statements

I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.
Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};
You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/
The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id
It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

Categories