Linq - how to get top records? - c#

I have this code which queries a database
var buildInfoList = (from m in context.BuildInfoes
where m.ManagerInfoGuid == managerGuid
select m).Take(3).ToList();
the code above gives me the first 3 results, how can i change it to take the last 3?
meaning if i have 100 rows in the database, i want to get 98, 99, 100 and not 1, 2, 3

Reverse the order of the query. The basic idea is reverse the order of the entire query, fetch the first three elements, then reverse the order again to put them back in the right order:
var query = from m in context.BuildInfoes
where m.ManagerInfoGuid == managerGuid
select m;
var lastItems = query.OrderByDescending(x => x.ID).Take(3).Reverse().ToList();
PS: If you were using Linq to Objects (but I guess you aren't) you could use TakeLast from morelinq.

Your are not introducing any order here, so you currently get any 3 results which by chance don't happen to be the ones you want. Establish an order:
var buildInfoList = (from m in context.BuildInfoes
where m.ManagerInfoGuid == managerGuid
orderby m.Name descending
select m).Take(3).ToList();
Using orderby you can specify ascending or descending to reverse the order, which will result in returning the first or last 3 elements using Take.

You can use orderby
var buildInfoList = (from m in context.BuildInfoes
where m.ManagerInfoGuid == managerGuid
orderby m.Id descending
select m).Take(3).ToList();
Or, as #MarkByers said, just use Reverse

var buildInfoList = from m in context.BuildInfoes
where m.ManagerInfoGuid == managerGuid
select m;
var count = buildInfoList.Count();
var list = buildInfoList.Skip(count < 3 ? count - 3 : 0).Take(3).ToList();
edit: Why is this solution different than the others? But this doesn't mean is the best one.
First the OP states that the query is over a database and since the query uses Take without specifying the order, I guess is about Linq To Sql.
This solution is not actually the best because it does two queries, one for the count and the other for to get the items. This solution uses only the SQL to get the last 3 items and doesn't do an order over objects.
While testing it with LINQ Pad I noticed that, when no order is specified, LINQ to SQL generates the order over all the columns
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[id], [t0].[A], [t0].[B], [t0].[C])
Obs.:
The Reverse method is not translated, so is good to be called after a ToList() call

Related

Linq return distinct values from table if not exist in another

I am trying to return all distinct rows from Staging below where Staging.CenterCode does not exist in Centers.CenterCode.
At the moment Stagings has around 850 distinct CenterCodes and Centers is empty so I should be getting all of the distinct rows, but count begs to differ :)
Any ideas?
var query =
(from s in db.Stagings
join t in db.Centers on s.CenterCode equals t.CenterCode into tj
from t in tj.DefaultIfEmpty()
where s.CenterCode != t.CenterCode
select s.CenterCode).Distinct();
var c = query.Count();
I only need the unique columns from staging so not sure if I actually need a join with the above as I am not ever using data returned from Centers - I have however tried both and get the same 0 value for count.
Any ideas?
I would not use a join, but use a Contains.
var centerCodesQuery = db.Centers.CenterCode
.Select(x => x.CenterCode);
var query = db.Staging
.Where(x => !centerCodesQuery.Contains(x.CenterCode))
.Select(x => x.CenterCode)
.Distinct();
var c = query.Count();
the join is an inner join. So, if none of the rows in 1 table match the other table on the specified identifier then it will return 0. In yours you are trying to join 1 table with 850 distinct rows with an empty table. This will return 0.
If you actually want to return only those rows in 1 table that aren't in another you can use Except:
var query = (from s in db.Stagings
select s.CenterCode)
.Except(from t in db.Centers
select t.CenterCode);
var c = query.Count();
Looks like you are trying to implement antijoin via left outer join, which is one of the possible ways, but in order to make it work, you need to change
where s.CenterCode != t.CenterCode
to
where t == null

Linq to SQL Slow Query

My ASP.Net application has the following Linq to SQL function to get a distinct list of height values from the product table.
public static List<string> getHeightList(string catID)
{
using (CategoriesClassesDataContext db = new CategoriesClassesDataContext())
{
var heightTable = (from p in db.Products
join cp in db.CatProducts on p.ProductID equals cp.ProductID
where p.Enabled == true && (p.CaseOnly == null || p.CaseOnly == false) && cp.CatID == catID
select new { Height = p.Height, sort = Convert.ToDecimal(p.Height.Replace("\"", "")) }).Distinct().OrderBy(s => s.sort);
List<string> heightList = new List<string>();
foreach (var s in heightTable)
{
heightList.Add(s.Height.ToString());
}
return heightList;
}
}
I ran Redgate SQL Monitor which shows that this query is using a lot of resources.
Redgate is also showing that I am running the following query:
select count(distinct [height]) from product p
join catproduct cp on p.productid = cp.productid
join cat c on cp.catid = c.catid
where p.enabled=1 and p.displayfilter = 1 and c.catid = 'C2-14'
My questions are:
A suggestion to change the function so that it uses less resources?
Also, how does linq to sql generate the above query from my function? (I did not write select count(distinct [height]) from product anywhere in the code)
There are 90,000 records in the products. This category which I am trying to get the distinct list of heights has 50,000 product records
Thank you in advance,
Nick
First of all your posted sql query and linq query doesn't match at all. it's not the LINQ query rather the underlying SQL query itself performing slow. Make sure, all the columns involved in JOIN ON clause and WHERE clause and ORDER BY clause are indexed properly in order to have a better execution plan; else you will end up getting a FULL Table Scan and a File Sort and query will deemed to perform slow.
The join multiplies the number of Products the query returns. To undo that, you apply Distinct at the end. It will certainly reduce db resources if you return unique Products right away:
var heightTable = (from p in db.Products
where p.CatProducts.Any(cp => cp.CatID == catID)
&& p.Enabled && (p.CaseOnly == null || !p.CaseOnly)
select new
{
Height = p.Height,
sort = Convert.ToDecimal(p.Height.Replace("\"", ""))
}).OrderBy(s => s.sort);
This changes the join into a where clause. It saves the db engine the trouble of deduplicating the result.
If that still performs poorly, you should try to do the conversion and ordering in memory, i.e. after receiving the raw results from the database.
As for the count. I don't know where it comes from. Such queries typically get generated by paging libraries such as PagedList, but I see no trace of that in your code.
Side note: you can return ...
heightList.Select(x => x.Height.ToString()).ToList()
... instead of creating the list yourself.

Entity Framework query ignoring my orderby

I have myself this SQL query
SELECT
db_accounts_last_contacts.id,
dbe_accounts_last_contacts.last_contact_date,
db_accounts_last_contacts.description,
db_accounts_last_contacts.follow_up_date,
db_accounts_last_contacts.spoke_to_person_id,
db_accounts_last_contacts.account_idFROM
db_accounts_last_contacts ,
db_companies
WHERE db_companies.id = db_accounts_last_contacts.account_id
ORDER BY db_accounts_last_contacts.last_contact_date DESC
Which returns my results ordered by last_contact_date.
Now I have my Entity framework query
var query = (from c in context.accounts_companies
select new AccountSearchResultModel()
{
LastContacted = (from calc in context.communique_accounts_last_contacts
where calc.account_id == companyId
orderby calc.last_contact_date descending
select calc.last_contact_date).FirstOrDefault()
});
However when I go ahead and do my ToList on it, my results are never ordered
Here is my table un-ordered
Here is my list ordered using the SQL query
Why isn't my entity framework query not picking up my orderby? Or if it is why am I always pulling out the first one?
You need to choose a Property to sort by and pass it as a lambda expression to OrderByDescending
like this:
.OrderByDescending(x => x.calc.last_contact_date);
I hope this helps.
Linq Orderby Descending Query
Sorry for the late answer,
What I had to do in the end was create a view and import it via the EDMX file and then use that to pull out my results.

Linq to Entity Paging With Large dataset too slow

I'm analyzing player data over millions of matches from an online game. I'm trying to page data into memory in chunks to reduce load times but using OrderBy with skip/take takes way too long (20+ minutes even for smaller queries).
This is my query:
var playerMatches = (from p in context.PlayerMatchEntities
join m in context.MatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
.OrderBy(p => p.MatchId)
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
MatchId is indexed.
Each match has 10 players, and I currently have 3.3 million matches w/ 33 million rows in the PlayerMatch table, but data is being collected constantly.
Is there a way to get around the large performance drop caused by OrderBy?
This post is similar but didn't seem to be resolved.
Edit:
This is the SQL query generated:
SELECT
`Project1`.`AccountId`,
`Project1`.`MatchId`,
`Project1`.`PlayerSlot`,
`Project1`.`HeroId`,
`Project1`.`Item_0`,
`Project1`.`Item_1`,
`Project1`.`Item_2`,
`Project1`.`Item_3`,
`Project1`.`Item_4`,
`Project1`.`Item_5`,
`Project1`.`Kills`,
`Project1`.`Deaths`,
`Project1`.`Assists`,
`Project1`.`LeaverStatus`,
`Project1`.`Gold`,
`Project1`.`GoldSpent`,
`Project1`.`LastHits`,
`Project1`.`Denies`,
`Project1`.`GoldPerMin`,
`Project1`.`XpPerMin`,
`Project1`.`Level`,
`Project1`.`HeroDamage`,
`Project1`.`TowerDamage`,
`Project1`.`HeroHealing`
FROM (SELECT
`Extent2`.`AccountId`,
`Extent2`.`MatchId`,
`Extent2`.`PlayerSlot`,
`Extent2`.`HeroId`,
`Extent2`.`Item_0`,
`Extent2`.`Item_1`,
`Extent2`.`Item_2`,
`Extent2`.`Item_3`,
`Extent2`.`Item_4`,
`Extent2`.`Item_5`,
`Extent2`.`Kills`,
`Extent2`.`Deaths`,
`Extent2`.`Assists`,
`Extent2`.`LeaverStatus`,
`Extent2`.`Gold`,
`Extent2`.`GoldSpent`,
`Extent2`.`LastHits`,
`Extent2`.`Denies`,
`Extent2`.`GoldPerMin`,
`Extent2`.`XpPerMin`,
`Extent2`.`Level`,
`Extent2`.`HeroDamage`,
`Extent2`.`TowerDamage`,
`Extent2`.`HeroHealing`
FROM `match` AS `Extent1` INNER JOIN `playermatch` AS `Extent2` ON `Extent1`.`MatchId` = `Extent2`.`MatchId`
WHERE ((`Extent1`.`GameMode`) = 2) AND ((`Extent1`.`LobbyType`) = 7)) AS `Project1`
ORDER BY
`Project1`.`MatchId` ASC LIMIT 0,1000
Another approach could be to have a VIEW that does the join and indexes the appropriate columns and then create a Table-Valued Function that uses the VIEW and returns a TABLE with only the page data.
You'll have to manually write the SQL query for the paging, but i think it would be faster.
I haven't tried something like that so i can't be sure there is gonna be a big speed boost.
You didn't include enough information to help you so I'll suggest.
One way to avoid order by is to store rows in a table already in the order. I suggest 'MatchId' is a primary key and a clustered index of MatchEntities. That means MatchEntities.MatchId is stored physically sorted. If you switch join streams to pull the sorted stream first and additive stream second you avoid expensive sorting.
Like this:
var playerMatches = (from m in context.MatchEntities // note the switch: MatchEntities goes first
join p in context.PlayerMatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
// .OrderBy(p => p.MatchId) // no need for this any more
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
Also see a query plan to find out how the query is executed by the database, what type of join is being used, etc. Maybe your original query does not exploit sorting at all.

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};
You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/
The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id
It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

Categories