I am using EF6 and I would like to get the records in a table which are in a group of IDs.
In my test for example I am using 4 IDs.
I try two options, the first is with any.
dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));
And the T-SQL that this linq exrepsion generates is:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
cast(130 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(139 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
cast(140 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
cast(141 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
WHERE [UnionAll3].[C1] = [Extent1].[MiID]
)
How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.
The second option is with contains.
dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));
And the T-SQL:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))
The contains is translated into "where in", but the query is much less complex.
I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.
Thank so much.
EDIT: I have some test (I don't know if this is the best way to test this).
System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
miswContains.Start();
for (int i = 0; i < 100; i++)
{
IQueryable<MyTable> iq = dbContext.MyTable
.Where(x => myIDS.Contains(x.MyID));
iq.ToArrayAsync();
}
miswContains.Stop();
System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
miswAny.Start();
for (int i = 0; i < 20; i++)
{
IQueryable<MyTable> iq = dbContext.Mytable
.Where(x => myIDS.Any(y => y == x.MyID));
iq.ToArrayAsync();
}
miswAny.Stop();
the results are that miswAny is about 850ms and the miswContains is about 4251ms.
So the second option, with contaions, is slower.
Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.
If you want to read more about in clause performance: Is SQL IN bad for performance?.
If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):
dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);
The generated SQL for the count should look something like this:
DECLARE #p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = #p0
You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.
Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.
So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.
I have read that any it use to be faster,
In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.
Related
I've done some extensive research and I've concluded that the DATEDIFF function is making my queries run very slow.
Below is the generated query by Entity Framework and it does look readable enough hopefully.
Here's the Linq that generates the T-SQL:
model.NewTotal1Week = ( from sdo in context.SubscriberDebitOrders
where
(
sdo.CampaignId == campaignId &&
( sdo.Status == ( Int32 ) DebitOrderStatus.New_Faulty ) &&
( SqlFunctions.DateDiff( "week", sdo.Collections.FirstOrDefault( c => c.TxnStatus == "U" ).ProcessDate, DateTime.Now ) <= 1 )
)
select sdo ).Count();
In the query below, I would like to get a COUNT of all Collections which fall within 1 week from the time they were Processed to today's date.
Is there anyone that can help me get rid of the DATEDIFF function? I've seen examples online but I couldn't adapt it to my scenario, forgive me I'm not very genius yet.
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[SubscriberDebitOrder] AS [Extent1]
OUTER APPLY (SELECT TOP (1)
[Extent2].[ProcessDate] AS [ProcessDate]
FROM [dbo].[Collections] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[DebitOrderId]) AND (''U'' = [Extent2].[TxnStatus]) ) AS [Limit1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND (3 = [Extent1].[Status]) AND ((DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <= 1)
) AS [GroupBy1]',N'#p__linq__0 int',#p__linq__0=3
go
Thanks in advance.
Its not the just DATEDIFF, any function on the column would cause query to do a SCAN on the underlying table/index
DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <=1
Above logic is fetching last week data? You can also write above without putting function around ProcessDate Column.
[Limit1].[ProcessDate] > SysDateTime()-7
This is your query:
SELECT GroupBy1.A1 AS C1
FROM (SELECT COUNT(1) AS[A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS [Limit1]
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
(DATEDIFF(week, Limit1.ProcessDate, SysDateTime()) <= 1)
) GroupBy1;
As mentioned elsewhere, you should change the date logic and get rid of the outer query:
SELECT COUNT(1) AS A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS limit1
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
Limit1.ProcessDate <= DATEADD(-1, week, GETDATE())
Very important note: This is not exactly equivalent to your query. Your original query counted the number of week boundaries between two dates. This depends on datefirst, but it woudld often be the number of Saturday or Sunday nights.
Based on your description, the above is more correct.
Next, you want indexes on Collections(DebitOrderId, TxnStatus, ProcessDate) and SubscriberDebitOrder(CampaignId, Status).
So I have a Topic which has these related entities.
- List<Posts>
- List<Votes>
- List<Views>
I have the following query. Where I want to pull out and order popular Topics based on the count of 3 related entities over a specific date period.
var topics = _context.Topic
.OrderByDescending(x => x.Posts.Count(c => c.DateCreated >= from && c.DateCreated <= to))
.ThenByDescending(x => x.Votes.Count(c => c.DateCreated >= from && c.DateCreated <= to))
.ThenByDescending(x => x.Views.Count(c => c.DateCreated >= from && c.DateCreated <= to))
.Take(amountToShow)
.ToList();
I'm looking for the most efficient query for doing the above? Is what I am doing the best way to do this with EntityFramework? Or am I missing something?
Any help appreciated.
If you put your above code into LINQPad, or check it with the profiler, you will see that it will probably generate something like the following SQL:
SELECT TOP #amountToShow [t0].[id] --and additional columns
FROM [Topic] AS [t0]
ORDER BY (
SELECT COUNT(*)
FROM [Posts] AS [t1]
WHERE ([t1].DateCreated >= #from AND [t1].DateCreated <= #to)
AND ([t1].[topidId] = [t0].[id])
) DESC, (
SELECT COUNT(*)
FROM [Votes] AS [t2]
WHERE ([t2].DateCreated >= #from AND [t2].DateCreated <= #to)
AND ([t2].[topicId] = [t0].[id])
) DESC , (
SELECT COUNT(*)
FROM [Views] AS [t3]
WHERE ([t3].DateCreated >= #from AND [t3].DateCreated <= #to)
AND ([t3].[topicId] = [t0].[id])
) DESC
GO
You could try rewriting the SQL a bit to GROUP the results of the subqueries and LEFT JOIN them to the original table, which does seem to be about 2x faster in the db itself:
SELECT TOP #amountToShow [t0].[id] --etc
FROM [Topic] AS [t0]
LEFT JOIN
(SELECT topicId, COUNT(*) AS num FROM Posts p
WHERE [p].DateCreated >= #from AND .DateCreated <= #to
GROUP BY topicId) [t1]
ON t0.id = t1.topicId
LEFT JOIN
(SELECT topicId, COUNT(*) AS num FROM Votes vo
WHERE [vo].DateCreated >= #from AND [vo].DateCreated <= #to
GROUP BY topidId) [t2]
ON t0.id = t2.topicId
LEFT JOIN
(SELECT topicId, COUNT(*) AS num FROM Views vi
WHERE [vi].DateCreated >= #from AND [vi].DateCreated <= #to
GROUP BY topicId) [t3]
ON t0.id = t3.topicId
ORDER BY t1.num DESC, t2.num DESC, t3.num DESC
But getting LINQ to generate code like this is iffy at best. Doing LEFT JOINs are not exactly its strong suit, and using the techniques that are out there for doing so will probably generate SQL that uses CROSS APPLY and/or OUTER APPLY instead, and will likely be as slow or slower than your current code.
If you are that worried about speed, you might consider putting your fine-tuned SQL into a view so that you know that the query being used is the one you want.
Bear in mind, too, that you or someone else will have to come back to this code and maintain it later. Your current linq statement is very straightforward and easy to understand. A complicated query is going to be harder to maintain and will take more work to alter in the future.
I'm trying to create a query similar to this:
select randomId
from myView
where ...
group by randomId
NOTE: EF doesn't support the distinct so I was thinking of going around the lack of it with the group by (or so I think)
randomId is numeric
Entity Framework V.6.0.2
This gives me the expected result in < 1 second query
When trying to do the same with EF I have been having some issues.
If I do the LINQ similar to this:
context.myView
.Where(...)
.GroupBy(mt => mt.randomId)
.Select({ Id = group.Key, Count = group.Count() } )
I will get sort of the same result but forcing a count and making the query > 6 seconds
The SQL EF generates is something like this:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [randomId],
[GroupBy1].[A1] AS [C2]
FROM (
SELECT
[Extent1].[randomId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
GROUP BY [Extent1].[randomId]
) AS [GroupBy1]
But, if the query had the count commented out it would be back to < 1 second
If I change the Select to be like:
.Select({ Id = group.Key} )
I will get all of rows without the group by statement in the SQL query and no Distinct whatsoever:
SELECT
[Extent1].[anotherField] AS [anotherField], -- 'this field got included automatically on this query and I dont know why, it doesnt affect outcome when removed in SQL server'
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
Other failed attempts:
query.GroupBy(x => x.randomId).Select(group => group.FirstOrDefault());
The query that was generated is as follows:
SELECT
[Limit1].ALL FIELDS,...
FROM (SELECT
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...) AS [Project1]
OUTER APPLY (SELECT TOP (1)
[Extent2].ALL FIELDS,...
FROM [dbo].[myView] AS [Extent2]
WHERE (...) AS [Limit1] -- same as the where above
This query performed rather poorly and still managed to return all Ids for the where clause.
Does anyone have an idea on how to force the usage of the group by without an aggregating function like a count?
In SQL it works but then again I have the distinct keyword as well...
Cheers,
J
var query = from p in TableName
select new {Id = p.ColumnNameId};
var distinctItems = query.Distinct().ToList();
Here is the linq query however you should be able to write an equivalent from EF dbset too. If you have issues let me know.
Cheers!
I'm newish to LinqToSQL and the project that I am working on cannot be changed to something else. I am translating some old SQL code to Linq. Not being that hot at linq, I used Linqer to do the translation for me. The query took about 90 seconds to run, so I thought it must be the linqToSQL. However, when I copied the query that the LinqToSQL produced and ran an ExecuteQuery on the datacontext it was super quick as I expected. I've copied the full queries, rather than trying to distil it down, but it looks like the issue is with something LinqToSQL is doing behind the scenes.
To summarise, if I copy the T-SQL created by linq and run
var results = DB.ExecuteQuery<InvoiceBalanceCheckDTO.InvoiceBalanceCheck>(#"T-SQL created by Linq - see below").ToList()
it completes with expected results in about 0.5 seconds.
It runs about the same time directly in SSMS. However, if I use the linqToSQL code that creates the T-SQL and do ToList() it takes ages. The result is only 9 records, although without the constraint to check the balance <> 0, there would be around 19,000 records. It's as if it's getting all 19,000 and then checking <> 0 after it's got the records.
I have also changed the Linq to project into the class used above, rather than to an anonymous type, but it makes not difference
This is the original SQL :
SELECT InvoiceNum, Max(AccountCode), Sum(AmountInc) AS Balance
FROM
(SELECT InvoiceNum, AccountCode, AmountInc From TourBookAccount WHERE AccDetailTypeE IN(20,30) AND InvoiceNum >= 1000
UNION ALL
SELECT InvoiceNum, '<no matching invoice>' AS AccountCode, AccountInvoiceDetail.AmountInc
FROM AccountInvoiceDetail
INNER JOIN AccountInvoice ON AccountInvoiceDetail.InvoiceID=AccountInvoice.InvoiceID
WHERE AccDetailTypeE IN(20,30)
AND InvoiceNum >= 1000
) as t
GROUP BY InvoiceNum
HAVING (Sum(t.AmountInc)<>0)
ORDER BY InvoiceNum
and this is the linq
var test = (from t in
(
//this gets the TourBookAccount totals
from tba in DB.TourBookAccount
where
detailTypes.Contains(tba.AccDetailTypeE) &&
tba.InvoiceNum >= dto.CheckInvoiceNumFrom
select new
{
InvoiceNum = tba.InvoiceNum,
AccountCode = tba.AccountCode,
Balance = tba.AmountInc
}
)
.Concat //note that concat, since it's possible that the AccountInvoice record does not actually exist
(
//this gets the Invoice detail totals.
from aid in DB.AccountInvoiceDetail
where
detailTypes.Contains(aid.AccDetailTypeE) &&
aid.AccountInvoice.InvoiceNum >= dto.CheckInvoiceNumFrom &&
select new
{
InvoiceNum = aid.AccountInvoice.InvoiceNum,
AccountCode = "<No Account Records>",
Balance = aid.AmountInc
}
)
group t by t.InvoiceNum into g
where Convert.ToDecimal(g.Sum(p => p.Balance)) != 0m
select new
{
InvoiceNum = g.Key,
AccountCode = g.Max(p => p.AccountCode),
Balance = g.Sum(p => p.Balance)
}).ToList();
and this is the T-SQL that the linq produces
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (20, 30)) AND ([t0].[InvoiceNum] >= 1000)
UNION ALL
SELECT [t2].[InvoiceNum],'<No Account Records>' AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (20, 30)) AND ([t2].[InvoiceNum] >= 1000)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> 0
I would bet money, that the problem is in this line:
where Convert.ToDecimal(g.Sum(p => p.Balance)) != 0m
What is probably happening, is that it can't translate this to SQL and silently tries to get all rows from db to memory, and then do filtering on in memory objects (LINQ to objects)
Maybe try to change this to something like:
where g.Sum(p=>.Balance!=0)
Well, the answer turned out not to be LinqToSQL itself (although possibly the way it creates the query could be blamed) , but the way SQL server handles the query. When I was running the query on the database to check speed (and running the created T=SQL in DB.ExecuteQuery) I had all the variables hardcoded. When I changed it to use the exact sql that Linq produces (i.e. with variables that are substituted) it ran just as slow in SSMS.
Looking at the execution plans of the two, they are quite different. A quick search on SO brought me to this page : Why does a parameterized query produces vastly slower query plan vs non-parameterized query which indicated that the problem was SQL server's "Parameter sniffing".
The culprit turned out to be the "No Account Records" string
For completeness, here is the generated T-SQL that Linq creates.
Change #p10 to the actual hardcoded string, and it's back to full speed !
In the end I just removed the line from the linq and set the account code afterwards and all was good.
Thanks #Botis,#Blorgbeard,#ElectricLlama & #Scott for suggestions.
DECLARE #p0 as Int = 20
DECLARE #p1 as Int = 30
DECLARE #p2 as Int = 1000
DECLARE #p3 as Int = 20
DECLARE #p4 as Int = 30
DECLARE #p5 as Int = 1000
DECLARE #p6 as Int = 40
DECLARE #p7 as Int = 10
DECLARE #p8 as Int = 0
DECLARE #p9 as Int = 1
DECLARE #p10 as NVarChar(4000)= '<No Account Records>' /*replace this parameter with the actual text in the SQl and it's way faster.*/
DECLARE #p11 as Decimal(33,4) = 0
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (#p0, #p1)) AND ([t0].[InvoiceNum] >= #p2)
UNION ALL
SELECT [t2].[InvoiceNum], #p10 AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (#p3, #p4)) AND ([t2].[InvoiceNum] >= #p5) AND ([t2].[InvoiceStatusE] <= #p6) AND ([t2].[InvoiceTypeE] = #p7) AND ([t1].[BookNum] <> #p8) AND ([t1].[AccDetailSourceE] = #p9)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> #p11
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (20, 30)) AND ([t0].[InvoiceNum] >= 1000)
UNION ALL
SELECT [t2].[InvoiceNum], '<No Account Records>' AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (20, 30)) AND ([t2].[InvoiceNum] >= 0) AND ([t2].[InvoiceStatusE] <= 40) AND ([t2].[InvoiceTypeE] = 10) AND ([t1].[BookNum] <> 0) AND ([t1].[AccDetailSourceE] = 1)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> 0
Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.
Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);
Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1
Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));
Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.
There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!
This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.