Linq2Sql: Get every N'th row [duplicate]

Linq2Sql: Get every N'th row [duplicate] - c#

Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.

Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);

Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1

Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));

Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.

There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!

This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.

Related

Performing two queries in a single round trip to the database

I have the following code to perform a full-text search. It creates a query, gets the total number of rows returned by that query and then retrieves the actual rows for only the current page.
// Create IQueryable
var query = from a in ArticleServerContext.Set<Article>()
where a.Approved
orderby a.UtcDate descending
select a;
// Get total rows (needed for pagination logic)
int totalRows = query.Count()
// Get rows for current page
query = query.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage);
This works fine, but it requires two round trips to the database. In the interest of optimizing the code, is there any way to rework this query so it only had one round trip to the database?

Yes, you can perform this two operations with the help of the only one query to database:
// Create IQueryable
var query = from a in ArticleServerContext.Set<Article>()
where a.Approved
orderby a.UtcDate descending
select new { a, Total = ArticleServerContext.Set<Article>().Where(x => x.Approved).Count() };
//Get raw rows for current page with Total(Count) field
var result = query.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage).ToList();
//this data you actually will use with your logic
var actualData = result.Select(x => x.a).ToList();
// Get total rows (needed for pagination logic)
int totalRows = result.First().Total;
If you use MSSQL query wil be look that way:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[UtcDate] AS [UtcDate],
[Extent1].[Approved] AS [Approved],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Articles] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Articles] AS [Extent2]
WHERE [Extent2].[Approved] ) AS [GroupBy1]
WHERE [Extent1].[Approved]
ORDER BY [Extent1].[UtcDate] DESC

I'm not sure whether it's worth enough, but it's doable under the following constraints:
(1) CurrentPage and RowsPerPage are not affected by the totalRows value.
(2) The query is materialized after applying the paging parameters.
The trick is to use group by constant value, which is supported by EF. The code looks like this:
var query =
from a in ArticleServerContext.Set<Article>()
where a.Approved
// NOTE: order by goes below
group a by 1 into allRows
select new
{
TotalRows = allRows.Count(),
PageRows = allRows
.OrderByDescending(a => a.UtcDate)
.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage)
};
var result = query.FirstOrDefault();
var totalRows = result != null ? result.TotalRows : 0;
var pageRows = result != null ? result.PageRows : Enumerable.Empty<Article>();

This Any is better or not than this contains?

I am using EF6 and I would like to get the records in a table which are in a group of IDs.
In my test for example I am using 4 IDs.
I try two options, the first is with any.
dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));
And the T-SQL that this linq exrepsion generates is:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
cast(130 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(139 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
cast(140 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
cast(141 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
WHERE [UnionAll3].[C1] = [Extent1].[MiID]
)
How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.
The second option is with contains.
dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));
And the T-SQL:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))
The contains is translated into "where in", but the query is much less complex.
I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.
Thank so much.
EDIT: I have some test (I don't know if this is the best way to test this).
System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
miswContains.Start();
for (int i = 0; i < 100; i++)
{
IQueryable<MyTable> iq = dbContext.MyTable
.Where(x => myIDS.Contains(x.MyID));
iq.ToArrayAsync();
}
miswContains.Stop();
System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
miswAny.Start();
for (int i = 0; i < 20; i++)
{
IQueryable<MyTable> iq = dbContext.Mytable
.Where(x => myIDS.Any(y => y == x.MyID));
iq.ToArrayAsync();
}
miswAny.Stop();
the results are that miswAny is about 850ms and the miswContains is about 4251ms.
So the second option, with contaions, is slower.

Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.
If you want to read more about in clause performance: Is SQL IN bad for performance?.

If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):
dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);
The generated SQL for the count should look something like this:
DECLARE #p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = #p0

You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.
Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.
So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.
I have read that any it use to be faster,
In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.

Linq-to-SQL: arithmetic operation on consecutive elements

For example, I have a table:
Date |Value
----------|-----
2015/10/01|5
2015/09/01|8
2015/08/01|10
Is there any way using Linq-to-SQL to get a new sequence which will be an arithmetic operation between consecutive elements in the previously ordered set (for example, i.Value - (i-1).Value)? It must be executed on SQL Server 2008 side, not application side.
For example dataContext.GetTable<X>().OrderByDescending(d => d.Date).Something(.......).ToArray(); should return 3, 2.
Is it possible?

You can try this:
var q = (
from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = i.ItemValue - (prev == null ? 0 : prev.ItemValue) }
).ToArray();
EDIT:
If you slightly modify the above linq query to:
var q = (from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = (int?)i.ItemValue - prev.ItemValue }
).ToArray();
then you get the following TSQL query sent to the database:
SELECT ([t0].[ItemValue]) - ((SELECT [t2].[ItemValue]
FROM (SELECT TOP (1) [t1].[ItemValue]
FROM [Items] AS [t1]
WHERE [t1].[ItemDate] < [t0].[ItemDate]) AS [t2]
)) AS [Value]
FROM [Items] AS [t0]
ORDER BY [t0].[ItemDate] DESC
My guess now is if you place an index on ItemDate field this shouldn't perform too bad.

I wouldn't let SQL do this, it would create an inefficient SQL query (I think).
I could create a stored procedure, but if the amount of data is not too big I can also use Linq to objects:
List<x> items=dataContext.GetTable<X>().OrderByDescending(d => d.Date).ToList();//Bring data to memory
var res = items.Skip(1).Zip(items, (cur, prev) => cur.Value - prev.Value);
At the end, I might use a foreach for readability

Linq to Entity Paging With Large dataset too slow

I'm analyzing player data over millions of matches from an online game. I'm trying to page data into memory in chunks to reduce load times but using OrderBy with skip/take takes way too long (20+ minutes even for smaller queries).
This is my query:
var playerMatches = (from p in context.PlayerMatchEntities
join m in context.MatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
.OrderBy(p => p.MatchId)
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
MatchId is indexed.
Each match has 10 players, and I currently have 3.3 million matches w/ 33 million rows in the PlayerMatch table, but data is being collected constantly.
Is there a way to get around the large performance drop caused by OrderBy?
This post is similar but didn't seem to be resolved.
Edit:
This is the SQL query generated:
SELECT
`Project1`.`AccountId`,
`Project1`.`MatchId`,
`Project1`.`PlayerSlot`,
`Project1`.`HeroId`,
`Project1`.`Item_0`,
`Project1`.`Item_1`,
`Project1`.`Item_2`,
`Project1`.`Item_3`,
`Project1`.`Item_4`,
`Project1`.`Item_5`,
`Project1`.`Kills`,
`Project1`.`Deaths`,
`Project1`.`Assists`,
`Project1`.`LeaverStatus`,
`Project1`.`Gold`,
`Project1`.`GoldSpent`,
`Project1`.`LastHits`,
`Project1`.`Denies`,
`Project1`.`GoldPerMin`,
`Project1`.`XpPerMin`,
`Project1`.`Level`,
`Project1`.`HeroDamage`,
`Project1`.`TowerDamage`,
`Project1`.`HeroHealing`
FROM (SELECT
`Extent2`.`AccountId`,
`Extent2`.`MatchId`,
`Extent2`.`PlayerSlot`,
`Extent2`.`HeroId`,
`Extent2`.`Item_0`,
`Extent2`.`Item_1`,
`Extent2`.`Item_2`,
`Extent2`.`Item_3`,
`Extent2`.`Item_4`,
`Extent2`.`Item_5`,
`Extent2`.`Kills`,
`Extent2`.`Deaths`,
`Extent2`.`Assists`,
`Extent2`.`LeaverStatus`,
`Extent2`.`Gold`,
`Extent2`.`GoldSpent`,
`Extent2`.`LastHits`,
`Extent2`.`Denies`,
`Extent2`.`GoldPerMin`,
`Extent2`.`XpPerMin`,
`Extent2`.`Level`,
`Extent2`.`HeroDamage`,
`Extent2`.`TowerDamage`,
`Extent2`.`HeroHealing`
FROM `match` AS `Extent1` INNER JOIN `playermatch` AS `Extent2` ON `Extent1`.`MatchId` = `Extent2`.`MatchId`
WHERE ((`Extent1`.`GameMode`) = 2) AND ((`Extent1`.`LobbyType`) = 7)) AS `Project1`
ORDER BY
`Project1`.`MatchId` ASC LIMIT 0,1000

Another approach could be to have a VIEW that does the join and indexes the appropriate columns and then create a Table-Valued Function that uses the VIEW and returns a TABLE with only the page data.
You'll have to manually write the SQL query for the paging, but i think it would be faster.
I haven't tried something like that so i can't be sure there is gonna be a big speed boost.

You didn't include enough information to help you so I'll suggest.
One way to avoid order by is to store rows in a table already in the order. I suggest 'MatchId' is a primary key and a clustered index of MatchEntities. That means MatchEntities.MatchId is stored physically sorted. If you switch join streams to pull the sorted stream first and additive stream second you avoid expensive sorting.
Like this:
var playerMatches = (from m in context.MatchEntities // note the switch: MatchEntities goes first
join p in context.PlayerMatchEntities
on p.MatchId equals m.MatchId
where m.GameMode == (byte) gameMode
&& m.LobbyType == (byte) lobbyType
select p)
// .OrderBy(p => p.MatchId) // no need for this any more
.Skip(page - 1 * pageSize)
.Take(pageSize)
.ToList();
Also see a query plan to find out how the query is executed by the database, what type of join is being used, etc. Maybe your original query does not exploit sorting at all.

Linq-to-entities, get results + row count in one query

I've seen multiple questions about this matter, however they were 2 years (or more) old, so I'd like to know if anything changed about this.
The basic idea is to populate a gridview and create custom paging. So, I need the results and row count as well.
In SQL this would be something like:
SELECT COUNT(id), Id, Name... FROM ... WHERE ...
Getting everything in a nice simple query. However, I'd like to be consistent and use Linq2Entities.
So far I'm using the approach with two queries (against sql server), because it just works. I would like to optimize it though and use a single query instead.
I've tried this:
var query = from o in _db.Products
select o;
var prods = from o in query
select new
{
Count = query.Count(),
Products = query
};
This produces a very nasty and long query with really unnecessary cross joins and other stuff which I don't really need or want.
Is there a way to get the paged results + count of all entities in a one simple query? What is the recommended approach here?
UPDATE:
Just tried FutureQueries and either I'm doing something wrong, or it actually executes two queries. This shows my sql profiler:
-- Query #1
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID]
) AS [GroupBy1];
And next row:
-- Query #1
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[Price] AS [Price],
[Extent1].[CategoryID] AS [CategoryID]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID];
The C# code:
internal static List<Product> GetProducts(out int _count)
{
DatabaseEntities _db = new DatabaseEntities();
var query = from o in _db.Products
where o.CategoryID == 1
select o;
var count = query.FutureCount();
_count = count.Value;
return query.Future().ToList();
}
Did I miss something? According to my profiler it does exactly the same except that added row in the query (-- Query #1).

Have a look at Future Queries to do this in EntityFramework.Extended. The second example on that linked page uses FutureCount() to do exactly what you want. Adapted here:
var q = db.Products.Where(p => ...);
var qCount = q.FutureCount();
var qPage = q.Skip((pageNumber-1)*pageSize).Take(pageSize).Future();
int total = qCount.Value; // Both queries are sent to the DB here.
var tasks = qPage.ToList();

this 'EntityFramework.Extended' library is no longer supported use this one instead:
entityframework-plus and go here:
https://entityframework-plus.net/query-future to see how you can get count and records
in the same query.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq2Sql: Get every N'th row [duplicate] - c#

Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice: var nth = 3; var ids = Table .Select(x => x.Id) .ToArray() .Where((x, n) => n % nth == 0) .ToArray(); var nthRecords = Table .Where(x => ids.Contains(x.Id));

Related

Performing two queries in a single round trip to the database

This Any is better or not than this contains?

Linq-to-SQL: arithmetic operation on consecutive elements

Linq to Entity Paging With Large dataset too slow

Linq-to-entities, get results + row count in one query

Categories

Resources