Entity Framework/Linq to SQL: Skip & Take - c#

Just curious as to how Skip & Take are supposed to work. I'm getting the results I want to see on the client side, but when I hook up the AnjLab SQL Profiler and look at the SQL that is being executed it looks as though it is querying for and returning the entire set of rows to the client.
Is it really returning all the rows then sorting and narrowing down stuff with LINQ on the client side?
I've tried doing it with both Entity Framework and Linq to SQL; both appear to have the same behavior.
Not sure it makes any difference, but I'm using C# in VWD 2010.
Any insight?
public IEnumerable<Store> ListStores(Func<Store, string> sort, bool desc, int page, int pageSize, out int totalRecords)
{
var context = new TectonicEntities();
totalRecords = context.Stores.Count();
int skipRows = (page - 1) * pageSize;
if (desc)
return context.Stores.OrderByDescending(sort).Skip(skipRows).Take(pageSize).ToList();
return context.Stores.OrderBy(sort).Skip(skipRows).Take(pageSize).ToList();
}
Resulting SQL (Note: I'm excluding the Count query):
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[LegalName] AS [LegalName],
[Extent1].[YearEstablished] AS [YearEstablished],
[Extent1].[DiskPath] AS [DiskPath],
[Extent1].[URL] AS [URL],
[Extent1].[SecureURL] AS [SecureURL],
[Extent1].[UseSSL] AS [UseSSL]
FROM [dbo].[tec_Stores] AS [Extent1]
After some further research, I found that the following works the way I would expect it to:
public IEnumerable<Store> ListStores(Func<Store, string> sort, bool desc, int page, int pageSize, out int totalRecords)
{
var context = new TectonicEntities();
totalRecords = context.Stores.Count();
int skipRows = (page - 1) * pageSize;
var qry = from s in context.Stores orderby s.Name ascending select s;
return qry.Skip(skipRows).Take(pageSize);
}
Resulting SQL:
SELECT TOP (3)
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[LegalName] AS [LegalName],
[Extent1].[YearEstablished] AS [YearEstablished],
[Extent1].[DiskPath] AS [DiskPath],
[Extent1].[URL] AS [URL],
[Extent1].[SecureURL] AS [SecureURL],
[Extent1].[UseSSL] AS [UseSSL]
FROM ( SELECT [Extent1].[ID] AS [ID], [Extent1].[Name] AS [Name], [Extent1].[LegalName] AS [LegalName], [Extent1].[YearEstablished] AS [YearEstablished], [Extent1].[DiskPath] AS [DiskPath], [Extent1].[URL] AS [URL], [Extent1].[SecureURL] AS [SecureURL], [Extent1].[UseSSL] AS [UseSSL], row_number() OVER (ORDER BY [Extent1].[Name] ASC) AS [row_number]
FROM [dbo].[tec_Stores] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 3
ORDER BY [Extent1].[Name] ASC
I really like the way the first option works; Passing in a lambda expression for sort. Is there any way to accomplish the same thing in the LINQ to SQL orderby syntax? I tried using qry.OrderBy(sort).Skip(skipRows).Take(pageSize), but that ended up giving me the same results as my first block of code. Leads me to believe my issues are somehow tied to OrderBy.
====================================
PROBLEM SOLVED
Had to wrap the incoming lambda function in Expression:
Expression<Func<Store,string>> sort

The following works and accomplishes the simplicity I was looking for:
public IEnumerable<Store> ListStores(Expression<Func<Store, string>> sort, bool desc, int page, int pageSize, out int totalRecords)
{
List<Store> stores = new List<Store>();
using (var context = new TectonicEntities())
{
totalRecords = context.Stores.Count();
int skipRows = (page - 1) * pageSize;
if (desc)
stores = context.Stores.OrderByDescending(sort).Skip(skipRows).Take(pageSize).ToList();
else
stores = context.Stores.OrderBy(sort).Skip(skipRows).Take(pageSize).ToList();
}
return stores;
}
The main thing that fixed it for me was changing the Func sort parameter to:
Expression<Func<Store, string>> sort

As long as you don't do it like queryable.ToList().Skip(5).Take(10), it won't return the whole recordset.
Take
Doing only Take(10).ToList(), does a SELECT TOP 10 * FROM.
Skip
Skip works a bit different because there is no 'LIMIT' function in TSQL. However it creates an SQL query that is based on the work described in this ScottGu blog post.
If you see the whole recordset returned, it probably is because you are doing a ToList() somewhere too early.

Entity Framework 6 solution here...
http://anthonychu.ca/post/entity-framework-parameterize-skip-take-queries-sql/
e.g.
using System.Data.Entity;
....
int skip = 5;
int take = 10;
myQuery.Skip(() => skip).Take(() => take);

I created simple extension:
public static IEnumerable<T> SelectPage<T, T2>(this IEnumerable<T> list, Func<T, T2> sortFunc, bool isDescending, int index, int length)
{
List<T> result = null;
if (isDescending)
result = list.OrderByDescending(sortFunc).Skip(index).Take(length).ToList();
else
result = list.OrderBy(sortFunc).Skip(index).Take(length).ToList();
return result;
}
Simple use:
using (var context = new TransportContext())
{
var drivers = (from x in context.Drivers where x.TransportId == trasnportId select x).SelectPage(x => x.Id, false, index, length).ToList();
}

If you are using SQL Server as DB
Then you can convert
context.Users.OrderBy(u => u.Id)
.Skip(() => 10)
.Take(() => 5)
.ToList
=>
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[UserName] AS [UserName]
FROM [dbo].[AspNetUsers] AS [Extent1]
ORDER BY [Extent1].[Id] ASC
OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY
refrence: https://anthonychu.ca/post/entity-framework-parameterize-skip-take-queries-sql/

Try this:
public IEnumerable<Store> ListStores(Func<Store, string> sort, bool desc, int page, int pageSize, out int totalRecords)
{
var context = new TectonicEntities();
var results = context.Stores;
totalRecords = results.Count();
int skipRows = (page - 1) * pageSize;
if (desc)
results = results.OrderByDescending(sort);
return results.Skip(skipRows).Take(pageSize).ToList();
}
in truth, that last .ToList() isn't really necessary as you are returning IEnumerable...
There will be 2 database calls, one for the count and one when the ToList() is executed.

Related

I have an IQueryable with WHERE IN (ids). Can ToList() return the same order as ids?

My code:
var ids = new int[] { 16259,16238,16240,16243 };
var teamQuery = _teamRepository.Where(team => ids.Contains(team.ID));
var teams = teamQuery.ToList();
I have a IQueryable object teamQuery which forms a query
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
FROM [dbo].[Team] AS [Extent1]
WHERE [Extent1].[ID] IN (16259, 16238, 16240, 16243)
Can I keep the same order in the generated List teams object during ToList() materialization as in the IN method (16259, 16238, 16240, 16243)?
Now the order of objects in teams is [16238, 16240, 16243, 16259]
This is the SQL that you want to try and approximate:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
FROM [dbo].[Team] AS [Extent1]
WHERE [Extent1].[ID] IN (16259, 16238, 16240, 16243)
ORDER BY CASE [Extent1].ID
WHEN 16259 THEN 0
WHEN 16238 THEN 1
WHEN 16240 THEN 2
WHEN 16243 THEN 3
ELSE 4
END;
I wouldn't recommend this as a generic solution, not unless your list of options was constrained to being very small:
var ids = new [] { 16259, 16238, 16240, 16243 };
var firstId = ids.First();
var teamQuery = _teamRepository.Where(team => ids.Contains(team.ID))
.OrderByDescending(t => t.Id == firstId);
foreach(var id in ids.Skip(1))
{
teamQuery = query.ThenByDescending(t => t.Id == id);
}
var teams = teamQuery.ToList();
A more efficient solution is to sort the results after executing in the database by correlating against the original array indices:
var ids = new [] { 16259, 16238, 16240, 16243 };
var teamData = _teamRepository.Where(team => ids.Contains(team.ID)).ToList();
var teams = teamData.OrderBy(team => ids.IndexOf(team.ID)).Tolist();
To do it in pure SQL efficiently for a large list would involve first creating the ordered list as a table reference, then you could order by the position in that list, but the LINQ to SQL would not easily replicate the same process unless you define a specific User Defined Type in the schema.
You could also interpolate this type of query, or construct it as raw SQL, that provides you with more options but is less LINQish.
Generally we can sort data in these sorts of custom ways with more efficient code in C#. There are very few benefits to sorting on the database unless the list is obscenely large.

Func Delegates cause LINQ-to-Entities to pull back the entire table

Passing a Func<> as a Where/Count filter causes LINQ to pull back the entire table. Here's a simple example.
pdx.Database.Log = strWriter1.Write;
totalCount = pdx.Principals.Count(x => x.PrincipalNumber.ToLower().Contains("42"));
Looking at the log I see
SELECT [GroupBy1].[A1] AS [C1] FROM ( SELECT COUNT(1) AS [A1]
FROM [Dealer].[Principal] AS [Extent1]
WHERE LOWER([Extent1].[PrincipalNumber]) LIKE N'%42%'
) AS [GroupBy1]
Did not pull back the full table. Simple enough. Now let's assign that lambda to a Func<>
pdx.Database.Log = strWriter2.Write;
Func<Principal, bool> filter = (x => x.PrincipalNumber.ToLower().Contains("42"));
totalCount = pdx.Principals.Count(filter);
The log shows it's pulling down the entire table.
SELECT
[Extent1].[PrincipalNumber] AS [PrincipalNumber],
[Extent1].[Id] AS [Id],
[Extent1].[CompanyName] AS [CompanyName],
...
[Extent1].[DistrictSalesManagerId] AS [DistrictSalesManagerId]
FROM [Dealer].[Principal] AS [Extent1]
That's pretty bad for performance. I have functions that do LINQ queries. I want to pass lambda filters to these functions so I can filter on various things, but apparently I can't pass lambdas as Func<>s because it will kill the performance. What are my alternatives?
What I want to do...
public IEnumerable<DealerInfo> GetMyPage(Func<Principal, bool> filter, int pageNumber, int pageSize, out int totalCount)
{
List<DealerInfo> dealers;
using (MyContext pdx = new MyContext())
{
totalCount = pdx.Principals.Count(filter);
// More LINQ stuff here, but UGH the performance...
}
}
You actually need to pass Expression<Func<TSrource,T>> , Linq to Entities cannot translate Func<T> to sql, change the signatures to be like:
public IEnumerable<DealerInfo> GetMyPage(Expression<Func<Principal, bool>> filter, int pageNumber, int pageSize, out int totalCount)
{
List<DealerInfo> dealers;
using (MyContext pdx = new MyContext())
{
totalCount = pdx.Principals.Count(filter);
// More LINQ stuff here, but UGH the performance...
}
}
When you pass Func<T,TResult>> in the Count method as argument, it would call Count method extension method of IEnumerable<T> which is in memory collection, so that is causing the whole table data to be loaded in to memory first and the count delegate then gets executed when all data is loaded and memory and executes the provided delegate call in memory , while passing Expression<Func<T>> as argument will make it translate the statement to proper sql if possible and then will make call to Count extension method of IQueryable<T> so you will have the correct query executing and result back.

Performing two queries in a single round trip to the database

I have the following code to perform a full-text search. It creates a query, gets the total number of rows returned by that query and then retrieves the actual rows for only the current page.
// Create IQueryable
var query = from a in ArticleServerContext.Set<Article>()
where a.Approved
orderby a.UtcDate descending
select a;
// Get total rows (needed for pagination logic)
int totalRows = query.Count()
// Get rows for current page
query = query.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage);
This works fine, but it requires two round trips to the database. In the interest of optimizing the code, is there any way to rework this query so it only had one round trip to the database?
Yes, you can perform this two operations with the help of the only one query to database:
// Create IQueryable
var query = from a in ArticleServerContext.Set<Article>()
where a.Approved
orderby a.UtcDate descending
select new { a, Total = ArticleServerContext.Set<Article>().Where(x => x.Approved).Count() };
//Get raw rows for current page with Total(Count) field
var result = query.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage).ToList();
//this data you actually will use with your logic
var actualData = result.Select(x => x.a).ToList();
// Get total rows (needed for pagination logic)
int totalRows = result.First().Total;
If you use MSSQL query wil be look that way:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[UtcDate] AS [UtcDate],
[Extent1].[Approved] AS [Approved],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Articles] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Articles] AS [Extent2]
WHERE [Extent2].[Approved] ) AS [GroupBy1]
WHERE [Extent1].[Approved]
ORDER BY [Extent1].[UtcDate] DESC
I'm not sure whether it's worth enough, but it's doable under the following constraints:
(1) CurrentPage and RowsPerPage are not affected by the totalRows value.
(2) The query is materialized after applying the paging parameters.
The trick is to use group by constant value, which is supported by EF. The code looks like this:
var query =
from a in ArticleServerContext.Set<Article>()
where a.Approved
// NOTE: order by goes below
group a by 1 into allRows
select new
{
TotalRows = allRows.Count(),
PageRows = allRows
.OrderByDescending(a => a.UtcDate)
.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage)
};
var result = query.FirstOrDefault();
var totalRows = result != null ? result.TotalRows : 0;
var pageRows = result != null ? result.PageRows : Enumerable.Empty<Article>();

Linq-to-SQL: arithmetic operation on consecutive elements

For example, I have a table:
Date |Value
----------|-----
2015/10/01|5
2015/09/01|8
2015/08/01|10
Is there any way using Linq-to-SQL to get a new sequence which will be an arithmetic operation between consecutive elements in the previously ordered set (for example, i.Value - (i-1).Value)? It must be executed on SQL Server 2008 side, not application side.
For example dataContext.GetTable<X>().OrderByDescending(d => d.Date).Something(.......).ToArray(); should return 3, 2.
Is it possible?
You can try this:
var q = (
from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = i.ItemValue - (prev == null ? 0 : prev.ItemValue) }
).ToArray();
EDIT:
If you slightly modify the above linq query to:
var q = (from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = (int?)i.ItemValue - prev.ItemValue }
).ToArray();
then you get the following TSQL query sent to the database:
SELECT ([t0].[ItemValue]) - ((SELECT [t2].[ItemValue]
FROM (SELECT TOP (1) [t1].[ItemValue]
FROM [Items] AS [t1]
WHERE [t1].[ItemDate] < [t0].[ItemDate]) AS [t2]
)) AS [Value]
FROM [Items] AS [t0]
ORDER BY [t0].[ItemDate] DESC
My guess now is if you place an index on ItemDate field this shouldn't perform too bad.
I wouldn't let SQL do this, it would create an inefficient SQL query (I think).
I could create a stored procedure, but if the amount of data is not too big I can also use Linq to objects:
List<x> items=dataContext.GetTable<X>().OrderByDescending(d => d.Date).ToList();//Bring data to memory
var res = items.Skip(1).Zip(items, (cur, prev) => cur.Value - prev.Value);
At the end, I might use a foreach for readability

Force Entity Framework to use SQL parameterization for better SQL proc cache reuse

Entity Framework always seems to use constants in generated SQL for values provided to Skip() and Take().
In the ultra-simplified example below:
int x = 10;
int y = 10;
var stuff = context.Users
.OrderBy(u => u.Id)
.Skip(x)
.Take(y)
.Select(u => u.Id)
.ToList();
x = 20;
var stuff2 = context.Users
.OrderBy(u => u.Id)
.Skip(x)
.Take(y)
.Select(u => u.Id)
.ToList();
the above code generates the following SQL queries:
SELECT TOP (10)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[Id] ASC
SELECT TOP (10)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 20
ORDER BY [Extent1].[Id] ASC
Resulting in 2 Adhoc plans added to the SQL proc cache with 1 use each.
What I'd like to accomplish is to parameterize the Skip() and Take() logic so the following SQL queries are generated:
EXEC sp_executesql N'SELECT TOP (#p__linq__0)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > #p__linq__1
ORDER BY [Extent1].[Id] ASC',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=10,#p__linq__1=10
EXEC sp_executesql N'SELECT TOP (#p__linq__0)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > #p__linq__1
ORDER BY [Extent1].[Id] ASC',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=10,#p__linq__1=20
This results in 1 Prepared plan added to the SQL proc cache with 2 uses.
I have some fairly complex queries and am experiencing significant overhead (on the SQL Server side) on the first run, and much faster execution on subsequent runs (since it can use the plan cache). Note that these more advanced queries already use sp_executesql as other values are parameterized so I'm not concerned about that aspect.
The first set of queries generated above basically means any pagination logic will create a new entry in the plan cache for each page, bloating the cache and requiring the plan generation overhead to be incurred for each page.
Can I force Entity Framework to parameterize values? I've noticed for other values e.g. in Where clauses, sometimes it parameterizes values, and sometimes it uses constants.
Am I completely out to lunch? Is there any reason why Entity Framework's existing behavior is better than the behavior I desire?
Edit:
In case it's relevant, I should mention that I'm using Entity Framework 4.2.
Edit 2:
This question is not a duplicate of Entity Framework/Linq to SQL: Skip & Take, which merely asks how to ensure that Skip and Take execute in SQL instead of on the client. This question pertains to parameterizing these values.
Update: the Skip and Take extension methods that take lambda parameters described below are part of Entity Framework from version 6 and onwards. You can take advantage of them by importing the System.Data.Entity namespace in your code.
In general LINQ to Entities translates constants as constants and variables passed to the query into parameters.
The problem is that the Queryable versions of Skip and Take accept simple integer parameters and not lambda expressions, therefore while LINQ to Entities can see the values you pass, it cannot see the fact that you used a variable to pass them (in other words, methods like Skip and Take don't have access to the method's closure).
This not only affects the parameterization in LINQ to Entities but also the learned expectation that if you pass a variable to a LINQ query the latest value of the variable is used every time you re-execute the query. E.g., something like this works for Where but not for Skip or Take:
var letter = "";
var q = from db.Beattles.Where(p => p.Name.StartsWith(letter));
letter = "p";
var beattle1 = q.First(); // Returns Paul
letter = "j";
var beattle2 = q.First(); // Returns John
Note that the same peculiarity also affects ElementAt but this one is currently not supported by LINQ to Entities.
Here is a trick that you can use to force the parameterization of Skip and Take and at the same time make them behave more like other query operators:
public static class PagingExtensions
{
private static readonly MethodInfo SkipMethodInfo =
typeof(Queryable).GetMethod("Skip");
public static IQueryable<TSource> Skip<TSource>(
this IQueryable<TSource> source,
Expression<Func<int>> countAccessor)
{
return Parameterize(SkipMethodInfo, source, countAccessor);
}
private static readonly MethodInfo TakeMethodInfo =
typeof(Queryable).GetMethod("Take");
public static IQueryable<TSource> Take<TSource>(
this IQueryable<TSource> source,
Expression<Func<int>> countAccessor)
{
return Parameterize(TakeMethodInfo, source, countAccessor);
}
private static IQueryable<TSource> Parameterize<TSource, TParameter>(
MethodInfo methodInfo,
IQueryable<TSource> source,
Expression<Func<TParameter>> parameterAccessor)
{
if (source == null)
throw new ArgumentNullException("source");
if (parameterAccessor == null)
throw new ArgumentNullException("parameterAccessor");
return source.Provider.CreateQuery<TSource>(
Expression.Call(
null,
methodInfo.MakeGenericMethod(new[] { typeof(TSource) }),
new[] { source.Expression, parameterAccessor.Body }));
}
}
The class above defines new overloads of Skip and Take that expect a lambda expression and can hence capture variables. Using the methods like this will result in the variables being translated to parameters by LINQ to Entities:
int x = 10;
int y = 10;
var query = context.Users.OrderBy(u => u.Id).Skip(() => x).Take(() => y);
var result1 = query.ToList();
x = 20;
var result2 = query.ToList();
Hope this helps.
The methods Skip and Top of ObjectQuery<T> can be parametrized. There is an example at MSDN.
I did a similar thing in a model of my own and sql server profiler showed the parts
SELECT TOP (#limit)
and
WHERE [Extent1].[row_number] > #skip
So, yes. It can be done. And I agree with others that this is a valuable observation you made here.

Categories