How does skip and take works in linq - c#

i have following Linq query . its works well but the thing that seems confusing is how does skip() and take() function working in linq.
here is my query
(from GRD in _tblAcademicYears.GetQueryable()
where GRD.SchoolID == intSchoolID
select new AcademicYearsModel
{
AcademicYearID = GRD.AcademicYearID,
SchoolID = GRD.SchoolID,
AcademicYearName = GRD.AcademicYearName,
AcademicYearStart = GRD.AcademicYearStart,
AcademicYearEnd = GRD.AcademicYearEnd,
AcademicYearRemarks = GRD.AcademicYearRemarks,
IsActive = GRD.IsActive,
CreatedOn = GRD.CreatedOn,
CreatedBy = GRD.CreatedBy,
ModifiedOn = GRD.ModifiedOn,
ModifiedBy = GRD.ModifiedBy
}
).Where(z => z.AcademicYearName.Contains(param.sSearch) || z.AcademicYearStart.ToString().Contains(param.sSearch)
|| z.AcademicYearEnd.ToString().Contains(param.sSearch) || z.AcademicYearRemarks.Contains(param.sSearch))
.Skip(param.iDisplayStart).Take(param.iDisplayLength).ToList();
How this query will get record from data base .
will it get all record from database and then will apply skip() and take().
or it will just get record that are with in limits of skip() and take()

When you call .Take only, it will just translate to SQL: TOP N syntax
When you call .Skip and .Take together, it will generate at least 2 queries, by using ROWNUMBER to filter out.
So the short answer for your question is: No, it will not get all records from database. it will run a SQL to filter and select.
If you are curious, you can always use SQL profiler or just check the generated SQL in the debug mode.
Here is a simple MSDN article explains it
https://msdn.microsoft.com/library/bb386988(v=vs.100).aspx

If you asking about LINQ to SQL, you can run a sql-profiler to get query, generated by linq provider.
But I can tell you, LINQ will get only records in limits skip and take, using row_number operator in SQL:
The query will be like this (skip 3 and take 3):
SELECT TOP (3)
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
FROM (
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
row_number() OVER (ORDER BY [Extent1].[Name] ASC) AS [row_number]
FROM [dbo].[tec_Stores] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 3
ORDER BY [Extent1].[Name] ASC
In LINQ to Entities it works different, depending on collection you use.

The source code of all Linq IEnumerable extensions can be found here:
System.Linq.Enumerable
Here you can see how skip and take work

Related

Why there is no GroupBy clause in internal SQL of Entity Framework linq query?

In documentation of Entity Framework:
https://www.entityframeworktutorial.net/querying-entity-graph-in-entity-framework.aspx
in section regarding GroupBy we can read that following code:
using (var ctx = new SchoolDBEntities())
{
var students = from s in ctx.Students
group s by s.StandardId into studentsByStandard
select studentsByStandard;
foreach (var groupItem in students)
{
Console.WriteLine(groupItem.Key);
foreach (var stud in groupItem)
{
Console.WriteLine(stud.StudentId);
}
}
}
executes internally following SQL:
SELECT
[Project2].[C1] AS [C1],
[Project2].[StandardId] AS [StandardId],
[Project2].[C2] AS [C2],
[Project2].[StudentID] AS [StudentID],
[Project2].[StudentName] AS [StudentName],
[Project2].[StandardId1] AS [StandardId1]
FROM ( SELECT
[Distinct1].[StandardId] AS [StandardId],
1 AS [C1],
[Extent2].[StudentID] AS [StudentID],
[Extent2].[StudentName] AS [StudentName],
[Extent2].[StandardId] AS [StandardId1],
CASE WHEN ([Extent2].[StudentID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT DISTINCT
[Extent1].[StandardId] AS [StandardId]
FROM [dbo].[Student] AS [Extent1] ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[Student] AS [Extent2] ON ([Distinct1].[StandardId] = [Extent2]. [StandardId]) OR (([Distinct1].[StandardId] IS NULL) AND ([Extent2].[StandardId] IS NULL))
) AS [Project2]
ORDER BY [Project2].[StandardId] ASC, [Project2].[C2] ASC
go
Why there is no GroupBy clause in SQL? If there is no GroupBy clause needed, can’t we just use simple Select with OrderBy and without Joins? Can anyone explain the above query?
The bottom line is: because SQL can't return nested result sets.
Every SQL SELECT statement returns a flat list of values. LINQ is capable of returning object graphs, i.e. objects with nested objects. That's exactly what LINQ's GroupBy does.
In SQL, a GROUP BY statement only returns the grouping columns and aggregate results:
SELECT StandardId, COUNT(*)
FROM Students
GROUP BY StandardId;
The rest of the student columns is gone.
A LINQ GroupBy statement returns something like
StandardId
StudentId StudentName
1
21 "Student1"
15 "Student2"
2
48 "Student3"
91 "Student4"
17 "Student5"
Therefore, a SQL GROUP BY statement can never be the source for a LINQ GroupBy.
Entity Framework (6) knows this and it generates a SQL statement that pulls all required data from the database, adds some parts that make grouping easier, and it creates the groupings client-side.

EntityFramework Group by not included in SQL statement

I'm trying to create a query similar to this:
select randomId
from myView
where ...
group by randomId
NOTE: EF doesn't support the distinct so I was thinking of going around the lack of it with the group by (or so I think)
randomId is numeric
Entity Framework V.6.0.2
This gives me the expected result in < 1 second query
When trying to do the same with EF I have been having some issues.
If I do the LINQ similar to this:
context.myView
.Where(...)
.GroupBy(mt => mt.randomId)
.Select({ Id = group.Key, Count = group.Count() } )
I will get sort of the same result but forcing a count and making the query > 6 seconds
The SQL EF generates is something like this:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [randomId],
[GroupBy1].[A1] AS [C2]
FROM (
SELECT
[Extent1].[randomId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
GROUP BY [Extent1].[randomId]
) AS [GroupBy1]
But, if the query had the count commented out it would be back to < 1 second
If I change the Select to be like:
.Select({ Id = group.Key} )
I will get all of rows without the group by statement in the SQL query and no Distinct whatsoever:
SELECT
[Extent1].[anotherField] AS [anotherField], -- 'this field got included automatically on this query and I dont know why, it doesnt affect outcome when removed in SQL server'
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
Other failed attempts:
query.GroupBy(x => x.randomId).Select(group => group.FirstOrDefault());
The query that was generated is as follows:
SELECT
[Limit1].ALL FIELDS,...
FROM (SELECT
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...) AS [Project1]
OUTER APPLY (SELECT TOP (1)
[Extent2].ALL FIELDS,...
FROM [dbo].[myView] AS [Extent2]
WHERE (...) AS [Limit1] -- same as the where above
This query performed rather poorly and still managed to return all Ids for the where clause.
Does anyone have an idea on how to force the usage of the group by without an aggregating function like a count?
In SQL it works but then again I have the distinct keyword as well...
Cheers,
J
var query = from p in TableName
select new {Id = p.ColumnNameId};
var distinctItems = query.Distinct().ToList();
Here is the linq query however you should be able to write an equivalent from EF dbset too. If you have issues let me know.
Cheers!

How to translate this Queryable linq function

I'm struggling trying to generate this LINQ function in a correct T-SQL function.
Please check the following sentence:
// determine the max count of exams applied by students
IQueryable query = (from at in Database.Current.AnsweredTests
where at.TestId == id
group at by at.StudentId into s
select s.Count()).Max();
As you can see this function is wrong talking about syntactically, because Max extension returns int. So which I'm trying to accomplish is to generate a correct T-SQL.
Something like this:
MAX(SELECT x.COUNT()
FROM...
GROUP BY StudentId)
I just did this because I want a good performance, and that is performing a low performance. So my problem is how can I write a correct LINQ sentence with the aggregate functions like MAX and COUNT.
UPDATE:
SELECT [GroupBy1].[A1] AS [C1]
FROM ( SELECT
[Extent1].[StudentId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[AnsweredTests] AS [Extent1]
WHERE CAST( [Extent1].[TestId] AS int) = #p__linq__0
GROUP BY [Extent1].[StudentId]
) AS [GroupBy1]
This is what generate the IQueryable (if I remove the max extension, of course). I would like to know if is there a way to include the aggregate function MAX inside of that T-SQL Query to improve the performance on the Server side.
You could also word your query in the following way:
SELECT TOP 1 COUNT(*)
FROM AnsweredTests
WHERE TestId = #id
GROUP BY StudentId
ORDER BY COUNT(*) DESC
Following that logic, this (untested) should be what you are looking for:
var result = (from at in Database.Current.AnsweredTests
where at.TestId == id
group at by at.StudentId into s
orderby s.Count() descending
select s.Count()).First()
You can do ORDER BY DESCENDING and then take first:
var Max = (from at in Database.Current.AnsweredTests
where at.TestId == id
group at by at.StudentId into s
select new { Count = s.Count() }).OrderByDescending(o=>o.Count).First();

Linq-to-entities, get results + row count in one query

I've seen multiple questions about this matter, however they were 2 years (or more) old, so I'd like to know if anything changed about this.
The basic idea is to populate a gridview and create custom paging. So, I need the results and row count as well.
In SQL this would be something like:
SELECT COUNT(id), Id, Name... FROM ... WHERE ...
Getting everything in a nice simple query. However, I'd like to be consistent and use Linq2Entities.
So far I'm using the approach with two queries (against sql server), because it just works. I would like to optimize it though and use a single query instead.
I've tried this:
var query = from o in _db.Products
select o;
var prods = from o in query
select new
{
Count = query.Count(),
Products = query
};
This produces a very nasty and long query with really unnecessary cross joins and other stuff which I don't really need or want.
Is there a way to get the paged results + count of all entities in a one simple query? What is the recommended approach here?
UPDATE:
Just tried FutureQueries and either I'm doing something wrong, or it actually executes two queries. This shows my sql profiler:
-- Query #1
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID]
) AS [GroupBy1];
And next row:
-- Query #1
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[Price] AS [Price],
[Extent1].[CategoryID] AS [CategoryID]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID];
The C# code:
internal static List<Product> GetProducts(out int _count)
{
DatabaseEntities _db = new DatabaseEntities();
var query = from o in _db.Products
where o.CategoryID == 1
select o;
var count = query.FutureCount();
_count = count.Value;
return query.Future().ToList();
}
Did I miss something? According to my profiler it does exactly the same except that added row in the query (-- Query #1).
Have a look at Future Queries to do this in EntityFramework.Extended. The second example on that linked page uses FutureCount() to do exactly what you want. Adapted here:
var q = db.Products.Where(p => ...);
var qCount = q.FutureCount();
var qPage = q.Skip((pageNumber-1)*pageSize).Take(pageSize).Future();
int total = qCount.Value; // Both queries are sent to the DB here.
var tasks = qPage.ToList();
this 'EntityFramework.Extended' library is no longer supported use this one instead:
entityframework-plus and go here:
https://entityframework-plus.net/query-future to see how you can get count and records
in the same query.

Force Entity Framework to use SQL parameterization for better SQL proc cache reuse

Entity Framework always seems to use constants in generated SQL for values provided to Skip() and Take().
In the ultra-simplified example below:
int x = 10;
int y = 10;
var stuff = context.Users
.OrderBy(u => u.Id)
.Skip(x)
.Take(y)
.Select(u => u.Id)
.ToList();
x = 20;
var stuff2 = context.Users
.OrderBy(u => u.Id)
.Skip(x)
.Take(y)
.Select(u => u.Id)
.ToList();
the above code generates the following SQL queries:
SELECT TOP (10)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[Id] ASC
SELECT TOP (10)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 20
ORDER BY [Extent1].[Id] ASC
Resulting in 2 Adhoc plans added to the SQL proc cache with 1 use each.
What I'd like to accomplish is to parameterize the Skip() and Take() logic so the following SQL queries are generated:
EXEC sp_executesql N'SELECT TOP (#p__linq__0)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > #p__linq__1
ORDER BY [Extent1].[Id] ASC',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=10,#p__linq__1=10
EXEC sp_executesql N'SELECT TOP (#p__linq__0)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > #p__linq__1
ORDER BY [Extent1].[Id] ASC',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=10,#p__linq__1=20
This results in 1 Prepared plan added to the SQL proc cache with 2 uses.
I have some fairly complex queries and am experiencing significant overhead (on the SQL Server side) on the first run, and much faster execution on subsequent runs (since it can use the plan cache). Note that these more advanced queries already use sp_executesql as other values are parameterized so I'm not concerned about that aspect.
The first set of queries generated above basically means any pagination logic will create a new entry in the plan cache for each page, bloating the cache and requiring the plan generation overhead to be incurred for each page.
Can I force Entity Framework to parameterize values? I've noticed for other values e.g. in Where clauses, sometimes it parameterizes values, and sometimes it uses constants.
Am I completely out to lunch? Is there any reason why Entity Framework's existing behavior is better than the behavior I desire?
Edit:
In case it's relevant, I should mention that I'm using Entity Framework 4.2.
Edit 2:
This question is not a duplicate of Entity Framework/Linq to SQL: Skip & Take, which merely asks how to ensure that Skip and Take execute in SQL instead of on the client. This question pertains to parameterizing these values.
Update: the Skip and Take extension methods that take lambda parameters described below are part of Entity Framework from version 6 and onwards. You can take advantage of them by importing the System.Data.Entity namespace in your code.
In general LINQ to Entities translates constants as constants and variables passed to the query into parameters.
The problem is that the Queryable versions of Skip and Take accept simple integer parameters and not lambda expressions, therefore while LINQ to Entities can see the values you pass, it cannot see the fact that you used a variable to pass them (in other words, methods like Skip and Take don't have access to the method's closure).
This not only affects the parameterization in LINQ to Entities but also the learned expectation that if you pass a variable to a LINQ query the latest value of the variable is used every time you re-execute the query. E.g., something like this works for Where but not for Skip or Take:
var letter = "";
var q = from db.Beattles.Where(p => p.Name.StartsWith(letter));
letter = "p";
var beattle1 = q.First(); // Returns Paul
letter = "j";
var beattle2 = q.First(); // Returns John
Note that the same peculiarity also affects ElementAt but this one is currently not supported by LINQ to Entities.
Here is a trick that you can use to force the parameterization of Skip and Take and at the same time make them behave more like other query operators:
public static class PagingExtensions
{
private static readonly MethodInfo SkipMethodInfo =
typeof(Queryable).GetMethod("Skip");
public static IQueryable<TSource> Skip<TSource>(
this IQueryable<TSource> source,
Expression<Func<int>> countAccessor)
{
return Parameterize(SkipMethodInfo, source, countAccessor);
}
private static readonly MethodInfo TakeMethodInfo =
typeof(Queryable).GetMethod("Take");
public static IQueryable<TSource> Take<TSource>(
this IQueryable<TSource> source,
Expression<Func<int>> countAccessor)
{
return Parameterize(TakeMethodInfo, source, countAccessor);
}
private static IQueryable<TSource> Parameterize<TSource, TParameter>(
MethodInfo methodInfo,
IQueryable<TSource> source,
Expression<Func<TParameter>> parameterAccessor)
{
if (source == null)
throw new ArgumentNullException("source");
if (parameterAccessor == null)
throw new ArgumentNullException("parameterAccessor");
return source.Provider.CreateQuery<TSource>(
Expression.Call(
null,
methodInfo.MakeGenericMethod(new[] { typeof(TSource) }),
new[] { source.Expression, parameterAccessor.Body }));
}
}
The class above defines new overloads of Skip and Take that expect a lambda expression and can hence capture variables. Using the methods like this will result in the variables being translated to parameters by LINQ to Entities:
int x = 10;
int y = 10;
var query = context.Users.OrderBy(u => u.Id).Skip(() => x).Take(() => y);
var result1 = query.ToList();
x = 20;
var result2 = query.ToList();
Hope this helps.
The methods Skip and Top of ObjectQuery<T> can be parametrized. There is an example at MSDN.
I did a similar thing in a model of my own and sql server profiler showed the parts
SELECT TOP (#limit)
and
WHERE [Extent1].[row_number] > #skip
So, yes. It can be done. And I agree with others that this is a valuable observation you made here.

Categories