Is there a way to prevent EF Core from doing multiple DB round trips on single enumeration function call?
Take into consideration this relatively simple LINQ expression:
var query2 = context.CheckinTablets.Select(ct => new
{
Id = ct.Id,
DeviceName = ct.Name,
Status = ct.CheckinTabletStatuses
.OrderByDescending(cts => cts.TimestampUtc).FirstOrDefault()
}).ToList();
In the past expactation was that "One enumeration call translates to one DB call" (if you disable lazy loading). In EF Core this is no longer the case!
In EF 6.2.0 this LINQ is translated to
SELECT [Extent1].[CheckinTabletID] AS [CheckinTabletID],
[Limit1].[TimestampUtc] AS [TimestampUtc]
--...
FROM [dbo].[CheckinTablet] AS [Extent1] OUTER APPLY (
SELECT TOP (1) [Project1].[CheckinTabletStatusID] AS [CheckinTabletStatusID],
[Project1].[CheckinTabletID] AS [CheckinTabletID],
[Project1].[TimestampUtc] AS [TimestampUtc]
FROM (
SELECT [Extent2].[CheckinTabletStatusID] AS [CheckinTabletStatusID],
[Extent2].[CheckinTabletID] AS [CheckinTabletID],
[Extent2].[TimestampUtc] AS [TimestampUtc]
--...
FROM [dbo].[CheckinTabletStatus] AS [Extent2]
WHERE [Extent1].[CheckinTabletID] = [Extent2].[CheckinTabletID]
) AS [Project1] ORDER BY [Project1].[TimestampUtc] DESC
) AS [Limit1];
While quite ugly, it was something that followed POLA quite nicely. Even more it was something we could work with to optimize DB side (indexes).
With EF Core 2.1.0 we get something like this:
SELECT [ct].[CheckinTabletID] AS [Id], [ct].[strName] AS [DeviceName] FROM [CheckinTablet] AS [ct]
exec sp_executesql N'SELECT TOP(1) [cts].[CheckinTabletStatusID], [cts].[CheckinTabletID], [cts].[TimestampUtc] FROM [CheckinTabletStatus] AS [cts] WHERE #_outer_Id = [cts].[CheckinTabletID] ORDER BY [cts].[TimestampUtc] DESC',N'#_outer_Id int',#_outer_Id=1
exec sp_executesql N'SELECT TOP(1) [cts].[CheckinTabletStatusID], [cts].[CheckinTabletID], [cts].[TimestampUtc] FROM [CheckinTabletStatus] AS [cts] WHERE #_outer_Id = [cts].[CheckinTabletID] ORDER BY [cts].[TimestampUtc] DESC',N'#_outer_Id int',#_outer_Id=2
exec sp_executesql N'SELECT TOP(1) [cts].[CheckinTabletStatusID], [cts].[CheckinTabletID], [cts].[TimestampUtc] FROM [CheckinTabletStatus] AS [cts] WHERE #_outer_Id = [cts].[CheckinTabletID] ORDER BY [cts].[TimestampUtc] DESC',N'#_outer_Id int',#_outer_Id=3
exec sp_executesql N'SELECT TOP(1) [cts].[CheckinTabletStatusID], [cts].[CheckinTabletID], [cts].[TimestampUtc] FROM [CheckinTabletStatus] AS [cts] WHERE #_outer_Id = [cts].[CheckinTabletID] ORDER BY [cts].[TimestampUtc] DESC',N'#_outer_Id int',#_outer_Id=4
exec sp_executesql N'SELECT TOP(1) [cts].[CheckinTabletStatusID], [cts].[CheckinTabletID], [cts].[TimestampUtc] FROM [CheckinTabletStatus] AS [cts] WHERE #_outer_Id = [cts].[CheckinTabletID] ORDER BY [cts].[TimestampUtc] DESC',N'#_outer_Id int',#_outer_Id=5
Yes, that is one call to first get all entities (CheckinTablets) and then call per row to get status for each entity...
So in one call ToList() Entity Framework is making n+1 calls to database. This is extremely undesirable, is there a way to disable this behaviour or workaround?
Edit 1:
.Include() is not helping the issue... It still makes n+1 DB requests.
Edit 2 (credit #jmdon):
Not returning object but simple value make only one call! Of course this doesn’t really help if you don't want to flatten your entity, or if you want multiple values from second table. Never the less good to know!
var query2 = _context.CheckinTablets.Select(ct => new
{
Id = ct.Id,
DeviceName = ct.Name,
Status = new CheckinTabletStatus
{
Id = ct.CheckinTabletStatuses.OrderByDescending(cts => cts.TimestampUtc).FirstOrDefault().Id,
CheckinTabletId = ct.CheckinTabletStatuses.OrderByDescending(cts => cts.TimestampUtc).FirstOrDefault().CheckinTabletId,
}
}).ToList();
Produces one call to DB:
SELECT [ct].[intCheckinTabletID] AS [Id0],
[ct].[strName] AS [DeviceName],
(
SELECT TOP (1) [cts].[intCheckinTabletStatusID]
FROM [tCheckinTabletStatus] AS [cts]
WHERE [ct].[intCheckinTabletID] = [cts].[intCheckinTabletID]
ORDER BY [cts].[dtmTimestampUtc] DESC
) AS [Id],
(
SELECT TOP (1) [cts0].[intCheckinTabletID]
FROM [tCheckinTabletStatus] AS [cts0]
WHERE [ct].[intCheckinTabletID] = [cts0].[intCheckinTabletID]
ORDER BY [cts0].[dtmTimestampUtc] DESC
) AS [CheckinTabletId]
FROM [tCheckinTablet] AS [ct];
I asked this questions during .Net Conf 2018 to Diego Vega and Smit Patel... This was their answer (paraphrased).
EF Core is not only for relational DB... Customers did not want to see Exception if something cannot be translated to SQL... "If it needs more then one query, that is fine"... By default multiple queries per enumeration are enabled. There is a warning system that will output a warning if this happens. They are thinking about adding a method that will upgrade warning to exception if multiple round-trips are executed. They are working on optimizing (n+1) queries to a few (fixed size) queries based on data structure.
It is possible to force EF Core to throw exception when it evaluates part of the query client side by adding this to OnConfiguring method.
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder
.UseSqlServer(#"Server=(localdb)\mssqllocaldb;Database=EFQuerying;Trusted_Connection=True;")
.ConfigureWarnings(warnings => warnings.Throw(RelationalEventId.QueryClientEvaluationWarning));
}
More info: https://learn.microsoft.com/en-us/ef/core/querying/client-eval
I've noticed it does that when you try to return nested objects.
You can try flattening the Status object in your projection, eg. something like:
var query2 = context.CheckinTablets.Select(ct => new
{
Id = ct.Id,
DeviceName = ct.Name,
StatusName = ct.CheckinTabletStatuses
.OrderByDescending(cts => cts.TimestampUtc).FirstOrDefault().Name
}).ToList();
Related
I'm trying to use the SQL operator CONTAINSTABLE to get a list of search results, like this:
SELECT c.*, ccontains.[RANK]
FROM Customers c
INNER JOIN CONTAINSTABLE(Customers, LastName, #searchTerm) ccontains ON c.Id = ccontains.[KEY]
And calling this function from EF Core 2.1:
var query = DbContext.Customers.FromSql("SELECT * FROM udfSearchCustomers(#searchTerm)",
new SqlParameter(#searchTerm, mySearchTerm));
query = query.Include(c => c.Addresses).Take(maxResults);
I want to order my search results descending by RANK, to get the most relevant results at the top. Adding an ORDER BY ccontains.[RANK] to my function is not allowed, as my SELECT * FROM udfSearchCustomers(...) will be wrapped by EF Core: ORDER BY is not allowed on an inner query. Adding query.OrderBy(c => c.Rank) is not possible, as RANK is not on the Customer entity.
I've tried using System.Linq.Dynamic, as well as other reflection solutions, to do this:
query = query.OrderBy("Rank");
But I got an exception:
"Rank" is not a member of type "Customer"
which is true. Is there any way to order on a column not on an entity, or will I need to create a MyCustomerSearchQuery query object and use AutoMapper to convert those to Customer? I'd rather not, as Customer has many properties and keeping those in sync will be a hassle.
Thanks in advance!
you can try with
query = query.OrderBy(x => x.Rank);
OR
query = query.OrderBy(x => x["Rank"]);
You can create the stored procedure of the query which takes two parameter : #searchKey, #orderByColumn.
CREATE PROCEDURE [dbo].[UdfSearchCustomers]
#searchTerm varchar(50),
#orderByColumn varchar(50)
AS
BEGIN
DECLARE #sql NVARCHAR(MAX);
SET #sql =' SELECT c.*, ccontains.[RANK]
FROM Customers c
INNER JOIN CONTAINSTABLE(Customers, LastName, ''#searchTerm'') ccontains
ON c.Id = ccontains.[KEY]
ORDER BY #orderByColumn'
SET #sql = REPLACE(#sql, '#orderByColumn', #orderByColumn)
SET #sql = REPLACE(#sql, '#searchTerm', #searchTerm)
exec sp_executesql #sql
END
GO
Then you can query the same stored procedure as:
var query = DbContext.Customers.FromSql("exec UdfSearchCustomers #p0, #p1", mySearchTerm, "Rank");
If you want to add join to the address table then you can add the join to the stored procedure. This may give you your desired result.
I've been fooling around with some LINQ over Entities and I'm getting strange results and I would like to get an explanation...
Given the following LINQ query,
// Sample # 1
IEnumerable<GroupInformation> groupingInfo;
groupingInfo = from a in context.AccountingTransaction
group a by a.Type into grp
select new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
};
I get the following SQL query (taken from SQL Profiler):
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [Type],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[Type] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[AccountingTransaction] AS [Extent1]
GROUP BY [Extent1].[Type]
) AS [GroupBy1]
So far so good.
If I change my LINQ query to:
// Sample # 2
groupingInfo = context.AccountingTransaction.
GroupBy(a => a.Type).
Select(grp => new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
});
it yields to the exact same SQL query. Makes sense to me.
Here comes the interesting part... If I change my LINQ query to:
// Sample # 3
IEnumerable<AccountingTransaction> accounts;
IEnumerable<IGrouping<object, AccountingTransaction>> groups;
IEnumerable<GroupInformation> groupingInfo;
accounts = context.AccountingTransaction;
groups = accounts.GroupBy(a => a.Type);
groupingInfo = groups.Select(grp => new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
});
the following SQL is executed (I stripped a few of the fields from the actual query, but all the fields from the table (~ 15 fields) were included in the query, twice):
SELECT
[Project2].[C1] AS [C1],
[Project2].[Type] AS [Type],
[Project2].[C2] AS [C2],
[Project2].[Id] AS [Id],
[Project2].[TimeStamp] AS [TimeStamp],
-- <snip>
FROM ( SELECT
[Distinct1].[Type] AS [Type],
1 AS [C1],
[Extent2].[Id] AS [Id],
[Extent2].[TimeStamp] AS [TimeStamp],
-- <snip>
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT DISTINCT
[Extent1].[Type] AS [Type]
FROM [dbo].[AccountingTransaction] AS [Extent1] ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[AccountingTransaction] AS [Extent2] ON [Distinct1].[Type] = [Extent2].[Type]
) AS [Project2]
ORDER BY [Project2].[Type] ASC, [Project2].[C2] ASC
Why are the SQLs generated are so different? After all, the exact same code is executed, it's just that sample # 3 is using intermediate variables to get the same job done!
Also, if I do:
Console.WriteLine(groupingInfo.ToString());
for sample # 1 and sample # 2, I get the exact same query that was captured by SQL Profiler, but for sample # 3, I get:
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Linq.IGrouping`2[System.Object,TestLinq.AccountingTransaction],TestLinq.GroupInformation]
What is the difference? Why can't I get the SQL Query generated by LINQ if I split the LINQ query in multiple instructions?
The ulitmate goal is to be able to add operators to the query (Where, OrderBy, etc.) at run-time.
BTW, I've seen this behavior in EF 4.0 and EF 6.0.
Thank you for your help.
The reason is because in your third attempt you're referring to accounts as IEnumerable<AccountingTransaction> which will cause the query to be invoked using Linq-To-Objects (Enumerable.GroupBy and Enumerable.Select)
On the other hand, in your first and second attempts the reference to AccountingTransaction is preserved as IQueryable<AccountingTransaction> and the query will be executed using Linq-To-Entities which will then transform it to the appropriate SQL statement.
I'm trying to create a query similar to this:
select randomId
from myView
where ...
group by randomId
NOTE: EF doesn't support the distinct so I was thinking of going around the lack of it with the group by (or so I think)
randomId is numeric
Entity Framework V.6.0.2
This gives me the expected result in < 1 second query
When trying to do the same with EF I have been having some issues.
If I do the LINQ similar to this:
context.myView
.Where(...)
.GroupBy(mt => mt.randomId)
.Select({ Id = group.Key, Count = group.Count() } )
I will get sort of the same result but forcing a count and making the query > 6 seconds
The SQL EF generates is something like this:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [randomId],
[GroupBy1].[A1] AS [C2]
FROM (
SELECT
[Extent1].[randomId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
GROUP BY [Extent1].[randomId]
) AS [GroupBy1]
But, if the query had the count commented out it would be back to < 1 second
If I change the Select to be like:
.Select({ Id = group.Key} )
I will get all of rows without the group by statement in the SQL query and no Distinct whatsoever:
SELECT
[Extent1].[anotherField] AS [anotherField], -- 'this field got included automatically on this query and I dont know why, it doesnt affect outcome when removed in SQL server'
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
Other failed attempts:
query.GroupBy(x => x.randomId).Select(group => group.FirstOrDefault());
The query that was generated is as follows:
SELECT
[Limit1].ALL FIELDS,...
FROM (SELECT
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...) AS [Project1]
OUTER APPLY (SELECT TOP (1)
[Extent2].ALL FIELDS,...
FROM [dbo].[myView] AS [Extent2]
WHERE (...) AS [Limit1] -- same as the where above
This query performed rather poorly and still managed to return all Ids for the where clause.
Does anyone have an idea on how to force the usage of the group by without an aggregating function like a count?
In SQL it works but then again I have the distinct keyword as well...
Cheers,
J
var query = from p in TableName
select new {Id = p.ColumnNameId};
var distinctItems = query.Distinct().ToList();
Here is the linq query however you should be able to write an equivalent from EF dbset too. If you have issues let me know.
Cheers!
I've seen multiple questions about this matter, however they were 2 years (or more) old, so I'd like to know if anything changed about this.
The basic idea is to populate a gridview and create custom paging. So, I need the results and row count as well.
In SQL this would be something like:
SELECT COUNT(id), Id, Name... FROM ... WHERE ...
Getting everything in a nice simple query. However, I'd like to be consistent and use Linq2Entities.
So far I'm using the approach with two queries (against sql server), because it just works. I would like to optimize it though and use a single query instead.
I've tried this:
var query = from o in _db.Products
select o;
var prods = from o in query
select new
{
Count = query.Count(),
Products = query
};
This produces a very nasty and long query with really unnecessary cross joins and other stuff which I don't really need or want.
Is there a way to get the paged results + count of all entities in a one simple query? What is the recommended approach here?
UPDATE:
Just tried FutureQueries and either I'm doing something wrong, or it actually executes two queries. This shows my sql profiler:
-- Query #1
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID]
) AS [GroupBy1];
And next row:
-- Query #1
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[Price] AS [Price],
[Extent1].[CategoryID] AS [CategoryID]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID];
The C# code:
internal static List<Product> GetProducts(out int _count)
{
DatabaseEntities _db = new DatabaseEntities();
var query = from o in _db.Products
where o.CategoryID == 1
select o;
var count = query.FutureCount();
_count = count.Value;
return query.Future().ToList();
}
Did I miss something? According to my profiler it does exactly the same except that added row in the query (-- Query #1).
Have a look at Future Queries to do this in EntityFramework.Extended. The second example on that linked page uses FutureCount() to do exactly what you want. Adapted here:
var q = db.Products.Where(p => ...);
var qCount = q.FutureCount();
var qPage = q.Skip((pageNumber-1)*pageSize).Take(pageSize).Future();
int total = qCount.Value; // Both queries are sent to the DB here.
var tasks = qPage.ToList();
this 'EntityFramework.Extended' library is no longer supported use this one instead:
entityframework-plus and go here:
https://entityframework-plus.net/query-future to see how you can get count and records
in the same query.
Entity Framework always seems to use constants in generated SQL for values provided to Skip() and Take().
In the ultra-simplified example below:
int x = 10;
int y = 10;
var stuff = context.Users
.OrderBy(u => u.Id)
.Skip(x)
.Take(y)
.Select(u => u.Id)
.ToList();
x = 20;
var stuff2 = context.Users
.OrderBy(u => u.Id)
.Skip(x)
.Take(y)
.Select(u => u.Id)
.ToList();
the above code generates the following SQL queries:
SELECT TOP (10)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[Id] ASC
SELECT TOP (10)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 20
ORDER BY [Extent1].[Id] ASC
Resulting in 2 Adhoc plans added to the SQL proc cache with 1 use each.
What I'd like to accomplish is to parameterize the Skip() and Take() logic so the following SQL queries are generated:
EXEC sp_executesql N'SELECT TOP (#p__linq__0)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > #p__linq__1
ORDER BY [Extent1].[Id] ASC',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=10,#p__linq__1=10
EXEC sp_executesql N'SELECT TOP (#p__linq__0)
[Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id] AS [Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[User] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > #p__linq__1
ORDER BY [Extent1].[Id] ASC',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=10,#p__linq__1=20
This results in 1 Prepared plan added to the SQL proc cache with 2 uses.
I have some fairly complex queries and am experiencing significant overhead (on the SQL Server side) on the first run, and much faster execution on subsequent runs (since it can use the plan cache). Note that these more advanced queries already use sp_executesql as other values are parameterized so I'm not concerned about that aspect.
The first set of queries generated above basically means any pagination logic will create a new entry in the plan cache for each page, bloating the cache and requiring the plan generation overhead to be incurred for each page.
Can I force Entity Framework to parameterize values? I've noticed for other values e.g. in Where clauses, sometimes it parameterizes values, and sometimes it uses constants.
Am I completely out to lunch? Is there any reason why Entity Framework's existing behavior is better than the behavior I desire?
Edit:
In case it's relevant, I should mention that I'm using Entity Framework 4.2.
Edit 2:
This question is not a duplicate of Entity Framework/Linq to SQL: Skip & Take, which merely asks how to ensure that Skip and Take execute in SQL instead of on the client. This question pertains to parameterizing these values.
Update: the Skip and Take extension methods that take lambda parameters described below are part of Entity Framework from version 6 and onwards. You can take advantage of them by importing the System.Data.Entity namespace in your code.
In general LINQ to Entities translates constants as constants and variables passed to the query into parameters.
The problem is that the Queryable versions of Skip and Take accept simple integer parameters and not lambda expressions, therefore while LINQ to Entities can see the values you pass, it cannot see the fact that you used a variable to pass them (in other words, methods like Skip and Take don't have access to the method's closure).
This not only affects the parameterization in LINQ to Entities but also the learned expectation that if you pass a variable to a LINQ query the latest value of the variable is used every time you re-execute the query. E.g., something like this works for Where but not for Skip or Take:
var letter = "";
var q = from db.Beattles.Where(p => p.Name.StartsWith(letter));
letter = "p";
var beattle1 = q.First(); // Returns Paul
letter = "j";
var beattle2 = q.First(); // Returns John
Note that the same peculiarity also affects ElementAt but this one is currently not supported by LINQ to Entities.
Here is a trick that you can use to force the parameterization of Skip and Take and at the same time make them behave more like other query operators:
public static class PagingExtensions
{
private static readonly MethodInfo SkipMethodInfo =
typeof(Queryable).GetMethod("Skip");
public static IQueryable<TSource> Skip<TSource>(
this IQueryable<TSource> source,
Expression<Func<int>> countAccessor)
{
return Parameterize(SkipMethodInfo, source, countAccessor);
}
private static readonly MethodInfo TakeMethodInfo =
typeof(Queryable).GetMethod("Take");
public static IQueryable<TSource> Take<TSource>(
this IQueryable<TSource> source,
Expression<Func<int>> countAccessor)
{
return Parameterize(TakeMethodInfo, source, countAccessor);
}
private static IQueryable<TSource> Parameterize<TSource, TParameter>(
MethodInfo methodInfo,
IQueryable<TSource> source,
Expression<Func<TParameter>> parameterAccessor)
{
if (source == null)
throw new ArgumentNullException("source");
if (parameterAccessor == null)
throw new ArgumentNullException("parameterAccessor");
return source.Provider.CreateQuery<TSource>(
Expression.Call(
null,
methodInfo.MakeGenericMethod(new[] { typeof(TSource) }),
new[] { source.Expression, parameterAccessor.Body }));
}
}
The class above defines new overloads of Skip and Take that expect a lambda expression and can hence capture variables. Using the methods like this will result in the variables being translated to parameters by LINQ to Entities:
int x = 10;
int y = 10;
var query = context.Users.OrderBy(u => u.Id).Skip(() => x).Take(() => y);
var result1 = query.ToList();
x = 20;
var result2 = query.ToList();
Hope this helps.
The methods Skip and Top of ObjectQuery<T> can be parametrized. There is an example at MSDN.
I did a similar thing in a model of my own and sql server profiler showed the parts
SELECT TOP (#limit)
and
WHERE [Extent1].[row_number] > #skip
So, yes. It can be done. And I agree with others that this is a valuable observation you made here.