I have a Select statement that is currently formatted like
dbEntity
.GroupBy(x => x.date)
.Select(groupedDate => new {
Calculation1 = doCalculation1 ? x.Sum(groupedDate.Column1) : 0),
Calculation2 = doCalculation2 ? x.Count(groupedDate) : 0)
In the query doCalculation1 and doCalculation2 are bools that are set earlier. This creates a case statement in the Sql being generated, like
DECLARE #p1 int = 1
DECLARE #p2 int = 0
DECLARE #p3 int = 1
DECLARE #p4 int = 0
SELECT (Case When #p1 = 1 THEN Sum(dbEntity.Column1)
Else #p2
End) as Calculation1,
(Case When #p3 = 1 THEN Count(*)
Else #p4
End) as Calculation2
What I want to happen is for the generated sql is to be like this when doCalculation1 is true
SELECT SUM(Column1) as Calculation1, Count(*) as Calculation2
and like this when doCalculation2 is false
SELECT 0 as Calculation1, Count(*) as Calculation2
Is there any way to force a query through EF to act like this?
Edit:
bool doCalculation = true;
bool doCalculation2 = false;
dbEntity
.Where(x => x.FundType == "E")
.GroupBy(x => x.ReportDate)
.Select(dateGroup => new
{
ReportDate = dateGroup.Key,
CountInFlows = doCalculation2 ? dateGroup.Count(x => x.Flow > 0) : 0,
NetAssetEnd = doCalculation ? dateGroup.Sum(x => x.AssetsEnd) : 0
})
.ToList();
generates this sql
-- Region Parameters
DECLARE #p0 VarChar(1000) = 'E'
DECLARE #p1 Int = 0
DECLARE #p2 Decimal(5,4) = 0
DECLARE #p3 Int = 0
DECLARE #p4 Int = 1
DECLARE #p5 Decimal(1,0) = 0
-- EndRegion
SELECT [t1].[ReportDate],
(CASE
WHEN #p1 = 1 THEN (
SELECT COUNT(*)
FROM [dbEntity] AS [t2]
WHERE ([t2].[Flow] > #p2) AND ([t1].[ReportDate] = [t2].[ReportDate]) AND ([t2].[FundType] = #p0)
)
ELSE #p3
END) AS [CountInFlows],
(CASE
WHEN #p4 = 1 THEN CONVERT(Decimal(33,4),[t1].[value])
ELSE CONVERT(Decimal(33,4),#p5)
END) AS [NetAssetEnd]
FROM (
SELECT SUM([t0].[AssetsEnd]) AS [value], [t0].[ReportDate]
FROM [dbEntity] AS [t0]
WHERE [t0].[FundType] = #p0
GROUP BY [t0].[ReportDate]
) AS [t1]
which has many index scans and a spool and a join in the execution plan. It also takes about 20 seconds on average to run on the test set, with the production set going to be much larger.
I want it to run in the same speed as sql like
select reportdate, 1, sum(AssetsEnd)
from vwDailyFundFlowDetail
where fundtype = 'E'
group by reportdate
which runs in about 12 seconds on average and has the majority of the query tied up in a single index seek in the execution plan. What the actual sql output is doesnt matter, but the performance appears to be much worse with the case statements.
As for why I am doing this, I need to generate a dynamic select statements like I asked in Dynamically generate Linq Select. A user may select one or more of a set of calculations to perform and I will not know what is selected until the request comes in. The requests are expensive so we do not want to run them unless they are necessary. I am setting the doCalculation bools based on the user request.
This query is supposed to replace some code that inserts or deletes characters from a hardcoded sql query stored as a string, which is then executed. That runs fairly fast but is a nightmare to maintain
It would technically be possible to pass the Expression in your Select query through an expression tree visitor, which checks for constant values on the left-hand side of ternary operators, and replaces the ternary expression with the appropriate sub-expression.
For example:
public class Simplifier : ExpressionVisitor
{
public static Expression<T> Simplify<T>(Expression<T> expr)
{
return (Expression<T>) new Simplifier().Visit(expr);
}
protected override Expression VisitConditional(ConditionalExpression node)
{
var test = Visit(node.Test);
var ifTrue = Visit(node.IfTrue);
var ifFalse = Visit(node.IfFalse);
var testConst = test as ConstantExpression;
if(testConst != null)
{
var value = (bool) testConst.Value;
return value ? ifTrue : ifFalse;
}
return Expression.Condition(test, ifTrue, ifFalse);
}
protected override Expression VisitMember(MemberExpression node)
{
// Closed-over variables are represented as field accesses to fields on a constant object.
var field = (node.Member as FieldInfo);
var closure = (node.Expression as ConstantExpression);
if(closure != null)
{
var value = field.GetValue(closure.Value);
return VisitConstant(Expression.Constant(value));
}
return base.VisitMember(node);
}
}
Usage example:
void Main()
{
var b = true;
Expression<Func<int, object>> expr = i => b ? i.ToString() : "N/A";
Console.WriteLine(expr.ToString()); // i => IIF(value(UserQuery+<>c__DisplayClass0).b, i.ToString(), "N/A")
Console.WriteLine(Simplifier.Simplify(expr).ToString()); // i => i.ToString()
b = false;
Console.WriteLine(Simplifier.Simplify(expr).ToString()); // i => "N/A"
}
So, you could use this in your code something like this:
Expression<Func<IGrouping<DateTime, MyEntity>>, ClassYouWantToReturn> select =
groupedDate => new {
Calculation1 = doCalculation1 ? x.Sum(groupedDate.Column1) : 0),
Calculation2 = doCalculation2 ? x.Count(groupedDate) : 0
};
var q = dbEntity
.GroupBy(x => x.date)
.Select(Simplifier.Simplify(select))
However, this is probably more trouble than it's worth. SQL Server will almost undoubtedly optimize the "1 == 1" case away, and allowing Entity Framework to produce the less-pretty query shouldn't prove to be a performance problem.
Update
Looking at the updated question, this appears to be one of the few instances where producing the right query really does matter, performance-wise.
Besides my suggested solution, there are a few other choices: you could use raw sql to map to your return type, or you could use LinqKit to choose a different expression based on what you want, and then "Invoke" that expression inside your Select query.
Related
Before anyone jumps on a mark as duplicate, I have looked and everyone is doing something slightly more complicated than I am trying.
So I'm working on a database where there's a lot of data to check and LINQ's Any() extension translated to SQL isn't as fast as SQL's Count(1) > 0, so everywhere I'm writing:
var someBool = Ctx.SomeEntities.Count(x => x.RelatedEntity.Count(y => y.SomeProperty == SomeValue) > 0) > 0;
In Pseudo: Does any of my entities have a relationship with some other entity that has a property with a value of SomeValue.
This works fine and it works fast. However, it's not exactly readable (and I have lots of them, more embedded than that in cases) so what I'd like to do is replace it with:
var someBool = Ctx.SomeEntities.AnyX(x => x.RelatedEntity.AnyX(y => y.SomeProperty == SomeValue));
with:
public static bool AnyX<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) => source.Count(predicate) > 0;
So you see I'm not doing anything that LINQ can't translate to SQL, I'm not doing anything that LINQ doesn't already translate to SQL, but just by creating an additional extension I get:
LINQ to Entities does not recognize the method Boolean AnyX etc...
There must be some way of writing my extension or some way of telling LINQ just to take a look at the code and you'll see you can do it.
Not an answer to your specific question, but I suggest you rethink how you're approaching the query.
Let's use some descriptive names that make it easier to understand: do any households have a resident with the first name of "Bobby"?
// your way
Ctx.Households.Count( hh => hh.Residents.Count( r => r.FirstName == "Bobby" ) > 0 ) > 0
Yuck, it's backwards. Start with residents:
Ctx.Residents.Count( r =>
r.FirstName == "Bobby"
&& r.Household != null ) // if needed
> 0;
Now, will that generate SQL significantly different than the below?
Ctx.Residents.Any( r => r.FirstName == "Bobby" && r.Household != null)
edit:
Here's a true MCVE that results in the opposite of your conclusion:
/*
create table TestDatum
(
TestValue nchar(36) not null
)
*/
/*
set nocount on
declare #count int
declare #start datetime
declare #end datetime
set #count = 0
set #start = GETDATE()
while #count < 14000000
begin
insert TestDatum values( CONVERT(nchar(36), NEWID()) )
set #count = #count + 1
if (#count % 100000) = 0
begin
print convert(nvarchar, #count)
end
end
set #end = GETDATE()
select CONVERT(nvarchar, DATEDIFF(ms, #start, #end))
*/
/*
-- "Any" test
declare #startdt datetime, #enddt datetime
set #startdt = GETDATE()
DECLARE #p0 NVarChar(1000) = '%abcdef%'
SELECT
(CASE
WHEN EXISTS(
SELECT NULL AS [EMPTY]
FROM TestDatum AS [t0]
WHERE [t0].TestValue LIKE #p0
) THEN 1
ELSE 0
END) AS [value]
set #enddt = GETDATE()
select DATEDIFF(ms, #startdt, #enddt) -- ~7000ms
*/
/*
-- "Count" test
declare #startdt datetime, #enddt datetime
set #startdt = GETDATE()
-- Region Parameters
DECLARE #p0 NVarChar(1000) = '%abcdef%'
-- EndRegion
SELECT COUNT(*) AS [value]
FROM TestDatum AS [t0]
WHERE [t0].TestValue LIKE #p0
set #enddt = GETDATE()
select DATEDIFF(ms, #startdt, #enddt) -- > 48000ms
*/
For example, I have a table:
Date |Value
----------|-----
2015/10/01|5
2015/09/01|8
2015/08/01|10
Is there any way using Linq-to-SQL to get a new sequence which will be an arithmetic operation between consecutive elements in the previously ordered set (for example, i.Value - (i-1).Value)? It must be executed on SQL Server 2008 side, not application side.
For example dataContext.GetTable<X>().OrderByDescending(d => d.Date).Something(.......).ToArray(); should return 3, 2.
Is it possible?
You can try this:
var q = (
from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = i.ItemValue - (prev == null ? 0 : prev.ItemValue) }
).ToArray();
EDIT:
If you slightly modify the above linq query to:
var q = (from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = (int?)i.ItemValue - prev.ItemValue }
).ToArray();
then you get the following TSQL query sent to the database:
SELECT ([t0].[ItemValue]) - ((SELECT [t2].[ItemValue]
FROM (SELECT TOP (1) [t1].[ItemValue]
FROM [Items] AS [t1]
WHERE [t1].[ItemDate] < [t0].[ItemDate]) AS [t2]
)) AS [Value]
FROM [Items] AS [t0]
ORDER BY [t0].[ItemDate] DESC
My guess now is if you place an index on ItemDate field this shouldn't perform too bad.
I wouldn't let SQL do this, it would create an inefficient SQL query (I think).
I could create a stored procedure, but if the amount of data is not too big I can also use Linq to objects:
List<x> items=dataContext.GetTable<X>().OrderByDescending(d => d.Date).ToList();//Bring data to memory
var res = items.Skip(1).Zip(items, (cur, prev) => cur.Value - prev.Value);
At the end, I might use a foreach for readability
I've got a query passed to my service serialised into a set of classes. this object defines conditions in a tree like structure to support AND/ORing data to an infinite depth. I'm then using LinqToSQL to convert this class into a SQL query however my conditions (defined using PredicateBuilder) are ignored!
The PredicateBuilder seems like an obvious solution, my recursive functions build off Expression<Func<Error,bool>> instead of IQueryable<Error> to support this, I iterate over the tree recursively and append AND/OR conditions appropriately.
I call the recursive filter as follows, when debugging I can see the recursive function returning filters correctly - my issue is that these conditions are being ignored and don't surface in the output SQL (please see below) can anyone suggest why this might be?
Please let me know if any additional information is needed or if you believe this approach should work.
if ( hasConditions )
{
results.Where( RecursiveHandleFilterExpression( query.Criteria ) );
}
This is the function that appends the predicates
private Expression<Func<Error, bool>> RecursiveHandleFilterExpression( FilterExpression filterExpression )
{
// if anding, start with true Ors start with false
Expression<Func<Error, bool>> predicate;
if ( filterExpression.FilterOperator == LogicalOperator.And )
{
predicate = PredicateBuilder.True<Error>();
}
else
{
predicate = PredicateBuilder.False<Error>();
}
// apply conditions
foreach ( ConditionExpression condition in filterExpression.Conditions )
{
if ( filterExpression.FilterOperator == LogicalOperator.And )
{
predicate.And( ApplyCondition( condition ) );
}
else
{
predicate.Or( ApplyCondition( condition ) );
}
}
// apply child filters
foreach ( FilterExpression expression in filterExpression.Filters )
{
if ( filterExpression.FilterOperator == LogicalOperator.And )
{
predicate.And( RecursiveHandleFilterExpression( expression ) );
}
else
{
predicate.Or( RecursiveHandleFilterExpression( expression ) );
}
}
return predicate;
}
Generated SQL, obtained through DataContext.Log property, missing the 2 queries passed for the LoggedOn column
SELECT [t2].[ErrorId], [t2].[OrganisationId], [t2].[Severity], [t2].[Source], [t2].[ExceptionMessage], [t2].[InnerExceptionMessage], [t2].[Details], [t2].[LoggedOn]
FROM (
SELECT [t1].[ErrorId], [t1].[OrganisationId], [t1].[Severity], [t1].[Source], [t1].[ExceptionMessage], [t1].[InnerExceptionMessage], [t1].[Details], [t1].[LoggedOn], [t1].[ROW_NUMBER]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ErrorId], [t0].[OrganisationId], [t0].[Severity], [t0].[Source], [t0].[ExceptionMessage], [t0].[InnerExceptionMessage], [t0].[Details], [t0].[LoggedOn]) AS [ROW_NUMBER], [t0].[ErrorId], [t0].[OrganisationId], [t0].[Severity], [t0].[Source], [t0].[ExceptionMessage], [t0].[InnerExceptionMessage], [t0].[Details], [t0].[LoggedOn]
FROM [dbo].[Errors] AS [t0]
WHERE [t0].[OrganisationId] = #p0
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN #p1 + 1 AND #p1 + #p2
) AS [t2]
ORDER BY [t2].[ROW_NUMBER]
-- #p0: Input UniqueIdentifier (Size = -1; Prec = 0; Scale = 0) [f311d7f3-3755-e411-940e-00155d0c0c4b]
-- #p1: Input Int (Size = -1; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = -1; Prec = 0; Scale = 0) [51]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.17929
The And and Or methods don't mutate the expression. (The objects are immutable.) They return a new expression that represents the operation in question. You are ignoring that return value in your code.
I have a report that shows orders made to a determined merchant, and it was working fine until I needed to add a filter for payment status.
This is how I build the query, filter by filter:
var queryOrder = context.Orders.Select(m=>m);
if (viewModel.InitialDate.HasValue)
queryOrder = queryOrder.Where(m => m.CreatedDate.Date >= viewModel.InitialDate.Value);
(...) /* continues building the query, filter by filter */
if (viewModel.SelectedPaymentStatus != null)
queryOrder = queryOrder.Where(m => viewModel.SelectedPaymentStatus.Contains(m.Payments.Select(p => p.PaymentStatusId).Single().ToString()));
queryOrder = queryOrder.Where(m => m.MerchantId == merchantId);
When I run queryOrder, even if it's only a queryOrder.Count(), it takes over 1 minute to execute. Using SQL Server's profiling tool, I extracted the generated query as this:
SELECT [t0].[Id], [t0].[CustomerId], [t0].[MerchantId], [t0].[OrderNumber], [t0].[Amount], [t0].[SoftDescriptor], [t0].[ShippingMethod], [t0].[ShippingPrice], [t0].[IpAddress], [t0].[SellerComment], [t0].[CreatedDate]
FROM [dbo].[Order] AS [t0]
WHERE ([t0].[MerchantId] = #p0)
AND ((CONVERT(NVarChar,(
SELECT [t1].[PaymentStatusId]
FROM [dbo].[Payment] AS [t1]
WHERE [t1].[OrderId] = [t0].[Id]
))) IN (#p1, #p2, #p3, #p4, #p5, #p6, #p7, #p8))
the #p0 parameter is a Guid for merchantId, and the #p1 thru #p8 are numeral strings "1" thru "8", representing the paymentStatusId's.
If I skip the line:
if (viewModel.SelectedPaymentStatus != null)
queryOrder = queryOrder.Where(m => viewModel.SelectedPaymentStatus.Contains(m.Payments.Select(p => p.PaymentStatusId).Single().ToString()));
The query runs in under 1 second. But when I use it, the performance hits the floor. Any tips on how to solve this?
All your queries are deferred and this is both the good/bad part of linq. Try to split the queries and use some in-memory results.
Try removing the first query (doesn't make much sense really, you're returning the same collection) and amend the second query to be like this and see if it makes any difference.
var clause = context.Orders.Payments.Select(p => p.PaymentStatusId).Single().ToString();
if (viewModel.SelectedPaymentStatus != null)
var queryOrder = context.Orders.queryOrder.Where(m => viewModel.SelectedPaymentStatus.Contains(clause));
I've annotated the problem snippet:
if (viewModel.SelectedPaymentStatus != null) {
// Give me only orders from queryOrder where...
queryOrder = queryOrder.Where(
// ...my viewModel's SelectedPaymentStatus collection...
m => viewModel.SelectedPaymentStatus.Contains(
// ...contains the order's payment's PaymentStatusId...
m.Payments.Select(p => p.PaymentStatusId).Single()
// ... represented as a string?!
.ToString()
// Why are database IDs strings?
)
);
}
viewModel.SelectedPaymentStatus appears to be a collection of strings; therefore, you're asking the database to convert PaymentStatusId to an nvarchar and do string comparisons with the elements of SelectedPaymentStatus. Yuck.
Since viewModel.SelectedPaymentStatus is small, it might be better to create a temporary List<int> and use that in your query:
if (viewModel.SelectedPaymentStatus != null) {
// Let's do the conversion once, in C#
List<int> statusIds = viewModel.SelectedPaymentStatus.Select( i => Convert.ToInt32(i) ).ToList();
// Now select the matching orders
queryOrder = queryOrder.Where(
m => statusIds.Contains(
m.Payments.Select(p => p.PaymentStatusId).Single())
)
);
}
In SQL what I'm trying to accomplish is
SELECT
SUM(CASE WHEN Kendo=1 THEN 1 ELSE 0 END) as KendoCount,
SUM(CASE WHEN Icenium=1 THEN 1 ELSE 0 END) as IceniumCount
FROM
Contacts
I'd like to do this in a C# program using LINQ.
Contacts is a List where Contact has many Booleans such as Kendo and Icenium and I need to know how many are true for each of the Booleans.
At least with LINQ to SQL, the downside of the count functions is that it requires separate SQL requests for each .count method. I suspect Jessie is trying to run a single scan over the table rather than multiple scans for each predicate. Depending on the logic and number of columns you are creating, this may not perform as well. Closer to the original request, try using sum with a ternary if clause as such (from Northwind):
from e in Employees
group e by "" into g
select new {
isUs = g.Sum (x => x.Country == "USA" ? 1 : 0),
NotUs = g.Sum (x => x.Country != "USA" ? 0 : 1)
}
LINQ to SQL generates the following (YMMV with other ORM's):
SELECT SUM(
(CASE
WHEN [t1].[Country] = #p1 THEN #p2
ELSE #p3
END)) AS [isUs], SUM(
(CASE
WHEN [t1].[Country] <> #p4 THEN #p5
ELSE #p6
END)) AS [NotUs]
FROM (
SELECT #p0 AS [value], [t0].[Country]
FROM [Employees] AS [t0]
) AS [t1]
GROUP BY [t1].[value]
var KendoCount = db.Contacts.Where(x => x.Kendo).Count();
var IceniumCount = db.Contacts.Where(x => x.Icenium).Count();
I would do this as two separate queries:
int kendoCount = db.Contacts.Count(c => c.Kendo);
int iceniumCount = db.Contacts.Count(c => c.Icenium);
Given that these queries will automatically translate into optimized SQL, this will likely be similar in speed or even potentially faster than any query option, and is far simpler to understand.
Note that, if this is for Entity Framework, you'll need to write this as:
int kendoCount = db.Contacts.Where(c => c.Kendo).Count();
int iceniumCount = db.Contacts.Where(c => c.Icenium).Count();
var result = Contacts
.GroupBy(c => new
{
ID = "",
})
.Select(c => new
{
KendoCount = c.Sum(k => k.Kendo ? 1 : 0),
IceniumCount = c.Sum(k => k.Icenium ? 1: 0),
})
.ToArray()