When I'm trying the next linq query, it is slow (1.5 s):
var rslt = (from t in context.Set<SUB_Transactions>()
where
t.UpdateDate > query.LastUpdate &&
t.TransactionID > query.Index
select new
{
TransactionID = t.TransactionID
}).OrderBy(t => t.TransactionID).Take(query.Amount).ToList();
When converted to SQL, this query is super fast (40 ms):
SELECT TOP (300)
[Project1].[TransactionID] AS [TransactionID]
FROM ( SELECT
[Extent1].[TransactionID] AS [TransactionID]
FROM [dbo].[SUB_Transactions] AS [Extent1]
WHERE ([Extent1].[UpdateDate] > #p__linq__0) AND ([Extent1].[TransactionID] > #p__linq__1)
) AS [Project1]
ORDER BY [Project1].[TransactionID] ASC
What is going on here?
Removing the Take in the first query gives a fast result as well (given the fast that there are no new transactions)
There is a composite index on TransactionID and UpdateDate.
As far as your example goes, this could make your query faster but without some sample data it's not possible to test from our end:
var rslt = (from t in context.Set<SUB_Transactions>()
where t.TransactionID > query.Index // invert order of filter
&& t.UpdateDate > query.LastUpdate
order by t.TransactionID // you can orderby here
select t.TransactionID) // remove anonymous object
.Take(query.Amount)
.AsNoTracking() // you won't be changing IDs so no need to track them
.ToList();
You might also gain some performance if you used the DbSet<SUB_Transactions> instead of calling Set<SUB_Transactions>() which needs to locate the DbSet in the DbContext
Related
Predicament
I have a large dataset that I need to perform a complex calculation upon a part of. For my calculation, I need to take a chunk of ordered data from a large set based on input parameters.
My method signature looks like this:
double Process(Entity e, DateTimeOffset? start, DateTimeOffset? end)
Potential Solutions
The two following methods spring to mind:
Method 1 - WHERE Clause
double result = 0d;
IEnumerable<Quote> items = from item in e.Items
where (!start.HasValue || item.Date >= start.Value)
&& (!end.HasValue || item.Date <= end.Value)
orderby item.Date ascending
select item;
...
return result;
Method 2 - Skip & Take
double result = 0d;
IEnumerable<Item> items = e.Items.OrderBy(i => i.Date);
if (start.HasValue)
items = items.SkipWhile(i => i.Date < start.Value);
if (end.HasValue)
items = items.TakeWhile(i => i.Date <= end.Value);
...
return result;
Question
If I were just throwing this together, I'd probably just go with Method 1, but the size of my dataset and the the size of the set of datasets are both too large to ignore slight efficiency losses, and it is of vital importance that the resulting enumerable is ordered.
Which approach will generate the more efficient query? And is there a more efficient approach that I am yet to consider?
Any solutions presented can make the safe assumption that the table is well indexed.
SkipWhile is not supported for translation to SQL. You need to throw that option away.
The best way to go about this is to create an index on the field you use to range select and then issue a query that is SARGable. where date >= start && date < end is SARGable and can make use of an index.
!start.HasValue || is not a good idea because that destroys SARGability. Build the query so that this is not needed. For example:
if(start != null) query = query.Where(...);
Make the index covering and you have optimal performance. There is no single extra row that needs to be processed.
According to link you can't use SkipWhile without materializing a query, so in 2. case you materialize all entities, then calculate the result.
In 1. scenario you can let sql handle this query and materialize only necessary records, so it's better option.
EDIT:
I wrote sample data, queries made to database:
SELECT
[Project1].[Id] AS [Id],
[Project1].[AddedDate] AS [AddedDate],
[Project1].[SendDate] AS [SendDate]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[AddedDate] AS [AddedDate],
[Extent1].[SendDate] AS [SendDate]
FROM [dbo].[Alerts] AS [Extent1]
WHERE ([Extent1].[AddedDate] >= #p__linq__0) AND ([Extent1].[AddedDate] <= #p__linq__1)
) AS [Project1]
ORDER BY [Project1].[AddedDate] ASC
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[AddedDate] AS [AddedDate],
[Extent1].[SendDate] AS [SendDate]
FROM [dbo].[Alerts] AS [Extent1]
ORDER BY [Extent1].[AddedDate] ASC
I inserted 1 000 000 records, and wrote query with expected 1 row in result. In 1 case query time was 291 ms and instant materialize. In second case query time was 1065 ms and I had to wait about 10 second to materialize result;
I am using EF6 and I would like to get the records in a table which are in a group of IDs.
In my test for example I am using 4 IDs.
I try two options, the first is with any.
dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));
And the T-SQL that this linq exrepsion generates is:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
cast(130 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(139 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
cast(140 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
cast(141 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
WHERE [UnionAll3].[C1] = [Extent1].[MiID]
)
How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.
The second option is with contains.
dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));
And the T-SQL:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))
The contains is translated into "where in", but the query is much less complex.
I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.
Thank so much.
EDIT: I have some test (I don't know if this is the best way to test this).
System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
miswContains.Start();
for (int i = 0; i < 100; i++)
{
IQueryable<MyTable> iq = dbContext.MyTable
.Where(x => myIDS.Contains(x.MyID));
iq.ToArrayAsync();
}
miswContains.Stop();
System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
miswAny.Start();
for (int i = 0; i < 20; i++)
{
IQueryable<MyTable> iq = dbContext.Mytable
.Where(x => myIDS.Any(y => y == x.MyID));
iq.ToArrayAsync();
}
miswAny.Stop();
the results are that miswAny is about 850ms and the miswContains is about 4251ms.
So the second option, with contaions, is slower.
Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.
If you want to read more about in clause performance: Is SQL IN bad for performance?.
If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):
dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);
The generated SQL for the count should look something like this:
DECLARE #p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = #p0
You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.
Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.
So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.
I have read that any it use to be faster,
In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.
For example, I have a table:
Date |Value
----------|-----
2015/10/01|5
2015/09/01|8
2015/08/01|10
Is there any way using Linq-to-SQL to get a new sequence which will be an arithmetic operation between consecutive elements in the previously ordered set (for example, i.Value - (i-1).Value)? It must be executed on SQL Server 2008 side, not application side.
For example dataContext.GetTable<X>().OrderByDescending(d => d.Date).Something(.......).ToArray(); should return 3, 2.
Is it possible?
You can try this:
var q = (
from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = i.ItemValue - (prev == null ? 0 : prev.ItemValue) }
).ToArray();
EDIT:
If you slightly modify the above linq query to:
var q = (from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = (int?)i.ItemValue - prev.ItemValue }
).ToArray();
then you get the following TSQL query sent to the database:
SELECT ([t0].[ItemValue]) - ((SELECT [t2].[ItemValue]
FROM (SELECT TOP (1) [t1].[ItemValue]
FROM [Items] AS [t1]
WHERE [t1].[ItemDate] < [t0].[ItemDate]) AS [t2]
)) AS [Value]
FROM [Items] AS [t0]
ORDER BY [t0].[ItemDate] DESC
My guess now is if you place an index on ItemDate field this shouldn't perform too bad.
I wouldn't let SQL do this, it would create an inefficient SQL query (I think).
I could create a stored procedure, but if the amount of data is not too big I can also use Linq to objects:
List<x> items=dataContext.GetTable<X>().OrderByDescending(d => d.Date).ToList();//Bring data to memory
var res = items.Skip(1).Zip(items, (cur, prev) => cur.Value - prev.Value);
At the end, I might use a foreach for readability
The following query should return about 7200 records:
using (var context = new RapEntities())
{
context.Configuration.ProxyCreationEnabled = false;
var query = from i in context.QbTxnItems.AsNoTracking()
where (i.ListType == "Invoice")
&& !context.Payments.Any(p => p.QbTxnId == i.QbTxnId && p.QbTxnId != null)
&& !context.QbTxnIgnores.Any(ti => ti.QbTxnId == i.QbTxnId)
orderby i.RefNumber
select i;
var items = RapEntities.GetList(query);
The sql generated (from sql server profiler:
SELECT
[Extent1].[QbTxnItemId] AS [QbTxnItemId],
[Extent1].[ListType] AS [ListType],
[Extent1].[QbTxnId] AS [QbTxnId],
[Extent1].[QbEditSequence] AS [QbEditSequence],
[Extent1].[TxnNumber] AS [TxnNumber],
[Extent1].[RefNumber] AS [RefNumber],
[Extent1].[TxnDate] AS [TxnDate],
[Extent1].[TxnAmt] AS [TxnAmt],
[Extent1].[IsPaid] AS [IsPaid],
[Extent1].[IsCleared] AS [IsCleared],
[Extent1].[LastGetAll] AS [LastGetAll],
[Extent1].[GetIsCleared] AS [GetIsCleared],
[Extent1].[LastModified] AS [LastModified],
[Extent1].[Version] AS [Version],
[Extent1].[RecordStatus] AS [RecordStatus],
[Extent1].[UserId] AS [UserId],
[Extent1].[TableID] AS [TableID]
FROM [dbo].[QbTxnItems] AS [Extent1]
WHERE (N'Invoice' = [Extent1].[ListType]) AND ( NOT EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Payments] AS [Extent2]
WHERE ([Extent2].[QbTxnId] = [Extent1].[QbTxnId]) AND ([Extent2].[QbTxnId] IS NOT NULL)
)) AND ( NOT EXISTS (SELECT
1 AS [C1]
FROM [dbo].[QbTxnIgnores] AS [Extent3]
WHERE [Extent3].[QbTxnId] = [Extent1].[QbTxnId]
))
ORDER BY [Extent1].[RefNumber] ASC
will not complete in any reasonable amount of time when executed from Entity Framework, but executes instantaneously from SSMS.
Using take(200) to limit the number of rows to 200, the query runs in about 50 msecs even when called from EF. Increasing the number of rows to 500 increases the time to over 5 seconds.
This seems to be inappropriate performance. EF must be capable of returning more than a few hundred rows in a reasonable amount of time. Are there any settings that can be adjusted to increase the capability for running larger queries from EF?
Looks to me like you could speed this up by actually doing some Joins here.
Try this:
var query = (from i in context.QbTxnItems.AsNoTracking()
join p in context.Payments on i.QbTxnId equals p.QbTxnId
join qi in context.QbTxnIgnores on i.QbTxnId equals qi.QbTxnId
where (i.ListType == "Invoice")
select i).OrderBy(i => i.RefNumber);
The below C# code executes in 3 seconds. I listed the SQL Profiler output as well. If I change the statement to not use Dynamic SQL it executes in milliseconds. I can't find any good resources to give a solution to this problem. But I was able to find an article that explaine that in Dynamic SQL since the parser doesn't know the value of the parameters, it cannot optimize the query plan.
public string GetIncorporation(Parcel parcel)
{
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select c).ToList();
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
MAX([Filter1].[A1]) AS [A1]
FROM ( SELECT
SUBSTRING([Extent1].[DISTRICT_CD], 0 + 1, 2) + N''00'' AS [A1]
FROM [STAGE].[DISTRICT] AS [Extent1]
WHERE ([Extent1].[PARCEL_ID] = #p__linq__0) AND ([Extent1].[DB_YEAR] = #p__linq__1) AND ([Extent1].[DISTRICT_CD] < N''9000'')
) AS [Filter1]
) AS [GroupBy1]',N'#p__linq__0 nvarchar(4000),#p__linq__1 int',#p__linq__0=N'0001-02-0003',#p__linq__1=2012
I'm trying to build a service layer. I don't want to have a mixed batch of Stored Procedures and Linq Queries
Did you paste that query in SSMS, run the execution plan, and see if it suggestions any missing indexes?
Also, if you don't need all the columns from the table, limit them by using a select:
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select c.Parcel_ID).ToList();
or
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select new { c.Parcel_ID, c.column2, c.column3}).ToList();
The LINQ looks fine, have you got the correct indexes?
In the query from SSMS you've pasted, it's not doing any limiting on DISTRICT_CD, so make sure that is actually the query that is running.
Your performance problem is in the 'CompareTo' part. This function can not be translated to regular SQL, so the Entity framework will first materialize all objects matching the first 2 conditions (fetched with pure SQL). After this (whitch takes some time as you can see), the third condition is matched in memory. Avoid the CompareTo method in your linq query, and your problems will go away.