The below C# code executes in 3 seconds. I listed the SQL Profiler output as well. If I change the statement to not use Dynamic SQL it executes in milliseconds. I can't find any good resources to give a solution to this problem. But I was able to find an article that explaine that in Dynamic SQL since the parser doesn't know the value of the parameters, it cannot optimize the query plan.
public string GetIncorporation(Parcel parcel)
{
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select c).ToList();
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
MAX([Filter1].[A1]) AS [A1]
FROM ( SELECT
SUBSTRING([Extent1].[DISTRICT_CD], 0 + 1, 2) + N''00'' AS [A1]
FROM [STAGE].[DISTRICT] AS [Extent1]
WHERE ([Extent1].[PARCEL_ID] = #p__linq__0) AND ([Extent1].[DB_YEAR] = #p__linq__1) AND ([Extent1].[DISTRICT_CD] < N''9000'')
) AS [Filter1]
) AS [GroupBy1]',N'#p__linq__0 nvarchar(4000),#p__linq__1 int',#p__linq__0=N'0001-02-0003',#p__linq__1=2012
I'm trying to build a service layer. I don't want to have a mixed batch of Stored Procedures and Linq Queries
Did you paste that query in SSMS, run the execution plan, and see if it suggestions any missing indexes?
Also, if you don't need all the columns from the table, limit them by using a select:
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select c.Parcel_ID).ToList();
or
var result = (from c in _context.Districts
where c.PARCEL_ID == parcel.PARCEL_ID && c.DB_YEAR == parcel.DB_YEAR && c.DISTRICT_CD.CompareTo("9000") < 0
select new { c.Parcel_ID, c.column2, c.column3}).ToList();
The LINQ looks fine, have you got the correct indexes?
In the query from SSMS you've pasted, it's not doing any limiting on DISTRICT_CD, so make sure that is actually the query that is running.
Your performance problem is in the 'CompareTo' part. This function can not be translated to regular SQL, so the Entity framework will first materialize all objects matching the first 2 conditions (fetched with pure SQL). After this (whitch takes some time as you can see), the third condition is matched in memory. Avoid the CompareTo method in your linq query, and your problems will go away.
Related
When I'm trying the next linq query, it is slow (1.5 s):
var rslt = (from t in context.Set<SUB_Transactions>()
where
t.UpdateDate > query.LastUpdate &&
t.TransactionID > query.Index
select new
{
TransactionID = t.TransactionID
}).OrderBy(t => t.TransactionID).Take(query.Amount).ToList();
When converted to SQL, this query is super fast (40 ms):
SELECT TOP (300)
[Project1].[TransactionID] AS [TransactionID]
FROM ( SELECT
[Extent1].[TransactionID] AS [TransactionID]
FROM [dbo].[SUB_Transactions] AS [Extent1]
WHERE ([Extent1].[UpdateDate] > #p__linq__0) AND ([Extent1].[TransactionID] > #p__linq__1)
) AS [Project1]
ORDER BY [Project1].[TransactionID] ASC
What is going on here?
Removing the Take in the first query gives a fast result as well (given the fast that there are no new transactions)
There is a composite index on TransactionID and UpdateDate.
As far as your example goes, this could make your query faster but without some sample data it's not possible to test from our end:
var rslt = (from t in context.Set<SUB_Transactions>()
where t.TransactionID > query.Index // invert order of filter
&& t.UpdateDate > query.LastUpdate
order by t.TransactionID // you can orderby here
select t.TransactionID) // remove anonymous object
.Take(query.Amount)
.AsNoTracking() // you won't be changing IDs so no need to track them
.ToList();
You might also gain some performance if you used the DbSet<SUB_Transactions> instead of calling Set<SUB_Transactions>() which needs to locate the DbSet in the DbContext
I am using EF6 and I would like to get the records in a table which are in a group of IDs.
In my test for example I am using 4 IDs.
I try two options, the first is with any.
dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));
And the T-SQL that this linq exrepsion generates is:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
cast(130 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(139 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
cast(140 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
cast(141 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
WHERE [UnionAll3].[C1] = [Extent1].[MiID]
)
How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.
The second option is with contains.
dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));
And the T-SQL:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))
The contains is translated into "where in", but the query is much less complex.
I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.
Thank so much.
EDIT: I have some test (I don't know if this is the best way to test this).
System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
miswContains.Start();
for (int i = 0; i < 100; i++)
{
IQueryable<MyTable> iq = dbContext.MyTable
.Where(x => myIDS.Contains(x.MyID));
iq.ToArrayAsync();
}
miswContains.Stop();
System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
miswAny.Start();
for (int i = 0; i < 20; i++)
{
IQueryable<MyTable> iq = dbContext.Mytable
.Where(x => myIDS.Any(y => y == x.MyID));
iq.ToArrayAsync();
}
miswAny.Stop();
the results are that miswAny is about 850ms and the miswContains is about 4251ms.
So the second option, with contaions, is slower.
Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.
If you want to read more about in clause performance: Is SQL IN bad for performance?.
If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):
dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);
The generated SQL for the count should look something like this:
DECLARE #p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = #p0
You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.
Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.
So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.
I have read that any it use to be faster,
In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.
I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.
Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.
The Any() linq function seems to load all of the entity's columns even though they're not needed.
The following code:
if(Session.Query<Project>().Any(p=>p.ID == projectID && p.ProjectOwner.Responsible == CurrentUserID))
// Current user is the responsible for this project
Generates the following SQL:
select TOP (1) project0_.ProjectID as ProjectID7_,
project0_.DateCreated as DateCrea2_7_,
project0_.Type as Type7_,
project0_.ProjectOwner_FK as ProjectOy8_7_,
project0_.Address_FK as Address9_7_,
**[Snip -- the statement selects all of Project's columns]**
from [Project] project0_
inner join [OrganizationProject] organizati1_
on project0_.ProjectOwner_FK = organizati1_.OrganizationProjectID
where project0_.ProjectID = 1 /* #p0 */
and organizati1_.Responsible_FK = 1 /* #p1 */
However, the following code:
if(Context.Projects.Where(p=>p.ID == projectID && p.ProjectOwner.Responsible == CurrentUserID).Count() == 1)
// Current user is the responsible for this project
Generates the following sql, which is what is expected:
select cast(count(*) as INT) as col_0_0_
from [Project] project0_
inner join [OrganizationProject] organizati1_
on project0_.ProjectOwner_FK = organizati1_.OrganizationProjectID
where project0_.ProjectID = 1 /* #p0 */
and organizati1_.Responsible_FK = 1 /* #p1 */
The Count() method does what is expected, but it is a bit less straightforward.
Is the Any() behavior considered normal or is it a bug? It doesn't seem optimal to me, but maybe loading the entity isn't really slower than asking SQL to return the count?
In light of this, what is considered to be the best way to test a condition in Linq to NHibernate?
It was interesting but did you tryed:
if(Session.Query<Project>()
.FirstOrDefault(p=>p.ID == projectID
&& p.ProjectOwner.Responsible == CurrentUserID) != null)
I think it would be faster from your current options. In fact it doesn't checks for all items (like count) Also it doesn't fetch all data.
Something like this should create a query with only a single column:
if(Session
.Query<Project>()
.Where(p=>p.ID == projectID && p.ProjectOwner.Responsible == CurrentUserID)
.Select(p=>new { pID = p.ID })
.Any())
{
...
}
Projection to an anonymous type allows NHibernate to fetch only the specified column (ID, for example).
But it is usually more suitable when there is a significant number of rows to retrieve.
Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.
Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);
Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1
Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));
Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.
There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!
This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.