I am using nHibernate for our database access. I need to do a complicated query to find all member journal entries after a certain date with certain value, PreviousId, set for each member. I can easily write the SQL for it:
SELECT J.MemberId, J.PreviousId
FROM tblMemMemberStatusJournal J
INNER JOIN (
SELECT MemberId,
MIN(EffectiveDate) AS EffectiveDate
FROM tblMemMemberStatusJournal
WHERE EffectiveDate > #StartOfMonth
AND (PreviousId is NOT null)
GROUP BY MemberId
) AS X ON (X.EffectiveDate = J.EffectiveDate AND X.MemberId = J.MemberId)
However I am having a lot of trouble trying to get nHibernate to generate this information. There is not a lot of (any) documentation for how to use QueryOver.
I have been seeing information in other places, but none of it is very clear and very little has an actual explanation as to why things are done in certain ways. The answer for Selecting on Sub Queries in NHibernate with Critieria API did not give an adequate example as to what it is doing, so I haven't been able to replicate it.
I've gotten the inner part of the query created with this:
IList<object[]> result = session.QueryOver<MemberStatusJournal>()
.SelectList(list => list
.SelectGroup(a => a.Member.ID)
.SelectMin(a => a.EffectiveDate))
.Where(j => (j.EffectiveDate > firstOfMonth) && (j.PreviousId != null))
.List<object[]>();
Which, according to the profiler, makes this SQL:
SELECT this_.MemberId as y0_,
min(this_.EffectiveDate) as y1_
FROM tblMemMemberStatusJournal this_
WHERE (this_.EffectiveDate > '2014-08-01T00:00:00' /* #p0 */
and not (this_.PreviousLocalId is null))
GROUP BY this_.MemberId
But I am not finding a good example of how to actually do join this subset with a parent query. Does anyone have any suggestions?
You aren't actually joining on a subset, you're filtering on a subset. Knowing this, you have the option of filtering via other means, in this case, a correlated subquery.
The solution below first creates a detatched query to act as the inner subquery. We can correlate properties of the inner query with properties of the outer query through the use of an alias.
MemberStatusJournal memberStatusJournalAlias = null; // This will represent the
// object of the outer query
var subQuery = QueryOver.Of<MemberStatusJournal>()
.Select(Projections.GroupProperty(Projections.Property<MemberStatusJournal>(m => m.Member.ID)))
.Where(j => (j.EffectiveDate > firstOfMonth) && (j.PreviousId != null))
.Where(Restrictions.EqProperty(
Projections.Min<MemberStatusJournal>(j => j.EffectiveDate),
Projections.Property(() => memberStatusJournalAlias.EffectiveDate)
)
)
.Where(Restrictions.EqProperty(
Projections.GroupProperty(Projections.Property<MemberStatusJournal>(m => m.Member.Id)),
Projections.Property(() => memberStatusJournalAlias.Member.Id)
));
var results = session.QueryOver<MemberStatusJournal>(() => memberStatusJournalAlias)
.WithSubquery
.WhereExists(subQuery)
.List();
This would produce an SQL query like the following:
SELECT blah
FROM tblMemMemberStatusJournal J
WHERE EXISTS (
SELECT J2.MemberId
FROM tblMemberStatusJournal J2
WHERE J2.EffectiveDate > #StartOfMonth
AND (J2.PreviousId is NOT null)
GROUP BY J2.MemberId
HAVING MIN(J2.EffectiveDate) = J.EffectiveDate
AND J2.MemberId = J.MemberId
)
This looks less efficient than the inner join query you opened the question with. But my experience is that the SQL Query Optimizer is clever enough to convert this into an inner join. If you want to confirm this, you can use SQL Studio to generate and compare the execution plans of both queries.
Related
I have a database with the following schema:
Now, I'm trying to pull all landingpages for a domain and sort those by the first UrlFilter's FilterType that matches a certain group. This is the LINQ I've come up with so far:
var baseQuery = DbSet.AsNoTracking()
.Where(e => EF.Functions.Contains(EF.Property<string>(e, "Url"), $"\"{searchTerm}*\""))
.Where(e => e.DomainLandingPages.Select(lp => lp.DomainId).Contains(domainId));
var count = baseQuery.Count();
var page = baseQuery
.Select(e => new
{
LandingPage = e,
UrlFilter = e.LandingPageUrlFilters.FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
})
.Select(e => new
{
e.LandingPage,
FilterType = e.UrlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
})
.OrderBy(e => e.FilterType)
.Skip(10).Take(75).ToList();
Now, while this technically works, it's quite slow with execution times ranging from 10-30 seconds, which is not good enough for the use case. The LINQ is translated to the following SQL:
SELECT [l1].[Id], [l1].[LastUpdated], [l1].[Url], CASE
WHEN (
SELECT TOP(1) [l].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l]
INNER JOIN [UrlFilters] AS [u] ON [l].[UrlFilterId] = [u].[Id]
WHERE ([l1].[Id] = [l].[LandingPageId]) AND ([u].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u0].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l0]
INNER JOIN [UrlFilters] AS [u0] ON [l0].[UrlFilterId] = [u0].[Id]
WHERE ([l1].[Id] = [l0].[LandingPageId]) AND ([u0].[GroupId] = #__groupId_3))
END AS [FilterType]
FROM [LandingPages] AS [l1]
WHERE CONTAINS([l1].[Url], #__Format_1) AND #__domainId_2 IN (
SELECT [d].[DomainId]
FROM [DomainLandingPages] AS [d]
WHERE [l1].[Id] = [d].[LandingPageId]
)
ORDER BY CASE
WHEN (
SELECT TOP(1) [l2].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l2]
INNER JOIN [UrlFilters] AS [u1] ON [l2].[UrlFilterId] = [u1].[Id]
WHERE ([l1].[Id] = [l2].[LandingPageId]) AND ([u1].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u2].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l3]
INNER JOIN [UrlFilters] AS [u2] ON [l3].[UrlFilterId] = [u2].[Id]
WHERE ([l1].[Id] = [l3].[LandingPageId]) AND ([u2].[GroupId] = #__groupId_3))
END
OFFSET #__p_4 ROWS FETCH NEXT #__p_5 ROWS ONLY
Now my question is, how can I improve the execution time of this? Either by SQL or LINQ
EDIT: So I've been tinkering with some raw SQL and this is what I've come up with:
with matched_urls as (
select l.id, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.id
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = #groupId
and contains(Url, '"barz*"')
group by l.id
) select l.id, 5 as Filter
from landingpages l
where #domainId in (
select domainid
from domainlandingpages dlp
where l.id = dlp.landingpageid
) and l.id not in (select id from matched_urls ) and contains(Url, '"barz*"')
union select * from matched_urls
order by Filter
offset 10 rows fetch next 30 rows only
This performs somewhat okay, cutting the execution time down to ~5 seconds. As this is to be used for a table search I would however like to get it down even further. Is there any way to improve this SQL?
You're right to have a look at the generated SQL. In general, I would advise to learn SQL, write a performing SQL query and work your way back (either use a stored procedure or raw SQL, or design your LINQ query with that same philosophy.
I suspect this will be better (not tested):
var page = (
from e in baseQuery
let urlFilter = e.LandingPageUrlFilters.OrderBy(f => f.UrlFilterType).FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
let filterType = urlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
select new
{
LandingPage = e,
FilterType = filterType
}
).Skip(10).Take(75).ToList();
one of the way to improve the execution time is see execution plan in SSMS (SQL Server Management Studio).
After look on the execution plan you can design some indexes, or if you have no experiences with this, you can see if SSMS recommends some indexes.
Next try to create the indexes and execute the query again and see if execution time was improved.
Note: this is only one of many possible ways to improve execution time...
I am executing the following LINQ to Entities query but it is stuck and does not return response until timeout. I executed the same query on SQL Server and it return 92000 in 3 sec.
var query = (from r in WinCtx.PartsRoutings
join s in WinCtx.Tab_Processes on r.ProcessName equals s.ProcessName
join p in WinCtx.Tab_Parts on r.CustPartNum equals p.CustPartNum
select new { r}).ToList();
SQL Generated:
SELECT [ I omitted columns]
FROM [dbo].[PartsRouting] AS [Extent1]
INNER JOIN [dbo].[Tab_Processes] AS [Extent2] ON ([Extent1].[ProcessName] = [Extent2].[ProcessName]) OR (([Extent1].[ProcessName] IS NULL) AND ([Extent2].[ProcessName] IS NULL))
INNER JOIN [dbo].[Tab_Parts] AS [Extent3] ON ([Extent1].[CustPartNum] = [Extent3].[CustPartNum]) OR (([Extent1].[CustPartNum] IS NULL) AND ([Extent3].[CustPartNum] IS NULL))
PartsRouting Table has 100,000+ records, Parts = 15000+, Processes = 200.
I tried too many things found online but nothing worked for me as to how I can achieve the result with same performance of SQL.
Based on the comments, looks like the issue is caused by the additional OR with IS NULL conditions in joins generated by the EF SQL translator. They were added in EF in order to emulate the C# == operator semantics which are different from SQL = for NULL values.
You can start by turning that EF behavior off through UseDatabaseNullSemantics property (it's false by default):
WinCtx.Configuration.UseDatabaseNullSemantics = true;
Unfortunately that's not enough, because it fixes the normal comparison operators, but they simply forgot to do the same for join conditions.
In case you are using joins just for filtering (as it seems), you can replace them with LINQ Any conditions which translates to SQL EXISTS and nowadays database query optimizers are treating it the same way as if it was an inner join:
var query = (from r in WinCtx.PartsRoutings
where WinCtx.Tab_Processes.Any(s => r.ProcessName == s.ProcessName)
where WinCtx.Tab_Parts.Any(p => r.CustPartNum == p.CustPartNum)
select new { r }).ToList();
You might also consider using just select r since creating anonymous type with single property just introdeces additional memory overhead with no advantages.
Update: Looking at the latest comment, you do need fields from joined tables (that's why it's important to not omit relevant parts of the query in question). In such case, you could try the alternative join syntax with where clauses:
WinCtx.Configuration.UseDatabaseNullSemantics = true;
var query = (from r in WinCtx.PartsRoutings
from s in WinCtx.Tab_Processes where r.ProcessName == s.ProcessName
from p in WinCtx.Tab_Parts where r.CustPartNum == p.CustPartNum
select new { r, s.Foo, p.Bar }).ToList();
My ASP.Net application has the following Linq to SQL function to get a distinct list of height values from the product table.
public static List<string> getHeightList(string catID)
{
using (CategoriesClassesDataContext db = new CategoriesClassesDataContext())
{
var heightTable = (from p in db.Products
join cp in db.CatProducts on p.ProductID equals cp.ProductID
where p.Enabled == true && (p.CaseOnly == null || p.CaseOnly == false) && cp.CatID == catID
select new { Height = p.Height, sort = Convert.ToDecimal(p.Height.Replace("\"", "")) }).Distinct().OrderBy(s => s.sort);
List<string> heightList = new List<string>();
foreach (var s in heightTable)
{
heightList.Add(s.Height.ToString());
}
return heightList;
}
}
I ran Redgate SQL Monitor which shows that this query is using a lot of resources.
Redgate is also showing that I am running the following query:
select count(distinct [height]) from product p
join catproduct cp on p.productid = cp.productid
join cat c on cp.catid = c.catid
where p.enabled=1 and p.displayfilter = 1 and c.catid = 'C2-14'
My questions are:
A suggestion to change the function so that it uses less resources?
Also, how does linq to sql generate the above query from my function? (I did not write select count(distinct [height]) from product anywhere in the code)
There are 90,000 records in the products. This category which I am trying to get the distinct list of heights has 50,000 product records
Thank you in advance,
Nick
First of all your posted sql query and linq query doesn't match at all. it's not the LINQ query rather the underlying SQL query itself performing slow. Make sure, all the columns involved in JOIN ON clause and WHERE clause and ORDER BY clause are indexed properly in order to have a better execution plan; else you will end up getting a FULL Table Scan and a File Sort and query will deemed to perform slow.
The join multiplies the number of Products the query returns. To undo that, you apply Distinct at the end. It will certainly reduce db resources if you return unique Products right away:
var heightTable = (from p in db.Products
where p.CatProducts.Any(cp => cp.CatID == catID)
&& p.Enabled && (p.CaseOnly == null || !p.CaseOnly)
select new
{
Height = p.Height,
sort = Convert.ToDecimal(p.Height.Replace("\"", ""))
}).OrderBy(s => s.sort);
This changes the join into a where clause. It saves the db engine the trouble of deduplicating the result.
If that still performs poorly, you should try to do the conversion and ordering in memory, i.e. after receiving the raw results from the database.
As for the count. I don't know where it comes from. Such queries typically get generated by paging libraries such as PagedList, but I see no trace of that in your code.
Side note: you can return ...
heightList.Select(x => x.Height.ToString()).ToList()
... instead of creating the list yourself.
I've looked at several other questions related to correlated subqueries but it's still not clear to me how to accomplish what I need. I'm using Entity Framework and C#, and have a table called STEWARDSHIP with the following columns:
STEWARDSHIP_ID (the primary key)
SITE_ID
VISIT_DATE
VISIT_TYPE_ID
I need to identify cases where the same combination of SITE_ID, VISIT_DATE, VISIT_TYPE_ID exists more than once because it could represent a duplicate entry made by end users in error, and then I need to report on the details of these entries. In SQL I would do this by joining to the temporary result of a GROUP BY/HAVING like so:
SELECT * FROM stewardship AS s2,
(SELECT site_id, visit_type_id, CAST(visit_date AS DATE) AS visit_date
FROM stewardship
GROUP BY site_id, visit_type_id, CAST(visit_date AS DATE)
HAVING COUNT(*) > 1) AS s
WHERE s2.site_id = s.site_id
AND s2.visit_type_id = s.visit_type_id
AND CAST(s2.visit_date AS DATE) = s.visit_date
What's the best way to accomplish this in Linq?
Since you're open to a different approach that should be more performant, here is the new SQL to get what I think you're after.
select distinct s1.*
from stewardship s1
inner join stewardship s2 on
s1.stewardship_id <> s2.stewardship_id and
s1.site_id = s2.site_id and
s1.visit_type_id = s2.visit_type_id and
cast(s1.visit_date as date) = cast(s2.visit_date as date)
order by s1.site_id, s1.visit_type_id
Now, to translate that to LINQ, you can use the following statement.
var duplicates = (
from s in Stewardships
join s2 in Stewardships
on new { s.Site_id, s.Visit_type_id, s.Visit_date.Date } equals new { s2.Site_id, s2.Visit_type_id, s2.Visit_date.Date }
where s.Stewardship_id != s2.Stewardship_id
select s)
.Distinct()
.OrderBy(s => s.Site_id)
.ThenBy(s => s.Visit_type_id)
Note that you cannot use anything other than an equijoin for expression joins, so I had to put the non-equijoin (ensuring our matches aren't on the same record via PK) in the where expression. You could also accomplish this with lambdas via the Except() extension method.
The order by is there for readability of the results and to match the SQL statement above.
I hope this helps!
It would be fairly similar to what you've already got.
from s in context.stewardships
group s by new {s.site_id, s.visit_type_id, visit_date} into g
where g.Count() > 1
select g;
This would give you groups of stewardships with similar values. You could "flatten" those results with a SelectMany afterward, but you might find them more useful to work with in groups.
Note that you may need to use SqlFunctions or something to do the equivalent of the cast to date.
So I have a SQL query with the following structure:
select p.* from
(
select max([price]) as Max_Price,
[childId] as childNodeId
from [Items] group by [childId]
) as q inner join [Items] as p on p.[price] = q.[Max_Price] and p.[childId] = q.[childNodeId]
I need to recreate this query in NHibernate, using the Criteria API. I tried using the Subqueries API, but it seems to require that the inner query returns a single column to check equality with a property in the outer query. However, I return two. I've read that this can be accomplished via the HQL API, but I need to do it with Criteria API, as we're going to be dynamically generating queries like this on the fly. Can anyone steer me in the correct direction here?
I've managed to resolve a similar problem by slightly adapting the original sql query. I've ended up with something like this (pseudo sql code):
SELECT p.* FROM [Items] as p
WHERE EXISTS
(
SELECT [childId] as childNodeId FROM [Items] as q
WHERE p.[childId] = q.[childNodeId]
GROUP BY q.[childId]
HAVING p.[price] = MAX(q.[price])
)
And this is the QueryOver implementation:
var subquery = QueryOver.Of(() => q)
.SelectList(list => list.SelectGroup(() => q.ChildId))
.Where(Restrictions.EqProperty(
Projections.Property(() => p.Price),
Projections.Max(() => q.Price)))
.And(Restrictions.EqProperty(
Projections.Property(() => p.ChildId),
Projections.Property(() => q.ChildId)));
From here you only need to pass the aliases so that NHibernate can resolve entities correctly (pseudo code):
var filter = QueryOver.Of(() => p)
.WithSubquery.WhereExists(GetSubQuery(p, criteria...));
I hope this helps in your particular case.
UPDATE: Criteria API
var subquery = DetachedCriteria.For<Items>("q")
.SetProjection(Projections.ProjectionList()
.Add(Projections.GroupProperty("q.ChildId")))
.Add(Restrictions.EqProperty("p.Price", Projections.Max("q.Price")))
.Add(Restrictions.EqProperty("p.ChildId", "q.ChildId"));
var query = DetachedCriteria.For<Items>("p")
.Add(Subqueries.Exists(subquery));
Nevertheless I would recommend sticking to the QueryOver version, it's much more intuitive and you avoid magic strings (especially that you don't have to upgrade the NH version).
Please let me know if this is working for you.