I have Linq-to-SQL code that works with a many-to-many relationship, but note that the relationship itself has its own set of attributes (in this case, Products are in Many Categories, and each product-in-category relation has its own SortOrder attribute).
I have a Linq-to-SQL block that returns matching Products with Category membership information. When I execute the code it generates optimised T-SQL code like so:
exec sp_executesql N'SELECT [t0].[ProductId], [t0].[Name], [t1].[ProductId] AS [ProductId2], [t1].[CategoryId], [t1].[SortOrder] AS [SortOrder2], [t2].[CategoryId] AS [CategoryId2], [t2].[Name] AS [Name2] (
SELECT COUNT(*)
FROM [dbo].[ProductsInCategories] AS [t3]
INNER JOIN [dbo].[Categories] AS [t4] ON [t4].[CategoryId] = [t3].[CategoryId]
WHERE [t3].[ProductId] = [t0].[ProductId]
) AS [value]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN ([dbo].[ProductsInCategories] AS [t1]
INNER JOIN [dbo].[Categories] AS [t2] ON [t2].[CategoryId] = [t1].[CategoryId]) ON [t1].[ProductId] = [t0].[ProductId]
WHERE (([t0].[OwnerId]) = #p0) AND ([t0].[Visible] = 1)
ORDER BY [t0].[SortOrder], [t0].[Name], [t0].[ProductId], [t1].[CategoryId]',N'#p0 bigint',#p0=3
However, when I add paging instructions (i.e.".Skip(0).Take(50)") to the Linq expression the generated SQL becomes this:
exec sp_executesql N'SELECT TOP (50) [t0].[ProductId], [t0].[Name]
FROM [dbo].[Products] AS [t0]
WHERE (([t0].[OwnerId]) = #p0) AND ([t0].[Visible] = 1)
ORDER BY [t0].[SortOrder], [t0].[Name]',N'#p0 bigint',#p0=3
Which means the Category membership information isn't loaded anymore, so Linq-to-SQL then executes the manual loading code 50 times over (one for each member in the returned set):
exec sp_executesql N'SELECT [t0].[ProductId], [t0].[CategoryId], [t0].[SortOrder], [t1].[CategoryId] AS [CategoryId2], [t1].[Name]
FROM [dbo].[ProductsInCategories] AS [t0]
INNER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryId] = [t0].[CategoryId]
WHERE [t0].[ProductId] = #x1',N'#x1 bigint',#x1=1141
(obviously the "#x1" ID parameter varies for each result from the original query).
So clearly Linq paging breaks the query and causes it to load data separately. Is there a way around this or should I do paging in my own software?
...fortunately the number of products in the database is small enough (<500) to do this, but it just feels dirty because there could be tens of thousands of products, and this just wouldn't be a good query.
EDIT:
Here is my Linq:
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Product>( p => p.ProductsInCategories );
dlo.LoadWith<ProductsInCategory>( pic => pic.Category );
this.LoadOptions = dlo;
query = from p in this.Products
select p;
// The lines below are added conditionally:
query = query.OrderBy( p => p.SortOrder ).ThenBy( p => p.Name );
query = query.Where( p => p.Visible );
query = query.Where( p => p.Name.Contains( filter ) || p.Description.Contains( filter ) );
query = query.Where( p => p.OwnerId == siteId );
The skip/take lines are added optionally, and are the only differences that cause the different T-SQL generation (as far as I know):
IQueryable<Product> query = GetProducts( siteId, category, filter, showHidden, sortBySortOrder );
///////////////////////////////////
total = query.Count();
var pagedProducts = query.Skip( pageIndex * pageSize ).Take( pageSize );
return pagedProducts;
An alternative answer which first pages the products and then selects products and categories in a parent-child structure would be like this:
var filter = "a";
var pageSize = 2;
var pageIndex = 1;
// get the correct products
var query = Products.AsQueryable();
query = query.Where (q => q.Name.Contains(filter));
query = query.OrderBy (q => q.SortOrder).ThenBy(q => q.Name);
// do paging
query = query.Skip(pageSize*pageIndex).Take(pageSize);
// now get products + categories as tree structure
var query2 = query.Select(
q=>new
{
q.Name,
Categories=q.ProductsInCategories.Select (pic => pic.Category)
});
Which produces a single SQL statement
-- Region Parameters
DECLARE #p0 NVarChar(1000) = '%a%'
DECLARE #p1 Int = 2
DECLARE #p2 Int = 2
-- EndRegion
SELECT [t2].[Name], [t4].[CategoryId], [t4].[Name] AS [Name2], [t4].[Visible], (
SELECT COUNT(*)
FROM (
SELECT [t5].[CategoryId]
FROM [ProductsInCategories] AS [t5]
WHERE [t5].[ProductId] = [t2].[ProductId]
) AS [t6]
INNER JOIN [Categories] AS [t7] ON [t7].[CategoryId] = [t6].[CategoryId]
) AS [value]
FROM (
SELECT [t1].[ProductId], [t1].[Name], [t1].[ROW_NUMBER]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[SortOrder], [t0].[Name], [t0].[ProductId]) AS [ROW_NUMBER], [t0].[ProductId], [t0].[Name]
FROM [Products] AS [t0]
WHERE [t0].[Name] LIKE #p0
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN #p1 + 1 AND #p1 + #p2
) AS [t2]
LEFT OUTER JOIN ([ProductsInCategories] AS [t3]
INNER JOIN [Categories] AS [t4] ON [t4].[CategoryId] = [t3].[CategoryId]) ON [t3].[ProductId] = [t2].[ProductId]
ORDER BY [t2].[ROW_NUMBER], [t3].[CategoryId], [t3].[ProductId]
Here is a workaround: you should construct your query based on all your conditions, perform ordering there but select only the primary key on your Product table (let's assume this is ProductId column).
The next step is to take the total count (to calculate rows should be skipped and taken),
and the last step is to select all the records from your Product table whose ProductIds are in the query (note: Skip and Take extension methods should be applied to query, not to the new select itself).
This will get you a SELECT statement similar to yours (from the first example) with related entities.
EDIT:
Just created a similar DB structure (according to the original SQL from the question):
Then used:
using (var db = new TestDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Product>(p => p.ProductsInCategories);
options.LoadWith<ProductsInCategory>(pic => pic.Category);
db.LoadOptions = options;
var filter = "product";
var pageIndex = 1;
var pageSize = 10;
var query = db.Products
.OrderBy(p => p.SortOrder)
.ThenBy(p => p.Name)
.Where(p => p.Name.Contains(filter) || p.Description.Contains(filter))
.Select(p => p.ProductId);
var total = query.Count();
var products = db.Products
.Where(p => query.Skip(pageIndex * pageSize).Take(pageSize).Contains(p.ProductId))
.ToList();
}
After the .ToList() call, products variable hold products with product categories with categories. This also produced 2 SQL statement, one - for .Count() statement:
exec sp_executesql N'SELECT COUNT(*) AS [value]
FROM [dbo].[Products] AS [t0]
WHERE ([t0].[Name] LIKE #p0) OR ([t0].[Description] LIKE #p1)',N'#p0 nvarchar(4000),#p1 nvarchar(4000)',#p0=N'%product%',#p1=N'%product%'
and another one for .ToList():
exec sp_executesql N'SELECT [t0].[ProductId], [t0].[Name], [t0].[Description], [t0].[SortOrder], [t1].[ProductId] AS [ProductId2], [t1].[CategoryId], [t1].[SortOrder] AS [SortOrder2], [t2].[CategoryId] AS [CategoryId2], [t2].[Name] AS [Name2], (
SELECT COUNT(*)
FROM (
SELECT NULL AS [EMPTY]
FROM [dbo].[ProductsInCategories] AS [t6]
INNER JOIN [dbo].[Category] AS [t7] ON [t7].[CategoryId] = [t6].[CategoryId]
WHERE [t6].[ProductId] = [t0].[ProductId]
) AS [t8]
) AS [value]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN ([dbo].[ProductsInCategories] AS [t1]
INNER JOIN [dbo].[Category] AS [t2] ON [t2].[CategoryId] = [t1].[CategoryId]) ON [t1].[ProductId] = [t0].[ProductId]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT [t4].[ProductId]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t3].[SortOrder], [t3].[Name], [t3].[ProductId]) AS [ROW_NUMBER], [t3].[ProductId]
FROM [dbo].[Products] AS [t3]
WHERE ([t3].[Name] LIKE #p0) OR ([t3].[Description] LIKE #p1)
) AS [t4]
WHERE [t4].[ROW_NUMBER] BETWEEN #p2 + 1 AND #p2 + #p3
) AS [t5]
WHERE [t5].[ProductId] = [t0].[ProductId]
)
ORDER BY [t0].[ProductId], [t1].[CategoryId]',N'#p0 nvarchar(4000),#p1 nvarchar(4000),#p2 int,#p3 int',#p0=N'%product%',#p1=N'%product%',#p2=10,#p3=10
No more extra queries (as SQL Server Profiler said).
Related
Using C# and Linq to SQL, I found that my query with multiple where is orders of magnitude slower than with a single where / and.
Here is the query
using (TeradiodeDataContext dc = new TeradiodeDataContext())
{
var filterPartNumberID = 71;
var diodeIDsInBlades = (from bd in dc.BladeDiodes
select bd.DiodeID.Value).Distinct();
var diodesWithTestData = (from t in dc.Tests
join tt in dc.TestTypes on t.TestTypeID equals tt.ID
where tt.DevicePartNumberID == filterPartNumberID
select t.DeviceID.Value).Distinct();
var result = (from d in dc.Diodes
where d.DevicePartNumberID == filterPartNumberID
where diodesWithTestData.Contains(d.ID)
where !diodeIDsInBlades.Contains(d.ID)
orderby d.Name
select d);
var list = result.ToList();
// ~15 seconds
}
However, when the condition in the final query is this
where d.DevicePartNumberID == filterPartNumberID
& diodesWithTestData.Contains(d.ID)
& !diodeIDsInBlades.Contains(d.ID)
// milliseconds
it is very fast.
Comparing the SQL in result before calling ToList(), here are the queries (value 71 manually added in place of #params)
-- MULTIPLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t2].[value]
FROM (
SELECT [t1].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t1]
) AS [t2]
) AS [t3]
WHERE [t3].[value] = [t0].[ID]
))) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t4].[DeviceID] AS [value], [t5].[DevicePartNumberID]
FROM [dbo].[Test] AS [t4]
INNER JOIN [dbo].[TestType] AS [t5] ON [t4].[TestTypeID] = ([t5].[ID])
) AS [t6]
WHERE [t6].[DevicePartNumberID] = (71)
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)) AND ([t0].[DevicePartNumberID] = 71)
ORDER BY [t0].[Name]
and
-- SINGLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE ([t0].[DevicePartNumberID] = 71) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t3].[value]
FROM (
SELECT [t1].[DeviceID] AS [value], [t2].[DevicePartNumberID]
FROM [dbo].[Test] AS [t1]
INNER JOIN [dbo].[TestType] AS [t2] ON [t1].[TestTypeID] = ([t2].[ID])
) AS [t3]
WHERE [t3].[DevicePartNumberID] = (71)
) AS [t4]
WHERE [t4].[value] = [t0].[ID]
)) AND (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t5].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t5]
) AS [t6]
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)))
ORDER BY [t0].[Name]
The two SQL queries execute in < 1 second in SSMS and produce the same results.
So I'm wondering why the first is slower on the LINQ side. It's worrying to me because I know I've used multiple where elsewhere, without being aware of a such a severe performance impact.
This question even has answered with both multiple & and where. And this answer even suggests using multiple where clauses.
Can anyone explain why this happens in my case?
Because writing like this
if (someParam1 != 0)
{
myQuery = myQuery.Where(q => q.SomeField1 == someParam1)
}
if (someParam2 != 0)
{
myQuery = myQuery.Where(q => q.SomeField2 == someParam2)
}
is NOT(upd) the same as (in case when someParam1 and someParam2 != 0)
myQuery = from t in Table
where t.SomeField1 == someParam1
&& t.SomeField2 == someParam2
select t;
is (NOT deleted) the same as
myQuery = from t in Table
where t.SomeField1 == someParam1
where t.SomeField2 == someParam2
select t;
UPD
Yes, I do mistake. Second query is same, first is not same.
First and Second queries not EXACTLY the same. Let me show you what I mean.
1st query with lamda-expression writen as
t.Where(r => t.SomeField1 == someParam1 && t.SomeField2 == someParam2)
2nd query as
t.Where(r => r.SomeField1 == someParam1).Where(r => r.SomeField2 == someParam2)
In this case in generated SQL Predicate with SomeField2 goes first (it is important, see below)
In 1st case we getting this SQL:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField1 = :someParam1
AND t.SomeField2 = :someParam2
In 2 case the SQL is:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField2 = :someParam2
AND t.SomeField1 = :someParam1
As we see there are 2 'same' SQLs. As we see, the OP's SQLs are also 'same', they are different in order of predicates in WHERE clause (as in my example). And I guess that SQL optimizer generate 2 different execution plans and may be(!!!) doing NOT EXISTS, then EXISTS and then filtering take more time than do first filtering and after that do EXISTS and NOT EXISTS
UPD2
It is a 'problem' of Linq Provider (ORM). I'm using another ORM (linq2db), and it generates for me EXACTLY the same SQLs in both cases.
I have the following query that works fine
var myList = (from p in db.full
group p by p.object into g
orderby g.Count() descending
select new StringIntType
{
str = g.Key,
nbr = g.Count()
}).Take(50).ToList();
The problem is that it's a bit slow due to the fact that i'm using count(), which is translated to count(*).
I need to know if is there a way to use count(object),
Here is what i got in sql server profiler
exec sp_executesql N'SELECT TOP (50)
[Project1].[C2] AS [C1],
[Project1].[object] AS [object],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [object],
1 AS [C2]
FROM ( SELECT
[Extent1].[object], AS [K1],
COUNT(1) AS [A1]
FROM (SELECT
[full].[mc_host_class] AS [mc_host_class],
[full].[event_handle] AS [event_handle],
[full].[mc_host_address] AS [mc_host_address],
[full].[mc_object_class] AS [mc_object_class],
[full].[mc_object] AS [mc_object],
[full].[mc_incident_time] AS [mc_incident_time],
[full].[date_reception] AS [date_reception],
[full].[status] AS [status],
[full].[mc_owner] AS [mc_owner],
[full].[msg] AS [msg],
[full].[duration] AS [duration],
[full].[repeat_count] AS [repeat_count],
[full].[mc_date_modification] AS [mc_date_modification],
[full].[event_class] AS [event_class],
[full].[bycn_ticket_remedy] AS [bycn_ticket_remedy],
[full].[mc_host] AS [mc_host],
[full].[acknowledge_by] AS [acknowledge_by],
[full].[acknowledge_by_time] AS [acknowledge_by_time],
[full].[assigned_by] AS [assigned_by],
[full].[assigned_to] AS [assigned_to],
[full].[assigned_by_time] AS [assigned_by_time],
[full].[closed_b y] AS [closed_by],
[full].[closed_by_time] AS [closed_by_time],
[full].[blacked_out] AS [blacked_out],
[full].[bycn_liaison_type] AS [bycn_liaison_type],
[full].[bycn_liaison_debit] AS [bycn_liaison_debit],
[full].[cause] AS [cause],
[full].[mc_location] AS [mc_location],
[full].[mc_parameter] AS [mc_parameter]
FROM [dbo].[full] AS [full]) AS [Extent1]
GROUP BY [Extent1].[object],
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC',N'#p__linq__0 datetime2(7),#p__linq__1 datetime2(7)',#p__linq__0='2015-03-14 00:00:00',#p__linq__1='2015-04-15 00:00:00'
Perhaps couple optimisations can do the trick:
Do the take first, before selecting
Count groups only once using a let keyword
So metacode (this code written in notepad and won't compile!)
var topFifty = (
from p in db.full
group p by p.object into g
let groupedCount = g.Count()
orderby groupedCount descending
select p.key, groupedCount
)
.Take(50).ToList();
var topFifty.Select(x => new StringIntType
{
str = x.Key,
nbr = x.Count
}).ToList();
I am having a query i need this to be as a plain sql query can any one help me. Also is there any alternate way with out using NorthWindDatacontext as per in the code mentioneedMy code is as follows
private void FetchData(int take, int pageSize)
{
using (NorthwindDataContext dc = new NorthwindDataContext())
{
var query = from p in dc.Customers
.OrderBy(o => o.ContactName)
.Take(take)
.Skip(pageSize)
select new
{
ID = p.CustomerID,
Name = p.ContactName,
Count = dc.Customers.Count()
};
PagedDataSource page = new PagedDataSource();
page.AllowCustomPaging = true;
page.AllowPaging = true;
page.DataSource = query;
page.PageSize = 10;
Repeater1.DataSource = page;
Repeater1.DataBind();
if (!IsPostBack)
{
RowCount = query.First().Count;
CreatePagingControl();
}
}
}
I tried as per Jon but i am unable to get the desired result can any one help me
(SELECT [t1].[CustomerID] AS [ID], [t1].[ContactName] AS [Name], (
SELECT COUNT(*)
FROM [dbo].[Customers] AS [t2]
) AS [Count]
FROM (
SELECT TOP (10) [t0].[CustomerID], [t0].[ContactName]
FROM [dbo].[Customers] AS [t0]
ORDER BY [t0].[ContactName]
) AS [t1]
ORDER BY [t1].[ContactName]
The simplest way in my view is to see what LINQ to SQL is doing.
Use DataContext.Log to write out the query information to a log of some kind (e.g. to a StringWriter whose contents you can print out afterwards). It won't always be the cleanest SQL possible, but it's a good starting point IMO.
I realise this doesn't answer the question for your specific query, but it's trying to teach you to fish as opposed to giving you a fish, as it were :)
EDIT: Using LINQPad as suggested by RoccoC5 in comments is another great way of seeing the SQL generated for a LINQ query.
Here's the SQL produced by your LINQ query, per LINQPad, where #p0 is the number of rows to Skip:
SELECT [t3].[CustomerID] AS [ID], [t3].[ContactName] AS [Name], (
SELECT COUNT(*)
FROM [Customers] AS [t4]
) AS [Count]
FROM (
SELECT [t2].[CustomerID], [t2].[ContactName], [t2].[ROW_NUMBER]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t1].[ContactName]) AS [ROW_NUMBER], [t1].[CustomerID], [t1].[ContactName]
FROM (
SELECT TOP (10) [t0].[CustomerID], [t0].[ContactName]
FROM [Customers] AS [t0]
ORDER BY [t0].[ContactName]
) AS [t1]
) AS [t2]
WHERE [t2].[ROW_NUMBER] > #p0
) AS [t3]
ORDER BY [t3].[ROW_NUMBER]
I have a bit complex linq2sql query, it doesn't contain any 'order by' statements, but somehow linq2sql thinks it is necessery and inserts it. Unfortunately this 'order by' statement hurts performance and I don't know how to remove it...
Here's the linq2sql expressions (I don't think that they might help, but anyway...)
var lockedBy = Guid.NewGuid();
var locks = LinqDataContext.Instance.LockBranches(3, lockedBy, DateTime.Now + TimeSpan.FromMinutes(3)); // Stored procedure...
var x = LinqDataContext.Instance.Branches
.Where(branch => branch.LockedBy == lockedBy)
.Select
(
branch => new
{
Branch = branch,
Leaves = branch.Leaves
.Select
(
leaf => new
{
Leaf = leaf,
Estimate = leaf.Representation.Estimates
.GroupBy(estimate=>estimate.SegmentID)
.Sum
(
estimatesBySegment => estimatesBySegment.Average
(
estimate => estimate.EstimateRequests.Average
(
estimateRequest => estimateRequest.EstimateSubmit.Value
)
)
)
}
)
}
);
It renders to the following sql
SELECT [t0].[ID], [t0].[IndexID], [t0].[IndexNo], [t0].[Sealed], [t0].[LockedBy], [t0].[UnlockOn], [t1].[ID] AS [ID2], [t1].[BranchID], [t1].[RepresentationID], [t1].[Xml], (
SELECT SUM([t7].[value])
FROM (
SELECT AVG([t6].[value]) AS [value]
FROM (
SELECT (
SELECT AVG([t5].[Value])
FROM [dbo].[EstimateRequests] AS [t4]
LEFT OUTER JOIN [dbo].[EstimateSubmits] AS [t5] ON [t5].[EstimateRequestID] = [t4].[ID]
WHERE [t4].[EstimateID] = [t3].[ID]
) AS [value], [t2].[ID], [t3].[RepresentationID], [t3].[SegmentID]
FROM [dbo].[Representations] AS [t2], [dbo].[Estimates] AS [t3]
) AS [t6]
WHERE ([t6].[ID] = [t1].[RepresentationID]) AND ([t6].[RepresentationID] = [t6].[ID])
GROUP BY [t6].[SegmentID]
) AS [t7]
) AS [Estimate], (
SELECT COUNT(*)
FROM [dbo].[Leaves] AS [t8]
WHERE [t8].[BranchID] = [t0].[ID]
) AS [value]
FROM [dbo].[Branches] AS [t0]
LEFT OUTER JOIN [dbo].[Leaves] AS [t1] ON [t1].[BranchID] = [t0].[ID]
WHERE [t0].[LockedBy] = #p0
ORDER BY [t0].[ID], [t1].[ID] <-- Here's the unnecessary ORDER BY
The order by is needed to be able to distinguish the child collection Leaves. (Every time you have a child collection there is an order by)
I am trying to convert a SQL query to LINQ. Somehow my count(distinct(x)) logic does not seem to be working correctly. The original SQL is quite efficient(or so i think), but the generated SQL is not even returning the correct result.
I am trying to fix this LINQ to do what the original SQL is doing, AND in an efficient way as the original query is doing. Help here would be really apreciated as I am stuck here :(
SQL which is working and I need to make a comparable LINQ of:
SELECT [t1].[PersonID] AS [personid]
FROM [dbo].[Code] AS [t0]
INNER JOIN [dbo].[phonenumbers] AS [t1] ON [t1].[PhoneCode] = [t0].[Code]
INNER JOIN [dbo].[person] ON [t1].[PersonID]= [dbo].[Person].PersonID
WHERE ([t0].[codetype] = 'phone') AND (
([t0].[CodeDescription] = 'Home') AND ([t1].[PhoneNum] = '111')
OR
([t0].[CodeDescription] = 'Work') AND ([t1].[PhoneNum] = '222') )
GROUP BY [t1].[PersonID] HAVING COUNT(DISTINCT([t1].[PhoneNum]))=2
The LINQ which I made is approximately as below:
var ids = context.Code.Where(predicate);
var rs = from r in ids
group r by new { r.phonenumbers.person.PersonID} into g
let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count()
where matchcount ==2
select new
{
personid = g.Key
};
Unfortunately, the above LINQ is NOT generating the correct result, and is actually internally getting generated to the SQL shown below. By the way, this generated query is also reading ALL the rows(about 19592040) around 2 times due to the COUNTS :( Wich is a big performance issue too. Please help/point me to the right direction.
Declare #p0 VarChar(10)='phone'
Declare #p1 VarChar(10)='Home'
Declare #p2 VarChar(10)='111'
Declare #p3 VarChar(10)='Work'
Declare #p4 VarChar(10)='222'
Declare #p5 VarChar(10)='2'
SELECT [t9].[PersonID], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t13].[PhoneNum]
FROM [dbo].[Code] AS [t10]
INNER JOIN [dbo].[phonenumbers] AS [t11] ON [t11].[PhoneType] = [t10].[Code]
INNER JOIN [dbo].[Person] AS [t12] ON [t12].[PersonID] = [t11].[PersonID]
INNER JOIN [dbo].[phonenumbers] AS [t13] ON [t13].[PhoneType] = [t10].[Code]
WHERE ([t9].[PersonID] = [t12].[PersonID]) AND ([t10].[codetype] = #p0) AND ((([t10].[codetype] = #p1) AND ([t11].[PhoneNum] = #p2)) OR (([t10].[codetype] = #p3) AND ([t11].[PhoneNum] = #p4)))
) AS [t14]
) AS [cnt]
FROM (
SELECT [t3].[PersonID], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t7].[PhoneNum]
FROM [dbo].[Code] AS [t4]
INNER JOIN [dbo].[phonenumbers] AS [t5] ON [t5].[PhoneType] = [t4].[Code]
INNER JOIN [dbo].[Person] AS [t6] ON [t6].[PersonID] = [t5].[PersonID]
INNER JOIN [dbo].[phonenumbers] AS [t7] ON [t7].[PhoneType] = [t4].[Code]
WHERE ([t3].[PersonID] = [t6].[PersonID]) AND ([t4].[codetype] = #p0) AND ((([t4].[codetype] = #p1) AND ([t5].[PhoneNum] = #p2)) OR (([t4].[codetype] = #p3) AND ([t5].[PhoneNum] = #p4)))
) AS [t8]
) AS [value]
FROM (
SELECT [t2].[PersonID]
FROM [dbo].[Code] AS [t0]
INNER JOIN [dbo].[phonenumbers] AS [t1] ON [t1].[PhoneType] = [t0].[Code]
INNER JOIN [dbo].[Person] AS [t2] ON [t2].[PersonID] = [t1].[PersonID]
WHERE ([t0].[codetype] = #p0) AND ((([t0].[codetype] = #p1) AND ([t1].[PhoneNum] = #p2)) OR (([t0].[codetype] = #p3) AND ([t1].[PhoneNum] = #p4)))
GROUP BY [t2].[PersonID]
) AS [t3]
) AS [t9]
WHERE [t9].[value] = #p5
Thanks!
I think the issue might be new { r.phonenumbers.person.PersonID}.
Why are you newing up a new object here rather than just grouping by r.phonenumbers.person directly? new {} is going to be a different object every time which will never group.
After grouping by person I would Select a group => new {person = group.person, phoneNumbers = group.person.phonenumbers} and then perform the check for how many phone numbers they have and then any final projection.
Drats! It looks like the fault was on my side(GIGO principle!)
In my ORM, I had created the associations from right to left, instead of the other way round. I think that was the issue.
Only issue left now is that somehow the LINQ is generating one INNER JOIN twice, and that is keeping the final result to be retrieved correctly. If i comment it out i nthe generated sql, i am getting correct result. That's the only issue now and I guess i'll open a new question for that. Thanks for your time!.