Forcing linq to perform inner joins

Forcing linq to perform inner joins - c#

I'm trying to force Linq to preform an inner join between two tables. I'll give an example.
CREATE TABLE [dbo].[People] (
[PersonId] [int] NOT NULL,
[Name] [nvarchar](MAX) NOT NULL,
[UpdatedDate] [smalldatetime] NOT NULL
... Other fields ...
)
CREATE TABLE [dbo].[CompanyPositions] (
[CompanyPositionId] [int] NOT NULL,
[CompanyId] [int] NOT NULL,
[PersonId] [int] NOT NULL,
... Other fields ...
)
Now I'm working with unusual database as there's a reason beyond my control for people to be missing from the People table but have a record in CompanyPositions. I want to filter out CompanyPositions with missing People by joining the tables.
return (from pos in CompanyPositions
join p in People on pos.PersonId equals p.PersonId
select pos).ToList();
Linq sees this join as redundant and removes it from the SQL it generates.
SELECT
[Extent1].[CompanyPositionId] AS [CompanyPositionId],
[Extent1].[CompanyId] AS [CompanyId],
....
FROM [dbo].[CompanyPositions] AS [Extent1]
However it's not redundant in my case. I can fix it like this
// The min date check will always be true, here to force linq to perform the inner join
var minDate = DateTimeExtensions.SqlMinSmallDate;
return (from pos in CompanyPositions
join p in People on pos.PersonId equals p.PersonId
where p.UpdatedDate >= minDate
select pos).ToList();
However this now creates a needless where clause in my SQL. As a purest I'd like to remove this. Any idea's or does the current database design tie my hands?

Since PersonId is declared NOT NULL (and I assume it is declared as an FK to People) then I'm not sure how you could have a CompanyPosition with a person that is not assigned; and Linq can't see how you can eiter, which is why as you have observed Linq considers the join redundant.

If you're using LinqToSql, you can use LoadWith similar to this:
var context = new MyDataContext();
var options = new DataLoadOptions();
options.LoadWith<People>(x => x.CompanyPositions);
context.LoadOptions = options;

I don't know how to force linq to use a join. But the following statment should give you the required result.
return (from pos in CompanyPositions
where (p in People select p.PersonId).Contains(pos.PersonId)
select pos).ToList();

ClientSide transformation:
(
from pos in CompanyPositions
join p in People on pos.PersonId equals p.PersonId
select new {pos, p}
).ToList().Select(x => x.pos);
More direct filtering:
from pos in CompanyPositions
where pos.People.Any()
select pos

Related

Sql query to get newest comment added joined 3 tables

I am trying to make a sql query, that gets me the registration_timestamp of the newest comment.
By supplying a category id.
I have three tables. ( seen below with the fields that should be needed)
Ctm_Comments{
Id,
Page_ID,
Registration_Timestamp
}
Ctm_Forum_Categories{
Id
}
Ctm_Forum_Posts{
Id,
FK_Category_ID
}
I have tried the following, and it returns zero results.
var query = from p in Ctm_Forum_Posts
join c in Ctm_Forum_Categories on p.FK_Categori_ID equals c.Id
join ctm in Ctm_Comments on p.Id equals ctm.Page_ID
where c.Id == 1
select ctm.Reqistration_timestamp;
SQL Queries like these are not my strong suit, so i hope someone here can help out.
Ended up with this, based on the response from accepted answer.
var query = (from comments in Ctm_Comments
join posts in Ctm_Forum_Posts on comments.Page_ID equals posts.Id
join category in Ctm_Forum_Categories on posts.FK_Categori_ID equals category.Id
where category.Id == 1
orderby comments.Reqistration_timestamp descending
select comments.Reqistration_timestamp).FirstOrDefault();

SQL (MS SQL) Query needed is
SELECT TOP 1 [Registration_Timestamp]
FROM [dbo].[Ctm_Comments] AS C
INNER JOIN [dbo].[Ctm_Forum_Posts] AS P ON C.Page_ID = P.Id
INNER JOIN [dbo].[Ctm_Forum_Categories] AS CAT ON CAT.Id = P.Category_ID
WHERE CAT.Id = 1
ORDER BY C.Registration_Timestamp DESC
and this is if we accept that PageID (of Comments Table) is the Post Id. Otherwise, you are missing the PostId Column in the table of Comments which should be the join point
Run the Script below in SQL Server Studio for verification
CREATE TABLE [dbo].[Ctm_Comments] ( [Id] [int] NULL,[Page_ID] [int] NULL,[Registration_Timestamp] [datetime] NULL) ON [PRIMARY]
CREATE TABLE [dbo].[Ctm_Forum_Categories] ( [Id] [int] NULL) ON [PRIMARY]
CREATE TABLE [dbo].[Ctm_Forum_Posts] ( [Id] [int] NULL,[Category_ID] [int] NULL) ON [PRIMARY]
INSERT INTO [dbo].[Ctm_Comments] VALUES (1, 1, '2020-10-23 13:12:55')
INSERT INTO [dbo].[Ctm_Comments] VALUES (2, 1, '2020-10-26 12:12:55')
INSERT INTO [dbo].[Ctm_Comments] VALUES (3, 1, '2020-10-26 12:25:55')
INSERT INTO [dbo].[Ctm_Comments] VALUES (4, 1, '2020-10-26 13:12:55')
INSERT INTO [dbo].[Ctm_Forum_Categories] VALUES (1)
INSERT INTO [dbo].[Ctm_Forum_Posts] VALUES (1, 1)
SELECT TOP 1 [Registration_Timestamp]
FROM [dbo].[Ctm_Comments] AS C
INNER JOIN [dbo].[Ctm_Forum_Posts] AS P ON C.Page_ID = P.Id
INNER JOIN [dbo].[Ctm_Forum_Categories] AS CAT ON CAT.Id = P.Category_ID
WHERE CAT.Id = 1
ORDER BY C.Registration_Timestamp DESC
DROP TABLE [dbo].[Ctm_Comments]
DROP TABLE [dbo].[Ctm_Forum_Categories]
DROP TABLE [dbo].[Ctm_Forum_Posts]
the Result is 2020-10-26 13:12:55.000

When you fix the "my query returns 0 results" part, I'd suggest something like this:
var mostRecentCommentTimestamp = query.Max();
But as you've only selected timestamps, this can only tell you the max timestamp, nothing else about the comment
If you want the whole most recent comment swap the select for an order by descending on the timestamp and take the first*, or install morelinq and use their MaxBy
*Edit, like this:
var query = from p in Ctm_Forum_Posts
join c in Ctm_Forum_Categories on p.FK_Categori_ID equals c.Id
join ctm in Ctm_Comments on p.Id equals ctm.Page_ID
where c.Id == 1
orderby ctm.Reqistration_timestamp descending
select ctm;
var firstComment = query.First();
All this said, at the moment you say your query produces no results; you need to fix that (the joins are wrong, or there is no category 1, or the db is missing data) before you can get a max/orderby of anything

EntityFramework generating poor tsql (linqpad generates much better)

I have a database structure as below
Family(1) ----- (*) FamilyPersons -----(1)Person(1)------() Expenses (1) -----(0..1)GroceriesDetails
Let me explain that relation, Family can have one or more than one person , we have a mapping table FamilyPersons between Family and Persons. Now each person can enter his expenses which go into Expenses Table. Expense Table has a column ExpenseType (groceries, entertainemnet etc)
and details of each of these expenses goes into their own Tables, so we have a GroceriesDetails table (similarly we have other tables), so we have 1 to 0..1 relation between Expense and Groceries.
Now I am writing a query to get Complete GroceriesDetails for a family
GroceriesDetails.Where (g => g.Expenses.Person.FamilyPersons.Any(fp =>
fp.FamilyId == 1) && g.Expenses.ExpenseType == "GC" )
For this the sql generated by EF is
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Amount] AS [Amount]
FROM [dbo].[GroceriesDetails] AS [Extent1]
INNER JOIN (SELECT [Extent3].[Id] AS [Id1]
FROM [dbo].[Expenses] AS [Extent2]
INNER JOIN [dbo].[GroceriesDetails] AS [Extent3] ON [Extent2].[Id] = [Extent3].[Id]
WHERE N'GC' = [Extent2].[ExpenseType] ) AS [Filter1] ON [Extent1].[Id] = [Filter1].[Id1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Expenses] AS [Extent4]
INNER JOIN [dbo].[GroceriesDetails] AS [Extent5] ON [Extent4].[Id] = [Extent5].[Id]
INNER JOIN [dbo].[FamilyPerson] AS [Extent6] ON [Extent4].[PersonId] = [Extent6].[PersonId]
WHERE ([Extent1].[Id] = [Extent5].[Id]) AND (1 = [Extent6].[FamilyId])
)
In this query there is a full table join between Expenses and GroceriesDetails tables which is causing performance issues.
Whereas Linqpad generates a much better SQL
SELECT [t0].[Id], [t0].[Amount]
FROM [GroceriesDetails] AS [t0]
INNER JOIN [Expenses] AS [t1] ON [t1].[Id] = [t0].[Id]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Expenses] AS [t2]
INNER JOIN [Person] AS [t3] ON [t3].[Id] = [t2].[PersonId]
CROSS JOIN [FamilyPerson] AS [t4]
WHERE ([t4].[FamilyId] = #p0) AND ([t2].[Id] = [t0].[Id]) AND ([t4].[PersonId] =
[t3].[Id])
)) AND ([t1].[ExpenseType] = #p1)
Please note that we are using WCF data services so this query is written against a WCF data service reference, so I can't traverse from top (family) to bottom (Groceries) as OData allows only one level of select.
Any help on optimizing this code is appreciated.

From the comments I learned that LinqPad uses Linq2SQL while the app uses EF, and that explains the difference.
The thing is that you have zero control on how EF generates SQL.
The only thing you can do is to rewrite your LINQ query to make it "closer" to desired SQL.
For example, instead of
GroceriesDetails.Where (g => g.Expenses.Person.FamilyPersons.Any(fp => fp.FamilyId == 1)
&& g.Expenses.ExpenseType == "GC" )
you can try to write something like (pseudocode):
from g in GrosseriesDetails
join e in Expenses on g.Id = e.GrosseryId
join p in Persons on p.Id = e.PersonId
join f in FamilyPersons on f.PersonId = p.Id
where f.FamilyId == 1 && e.ExpenseType == "GC"
It almost always helps as it tells an ORM a straightforward way to transform it into SQL. The idea is that the expression tree in the "original" case is more complex compare to the proposed scenario, and by simplifying the expression tree we make translator's job easier and more straightforward.
But besides manipulating the LINQ there is no control over how it generates SQL from the expression tree.

Pulling Data from 1 DataContext to use in another's Method

I have been trying to convert this SQL statement to a LINQ one and am having trouble with the fact that part of the info returned is in a Seperate Database(Datacontext) from the rest. I am pretty sure this can be overcome however I seem to be failing at accomplishing this or finding examples of previous successful attempts.
Can someone offer some guidance on what I do to overcome that hurdle? Thanks
SELECT p.PersonID, p.FirstName, p.MiddleName, p.LastName, cp.EnrollmentID, cp.EnrollmentDate, cp.DisenrollmentDate
FROM [Connect].dbo.tblPerson AS p
INNER JOIN (
SELECT c.ClientID, c.EnrollmentID, c.EnrollmentDate, c.DisenrollmentDate
FROM [CMO].dbo.tblCMOEnrollment AS c
LEFT OUTER JOIN [CMO].dbo.tblWorkerHistory AS wh
ON c.EnrollmentID = wh.EnrollmentID
INNER JOIN [CMO].dbo.tblStaffExtended AS se
ON wh.Worker = se.StaffID
WHERE (wh.EndDate IS NULL OR wh.EndDate >= getdate())
AND wh.Worker = --WorkerGUID Param here
) AS cp
ON p.PersonID = cp.ClientID
ORDER BY p.PersonID
I have asked a similar question here before as was told I would need to create a View in order to accomplish this. Is that still true or was it ever?

I use LINQPad to do a lot of my LINQ to SQL. One of the features it allows is the use of multiple data contexts for one query.
for instance here is some code that I wrote in LINQPad
from template in RateTemplates
where
template.Policies.Any(p =>
Staging_history.Changes.Any(c =>
(c.Policies.Any(cp => cp.PolicyID == p.PolicyID) ||
c.PolicyFees.Any(cpf => cpf.PolicyID == p.PolicyID) ||
c.PolicyOptions.Any(cpo => cpo.PolicyID == p.PolicyID)) &&
c.ChangeTime > new DateTime(2012, 1, 11)
)
)
select new
{
TemplateID = template.ID,
UserID = template.UserID,
PropertyIDs = template.Properties.Select(ppty => ppty.PropertyID)
}
The table "RateTemplates" is a part of my first Data Context (With LINQPad you do not have to define the first data context in your code it is just assumed, but if you do this is C# you would need to specifically say which context to use etc). "Staging_history" is the second Data Context and I am using the table "Changes" from this one.
LINQ to SQL will do all sorts of magic in the background and the resulting SQL that gets executed is ...
-- Region Parameters
DECLARE #p0 DateTime = '2012-01-11 00:00:00.000'
-- EndRegion
SELECT [t0].[ID] AS [TemplateID], [t0].[UserID], [t1].[PropertyID], (
SELECT COUNT(*)
FROM [Property] AS [t7]
WHERE [t7].[RateTemplateID] = [t0].[ID]
) AS [value]
FROM [RateTemplate] AS [t0]
LEFT OUTER JOIN [Property] AS [t1] ON [t1].[RateTemplateID] = [t0].[ID]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [Policy] AS [t2]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [staging_history].[dbo].[Change] AS [t3]
WHERE ((EXISTS(
SELECT NULL AS [EMPTY]
FROM [staging_history].[dbo].[Policy] AS [t4]
WHERE ([t4].[PolicyID] = [t2].[PolicyID]) AND ([t4].[ChangeID] = [t3].[ID])
)) OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [staging_history].[dbo].[PolicyFee] AS [t5]
WHERE ([t5].[PolicyID] = [t2].[PolicyID]) AND ([t5].[ChangeID] = [t3].[ID])
)) OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [staging_history].[dbo].[PolicyOption] AS [t6]
WHERE ([t6].[PolicyID] = [t2].[PolicyID]) AND ([t6].[ChangeID] = [t3].[ID])
))) AND ([t3].[ChangeTime] > #p0)
)) AND ([t2].[RateTemplateID] = [t0].[ID])
)
ORDER BY [t0].[ID], [t1].[PropertyID]
So it looks like you would just need to load up one data context for each database that you want to use and then just build up a LINQ query that makes use of both data contexts in one linq statement, like I have up above.
Hopefully this helps you out and gets you the results you are wanting without having to go creating views for each cross context queries that you want to do.

My understanding (I'm no guru on linqtosql) was the same, that it wasn't possible without using a view/sproc.
However, a quick search, found this on MSDN forums with a workaround. Quote from Damien's answer on there:
2.Add one of the tables to the other data context (Copy the DBML over and prefix the name attribute with the name of the database, e.g.
database2.dbo.MyTable)

speed difference dblinq results vs sql query in prompt

I've setup a database to be used with dblinq.
CREATE TABLE 'quotes' (
'DBDate' int(8) unsigned NOT NULL,
'TickerID' int(11) unsigned NOT NULL,
'Open' double(12,4) NOT NULL,
'High' double(12,4) DEFAULT NULL,
'Low' double(12,4) DEFAULT NULL,
'Close' double(12,4) DEFAULT NULL,
'AdjClose' double(12,4) DEFAULT NULL,
'Volume' int(11) unsigned NOT NULL,
PRIMARY KEY ('TickerID','DBDate'),
CONSTRAINT 'quotes_ibfk_1' FOREIGN KEY ('TickerID') REFERENCES 'tickers' ('TickerID') ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
the above is the mysql table schedule
The MySQL server is on a different machine.
When I run this mysql query on my test machine (so not the same machine as the server)
SELECT a.*, b.* FROM quotes a INNER JOIN quotes b ON a.DBDate =
b.DBDate AND a.TickerID=956 and b.TickerID=957 order by a.dbdate asc;
I'll get an output as expected:
2934 rows in set (0.03 sec)
but when I want to get the same result in my C# envirement using DBLinq like this:
var tradeAbleA = (from a in _db.Quotes where a.TickerID == 956 select a);
var tradeAbleB = (from a in _db.Quotes where a.TickerID == 957 select a);
var myDataSet = (from a in tradeAbleA.AsEnumerable() join b in tradeAbleB.AsEnumerable() on a.DbdAte equals b.DbdAte orderby a.DbdAte ascending select new { a, b }).ToList();
it takes over a second to get the list filled. This is way too long. How can I speed this up? (I need it in a list)
regards,
Matthijs

Shouldn't your Sql translate to the following linq?
var myDataSet = (from a in _db.Quotes
join b in db.Quotes on a.DbdAte equals a.DbdAte
where a.TickerID == 956 && b.TickerID == 957
orderby a.DbdAte ascending select new { a, b }).ToList();
In your current version, you create the query for a and b seperately, and by calling .AsEnumerable() on them in the 3rd of your linq expressions you force them to be evaluated. You move the results in memory, which then Linq to Objects joins for you (which can be expensive). You then order the remaining items in memory.
The above should allow you to pass all these steps through to the query provider which tend to be much faster.

whats wrong in this LINQ synatx?

I am trying to convert a SQL query to LINQ. Somehow my count(distinct(x)) logic does not seem to be working correctly. The original SQL is quite efficient(or so i think), but the generated SQL is not even returning the correct result.
I am trying to fix this LINQ to do what the original SQL is doing, AND in an efficient way as the original query is doing. Help here would be really apreciated as I am stuck here :(
SQL which is working and I need to make a comparable LINQ of:
SELECT [t1].[PersonID] AS [personid]
FROM [dbo].[Code] AS [t0]
INNER JOIN [dbo].[phonenumbers] AS [t1] ON [t1].[PhoneCode] = [t0].[Code]
INNER JOIN [dbo].[person] ON [t1].[PersonID]= [dbo].[Person].PersonID
WHERE ([t0].[codetype] = 'phone') AND (
([t0].[CodeDescription] = 'Home') AND ([t1].[PhoneNum] = '111')
OR
([t0].[CodeDescription] = 'Work') AND ([t1].[PhoneNum] = '222') )
GROUP BY [t1].[PersonID] HAVING COUNT(DISTINCT([t1].[PhoneNum]))=2
The LINQ which I made is approximately as below:
var ids = context.Code.Where(predicate);
var rs = from r in ids
group r by new { r.phonenumbers.person.PersonID} into g
let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count()
where matchcount ==2
select new
{
personid = g.Key
};
Unfortunately, the above LINQ is NOT generating the correct result, and is actually internally getting generated to the SQL shown below. By the way, this generated query is also reading ALL the rows(about 19592040) around 2 times due to the COUNTS :( Wich is a big performance issue too. Please help/point me to the right direction.
Declare #p0 VarChar(10)='phone'
Declare #p1 VarChar(10)='Home'
Declare #p2 VarChar(10)='111'
Declare #p3 VarChar(10)='Work'
Declare #p4 VarChar(10)='222'
Declare #p5 VarChar(10)='2'
SELECT [t9].[PersonID], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t13].[PhoneNum]
FROM [dbo].[Code] AS [t10]
INNER JOIN [dbo].[phonenumbers] AS [t11] ON [t11].[PhoneType] = [t10].[Code]
INNER JOIN [dbo].[Person] AS [t12] ON [t12].[PersonID] = [t11].[PersonID]
INNER JOIN [dbo].[phonenumbers] AS [t13] ON [t13].[PhoneType] = [t10].[Code]
WHERE ([t9].[PersonID] = [t12].[PersonID]) AND ([t10].[codetype] = #p0) AND ((([t10].[codetype] = #p1) AND ([t11].[PhoneNum] = #p2)) OR (([t10].[codetype] = #p3) AND ([t11].[PhoneNum] = #p4)))
) AS [t14]
) AS [cnt]
FROM (
SELECT [t3].[PersonID], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t7].[PhoneNum]
FROM [dbo].[Code] AS [t4]
INNER JOIN [dbo].[phonenumbers] AS [t5] ON [t5].[PhoneType] = [t4].[Code]
INNER JOIN [dbo].[Person] AS [t6] ON [t6].[PersonID] = [t5].[PersonID]
INNER JOIN [dbo].[phonenumbers] AS [t7] ON [t7].[PhoneType] = [t4].[Code]
WHERE ([t3].[PersonID] = [t6].[PersonID]) AND ([t4].[codetype] = #p0) AND ((([t4].[codetype] = #p1) AND ([t5].[PhoneNum] = #p2)) OR (([t4].[codetype] = #p3) AND ([t5].[PhoneNum] = #p4)))
) AS [t8]
) AS [value]
FROM (
SELECT [t2].[PersonID]
FROM [dbo].[Code] AS [t0]
INNER JOIN [dbo].[phonenumbers] AS [t1] ON [t1].[PhoneType] = [t0].[Code]
INNER JOIN [dbo].[Person] AS [t2] ON [t2].[PersonID] = [t1].[PersonID]
WHERE ([t0].[codetype] = #p0) AND ((([t0].[codetype] = #p1) AND ([t1].[PhoneNum] = #p2)) OR (([t0].[codetype] = #p3) AND ([t1].[PhoneNum] = #p4)))
GROUP BY [t2].[PersonID]
) AS [t3]
) AS [t9]
WHERE [t9].[value] = #p5
Thanks!

I think the issue might be new { r.phonenumbers.person.PersonID}.
Why are you newing up a new object here rather than just grouping by r.phonenumbers.person directly? new {} is going to be a different object every time which will never group.
After grouping by person I would Select a group => new {person = group.person, phoneNumbers = group.person.phonenumbers} and then perform the check for how many phone numbers they have and then any final projection.

Drats! It looks like the fault was on my side(GIGO principle!)
In my ORM, I had created the associations from right to left, instead of the other way round. I think that was the issue.
Only issue left now is that somehow the LINQ is generating one INNER JOIN twice, and that is keeping the final result to be retrieved correctly. If i comment it out i nthe generated sql, i am getting correct result. That's the only issue now and I guess i'll open a new question for that. Thanks for your time!.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Forcing linq to perform inner joins - c#

Since PersonId is declared NOT NULL (and I assume it is declared as an FK to People) then I'm not sure how you could have a CompanyPosition with a person that is not assigned; and Linq can't see how you can eiter, which is why as you have observed Linq considers the join redundant.

If you're using LinqToSql, you can use LoadWith similar to this: var context = new MyDataContext(); var options = new DataLoadOptions(); options.LoadWith<People>(x => x.CompanyPositions); context.LoadOptions = options;

I don't know how to force linq to use a join. But the following statment should give you the required result. return (from pos in CompanyPositions where (p in People select p.PersonId).Contains(pos.PersonId) select pos).ToList();

ClientSide transformation: ( from pos in CompanyPositions join p in People on pos.PersonId equals p.PersonId select new {pos, p} ).ToList().Select(x => x.pos); More direct filtering: from pos in CompanyPositions where pos.People.Any() select pos

Related

Sql query to get newest comment added joined 3 tables

EntityFramework generating poor tsql (linqpad generates much better)

Pulling Data from 1 DataContext to use in another's Method

speed difference dblinq results vs sql query in prompt

whats wrong in this LINQ synatx?

Categories

Resources