I've been fooling around with some LINQ over Entities and I'm getting strange results and I would like to get an explanation...
Given the following LINQ query,
// Sample # 1
IEnumerable<GroupInformation> groupingInfo;
groupingInfo = from a in context.AccountingTransaction
group a by a.Type into grp
select new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
};
I get the following SQL query (taken from SQL Profiler):
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [Type],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[Type] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[AccountingTransaction] AS [Extent1]
GROUP BY [Extent1].[Type]
) AS [GroupBy1]
So far so good.
If I change my LINQ query to:
// Sample # 2
groupingInfo = context.AccountingTransaction.
GroupBy(a => a.Type).
Select(grp => new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
});
it yields to the exact same SQL query. Makes sense to me.
Here comes the interesting part... If I change my LINQ query to:
// Sample # 3
IEnumerable<AccountingTransaction> accounts;
IEnumerable<IGrouping<object, AccountingTransaction>> groups;
IEnumerable<GroupInformation> groupingInfo;
accounts = context.AccountingTransaction;
groups = accounts.GroupBy(a => a.Type);
groupingInfo = groups.Select(grp => new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
});
the following SQL is executed (I stripped a few of the fields from the actual query, but all the fields from the table (~ 15 fields) were included in the query, twice):
SELECT
[Project2].[C1] AS [C1],
[Project2].[Type] AS [Type],
[Project2].[C2] AS [C2],
[Project2].[Id] AS [Id],
[Project2].[TimeStamp] AS [TimeStamp],
-- <snip>
FROM ( SELECT
[Distinct1].[Type] AS [Type],
1 AS [C1],
[Extent2].[Id] AS [Id],
[Extent2].[TimeStamp] AS [TimeStamp],
-- <snip>
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT DISTINCT
[Extent1].[Type] AS [Type]
FROM [dbo].[AccountingTransaction] AS [Extent1] ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[AccountingTransaction] AS [Extent2] ON [Distinct1].[Type] = [Extent2].[Type]
) AS [Project2]
ORDER BY [Project2].[Type] ASC, [Project2].[C2] ASC
Why are the SQLs generated are so different? After all, the exact same code is executed, it's just that sample # 3 is using intermediate variables to get the same job done!
Also, if I do:
Console.WriteLine(groupingInfo.ToString());
for sample # 1 and sample # 2, I get the exact same query that was captured by SQL Profiler, but for sample # 3, I get:
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Linq.IGrouping`2[System.Object,TestLinq.AccountingTransaction],TestLinq.GroupInformation]
What is the difference? Why can't I get the SQL Query generated by LINQ if I split the LINQ query in multiple instructions?
The ulitmate goal is to be able to add operators to the query (Where, OrderBy, etc.) at run-time.
BTW, I've seen this behavior in EF 4.0 and EF 6.0.
Thank you for your help.
The reason is because in your third attempt you're referring to accounts as IEnumerable<AccountingTransaction> which will cause the query to be invoked using Linq-To-Objects (Enumerable.GroupBy and Enumerable.Select)
On the other hand, in your first and second attempts the reference to AccountingTransaction is preserved as IQueryable<AccountingTransaction> and the query will be executed using Linq-To-Entities which will then transform it to the appropriate SQL statement.
Related
In documentation of Entity Framework:
https://www.entityframeworktutorial.net/querying-entity-graph-in-entity-framework.aspx
in section regarding GroupBy we can read that following code:
using (var ctx = new SchoolDBEntities())
{
var students = from s in ctx.Students
group s by s.StandardId into studentsByStandard
select studentsByStandard;
foreach (var groupItem in students)
{
Console.WriteLine(groupItem.Key);
foreach (var stud in groupItem)
{
Console.WriteLine(stud.StudentId);
}
}
}
executes internally following SQL:
SELECT
[Project2].[C1] AS [C1],
[Project2].[StandardId] AS [StandardId],
[Project2].[C2] AS [C2],
[Project2].[StudentID] AS [StudentID],
[Project2].[StudentName] AS [StudentName],
[Project2].[StandardId1] AS [StandardId1]
FROM ( SELECT
[Distinct1].[StandardId] AS [StandardId],
1 AS [C1],
[Extent2].[StudentID] AS [StudentID],
[Extent2].[StudentName] AS [StudentName],
[Extent2].[StandardId] AS [StandardId1],
CASE WHEN ([Extent2].[StudentID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT DISTINCT
[Extent1].[StandardId] AS [StandardId]
FROM [dbo].[Student] AS [Extent1] ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[Student] AS [Extent2] ON ([Distinct1].[StandardId] = [Extent2]. [StandardId]) OR (([Distinct1].[StandardId] IS NULL) AND ([Extent2].[StandardId] IS NULL))
) AS [Project2]
ORDER BY [Project2].[StandardId] ASC, [Project2].[C2] ASC
go
Why there is no GroupBy clause in SQL? If there is no GroupBy clause needed, can’t we just use simple Select with OrderBy and without Joins? Can anyone explain the above query?
The bottom line is: because SQL can't return nested result sets.
Every SQL SELECT statement returns a flat list of values. LINQ is capable of returning object graphs, i.e. objects with nested objects. That's exactly what LINQ's GroupBy does.
In SQL, a GROUP BY statement only returns the grouping columns and aggregate results:
SELECT StandardId, COUNT(*)
FROM Students
GROUP BY StandardId;
The rest of the student columns is gone.
A LINQ GroupBy statement returns something like
StandardId
StudentId StudentName
1
21 "Student1"
15 "Student2"
2
48 "Student3"
91 "Student4"
17 "Student5"
Therefore, a SQL GROUP BY statement can never be the source for a LINQ GroupBy.
Entity Framework (6) knows this and it generates a SQL statement that pulls all required data from the database, adds some parts that make grouping easier, and it creates the groupings client-side.
I have been experimenting trying to get the following Linq working without joy. I'm convinced that it's right, but that might just be my bad Linq. I originally added this as a answer to a similar question here:
Linq-to-entities - Include() method not loading
But as it's a very old question, and mine is more specific, I figured it would do better as an explicit question.
In the linked question, Alex James gives two interesting solutions, however if you try them and check the SQL, it's horrible.
The example I was working on is:
var theRelease = from release in context.Releases
where release.Name == "Hello World"
select release;
var allProductionVersions = from prodVer in context.ProductionVersions
where prodVer.Status == 1
select prodVer;
var combined = (from release in theRelease
join p in allProductionVersions on release.Id equals p.ReleaseID
select release).Include(release => release.ProductionVersions);
var allProductionsForChosenRelease = combined.ToList();
This follows the simpler of the two examples. Without the include it produces the perfectly respectable sql:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
But with, OMG:
SELECT
[Project1].[Id1] AS [Id],
[Project1].[Id] AS [Id1],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id2] AS [Id2],
[Project1].[Status] AS [Status],
[Project1].[ReleaseID] AS [ReleaseID]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent3].[Id] AS [Id2],
[Extent3].[Status] AS [Status],
[Extent3].[ReleaseID] AS [ReleaseID],
CASE WHEN ([Extent3].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [Extent3] ON [Extent1].[Id] = [Extent3].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
) AS [Project1]
ORDER BY [Project1].[Id1] ASC, [Project1].[Id] ASC, [Project1].[C1] ASC
Total garbage. The key point to note here is the fact that it returns the outer joined version of the table which has not been limited by status=1.
This results in the WRONG data being returned:
Id Id1 Name C1 Id2 Status ReleaseID
2 1 Hello World 1 1 2 1
2 1 Hello World 1 2 1 1
Note that the status of 2 is being returned there, despite our restriction. It simply does not work.
If I have gone wrong somewhere, I would be delighted to find out, as this is making a mockery of Linq. I love the idea, but the execution doesn't seem to be usable at the moment.
Out of curiosity, I tried the LinqToSQL dbml rather than the LinqToEntities edmx that produced the mess above:
SELECT [t0].[Id], [t0].[Name], [t2].[Id] AS [Id2], [t2].[Status], [t2].[ReleaseID], (
SELECT COUNT(*)
FROM [dbo].[ProductionVersions] AS [t3]
WHERE [t3].[ReleaseID] = [t0].[Id]
) AS [value]
FROM [dbo].[Releases] AS [t0]
INNER JOIN [dbo].[ProductionVersions] AS [t1] ON [t0].[Id] = [t1].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [t2] ON [t2].[ReleaseID] = [t0].[Id]
WHERE ([t0].[Name] = #p0) AND ([t1].[Status] = #p1)
ORDER BY [t0].[Id], [t1].[Id], [t2].[Id]
Slightly more compact - weird count clause, but overall same total FAIL.
Please tell me I've missed something obvious, as I really want to like Linq!
Okay, after another evening of head scratching I cracked it.
In LinqToSQL:
using (var context = new TestSQLModelDataContext())
{
context.DeferredLoadingEnabled = false;
DataLoadOptions ds = new DataLoadOptions();
ds.LoadWith<ProductionVersion>(prod => prod.Release);
context.LoadOptions = ds;
var combined = from release in context.Releases
where release.Name == "Hello World"
select from prodVer in release.ProductionVersions
where prodVer.Status == 1
select prodVer;
var allProductionsForChosenRelease = combined.ToList();
}
This produces the much more reasonable SQL:
SELECT [t2].[Id], [t2].[Status], [t2].[ReleaseID], [t0].[Id] AS [Id2], [t0].[Name], (
SELECT COUNT(*)
FROM [dbo].[ProductionVersions] AS [t3]
WHERE ([t3].[Status] = 1) AND ([t3].[ReleaseID] = [t0].[Id])
) AS [value]
FROM [dbo].[Releases] AS [t0]
OUTER APPLY (
SELECT [t1].[Id], [t1].[Status], [t1].[ReleaseID]
FROM [dbo].[ProductionVersions] AS [t1]
WHERE ([t1].[Status] =1) AND ([t1].[ReleaseID] = [t0].[Id])
) AS [t2]
WHERE [t0].[Name] = 'Hello World'
ORDER BY [t0].[Id], [t2].[Id]
Which produces the correct results:
Id Status ReleaseID Id2 Name value
2 1 1 1 Hello World 1
And in LinqToEntities (I couldn't get the Include syntax to work, so I use the quirk where including the desired table in the results links it up correctly):
using (var context = new TestEntities1())
{
var combined = (from release in context.Releases
where release.Name == "Hello World"
select from prodVer in release.ProductionVersions
where prodVer.Status == 1
select new { prodVer, Release =prodVer.Release });
var allProductionsForChosenRelease = combined.ToList();
}
And this produces the SQL:
SELECT
[Project1].[Id] AS [Id],
[Project1].[C1] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Status] AS [Status],
[Project1].[ReleaseID] AS [ReleaseID],
[Project1].[Id2] AS [Id2],
[Project1].[Name] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Join1].[Id1] AS [Id1],
[Join1].[Status] AS [Status],
[Join1].[ReleaseID] AS [ReleaseID],
[Join1].[Id2] AS [Id2],
[Join1].[Name] AS [Name],
CASE WHEN ([Join1].[Id1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Releases] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id1], [Extent2].[Status] AS [Status], [Extent2].[ReleaseID] AS [ReleaseID], [Extent3].[Id] AS [Id2], [Extent3].[Name] AS [Name]
FROM [dbo].[ProductionVersions] AS [Extent2]
INNER JOIN [dbo].[Releases] AS [Extent3] ON [Extent2].[ReleaseID] = [Extent3].[Id] ) AS [Join1] ON ([Extent1].[Id] = [Join1].[ReleaseID]) AND (1 = [Join1].[Status])
WHERE 'Hello World' = [Extent1].[Name]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
Which is fairly mental, but it does work.
Id C1 Id1 Status ReleaseID Id2 Name
1 1 2 1 1 1 Hello World
All of which leads me to the conclusion that Linq is far from finished. It can be used, but with extreme caution. Use it as a strongly typed and compile time checked, but laborious/error prone, way of writing bad SQL. It's a trade-off. You get more security at the C# end, but man it's a lot harder than writing SQL!
Taking a second look, I now understand the elusive effect of the Include.
Just as in plain SQL, a join in LINQ will repeat results when the right side of the join is the "n" end of a 1-n association.
Let's assume you have one Release with two ProductionVersions. Without the Include, the join will give you two identical Releases, because after all the statement selects releases. Now when you add the Include, EF will not only return two releases, but will also fully populate their ProductionVersions collections.
Looking a bit deeper, in the context's cache, it appears that EF really only materialized just 1 Release and 2ProductionVersions. It's just that the releases are returned twice in the final result set.
In a way, you got what you asked for: give me releases, multiplied by their number of versions. But that's not what you intended to ask.
What you (probably) intended reveals a weak spot in EF's toolbox: we can't Include partial collections. I think you tried to get releases populated with ProductionVersions of Status = 1 only. If possible, you'd rather have done this:
context.Releases.Include(r => r.ProductionVersions.Where(v => v.Status == 1))
.Where(r => r.Name == "Hello World")
But that throws an exception:
The Include path expression must refer to a navigation property defined on the type. Use dotted paths for reference navigation properties and the Select operator for collection navigation properties.
Parameter name: path
This "filtered include" problem has been noted before and until the EF team (or a contributor) decides to grab this issue we have to do with elaborate work-arounds. I described a common one here.
I'm trying to create a query similar to this:
select randomId
from myView
where ...
group by randomId
NOTE: EF doesn't support the distinct so I was thinking of going around the lack of it with the group by (or so I think)
randomId is numeric
Entity Framework V.6.0.2
This gives me the expected result in < 1 second query
When trying to do the same with EF I have been having some issues.
If I do the LINQ similar to this:
context.myView
.Where(...)
.GroupBy(mt => mt.randomId)
.Select({ Id = group.Key, Count = group.Count() } )
I will get sort of the same result but forcing a count and making the query > 6 seconds
The SQL EF generates is something like this:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [randomId],
[GroupBy1].[A1] AS [C2]
FROM (
SELECT
[Extent1].[randomId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
GROUP BY [Extent1].[randomId]
) AS [GroupBy1]
But, if the query had the count commented out it would be back to < 1 second
If I change the Select to be like:
.Select({ Id = group.Key} )
I will get all of rows without the group by statement in the SQL query and no Distinct whatsoever:
SELECT
[Extent1].[anotherField] AS [anotherField], -- 'this field got included automatically on this query and I dont know why, it doesnt affect outcome when removed in SQL server'
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
Other failed attempts:
query.GroupBy(x => x.randomId).Select(group => group.FirstOrDefault());
The query that was generated is as follows:
SELECT
[Limit1].ALL FIELDS,...
FROM (SELECT
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...) AS [Project1]
OUTER APPLY (SELECT TOP (1)
[Extent2].ALL FIELDS,...
FROM [dbo].[myView] AS [Extent2]
WHERE (...) AS [Limit1] -- same as the where above
This query performed rather poorly and still managed to return all Ids for the where clause.
Does anyone have an idea on how to force the usage of the group by without an aggregating function like a count?
In SQL it works but then again I have the distinct keyword as well...
Cheers,
J
var query = from p in TableName
select new {Id = p.ColumnNameId};
var distinctItems = query.Distinct().ToList();
Here is the linq query however you should be able to write an equivalent from EF dbset too. If you have issues let me know.
Cheers!
I have following SQL Query and would like to convert to LINQ to SQL which I will use in entity framework 5.0
var internationalDesksList =
from internationalDesks in _context.InternationalDesks
from subsection in
_context.Subsections.Where(
s =>
internationalDesks.EBALocationId == s.LocationId ||
internationalDesks.FELocationId == s.LocationId).DefaultIfEmpty()
where subsection.PublicationId == 1
select new {internationalDesks.Id, subsection.LocationId};
I have referred the following posts and answers. Though no luck.
LINQ to SQL - Left Outer Join with multiple join conditions Linq
left join on multiple (OR) conditions
When I tried this query in LINQPad I got the following answer which is correct.
-- Region Parameters
DECLARE #p0 Int = 1
-- EndRegion
SELECT [t0].[Id], [t1].[Id] AS [Id1]
FROM [InternationalDesks] AS [t0]
LEFT OUTER JOIN [Subsection] AS [t1] ON (([t0].[FELocationId]) = [t1].[LocationId]) OR (([t0].[EBALocationId]) = [t1].[LocationId])
WHERE [t1].[PublicationId] = #p0
However in entity framework 5 ( DBContext ) it is not providing me with the correct query. When I checked in SQL profiler all columns in subsection table is selected. That's it.
Following is the result:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Description] AS [Description],
[Extent1].[PracticeAreaId] AS [PracticeAreaId],
[Extent1].[LocationId] AS [LocationId],
...
FROM [dbo].[Subsection] AS [Extent1]
Don't know what could be problem. Please help me.
With LINQ you can't do LEFT OUTER JOIN on some boolean expression, only equijoins are supported. So, you can generate CROSS JOIN this way:
var internationalDesksList =
from internationalDesks in _context.InternationalDesks
from subsection in _context.Subsections
where subsection.PublicationId == 1 &&
(internationalDesks.EBALocationId == subsection.LocationId ||
internationalDesks.FELocationId == subsection.LocationId)
select new {
internationalDesks.Id,
subsection.LocationId
};
EF 5 will generate following SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent2].[LocationId] AS [LocationId]
FROM [dbo].[InternationalDesks] AS [Extent1]
CROSS JOIN [dbo].[Subsections] AS [Extent2]
WHERE (1 = [Extent2].[PublicationId]) AND
([Extent1].[EBALocationId] = [Extent2].[LocationId] OR
[Extent1].[FELocationId] = [Extent2].[LocationId])
As you can see, only required columns are selected. I also checked this query in LINQ to SQL - following query is generated:
DECLARE #p0 Int = 1
SELECT [t0].[Id], [t1].[LocationId]
FROM [InternationalDesks] AS [t0], [Subsections] AS [t1]
WHERE ([t1].[PublicationId] = #p0) AND
(([t0].[EBALocationId] = [t1].[LocationId]) OR
([t0].[FELocationId] = [t1].[LocationId]))
I have a problem that I know how to solve in SQL but not with Linq to Entities.
My data looks like this:
ID GROUP TIMESTAMP
-- ----- ---------
1 A 2011-06-20
2 A 2011-06-21
3 B 2011-06-21
4 B 2011-06-22
5 B 2011-06-23
6 C 2011-06-30
I want to retrieve all the Entity objects (not just the ID) such that I am only getting the most recent record from each group. (ie. the records with ids 2, 5, 6)
In SQL I would do something like this:
SELECT * FROM my_table a
WHERE a.timestamp =
(SELECT MAX(timestamp) FROM my_table b
WHERE a.group = b.group)
(For the sake of this question you can assume that timestamp is unique within each group).
I'd like to do this query against a WCF Data Service using Linq to Entities but I can't seem to have a nested query that references the outside query like this. Can anyone help?
Possibly not as clean and efficient as the hand written version but here's what I came up with
var q = from a in db.MyEntities
where a.Timestamp == (from b in db.MyEntities
where b.Group == a.Group
select b.Timestamp).Max()
select a;
which translates into this SQL
SELECT
[Project1].[Id] AS [Id],
[Project1].[Group] AS [Group],
[Project1].[Timestamp] AS [Timestamp]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Group] AS [Group],
[Extent1].[Timestamp] AS [Timestamp],
[SSQTAB1].[A1] AS [C1]
FROM [MyEntities] AS [Extent1]
OUTER APPLY
(SELECT
MAX([Extent2].[Timestamp]) AS [A1]
FROM [MyEntities] AS [Extent2]
WHERE [Extent2].[Group] = [Extent1].[Group]) AS [SSQTAB1]
) AS [Project1]
WHERE [Project1].[Timestamp] = [Project1].[C1]
Hi try to use linqer that will convert your sql statements to linq query.
Linqer
Best Regards
This should work:
var query = db.my_table
.GroupBy(p=>p.group)
.Select(p=>p.OrderByDescending(q=>q.timestamp).First());
Here you go.A simple way to do.
var result = (from x in my_table
group x by x.Group into g
select new
{
g.Key,
timestamp = g.Max(x => x.TimeStamp),
g //This will return everything in g
});