Entity Framework Group By Then Order By - c#

I'm generating a query using Entity Framework which uses a group by clause and then attempts to order each of the groups to get specific data. I attempted to optimize the order by to only happen once using a let statement but the results are incorrect but the query still executes.
Concept:
var results =
(from n in noteEntities.NoteLog
where associatedIDs.Contains(n.AssociatedID)
group n by n.AssociatedID into gn
let ogn = gn.OrderByDescending(t => t.CreatedDateTime)
let successNote = ogn.FirstOrDefault(x => x.Type == "Success")
let lastStatusNote = ogn.FirstOrDefault()
select new { Success = successNote, Status = lastStatusNote, AssociatedID = gn.Key }).ToList();
However, the problem is that using, what should be the ordered let variable, ogn in the subsequent let statements is not using an order by descending list and I'm getting the wrong success and status notes. I've also tried changing things up to create a sub-query and reference the result but that doesn't seem to return an ordered list either, ex:
var subQuery =
(from n in noteEntities.NoteLog
where associatedIDs.Contains(n.AssociatedID)
group n by n.AssociatedID into gn
select gn.OrderByDescending(t => t.CreatedDateTime));
var results =
(from s in subQuery
let successNote = s.FirstOrDefault(x => x.Type == "Success")
let lastStatusNote = s.FirstOrDefault()
select new { Success = successNote, Status = lastStatusNote }).ToList();
I can make this work by using OrderByDescending twice in the select statement or let statements for the success and status notes but this becomes very slow, and redundant, when there are a lot of notes. Is there a way to run the order by only once and get the right results back?

In SQL a subquery with Order By must have a TOP statement (yours does not). And when Linq detects that there is no FirstOrDefault or Takestatements with the ordered subquery it just strips the OrderByDescending.
If you are having a performance problem with the query perhaps you should look into indexing the table.

Related

LINQ query dropping includes when adding `.Contains()` in where clause

I have a somewhat complex query I'm trying to build in Linq (EntityFramework Core 2.1), and I hit behavior I can't comprehend. The below query runs well and seemingly efficiently:
var q = (
from n in TaskUpdates.Include(t => t.Status).Include("Task").Include("Task.Requirement").Include("User").Include("User.Employee")
where n.User.Employee.EmployeeNumber == 765448466
group n by n.UpdateDate into tu
select tu.OrderByDescending(t=>t.UpdateDate).FirstOrDefault()
)
.Select(x => x.Task.Requirement);
This works as I'd expect, does all the joins I want and includes the expected fields in the SELECT clause:
SELECT [t].[TaskUpdateID], [t].[Active], [t].[TaskId], [t].[Notes], [t].[StatusId], [t].[UpdateDate], [t].[UserId], [t.Task].[TaskID], [t.Task].[Active], [t.Task].[CreatedDate], [t.Task].[RequirementId], [t.Task].[UserId], [t.Task.Requirement].[RequirementID], [t.Task.Requirement].[Active], [t.Task.Requirement].[Description], [t.Task.Requirement].[Hours], [t.Task.Requirement].[Link], [t.Task.Requirement].[Name], [t.Task.Requirement].[RequirementTypeId], [t.Task.Requirement].[ExternalId], [t.Task.Requirement].[SortOrder], [t.Status].[StatusId], [t.Status].[Active], [t.Status].[IsComplete], [t.Status].[Title], [t.User].[UserId], [t.User].[Active], [t.User].[Created], [t.User].[EmployeeNumber], [t.User].[LastLogin], [t.User].[LastUpdated], [t.User.Employee].[EMPLOYEENUMBER], [t.User.Employee].[BEGINDATE], [t.User.Employee].[CITY], [t.User.Employee].[EMPLOYEETYPE], [t.User.Employee].[ENDDATE], [t.User.Employee].[FIRST_NAME], [t.User.Employee].[GENERATION_SUFFIX], [t.User.Employee].[STATUS], [t.User.Employee].[LAST_NAME], [t.User.Employee].[MIDDLE_NAME], [t.User.Employee].[MOBILE], [t.User.Employee].[ORGCODE], [t.User.Employee].[PHONE_NUMBER], [t.User.Employee].[PRIMARYEMAIL], [t.User.Employee].[STATE], [t.User.Employee].[STREET], [t.User.Employee].[TITLE], [t.User.Employee].[ZIPCODE], [t.User.Employee].[BUILDING], [t.User.Employee].[ROOM]
FROM [TaskUpdates] AS [t]
INNER JOIN [Tasks] AS [t.Task] ON [t].[TaskId] = [t.Task].[TaskID]
LEFT JOIN [Requirements] AS [t.Task.Requirement] ON [t.Task].[RequirementId] = [t.Task.Requirement].[RequirementID]
INNER JOIN [Status] AS [t.Status] ON [t].[StatusId] = [t.Status].[StatusId]
INNER JOIN [Users] AS [t.User] ON [t].[UserId] = [t.User].[UserId]
INNER JOIN [DirectoryPeople] AS [t.User.Employee] ON [t.User].[EmployeeNumber] = [t.User.Employee].[EMPLOYEENUMBER]
WHERE [t.User.Employee].[EMPLOYEENUMBER] = 765448466
ORDER BY [t].[UpdateDate]
GO
(I'm using LINQPad to experiment with this query and get the SQL.) In particular, the ending .Select(...) method correctly returns the Requirement object from the query.
What baffles me is if I want to make this query return data for multiple employees, and I change the where clause like so:
var employeeNumbers = new int[] { 765448466 };
var q = (
from n in TaskUpdates.Include(t => t.Status).Include("Task").Include("Task.Requirement").Include("User").Include("User.Employee")
//where n.User.Employee.EmployeeNumber == 765448466
where employeeNumbers.Contains(n.User.Employee.EmployeeNumber)
group n by n.UpdateDate into tu
select tu.OrderByDescending(t=>t.UpdateDate).FirstOrDefault()
)
.Select(x => x.Task.Requirement);
This changes the resulting SQL WHERE clause exactly as I would expect, but it now completely ignores the Includes in the from clause:
SELECT [t].[TaskUpdateID], [t].[Active], [t].[TaskId], [t].[Notes], [t].[StatusId], [t].[UpdateDate], [t].[UserId]
FROM [TaskUpdates] AS [t]
INNER JOIN [Users] AS [t.User] ON [t].[UserId] = [t.User].[UserId]
INNER JOIN [DirectoryPeople] AS [t.User.Employee] ON [t.User].[EmployeeNumber] = [t.User.Employee].[EMPLOYEENUMBER]
WHERE [t.User.Employee].[EMPLOYEENUMBER] IN (765448466)
ORDER BY [t].[UpdateDate]
GO
(only joins as necessary to execute the where) and the result of the final .Select(...) now returns null.
Is this known behavior, with or without explanation? Am I using the Include directives incorrectly, or is there a better way/place for them to go that will resolve this issue?
I can't say for certain the cause, I would suspect EF is going down a different translation path with the Contains and missing the Includes, however as you can see it's not translating the GroupBy at all, so it can definitely be reworked to match more the EF style.
TaskUpdates
.Include(x => x.Task)
.ThenInclude(x => x.Requirement)
.Where(x => employeeNumbers.Contains(x.User.Employee.EmployeeNumber))
.ToList()
.GroupBy(x => x.UpdateDate)
.Select(x => new {
UpdateDate = x.Key,
FirstRequirement = x.First().Task.Requirement
})
.ToList();
This should translate the statements before the first ToList into SQL, populate the results in-memory and allow C# to do the groupby and aggregates on the whole object which SQL would be unable to do.

how to take 100 records from linq query based on a condition

I have a query, which will give the result set . based on a condition I want to take the 100 records. that means . I have a variable x, if the value of x is 100 then I have to do .take(100) else I need to get the complete records.
var abc=(from st in Context.STopics
where st.IsActive==true && st.StudentID == 123
select new result()
{
name = st.name }).ToList().Take(100);
Because LINQ returns an IQueryable which has deferred execution, you can create your query, then restrict it to the first 100 records if your condition is true and then get the results. That way, if your condition is false, you will get all results.
var abc = (from st in Context.STopics
where st.IsActive && st.StudentID == 123
select new result
{
name = st.name
});
if (x == 100)
abc = abc.Take(100);
abc = abc.ToList();
Note that it is important to do the Take before the ToList, otherwise, it would retrieve all the records, and then only keep the first 100 - it is much more efficient to get only the records you need, especially if it is a query on a database table that could contain hundreds of thousands of rows.
One of the most important concept in SQL TOP command is order by. You should not use TOP without order by because it may return different results at different situations.
The same concept is applicable to linq too.
var results = Context.STopics.Where(st => st.IsActive && st.StudentID == 123)
.Select(st => new result(){name = st.name})
.OrderBy(r => r.name)
.Take(100).ToList();
Take and Skip operations are well defined only against ordered sets. More info
Although the other users are correct in giving you the results you want...
This is NOT how you should be using Entity Framework.
This is the better way to use EF.
var query = from student in Context.Students
where student.Id == 123
from topic in student.Topics
order by topic.Name
select topic;
Notice how the structure more closely follows the logic of the business requirements.
You can almost read the code in English.

LINQ to EF returning all fields not just those in Select()

This is the gist of my query which I'm testing in LinqPad using Linq to Entity Framework.
In my mind the resultant SQL should begin with something like SELECT TableA.ID AS myID. Instead, the SELECT includes all fields from all of the tables. Needless to say this incurs a massive performance hit among other problems. How can I prevent this?
var AnswerList = this.Answers
.Where(x=>
..... various conditions on x and related entities...
)
.GroupBy(x => new {x.TableA,x.TableB,x.TableC})
.Select(g=>new {
myID = g.Key.TableA.ID,
})
AnswerList.Dump();
In practice I'm using a new type instead of an anonymous one but the results are the same either way.
Let me know if you need me to fill in more of the ...'s.
UPDATE
I've noticed I can prevent this problem by explicitly specifying the fields I want returned in the GroupBy method, e.g. new {x.TableA.ID ... }
But I still don't understand why it doesn't work just using the Select method (which DOES work when doing the equivalent in Linq to SQL).
Hi,
Could you please try below....?
var query = from SubCat in mySubCategory
where SubCat.CategoryID == 1
group 1 by SubCat.CategoryID into grouped
select new { Catg = grouped.Key,
Count = grouped.Count() };
Thank you,
Vishal Patel

Linq Union: How to add a literal value to the query?

I need to add a literal value to a query. My attempt
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
In the above example, I get an error:
"Local sequence cannot be used in LINQ to SQL implementation
of query operators except the Contains() operator."
If I am using Entity Framework 4 for example, what could I add to the Union statement to always include the "seed" ID?
I am trying to produce SQL code like the following:
select distinct ID
from product
union
select 0 as ID
So later I can join the list to itself so I can find all values where the next highest value is not present (finding the lowest available ID in the set).
Edit: Original Linq Query to find lowest available ID
var skuQuery = Context.Products
.Where(p => p.sku > skuSeedStart &&
p.sku < skuSeedEnd)
.Select(p => p.sku).Distinct();
var lowestSkuAvailableList =
(from p1 in skuQuery
from p2 in skuQuery.Where(a => a == p1 + 1).DefaultIfEmpty()
where p2 == 0 // zero is default for long where it would be null
select p1).ToList();
var Answer = (lowestSkuAvailableList.Count == 0
? skuSeedStart :
lowestSkuAvailableList.Min()) + 1;
This code creates two SKU sets offset by one, then selects the SKU where the next highest doesn't exist. Afterward, it selects the minimum of that (lowest SKU where next highest is available).
For this to work, the seed must be in the set joined together.
Your problem is that your query is being turned entirely into a LINQ-to-SQL query, when what you need is a LINQ-to-SQL query with local manipulation on top of it.
The solution is to tell the compiler that you want to use LINQ-to-Objects after processing the query (in other words, change the extension method resolution to look at IEnumerable<T>, not IQueryable<T>). The easiest way to do this is to tack AsEnumerable() onto the end of your query, like so:
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().AsEnumerable().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
Up front: not answering exactly the question you asked, but solving your problem in a different way.
How about this:
var a = Products.Select(p => p.sku).Distinct().ToList();
a.Add(0);
a.Dump(); // LinqPad's way of showing the values
You should create database table for storing constant values and pass query from this table to Union operator.
For example, let's imagine table "Defaults" with fields "Name" and "Value" with only one record ("SKU", 0).
Then you can rewrite your expression like this:
var zero = context.Defaults.Where(_=>_.Name == "SKU").Select(_=>_.Value);
var result = context.Products.Select(p => p.sku).Distinct().Union(zero).ToList();

Using Count with Take with LINQ

Is there a way to get the whole count when using the Take operator?
You can do both.
IEnumerable<T> query = ...complicated query;
int c = query.Count();
query = query.Take(n);
Just execute the count before the take. this will cause the query to be executed twice, but i believe that that is unavoidable.
if this is in a Linq2SQL context, as your comment implies then this will in fact query the database twice. As far as lazy loading goes though it will depend on how the result of the query is actually used.
For example: if you have two tables say Product and ProductVersion where each Product has multiple ProductVersions associated via a foreign key.
if this is your query:
var query = db.Products.Where(p => complicated condition).OrderBy(p => p.Name).ThenBy(...).Select(p => p);
where you are just selecting Products but after executing the query:
var results = query.ToList();//forces query execution
results[0].ProductVersions;//<-- Lazy loading occurs
if you reference any foreign key or related object that was not part of the original query then it will be lazy loaded in. In your case, the count will not cause any lazy loading because it is simply returning an int. but depending on what you actually do with the result of the Take() you may or may not have Lazy loading occur. Sometimes it can be difficult to tell if you have LazyLoading ocurring, to check you should log your queries using the DataContext.Log property.
The easiest way would be to just do a Count of the query, and then do Take:
var q = ...;
var count = q.Count();
var result = q.Take(...);
It is possible to do this in a single Linq-to-SQL query (where only one SQL statement will be executed). The generated SQL does look unpleasant though, so your performance may vary.
If this is your query:
IQueryable<Person> yourQuery = People
.Where(x => /* complicated query .. */);
You can append the following to it:
var result = yourQuery
.GroupBy (x => true) // This will match all of the rows from your query ..
.Select (g => new {
// .. so 'g', the group, will then contain all of the rows from your query.
CountAll = g.Count(),
TakeFive = g.Take(5),
// We could also query for a max value.
MaxAgeFromAll = g.Max(x => x.PersonAge)
})
.FirstOrDefault();
Which will let you access your data like so:
// Check that result is not null before access.
// If there are no records to find, then 'result' will return null (because of the grouping)
if(result != null) {
var count = result.CountAll;
var firstFiveRows = result.TakeFive;
var maxPersonAge = result.MaxAgeFromAll;
}

Categories