How do you find the group-wise max in LINQ?

How do you find the group-wise max in LINQ? - c#

I'm trying to solve the "group-wise max" problem in LINQ. To start, I have a database modeled using the Entity Framework with the following structure:
Customer:
---------
CustomerID : Int32
Name : String
Order:
-------
OrderID : Int32
CustomerID : Int32
Total : Decimal
This gives me navigation from a Customer to her orders and an Order to the owner.
I'm trying to create a LINQ query that allows me to find the top-10 customer orders in the database. The simple case was pretty easy to come up with:
var q = (
from order in _data.Orders // ObjectQuery<Order>
orderby order.Amount descending select order
).Take(10);
However, I'd like to only show unique customers in this list. I'm still a bit new to LINQ, but this is what I've come up with:
var q = (
from order in _data.Orders // ObjectQuery<Order>
group order by order.Customer into o
select new {
Name = o.Key.Name,
Amount = o.FirstOrDefault().Amount
}
).OrderByDescending(o => o.Amount).Take(10);
This seems to work, but I'm not sure if this is the best approach. Specifically, I wonder about the performance of such a query against a very large database. Also, using the FirstOrDefault method from the group query looks a little strange...
Can anyone provide a better approach, or some assurance that this is the right one?

You could do:
var q = (
from order in _data.Orders // ObjectQuery<Order>
orderby order.Amount descending select order
).Distinct().Take(10);
I would normally look at the generated SQL, and see what is the best.

Customer
.Select(c=>new {Order= c.Orders.OrderByDescending(o=>o.Total).First()})
.OrderByDescending(o=>o.Total)
.Take(10);

Related

Custom Transformer for raw SQL query using NHibernate

I have a table of customerorders that are tied to a table of purchases. Every order can have multiple purchases, it's peculiar but imagine it as if you paid for each item separately at checkout. Every order has a customer name and date, every purchase has a payment type and total on them.
I have a nifty query that provided you know the name of the customer, you can find their most recent unique purchase types.
For example:
Customer A made 3 orders total, 2 via credit card and 1 via cash.
I ask the database "what's the newest unique orders by payment type for A?" Database returns 2 results - the most recent credit card order & the 1 cash order.
This is query:
String sqlQueryStr = $#"SELECT ee.PaymentType, ee.Total
FROM
(
SELECT e.CustomerName, ee.PaymentType, ee.Total, MAX(e.Date) as MaxTimestamp
FROM customerorders ee
INNER JOIN purchases e ON e.Id=ee.OrderId WHERE CustomerName='{customerName}'
GROUP BY 1, 2, 3
) AS sub
INNER JOIN purchases e ON e.Date = sub.MaxTimestamp AND e.CustomerName = sub.CustomerName
INNER JOIN customerorders ee ON ee.OrderId=e.Id AND ee.PaymentType = sub.PaymentType;"
ISQLQuery query = session.CreateSQLQuery(sqlQueryStr);
return query.SetResultTransformer(new AliasToBeanResultTransformer(typeof(Purchase)))
.List<Purchase>();
This works great on its own. The result would be as follows for Customer A, and I have what I want:
'PaymentType: Credit Total:100'
'PaymentType: Cash Total:50'
However, now I want to do something where I don't provide 'customerName'. I want to get everyone's in the same way.
Now if you recall what I said before, Purchase does not have a CustomerName. So I can't just remove the WHERE and use the transformer anymore!
When I remove the 'WHERE' and add an additional SELECT for e.CustomerName, I get this output in MySQL using the query:
'CustomerName: A PaymentType: Credit Total:100'
'CustomerName: A PaymentType: Cash Total:50'
'CustomerName: B PaymentType: Credit Total:20'
'CustomerName: C PaymentType: Credit Total:15'
I was thinking of making a custom transformer, but I'm not sure what kind of transformer will allow me to process this kind of result. It's now not not just a Purchase, it's got the Name also. And I would probably want them grouped together right (not on different rows)?

public sealed class CustomTransformer: IResultTransformer
{
public IList TransformList(IList collection)
{
return collection;
}
public object TransformTuple(object[] tuple, string[] aliases)
{
return new UniqueCustomerPurchase()
{
CustomerName = (string)tuple[0],
PaymentType = (string)tuple[1],
Total = (uint)tuple[2]
};
}
}
This is what I have at the moment. This seems to work well, however, I wish there was a way to group the CustomerName to a list of Purchases(paymentType & total) instead of having to do this. I end up having to iterate over the collection a second time to group them like so:
ISQLQuery query = session.CreateSQLQuery(sqlQueryStr);
return query.SetResultTransformer(new CustomTransformer())
.List<UniqueCustomerPurchase>()
.GroupBy(cp => cp.CustomerName)
.Select(g => new KeyValuePair(g.Key, g.Select(cp => cp));

How to select only rows that have a unique value in one column?

Consider this example data:
field1 field2
1 100
2 100
3 101
4 102
5 102
6 103
I want to select only the records where the value in field2 occurs only once. An example of the desired return data from the above would be:
field1 field2
3 101
6 103
How would this be done with LINQ to SQL?
--- EDIT -------
Hello all, thank you for your responses. I purposely supplied simplified data to get right to the root of my question. I think all these answers return the desired results based on my example data and I will be marking them all answers as such.
however in my real data scenario, using what I've learned from your responses, I have something like this:
var RefinedSource = from a in dSource
group a by a.AssetID into g
where g.Count() == 1
select new
{
AssetID = g.Key,
AssetType = g.Min(a => a.AssetType),
IPInfo = AppUtility.GetIPInfo(g.Key),
Hostname = AppUtility.GetServerName(g.Key),
DeviceID = g.Min(a => a.DeviceID).ToString(),
Environment = AppUtility.GetSolutionAndEnvironmentNames(g.Key),
Manufacturer = g.Min(a => a.Manufacturer),
MakeModel = g.Min(a => a.MakeModel),
Location = g.Min(a => a.Location),
count = g.Count()
};
So I'm concerned about all the .min() calls... I've deduced these are necessary because of the grouping? could someone explain why these are needed? In the case of my simple example I don't see them being an issue, but with my real data there a multiple calls to min() just to be able to include all the field data I need... which doesn't seem good.
The grouping allows me to test the condition I need (that count to identify duplicate values) but how do I more directly use a condition like this but just access my real underlying data rows directly?
for example, looking at the example I just supplied above, I would like to be able to just use a.FieldName from the original "from a in dSource" part, but you can't access that after you have introduced "group by"?
again, thanks for the info, I will be marking as answers, however if anyone could explain the need for all the calls to min() (or max, or whatever) I would appreciate it, also, seeing what it looks like with my real data, is this still the way I should go?

from r in tables
group r.field2 by r.field1 into grp
where grp.Count() == 1
select new {grp.First(), grp.Key}
I'd double check that this does one SQL call. It should, and if so I'd keep it as here because First is a very commonly used Linq method, and when there's a few dozen equally good things to use in a given case one should favour the familiar. If it did cause more than one SQL call (again, I'd be surprised), then try Max() or Min() instead of First().

Here is how it would be done in SQL (sometimes it is faster to use SQL):
SELECT max(field1), field2
FROM table
GROUP BY field2
HAVING count(field2) = 1
Example using window function in sql server
(note, can't test right now the OVER clause might need to be in the where):
SELECT COUNT() OVER(PARTITION BY field2) AS [field2Count], *
FROM table
WHERE [field2Count] = 1

With LINQ you can do simply :
var groups = list.GroupBy(r => r.Value).Where(grp => grp.Count() == 1);
foreach(var gr in groups){
var field1 = gr.Key; // KEY: is your FIELD1
var field2 = gr.Value; // VALUE: is your FIELD2
}

Sort by number of child records using LINQ

I have the following two table:
-Groups-
Id
Name
-Members-
Id
GroupId (Group.Id is related to Member.GroupId)
Name
IsActive (bit)
How can I writing a LINQ query that will sort by the number of IsActive members highest to lowest in a group?
The query would look something like this
//pseudo code
from grp in database.Groups
orderby Count(grp.Members.where(m=>m.IsActive == true)) descending
select grp

You can use let clause for this.
As a result you will get following:
from grp in database.Groups
let activeCount = grp.Members.Where(m=>m.IsActive == true).Count()
order by activeCount descending
select grp
Another way to achive desired ordering is to use select ... into. Queries will be pretty similar, but you should be aware of differences between this two approaches: Is linq's let keyword better than its into keyword?

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};

You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/

The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id

It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

Linq to Sql - Populate JOIN result into a List

I am not sure if this can be done, but here's the scenario.
I want to turn this sql into linq:
SELECT * FROM Department d
INNER JOIN Employee e ON e.DepartmentID = d.DepartmentID
Department - Employee is 1 to many relationship.
I have created a custom object that I would like to populate the result into.
public class DepartmentSummary
{
public Department Department { get; set; }
public List<Employee> Employees {get; set;}
}
The Linq I came up with is
var result = from d in dba.Department
join e in dba.Employee d.DepartmentID equals e.DepartmentID into j1
select new DepartmentSummary
{
Department = d,
Employees = j1.ToList()
};
I tried it out and it's not working. Can anyone shed some light for me please? I would like to perform an inner join between Department and Employee. For each Department in the resultset, I would like to create one DepartmentSummary object which holds that department and a list of employees belonging to that department.
Does Linq provides an ad hoc solution for this or must I iterates through the result set and create a list of DepartmentSummary manually?
Thanks,
EDIT:
Looks like this works for me
var result = from d in dba.Department
join e in dba.Employee d.DepartmentID equals e.DepartmentID into j1
where j1.Count() > 0
select new DepartmentSummary
{
Department = d,
Employees = j1.ToList()
};

The thing is that you're not really taking one SQL and trying to create a Linq-query out of it.
If you were, you'd notice that your SQL query does not really produce one row per department, but it will repeat the department information for each employee in that department.
Now, an initial naive look would suggest you use a group-by clause, since that would allow you to split the data into individual groupings for each department, but groupings in SQL does not really give you a key+all-matching-rows type of result, rather it allows you to do aggregate calculations, like "for each department, how many employees do I have".
So, in order to do what you want, you need to basically do a normal join, which will give you each employee, coupled with the appropriate department information (ie. each employee will be linked to his/her department), and then you need to construct the rest of the data structure yourself.
Now, having said that, if you have the proper relationships set in your data context related classes, each department should already have some kind of property that contains all employees in that department, so perhaps the simple query is just "give me all departments", and then you can, for each department, retrieve the employees?
Of course, doing that would likely execute one SQL for each department, but in this case, you're back to "give me all employees with their department information" and you have to build code to handle the rest.

LINQ to SQL doesn't understand your ToList() call, but you might be able to select the sequence of joined elements and then use LINQ to Objects (via AsEnumerable()) to map to your DepartmentSummary object:
var qResult = from d in dba.Department
join e in dba.Employee d.DepartmentID equals e.DepartmentID into j1
select new
{
Department = d,
Employees = j1
};
var result = from d in qResult.AsEnumerable()
select new DepartmentSummary()
{
Department = d.Department,
Employees = e.Employees.ToList()
};

Sounds like you're looking to get around lazy loading?
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Department>(d => d.Employees);
using (var dba = new MyDataContext())
{
dba.LoadOptions = dlo;
var result = from d in dba.Department
select d;
}
Now, if you don't have a relationship defined between Department and Employees (the Linq2Sql designer will do this for you if you have database relationships setup) then you should look into doing that. It makes it all dramatically easier. In fact, you don't even need your campaign summary.

This problem is due to the nature of the query. When you join Department to Employee, you'll get back one record for every Employee. This means that your ToList() statement is expecting multiple employees per department, but due to the join, always getting one.
Change your query to
var result =
from d in dba.Department
select new tCampaignSummary
{
Department = d,
Employees = dba.Employee.Where(e => e.DepartmentID ==
d.DepartmentID).ToList()
};
I've tested this and it works.
What it does differently is selects only one record per Department (not per employee) then it gets the zero to many corresponding employees for each dept and converts them to a list.
Good luck!
EDIT
As requested, here is the generated SQL:
SELECT [t0].*, [t1].*
(
SELECT COUNT(*)
FROM [dbo].[Employee] AS [t2]
WHERE [t2].[DepartmentID] = [t0].[DepartmentID]
) AS [value]
FROM [dbo].[Department] AS [t0]
LEFT OUTER JOIN [dbo].[Employee] AS [t1]
ON [t1].[DepartmentID] = [t0].[DepartmentID]
ORDER BY [t0].[DepartmentID], [t1].[IndexID]
The only modification is that LINQ will not do [t0].*, instead it will enumerate each field. Since I had to guess at the fields, I left them out to make the SQL clearer.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do you find the group-wise max in LINQ? - c#

You could do: var q = ( from order in _data.Orders // ObjectQuery<Order> orderby order.Amount descending select order ).Distinct().Take(10); I would normally look at the generated SQL, and see what is the best.

Customer .Select(c=>new {Order= c.Orders.OrderByDescending(o=>o.Total).First()}) .OrderByDescending(o=>o.Total) .Take(10);

Related

Custom Transformer for raw SQL query using NHibernate

How to select only rows that have a unique value in one column?

Sort by number of child records using LINQ

Is this LINQ Query "correct"?

Linq to Sql - Populate JOIN result into a List

Categories

Resources