Select only first row from each group. Entity Framework - c#

I have a table something like this
userId
productName
transactionId
Date
6556656
apple
3534534
25.10
6556656
apple
T423423
23.10
6556656
orange
7687898
22.10
6556656
orange
5675665
27.10
6556656
orange
1231312
25.09
6556656
banana
4564545
14.09
6556656
banana
7898878
30.09
As you can see I have 7 rows where are 3 kinds of products. I needn't get all 7 rows. I need get only one of each.
In result I need only 3 rows where will be only one apple, orange and banana ordered by Date(the most late from each group)
need to write query something like this
var result = _db.Fruits.GroupBy(o => o.ProductName).Select(g => g.OrderByDescending(o => o.Date).FirstOrDefault() I try any cases but without result.

Quick answer
You want to take the first occurrence of each group.
You can use OrderBy + First over a GroupBy:
var top =
db
.Transactions
.GroupBy(
t=> t.Product
)
.Select(t=>new {
t.Key,
date=t.OrderBy(x => x.Date).Select(x=>x.Date).First() // <- Magic is Here!
})
.ToList();
That generates:
SELECT t.Product AS Key, (
SELECT t0.Date
FROM Transactions AS t0
WHERE t.Product = t0.Product
ORDER BY t0.Date
LIMIT 1) AS date
FROM Transactions AS t
GROUP BY t.Product
Note, use OrderByDescending to get the last Date (instead the first one)
More elaborate
If you want the whole Transaction model for each group:
var top =
db
.Transactions
.GroupBy(
t=> t.Product
)
.Select(t=>new {
productname = t.Key,
lasttransaction= t.OrderByDescending(x => x.Date).First()} )
.AsEnumerable() // <-- at this point you should to move sql to client
.Select(t => new {t.productname, t.lasttransaction.Date} )
.ToList();
That is translated as:
SELECT t0.Product, t1.TransactionId, t1.Date, t1.Product
FROM (
SELECT t.Product
FROM Transactions AS t
GROUP BY t.Product
) AS t0
LEFT JOIN (
SELECT t2.TransactionId, t2.Date, t2.Product
FROM (
SELECT t3.TransactionId, t3.Date, t3.Product, ROW_NUMBER() OVER(PARTITION BY t3.Product ORDER BY t3.Date DESC) AS row
FROM Transactions AS t3
) AS t2
WHERE t2.row <= 1
) AS t1 ON t0.Product = t1.Product

Related

Get only rows with the latest date for each name

I'm trying to write a query that returns only those rows that contain the latest date for each name.
So for example, this data:
Name
Date Sold
More Columns...
Bob
2021-01-05
Mike
2021-01-18
Susan
2021-01-23
Bob
2021-02-04
Susan
2021-02-16
Mike
2021-03-02
Would produce this result:
Name
Date Sold
More Columns...
Bob
2021-02-04
Susan
2021-02-16
Mike
2021-03-02
It's sort of like a GROUP BY, but I'm not aggregating anything. I only want to filter the original rows.
How could I write such a query?
NOTE: In the end, this will be a SQL Server query but I need to write it using Entity Framework.
UPDATE: In reality, this is part of a much more complex query. It would be extremely difficult for me to implement this as a raw SQL query. If at all possible, I need to implement using Entity Framework.
Two options
Select top 1 with ties *
From YourTable
Order by row_number() over (partition by Name order by Sold_Date desc)
or slightly more performant
with cte as (
Select *
,RN = row_number() over (partition by Name order by Sold_Date desc)
From YourTable
)
Select *
From cte
Where RN=1
Adapted from
Error while flattening the IQueryable<T> after GroupBy()
var names = _context.Items.Select(row => row.Name).Distinct();
var items =
from name in names
from item in _context.Items
.Where(row => row.Name == name)
.OrderByDescending(row => row.DateSold)
.Take(1)
select item;
var results = items.ToArrayAsync();
Let's break this down:
A query expression which establishes the keys for our next query. Will eventually be run as a subquery.
var names = _context.Items.Select(row => row.Name).Distinct();
Another query, starting with the keys...
var items =
from name in names
... and for each key, let's find the matching row ...
from item in _context.Items
.Where(row => row.Name == name)
.OrderByDescending(row => row.DateSold)
.Take(1)
... and we want that row.
select item;
Run the combined query.
var results = items.ToArrayAsync();
try this
;with Groups as
(
Select [Name], max([Date Sold]) as [Date Sold]
From Table
Group By [Name]
)
Select Table.* From Groups
Inner Join Table on Table.[Name] = Groups.Name And Table.[Date Sold] = Groups.[Date Sold]

Delete all but top 10 for every type Entity Framework

Let's say that I have a table like:
Id Name Category CreatedDate
1 test test 10-10-2015
2 test1 test1 10-10-2015
...
Now, I would like to delete all rows and leave only the top 10 from all categories (by top 10 I mean the 10 newest according to createdDate).
Using raw SQL, it would be like:
DELETE FROM [Product]
WHERE id NOT IN
(
SELECT id FROM
(
SELECT id, RANK() OVER(PARTITION BY Category ORDER BY createdDate DESC) num
FROM [Product]
) X
WHERE num <= 10
How is this done when using the DbContext in Entity Framework?
// GET all products
var list = ctx.Products.ToList();
// GROUP by category, ORDER by date descending, SKIP 10 rows by category
var groupByListToRemove = list.GroupBy(x => x.Category)
.Select(x => x.OrderByDescending(y => y.CreatedDate)
.Skip(10).ToList());
// SELECT all data to remove
var listToRemove = groupByListToRemove.SelectMany(x => x);
// Have fun!
ctx.Products.RemoveRange(listToRemove);
Guessing it will take a whil if you have a lot of data but.
var oldItems = efContext.Products
.GroupBy(x => x.Category,
(c,p) => p.OrderByDescending(x => p.createdDate).Skip(10))
.SelectMany(p => p);
efContext.Products.RemoveRange(oldItems);
Will do the trick
(Written in notepad)

Linq query giving inappropriate output

I have two transaction tables named as ParentTransaction and ChildTransaction in which TransactionId of ParentTransaction will act as foreign to ChildTransaction of TransactionId.
Now I want to get all those TransactionId of ParentTransaction whose payamount is not completed.
From below output I want record of transaction Id 3 because only 1000 has been paid for transactionid 3 instead of 5000.
I have one table like this:
Transactionid(p.k) PayAmount
1 1000
2 3000
3 5000
4 6000
ChildTransaction
Id TransactionId(F.k) DepositAmount
1 1 600
2 1 400
3 2 1000
4 2 1000
5 2 1000
6 3 2000
This is my query:
var data = (from tmp in context.ParentTransaction
join tmp1 in context.ChildTransaction on tmp.Transactionid equals
tmp1.Transactionid where tmp.PayAmount !=tmp1.DepositAmount
select tmp);
But here I am getting Transaction Id 1 and 2 although their transaction has been completed in two parts that is 600 and 400 for transaction id 1.
The general idea of query languages is to express the desired result, not how to get it.
Applying it to your scenario leads to a simple query like this
var query = context.ParentTransaction
.Where(t => t.PayAmount != context.ChildTransaction
.Where(ct => ct.TransactionId == t.TransactionId)
.Sum(ct => ct.DepositAmount));
If you are using EF and a proper model navigation properties, it would be even simple
var query = context.ParentTransaction
.Where(t => t.PayAmount != t.ChildTransactions.Sum(ct => ct.DepositAmount));
One may say the above would be inefficient compared to let say the one from #Vadim Martynov answer. Well, may be yes, may be not. Vadim is trying to force a specific execution plan and I can understand that - we have to do such things when in reality encounter a query performance issues. But it's not natural and should be a last resort only if we have a performance problems. Query providers and SQL query optimizers will do (and are doing) that job for us in most of the cases, so we don't need to think of whether we need to use a join vs subquery etc.
I'm not sure that != is a best value. Here is a solution with > check and grouping:
var expectedValue =
context.ChildTransaction
.GroupBy(t => t.TransactionId, (key, group) => new { TransactionId = key, Deposit = group.Sum(e => e.Deposit) })
.Join(context.ParentTransaction, grouped => grouped.TransactionId, transaction => transaction.TransactionId, (group, transaction) => new { Transaction = transaction, group.Deposit })
.Where(result => result.Transaction.PayAmount > result.Deposit)
.Select(result => result.Transaction);
This query can be read in a declare manner like next requirement:
Group collection of child transactions by TransactionId and for each group retrieve an anonymous type object with fields TransactionId = grouping key (== TransactionId) and Deposit which is sum of Deposits for rows with same TransactionId.
Join set from part 1 to the the table PaerntTransaction by TransactionId field. For each joined pair retrieve an anonymous type object with fields Transaction == transaction from ParentTransactions table and Deposit which is deposit from part 1 set which is sum of Deposits with the same TransactionId from the ChildTransactions table.
Filter from result set only objects where PayAmount greather than sum of deposits.
Return only ParentTransaction object for each filtered row.
This is SQL-optimized scenario because join, filter and grouping prevents nested queries which can be added to the actual execution plan in other cases and make worse performance.
UPDATE
To solve the problem with transaction that have no deposits you can use LEFT JOIN:
var expectedValue = from parent in context.ParentTransaction
join child in context.ChildTransaction on parent.TransactionId equals child.TransactionId into gj
from subset in gj.DefaultIfEmpty()
let joined = new { Transaction = parent, Deposit = subset != null ? subset.Deposit : 0 }
group joined by joined.Transaction
into grouped
let g = new { Transaction = grouped.Key, Deposit = grouped.Sum(e => e.Deposit) }
where g.Transaction.PayAmount > g.Deposit
select g.Transaction;
The same query with LINQ method chain:
var expectedValue =
context.ParentTransaction
.GroupJoin(context.ChildTransaction, parent => parent.TransactionId, child => child.TransactionId, (parent, gj) => new { parent, gj })
.SelectMany(#t => #t.gj.DefaultIfEmpty(), (#t, subset) => new { #t, subset })
.Select(#t => new { #t, joined = new { Transaction = #t.#t.parent, Deposit = #t.subset != null ? #t.subset.Deposit : 0 } })
.GroupBy(#t => #t.joined.Transaction, #t => #t.joined)
.Select(grouped => new { grouped, g = new { Transaction = grouped.Key, Deposit = grouped.Sum(e => e.Deposit) } })
.Where(#t => #t.g.Transaction.PayAmount > #t.g.Deposit)
.Select(#t => #t.g.Transaction);
Now you retrieve all parent transaction and join it with child transaction but if there is no children then use Deposit == 0 and group joined entities in a similar manner by ParentTransaction.
Problem
The issue lies on this statement:
where tmp.PayAmount != tmp1.DepositAmount //the culprit
And since the tmp1 is defined as a single child transaction, the statement would result in equating wrong values:
Visualizer:
1000 != 600 //(result: true -> selected) comparing parent 1 and child 1
1000 != 400 //(result: true -> selected) comparing parent 1 and child 2
3000 != 1000 //(result: true -> selected) comparing parent 2 and child 3
3000 != 1000 //(result: true -> selected) comparing parent 2 and child 4
3000 != 1000 //(result: true -> selected) comparing parent 2 and child 5
5000 != 2000 //(result: true -> selected) comparing parent 2 and child 5
//However, you do not want it to behave like this actually
But what you want to have is rather:
Visualizer:
1000 != (600 + 400) //(result: false -> not selected) comparing parent 1 and child 1 & 2, based on the TransactionId
3000 != (1000 + 1000 + 1000) //(result: false -> not selected) comparing parent 2 and child 3, 4, & 5, based on the TransactionId
5000 != (2000) //(result: true -> selected) comparing parent 3 and child 6, based on the TransactionId
6000 != nothing paid //(result: true -> selected) comparing parent 3 with the whole childTransaction and found there isn't any payment made
Thus, you should make tmp1 is as a collection of children rather than single child.
Solution
Unpaid Transaction
Change your code like this:
var data = (from tmp in context.ParentTransaction
join tmp1 in context.ChildTransaction.GroupBy(x => x.TransactionId) //group this by transaction id
on tmp.TransactionId equals tmp1.Key //use the key
where tmp.PayAmount > tmp1.Sum(x => x.DepositAmount) //get the sum of the deposited amount
select tmp)
.Union( //added after edit
(from tmp in context.ParentTransaction
where !context.ChildTransaction.Select(x => x.TransactionId).Contains(tmp.TransactionId)
select tmp)
);
Explanations
This line:
join tmp1 in context.ChildTransaction.GroupBy(x => x.TransactionId) //group this by transaction id
Making use of GroupBy in Linq, this line makes tmp1 a group of children rather than a single child and, rightfully, based on its foreign key, which is the TransactionId.
Then this line:
on tmp.TransactionId equals tmp1.Key //use the key
We simply equates tmp.TransactionId with the children's group key tmp1.Key
Then the next line:
where tmp.PayAmount > tmp1.Sum(x => x.DepositAmount) //get the sum of the deposited amount
Get the sum value of the children's DepositAmount rather than single child's DepositAmount which is less than the PayAmount in the parent, and then
select tmp
Select all the parent transactions which satisfy all the criteria above. This way, we are half-done.
The next step is to consider transaction which occurs in the parent but not in the child(ren). This is considered as unpaid too.
We can combine the result of the first query with the second query using Union
.Union( //added after edit
(from tmp in context.ParentTransaction
where !context.ChildTransaction.Select(x => x.TransactionId).Contains(tmp.TransactionId)
select tmp)
);
This selects whatever exist in the parent transaction but doesn't exist at all in the child (and therefore considered unpaid).
And you would get the right data, consisting of your ParentTransaction rows which are not fully paid, both for the parent transaction whose TransactionId exists in the child or not.
Paid Transaction
As for the paid transaction, simply change the query from > to <=:
var datapaid = (from tmp in context.ParentTransaction
join tmp1 in context.ChildTransaction.GroupBy(y => y.TransactionId)
on tmp.TransactionId equals tmp1.Key
where tmp.PayAmount <= tmp1.Sum(x => x.DepositAmount)
select tmp);
Combined
We can further simplify the above query like this:
var grp = context.ChildTransaction.GroupBy(y => y.TransactionId);
var data = (from tmp in context.ParentTransaction
join tmp1 in grp //group this by transaction id
on tmp.TransactionId equals tmp1.Key //use the key
where tmp.PayAmount > tmp1.Sum(x => x.DepositAmount)
select tmp)
.Union((
from tmp in context.ParentTransaction
where !context.ChildTransaction.Select(x => x.TransactionId).Contains(tmp.TransactionId)
select tmp));
var datapaid = (from tmp in context.ParentTransaction
join tmp1 in grp
on tmp.TransactionId equals tmp1.Key
where tmp.PayAmount <= tmp1.Sum(x => x.DepositAmount)
select tmp);
List<int> obj = new List<int>();
using (DemoEntities context = new DemoEntities())
{
obj = (from ct in context.CTransactions
group ct by ct.Transactionid into grp
join pt in context.PTransactions on grp.Key equals pt.Transactionid
where grp.Sum(x => x.DepositAmount) < pt.PayAmount
select grp.Key).ToList();
}
You control only one child transaction. You must use Sum() operation and need to use > instead of != Pls try this.
var data = (from tmp in context.ParentTransaction
join tmp1 in context.ChildTransaction on tmp.Transactionid equals into tmp1List
tmp1.Transactionid where tmp.PayAmount > tmp1List.Sum(l => l.DepositAmount)
select tmp);

LINQ - Group By with Having?

I have a db table which is having data like below.
Name Tool Security QUANTITY PRICE
ABC ML XXX 100 50
ABC DB XXX -50 50
XYZ CS YYY 30 30
My requirement is to group the name and security and pick only that record which is having both negative and positive quantity. In T-SQL this is my query which is perfectly fine. Need similar in LINQ. For example in above it will give both rows for ABC & XXX.
select t1.* from MyTable as t1
inner join
(
select Name,Security from MyTable
group by Name, Security
HAVING min(Quantity)<0 and max(Quantity)>0
) as t2 on t1.Name=t2.Name and t1.Security =t2.Security
This is my inner query but it's not working.
var Positions = from r in lstpositions
group r by new { r.Name, r.Security} into grp
where grp.Min(x => x.Quantity<0) && grp.Max(x => x.Quantity >0)
select grp;
Any thoughts on this ?
The reason your query does not work is because you taking the min of the result of the comparison.
However I think you want any() not min and max
where grp.Any(x => x.Quantity<0) && grp.Any(x => x.Quantity >0)
This will check for any value below 0 and any value above 0. It will short circuit so it does not have traverse the entire list which should make it faster.
Or this
where grp.Min(x => x.Quantity) < 0 && grp.Max(x => x.Quantity) > 0

Linq query for a nested select statement with grouping and distinct

I'd like to translate the following SQL statement into a linq query:
select COUNT(*), itemid, globalid, title, preview, previewimage, previewimage_alt, link
from (
select distinct Id, itemid, globalid, title, preview, previewimage, previewimage_alt,
(select top 1 link from LikeCounter where GlobalId=x.GlobalId) as link
from [LikeCounter] x
where PortalId=1 and LanguageId=1
) as t
GROUP BY itemid, globalid, title, preview, previewimage, previewimage_alt, link
ORDER BY COUNT(*) desc
The query is over a view that holds records of objects being "liked". Since the objects can be published in multiple places, and the view was setup to allow for filtering for a certain place, it requires a distinct before grouping the records to find out the view count (that's the reason for the additional query for the "link" column).
Is a nested SELECT statement possible in one linq statement?
The inner query is no problem:
(from x in LikeCounter
where x.PortalId==1 && x.LanguageId==1
select new {x.Id, x.ItemId, x.GlobalId, x.LanguageId, x.Title, x.Preview, x.PreviewImage_alt,
Morelink=(from y in LikeCounter
where y.GlobalId==x.GlobalId
select y.Morelink).FirstOrDefault()
}).Distinct()
But is there a way to extend this with the grouping of the distinct records, that results in just one query to the database ?
Thanks in advance for any input...
Nina
Edit:
the following query almost returns what I want -- but produces multiple queries to the SQL server:
(from y in
((from x in LikeCounter
where x.PortalId==1 && x.LanguageId==1
select new {x.Id, x.ItemId, x.GlobalId, x.LanguageId, x.Title, x.Preview, x.PreviewImage_alt,
Link=(from y in Xparo_LikeCounter
where y.GlobalId==x.GlobalId
select y.Link).FirstOrDefault()
}).Distinct())
group y by y.GlobalId into grp
select new {Data=grp, Count= grp.Count()}).OrderByDescending (x => x.Count)
I Think the below should work but i can't really test it. No idea how many queries it would take either
from subq in (from x in LikeCounter
where x.PortalId==1 && x.LanguageId==1
select new {x.Id, x.ItemId, x.GlobalId, x.LanguageId, x.Title, x.Preview, x.PreviewImage_alt,
Morelink=(from y in LikeCounter
where y.GlobalId==x.GlobalId
select y.Morelink).FirstOrDefault()
}).Distinct()
group subq by new {TheCount = subq.Id.Count(), subq.Id, subq.ItemId, subq.GlobalId, subq.LanguageId, subq.Title, subq.Preview, subq.PreviewImage_alt, subq.Morelink } into grouped
order by grouped.TheCount descending;

Categories