I'm trying to improve performance of linq query for PostgreSQL. There are two tables (Parcles, ParcelStates) with relation 1:n. I need to get last 2 ParcelStates for each Parcel. Looks simple, I have following code:
IQueryable<Parcel> parcels = _dbContext.Parcels
.OrderByDescending(x => x.Id)
.Take(100);
Then getting states:
var states = await parcels
.GroupJoin(_dbContext.ParcelStates, ps => ps.Id, p => p.ParcelId, (ps, p) => new { ps, p })
.SelectMany(x => x.p.DefaultIfEmpty().OrderByDescending(y => y.Id).Take(2), (x,c) => c)
.ToListAsync();
It returns me 180 states, and it is ok. But there is performance issue, because it generates not perform SQL query:
SELECT *
FROM (
SELECT *
FROM parcels AS x
WHERE x.isdeleted = FALSE
ORDER BY c DESC, c0 DESC
LIMIT #__p_1 OFFSET #__p_0
) AS t
LEFT JOIN parcelstates AS p ON t.id = p.parcelid
ORDER BY t.c DESC, t.c0 DESC, t.id
It takes all states from database, when I need only 2.
How to change LINQ to filter result on database side?
In logs I found:
The LINQ expression 'Take(2)' could not be translated and will be evaluated
If you insert the SelectMany expression into the GroupJoin, will it convert to SQL?
var states = await parcels
.GroupJoin(_dbContext.ParcelStates, ps => ps.Id, p => p.ParcelId,
(ps, p) => p.DefaultIfEmpty().OrderByDescending(y => y.Id).Take(2))
.ToListAsync();
We can use a foreach loop which will translate to several very fast SQL lookups (should execute in < 1 second). Not ideal but I would still recommend writing a stored procedure to get this data, instead of relying on LINQ to SQL which doesn't always generate the most optimum query:
// Store a list of parcel states
var parcelStates = new List<ParcelState>();
// Read top 100 parcels from the database
var parcels = dbContext.Parcels
.OrderBy(p => p.Id)
.Take(100);
// For each parcel, use SQL to lookup the 2 most recent parcel states
foreach (var p in parcels)
{
var ps = dbContext.ParcelStates
.Where(ps => ps.ParcelId == p.Id)
.OrderByDescending(ps => ps.Id)
.Take(2);
parcelStates.AddRange(ps);
}
// Now we have all parcel states for those parcels
Console.WriteLine($"Found {parcelStates.Count} parcel states for {parcels.Count} parcels");
Related
How would you write a linq query with the following SQL statement. I've tried several methods referenced on stackoverflow but they either don't work with the EF version I'm using (EF core 3.5.1) or the DBMS (SQL Server).
select a.ProductID, a.DateTimeStamp, a.LastPrice
from Products a
where a.DateTimeStamp = (select max(DateTimeStamp) from Products where a.ProductID = ProductID)
For reference, a couple that I've tried (both get run-time errors).
var results = _context.Products
.GroupBy(s => s.ProductID)
.Select(s => s.OrderByDescending(x => x.DateTimeStamp).FirstOrDefault());
var results = _context.Products
.GroupBy(x => new { x.ProductID, x.DateTimeStamp })
.SelectMany(y => y.OrderByDescending(z => z.DateTimeStamp).Take(1))
Thanks!
I understand you would like to have a list of the latest prices of each products?
First of all I prefer to use group by option even over 1st query
select a.ProductID, a.DateTimeStamp, a.LastPrice
from Products a
where a.DateTimeStamp IN (select max(DateTimeStamp) from Products group by ProductID)
Later Linq:
var maxDateTimeStamps = _context.Products
.GroupBy(s => s.ProductID)
.Select(s => s.Max(x => x.DateTimeStamp)).ToArray();
var results = _context.Products.Where(s=>maxDateTimeStamps.Contains(s.DateTimeStamp));
-- all assuming that max datetime stamps are unique
I've managed to do it with the following which replicates the correlated sub query in the original post (other than using TOP and order by instead of the Max aggregate), though I feel like there must be a more elegant way to do this.
var results = from x
in _context.Products
where x.DateTimeStamp == (from y
in _context.Products
where y.ProductID == x.ProductID
orderby y.DateTimeStamp descending
select y.DateTimeStamp
).FirstOrDefault()
select x;
I prefer to break up these queries into IQueryable parts, do you can debug each "step".
Something like this:
IQueryable<ProductOrmEntity> pocoPerParentMaxUpdateDates =
entityDbContext.Products
//.Where(itm => itm.x == 1)/*if you need where */
.GroupBy(i => i.ProductID)
.Select(g => new ProductOrmEntity
{
ProductID = g.Key,
DateTimeStamp = g.Max(row => row.DateTimeStamp)
});
//// next line for debugging..do not leave in for production code
var temppocoPerParentMaxUpdateDates = pocoPerParentMaxUpdateDates.ToListAsync(CancellationToken.None);
IQueryable<ProductOrmEntity> filteredChildren =
from itm
in entityDbContext.Products
join pocoMaxUpdateDatePerParent in pocoPerParentMaxUpdateDates
on new { a = itm.DateTimeStamp, b = itm.ProductID }
equals
new { a = pocoMaxUpdateDatePerParent.DateTimeStamp, b = pocoMaxUpdateDatePerParent.ProductID }
// where
;
IEnumerable<ProductOrmEntity> hereIsWhatIWantItems = filteredChildren.ToListAsync(CancellationToken.None);
That last step, I am putting in an anonymous object. You can put the data in a "new ProductOrmEntity() { ProductID = pocoMaxUpdateDatePerParent.ProductID }...or you can get the FULL ProductOrmEntity object. Your original code, I don't know if getting all columns of the Product object is what you want, or only some of the columns of the object.
I'm building some marketplace web app, let's say something like e-bay. Typical scenario is:
User makes offer which consists of one or more items and those items are of certain type.After that other users are bidding on that offer.
Here is simplified diagram.
On SQL Fiddle (here) you can see both CREATE TABLE and INSERT INTO statements
Sample data:
There are two offers. On one offer (Id 1) which consists of one item which is type of "watch". There is another offer, (Id 2), which has one item which is of type "headphone".
On both offers there are bids. On watch, there are two bis; one bid with 100 dollars and another with 120. On headphones, there are bids with 50 and 80 dollars.
What I want to achieve is to have average bid per type. In this sample, that means i want to get 110 as average bid for watch and 65 as average bid for headphone. To achieve that using T-SQL, I would write query like this:
SELECT t.name,
avg(amount)
FROM bid b
LEFT JOIN offer o ON b.OfferId = o.id
LEFT JOIN offeritem oi ON o.id = oi.OfferId
LEFT JOIN itemType t ON oi.itemtypeid = t.Id
GROUP BY t.name
So, my question is - how to achieve that in dotnet core 3.0 EntityFramework
Using GroupBy, like this:
_context.Bids
.Include(b => b.Offer)
.ThenInclude(o => o.OfferItems)
.ThenInclude(os => os.ItemType)
.GroupBy(b => b.Offer.OfferItems.First().ItemType.Name);
gives exception:
Client side GroupBy is not supported.
. When I try with projection, like this:
_context.Bids
.Include(b => b.Offer)
.ThenInclude(o => o.OfferItems)
.ThenInclude(os => os.ItemType)
.GroupBy(b => b.Offer.OfferItems.First().ItemType.Name)
.Select(g => new
{
Key = g,
Value = g.Average(b => b.Amount)
});
i get exception again.
Processing of the LINQ .... failed. This may indicate either a bug or
a limitation in EF Core.
EDIT:
This approach
_context.Bids
.Include(b => b.Offer)
.ThenInclude(o => o.OfferItems)
.ThenInclude(os => os.ItemType)
.GroupBy(b => new { b.Offer.OfferItems.First().ItemType.Name}, b => b.Amount)
.Select(g => new
{
Key = g.Key.Code,
Value = g.Average()
});
also threw an exception, but this time:
Cannot use an aggregate or a subquery in an expression used for the
group by list of a GROUP BY clause.
...
So, is there a way to group that data (get simple Average) or should I make another query and iterate throught collection and make calculation myself? That would lower performance for sure (I was hoping I can do server grouping, but as you can see, i got into mentioned issues). Any ideas? Thanks in advance.
In your case it is hard to hide subquery from grouping
You can try it in such way
var joined =
context.Bid
.SelectMany(x =>
x.Offer.OfferItem
.Select(y => new
{
Amount = x.Amount,
Name = y.ItemType.Name
})
.Take(1));
var grouped = from i in joined
group i by i.Name into groups
select new
{
Key = groups.Key,
Amount = groups.Average(x => x.Amount)
};
it gives me a query
SELECT [t].[Name] AS [Key], AVG([t].[Amount]) AS [Amount]
FROM [Bid] AS [b]
INNER JOIN [Offer] AS [o] ON [b].[OfferId] = [o].[Id]
CROSS APPLY (
SELECT TOP(1) [b].[Amount], [i].[Name], [o0].[Id], [i].[Id] AS [Id0], [o0].[OfferId]
FROM [OfferItem] AS [o0]
INNER JOIN [ItemType] AS [i] ON [o0].[ItemTypeId] = [i].[Id]
WHERE [o].[Id] = [o0].[OfferId]
) AS [t]
GROUP BY [t].[Name]
I have 2 tables:
USERS
UserId
Name
Scores (collection of table Scores)
SCORES
UserId
CategoryId
Points
I need to show all the users and a SUM of their points, but also I need to show the name of the user. It can be filtered by CategoryId or not.
Context.Scores
.Where(p => p.CategoryId == categoryId) * OPTIONAL
.GroupBy(p => p.UserId)
.Select(p => new
{
UserId = p.Key,
Points = p.Sum(s => s.Points),
Name = p.Select(s => s.User.Name).FirstOrDefault()
}).OrderBy(p => p.Points).ToList();
The problem is that when I add the
Name = p.Select(s => s.User.Name).FirstOrDefault()
It takes so long. I don't know how to access the properties that are not inside the GroupBy or are a SUM. This example is very simple becaouse I don't have only the Name, but also other properties from User table.
How can I solve this?
It takes so long because the query is causing client evaluation. See Client evaluation performance issues and how to use Client evaluation logging to identify related issues.
If you are really on EF Core 2.0, there is nothing you can do than upgrading to v2.1 which contains improved LINQ GroupBy translation. Even with it the solution is not straight forward - the query still uses client evaluation. But it could be rewritten by separating the GroupBy part into subquery and joining it to the Users table to get the additional information needed.
Something like this:
var scores = db.Scores.AsQueryable();
// Optional
// scores = scores.Where(p => p.CategoryId == categoryId);
var points = scores
.GroupBy(s => s.UserId)
.Select(g => new
{
UserId = g.Key,
Points = g.Sum(s => s.Points),
});
var result = db.Users
.Join(points, u => u.UserId, p => p.UserId, (u, p) => new
{
u.UserId,
u.Name,
p.Points
})
.OrderBy(p => p.Points)
.ToList();
This still produces a warning
The LINQ expression 'orderby [p].Points asc' could not be translated and will be evaluated locally.
but at least the query is translated and executes as single SQL:
SELECT [t].[UserId], [t].[Points], [u].[UserId] AS [UserId0], [u].[Name]
FROM [Users] AS [u]
INNER JOIN (
SELECT [s].[UserId], SUM([s].[Points]) AS [Points]
FROM [Scores] AS [s]
GROUP BY [s].[UserId]
) AS [t] ON [u].[UserId] = [t].[UserId]
SELECT
[TimeStampDate]
,[User]
,count(*) as [Usage]
FROM [EFDP_Dev].[Admin].[AuditLog]
WHERE [target] = '995fc819-954a-49af-b056-387e11a8875d'
GROUP BY [Target], [User] ,[TimeStampDate]
ORDER BY [Target]
My database table has the columns User, TimeStampDate, and Target (which is a GUID).
I want to retrieve all items for each date for each user and display count of entries.
The above SQL query works. How can I convert it into LINQ to SQL? Am using EF 6.1 and my entity class in C# has all the above columns.
Create Filter basically returns an IQueryable of the entire AuditLogSet :
using (var filter = auditLogRepository.CreateFilter())
{
var query = filter.All
.Where(it => it.Target == '995fc819-954a-49af-b056-387e11a8875d')
.GroupBy(i => i.Target, i => i.User, i => i.TimeStamp);
audits = query.ToList();
}
Am not being allowed to group by on 3 columns in LINQ and I am also not sure how to select like the above SQL query with count. Fairly new to LINQ.
You need to specify the group by columns in an anonymous type like this:-
var query = filter.All
.Where(it => it.Target == '995fc819-954a-49af-b056-387e11a8875d')
.GroupBy(x => new { x.User, x.TimeStampDate })
.Select(x => new
{
TimeStampDate= x.Key.TimeStampDate,
User = x.Key.User,
Usage = x.Count()
}).ToList();
Many people find query syntax simpler and easier to read (this might not be the case, I don't know), here's the query syntax version anyway.
var res=(from it in filter.All
where it.Target=="995fc819-954a-49af-b056-387e11a8875d"
group it by new {it.Target, it.User, it.TimeStampDate} into g
orderby g.Key.Target
select new
{
TimeStampDate= g.Key.TimeStampDate,
User=g.Key.User,
Usage=g.Count()
});
EDIT: By the way you don't need to group by Target neither OrderBy, since is already filtered, I'm leaving the exact translation of the query though.
To use GroupBy you need to create an anonymous object like this:
filter.All
.Where(it => it.Target == '995fc819-954a-49af-b056-387e11a8875d')
.GroupBy(i => new { i.Target, i.User, i.TimeStamp });
It is unnecessary to group by target in your original SQL.
filter.All.Where( d => d.Target == "995fc819-954a-49af-b056-387e11a8875d")
.GroupBy(d => new {d.User ,d.TimeStampDate} )
.Select(d => new {
User = d.Key.User,
TimeStampDate = d.Key.TimeStampDate,
Usage = d.Count()
} );
Using either a Join or GroupJoin, is there any way to produce aggregates values for fields in both the parent and child tables. Given an Orders table and an OrderDetails table, Using the 2 steps below I can obtain an aggregate (MAX) from the Orders and an aggregate (SUM) from the OrderDetails.
STEP 1:
var query = from o in orders
join d in details on o.OrderId equals d.OrderId
select new
{
order = o.OrderId,
maximum = o.UserId,
quantity = d.Quantity
};
Step 2:
var result = (from q in query
group q by q.order into g
select new
{
OrderId = g.Key,
MaxUnits = g.Max(q => q.maximum),
Available = (g.Max(q => q.maximum) - g.Sum(q => q.quantity))
});
However, when I try to combine these as in:
var finalresult = orders
.GroupJoin( details,
o => o.OrderId,
d => d.OrderDetailId,
(o, grp) => new {
OrderId = o.OrderId,
MaxUnits = grp.Max(o => o.maximum),
Available = (grp.Max(o => o.maximum) - grp.Sum(d => d.Quantity))
});
.. the value 'o' is out of scope inside the grouped set 'grp'. So grp.Max(o => o.maximum) results in an error. It appears that only aggregate values for the child table (OrderDetail) are available.
So does anyone know if it is possible to obtain aggregates from both the Child and Parent tables in a single query?
result is a single query. The beauty of LINQ and deferred execution is that no actual computation has happened in Step 1, only a query has been defined. Step 2 then builds ontop of that query to create another single query. When you execute result that query will be executed as a single block.
I recommend splitting up larger queries into smaller easier to understand pieces like in the first two examples. Using good names for the queries can make them much easier to read. For example, I might name query orderQuantities. from q in query does not convey much meaning, but from oq in orderQuantities lets me know what kind of data the query is over.
If you really think you need them together:
var query = orders.Join(details, o => o.OrderId, d => d.OrderId,
(o, d) => new {
order = o.OrderId,
maximum = o.UserId,
quantity = d.Quantity
}).GroupBy(oq => oq.order)
.Select(g => new {
OrderId = g.Key,
MaxUnits = g.Max(q => q.maximum),
Available = (g.Max(q => q.maximum) - g.Sum(q => q.quantity))
});
Now that is ugly...