Paging the result group by query MongoDB strongly typed C# - c#

I am wondering if anyone can help me. I have created a query which returns duplicates grouped by an identifier and then pages the result-set (which works fine).
The advice I am seeking, is in regard to the most efficient way to get the total result count for the paging, whilst using the same filter. Can the query's be combined using count facet and data facet as oppose to the the way I have done below.
Working part
var filter = Builders<DuplicateOccurrence>.Filter.Eq(x => x.Id, occurrences.Id);
var data = await _baseRepository.DbCollection().Aggregate()
.Match(filter)
.SortByDescending(x => x.Identifier)
.Group(e => e.Identifier, g => new
{
Identifier= g.Key,
Occurred = g.Select(x => new
{
Id = x.Id
})
}).Skip((occurrences.CurrentPage - 1) * occurrences.PageSize).Limit(occurrences.PageSize)
.ToListAsync(cancellationToken);
Seeking advice of getting total count
var count = _baseRepository.DbCollection()
.AsQueryable().Where(x=> x.DetectionReportId == occurrences.DetectionReportObjectId)
.GroupBy(s => s.Identifier)
.Count();

Related

Convert Sql to linq with groupby

I have view on which I use this request
Select Spendband, SUM(SpendCurrencyJob), SUM(SpendDocumentCount)
From analysis.vwJobSupplierMetrics
Where JobId = '500E0DD1-E3D3-4887-95EF-01D3C9EA8FD0'
Group by SpendBand
And it's running sucessfully
and get me this data
How I need to write it using linq to get same data?
I tried like this
var data = await _dbContext.VwJobSupplierMetrics.Where(x => x.JobId == jobId)
.GroupBy(x => x.SpendBand)
.Select(x => new HumpChartDto() {SpendBand = x.SpendBand}).ToListAsync();
But on new HumpChartDto() {SpendBand = x.SpendBand} I got Cannot resolve symbol 'SpendBand
How I can solve this?
First, after grouping on SpendBand, you need to access it via Key property. Second, to compute Sum, you can use Sum method.
var data = await _dbContext.VwJobSupplierMetrics.Where(x => x.JobId == jobId)
.GroupBy(x => x.SpendBand)
.Select(x => new HumpChartDto()
{
SpendBand = x.Key,
SumOfSpendCurrencyJob = x.Sum(s => s.SpendCurrencyJob),
SumOfSpendDocumentCount= x.Sum(s => s.SpendDocumentCount),
})
.ToListAsync();
Note - change the property name accordingly for name I've used for SumOfSpendCurrencyJob and SumOfSpendDocumentCount as don't know the definition of HumpChartDto class.

Convert SQL query with multiple GroupBy columns to LINQ

SELECT
[TimeStampDate]
,[User]
,count(*) as [Usage]
FROM [EFDP_Dev].[Admin].[AuditLog]
WHERE [target] = '995fc819-954a-49af-b056-387e11a8875d'
GROUP BY [Target], [User] ,[TimeStampDate]
ORDER BY [Target]
My database table has the columns User, TimeStampDate, and Target (which is a GUID).
I want to retrieve all items for each date for each user and display count of entries.
The above SQL query works. How can I convert it into LINQ to SQL? Am using EF 6.1 and my entity class in C# has all the above columns.
Create Filter basically returns an IQueryable of the entire AuditLogSet :
using (var filter = auditLogRepository.CreateFilter())
{
var query = filter.All
.Where(it => it.Target == '995fc819-954a-49af-b056-387e11a8875d')
.GroupBy(i => i.Target, i => i.User, i => i.TimeStamp);
audits = query.ToList();
}
Am not being allowed to group by on 3 columns in LINQ and I am also not sure how to select like the above SQL query with count. Fairly new to LINQ.
You need to specify the group by columns in an anonymous type like this:-
var query = filter.All
.Where(it => it.Target == '995fc819-954a-49af-b056-387e11a8875d')
.GroupBy(x => new { x.User, x.TimeStampDate })
.Select(x => new
{
TimeStampDate= x.Key.TimeStampDate,
User = x.Key.User,
Usage = x.Count()
}).ToList();
Many people find query syntax simpler and easier to read (this might not be the case, I don't know), here's the query syntax version anyway.
var res=(from it in filter.All
where it.Target=="995fc819-954a-49af-b056-387e11a8875d"
group it by new {it.Target, it.User, it.TimeStampDate} into g
orderby g.Key.Target
select new
{
TimeStampDate= g.Key.TimeStampDate,
User=g.Key.User,
Usage=g.Count()
});
EDIT: By the way you don't need to group by Target neither OrderBy, since is already filtered, I'm leaving the exact translation of the query though.
To use GroupBy you need to create an anonymous object like this:
filter.All
.Where(it => it.Target == '995fc819-954a-49af-b056-387e11a8875d')
.GroupBy(i => new { i.Target, i.User, i.TimeStamp });
It is unnecessary to group by target in your original SQL.
filter.All.Where( d => d.Target == "995fc819-954a-49af-b056-387e11a8875d")
.GroupBy(d => new {d.User ,d.TimeStampDate} )
.Select(d => new {
User = d.Key.User,
TimeStampDate = d.Key.TimeStampDate,
Usage = d.Count()
} );

SQL Azure vs. On-Premises Timeout Issue - EF

I'm working on a report right now that runs great with our on-premises DB (just refreshed from PROD). However, when I deploy the site to Azure, I get a SQL Timeout during its execution. If I point my development instance at the SQL Azure instance, I get a timeout as well.
Goal: To output a list of customers that have had an activity created during the search range, and when that customer is found, get some other information about that customer regarding policies, etc. I've removed some of the properties below for brevity (as best I can)...
UPDATE
After lots of trial and error, I can get the entire query to run fairly consistently within 1000MS so long as this block of code is not executed.
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Status.Name)
.FirstOrDefault(),
With this code in place, things begin to go haywire. I think this Where clause is a big part of it: .Where(b => b.ActivityType.IsReportable). What is the best way to grab the status name?
EXISTING CODE
Any thoughts as to why SQL Azure would timeout whereas on-premises would turn this around in less than 100MS?
return db.Customers
.Where(a => a.Activities.Where(
b => b.CreatedDateTime >= search.BeginDateCreated
&& b.CreatedDateTime <= search.EndDateCreated).Count() > 0)
.Where(a => a.CustomerGroup.Any(d => d.GroupId== search.GroupId))
.Select(a => new CustomCustomerReport
{
CustomerId = a.Id,
Manager = a.Manager.Name,
Customer = a.FirstName + " " + a.LastName,
ContactSource= a.ContactSource!= null ? a.ContactSource.Name : "Unknown",
ContactDate = a.DateCreated,
NewSale = a.Sales
.Where(p => p.Employee.IsActive)
.OrderByDescending(p => p.DateCreated)
.Select(p => new PolicyViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
ExistingSale = a.Sales
.Where(p => p.CancellationDate == null || p.CancellationDate <= myDate)
.Where(p => p.SaleDate < myDate)
.OrderByDescending(p => p.DateCreated)
.Select(p => new SalesViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Disposition.Name)
.FirstOrDefault(),
CustomerGroup = a.CustomerGroup
.Where(cd => cd.GroupId == search.GroupId)
.Select(cd => new GroupViewModel
{
//MISC PROPERTIES
}).FirstOrDefault()
}).ToList();
I cannot give you a definite answer but I would recommend approaching the problem by:
Run SQL profiler locally when this code is executed and see what SQL is generated and run. Look at the query execution plan for each query and look for table scans and other slow operations. Add indexes as needed.
Check your lambdas for things that cannot be easily translated into SQL. You might be pulling the contents of a table into memory and running lambdas on the results, which will be very slow. Change your lambdas or consider writing raw SQL.
Is the Azure database the same as your local database? If not, pull the data locally so your local system is indicative.
Remove sections (i.e. CustomerGroup then CurrentDisposition then ExistingSale then NewSale) and see if there is a significant performance improvement after removing the last section. Focus on the last removed section.
Looking at the line itself:
You use ".Count() > 0" on line 4. Use ".Any()" instead, since the former goes through every row in the database to get you an accurate count when you just want to know if at least one row satisfies the requirements.
Ensure fields referenced in where clauses have indexes, such as IsReportable.
Short answer: use memory.
Long answer:
Because of either bad maintenance plans or limited hardware, running this query in one big lump is what's causing it to fail on Azure. Even if that weren't the case, because of all the navigation properties you're using, this query would generate a staggering number of joins. The answer here is to break it down in smaller pieces that Azure can run. I'm going to try to rewrite your query into multiple smaller, easier to digest queries that use the memory of your .NET application. Please bear with me as I make (more or less) educated guesses about your business logic/db schema and rewrite the query accordingly. Sorry for using the query form of LINQ but I find things such as join and group by are more readable in that form.
var activityFilterCustomerIds = db.Activities
.Where(a =>
a.CreatedDateTime >= search.BeginDateCreated &&
a.CreatedDateTime <= search.EndDateCreated)
.Select(a => a.CustomerId)
.Distinct()
.ToList();
var groupFilterCustomerIds = db.CustomerGroup
.Where(g => g.GroupId = search.GroupId)
.Select(g => g.CustomerId)
.Distinct()
.ToList();
var customers = db.Customers
.AsNoTracking()
.Where(c =>
activityFilterCustomerIds.Contains(c.Id) &&
groupFilterCustomerIds.Contains(c.Id))
.ToList();
var customerIds = customers.Select(x => x.Id).ToList();
var newSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& s.Employee.IsActive
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new PolicyViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var existingSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& (s.CancellationDate == null || s.CancellationDate <= myDate)
&& s.SaleDate < myDate
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new SalesViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var currentStatuses =
(from a in db.Activities.AsNoTracking()
where customerIds.Contains(a.CustomerId)
&& a.ActivityType.IsReportable
group a by a.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Status = grouped
.OrderByDescending(x => x.DueDateTime)
.Select(x => x.Disposition.Name)
.FirstOrDefault()
}).ToList();
var customerGroups =
(from cg in db.CustomerGroups
where cg.GroupId == search.GroupId
group cg by cg.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Group = grouped
.Select(x =>
new GroupViewModel
{
// ...
})
.FirstOrDefault()
}).ToList();
return customers
.Select(c =>
new CustomCustomerReport
{
// ... simple props
// ...
// ...
NewSale = newSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
ExistingSale = existingSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
CurrentStatus = currentStatuses
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Status)
.FirstOrDefault(),
CustomerGroup = customerGroups
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Group)
.FirstOrDefault(),
})
.ToList();
Hard to suggest anything without seeing actual table definitions, espectially the indexes and foreign keys on Activities entity.
As far I understand Activity (CustomerId, ActivityTypeId, DueDateTime, DispositionId). If this is standard warehousing table (DateTime, ClientId, Activity), I'd suggest the following:
If number of Activities is reasonably small, then force the use of CONTAINS by
var activities = db.Activities.Where( x => x.IsReportable ).ToList();
...
.Where( b => activities.Contains(b.Activity) )
You can even help the optimiser by specifying that you want ActivityId.
Indexes on Activitiy entity should be up to date. For this particular query I suggest (CustomerId, ActivityId, DueDateTime DESC)
precache Disposition table, my crystal ball tells me that it's dictionary table.
For similar task to avoid constantly hitting Activity table I made another small table (CustomerId, LastActivity, LastVAlue) and updated it as the status changed.

Linq: Best way to select parent rows order by count of children

I have got a table for Comments like following:
Comment{
ID,
Text,
ParentID
}
I am using following query to select the popular comments with paging based on number of replies.
var comments = db.Comments
.OrderByDescending(c => db.Comments.Count(r => r.ParentID == c.ID)).Skip(skip).Take(recordsPerPage).ToList();
Please let me know the best way of handling this situation when we have thousands of comments?
I would consider adding an extra column to Comment that stores the replies count.Then instead making a nested query you can easily order your Comments by replies count.
var comments = db.Comments.Skip(skip).Take(recordsPerPage)
.OrderByDescending(c => c.ReplyCount)
.ToList();
Unless you are prepared to pre-calculate this in the database then you have the problem that you need to either do nested queries or to do a single full fetch an then do everything in memory. The latter is my choice until it is proven to be too slow.
Here's how I'd initially do it.
First, pre-fetch:
var allComments = Comments.ToArray();
Then create a function that will quickly return the count of comments:
var childrenLookup = allComments.ToLookup(x => x.ParentID);
var parentMap = allComments.ToDictionary(x => x.ID, x => x.ParentID);
Func<int, int> getCommentsCount = n =>
{
var r = 0;
if (parentMap.ContainsKey(n))
{
r = childrenLookup[parentMap[n]].Count();
}
return r;
};
Now it is almost trivial to return the results:
var comments = allComments
.OrderByDescending(c => getCommentsCount(c.ID))
.Skip(skip)
.Take(recordsPerPage)
.ToList();
(And, yes, your ordering is in the wrong order to you skip and take for paging.)
If you can't do this in memory then go with the pre-calculate approach.

Counting grouped data with Linq to Sql

I have a database of documents in an array, each with an owner and a document type, and I'm trying to get a list of the 5 most common document types for a specific user.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id);
This returns all the documents belonging to a specific owner and grouped as I need them, I now need a way to extract the ids of the most common document types. I'm not too familiar with Linq to Sql, so any help would be great.
This would order the groups by count descending and then take the top 5 of them, you could adapt to another number or completely take out the Take() if its not needed in your case:
var mostCommon = docTypes.OrderByDescending( x => x.Count()).Take(5);
To just select the top document keys:
var mostCommonDocTypes = docTypes.OrderByDescending( x => x.Count())
.Select( x=> x.Key)
.Take(5);
You can also of course combine this with your original query by appending/chaining it, just separated for clarity in this answer.
Using the Select you can get the value from the Key of the Grouping (the Id) and then a count of each item in the grouping.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id)
.Select(groupingById=>
new
{
Id = groupingById.Key,
Count = groupingById.Count(),
})
.OrderByDescending(x => x.Count);

Categories