This is mostly a curiosity rather than a real problem as I've already fixed that bug. I would be glad to have a deep answer that explain the LINQ mechanics behind this wizardry. So I have this query:
List<List<IMS_CMM_Measurement>> imsCMMMeasurements =
(from i in context.IMS_CMM_Measurement
where i.Job_FK == jobId
&& currentOperationCharacteristics.Contains(i.Characteristic_FK)
select i)
.GroupBy(elm => elm.Characteristic_FK)
.Select(CharacGroup => CharacGroup.GroupBy(elm => elm.Part_Number))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.OrderByDescending(measure => measure.Timestamp)))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.FirstOrDefault()))
.Select(CharacGroup => CharacGroup.OrderBy(measure => measure.Part_Number))
.Select(CharacGroup => CharacGroup.ToList()).ToList();
Basically, it takes measurements from a database and group them by characteristics. For each characteristic there are several measures that correspond to different parts, and some parts are measured more than once. In the later case, we only take the most recent one measure (the one with the greatest Timestamp, wich is a Date format entry). In order to do that, I have to order the measure for each part by the timestamp in decreasing order. Unfortunately, this does the exact opposite: it takes the oldest measure.
I managed to get the appropriate result by doing this instead (adding .ToList() at the end of the LINQ query at the 4th line):
List<List<IMS_CMM_Measurement>> imsCMMMeasurements =
(from i in context.IMS_CMM_Measurement
where i.Job_FK == jobId
&& currentOperationCharacteristics.Contains(i.Characteristic_FK)
select i).ToList()
.GroupBy(elm => elm.Characteristic_FK)
.Select(CharacGroup => CharacGroup.GroupBy(elm => elm.Part_Number))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.OrderByDescending(measure => measure.Timestamp)))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.FirstOrDefault()))
.Select(CharacGroup => CharacGroup.OrderBy(measure => measure.Part_Number))
.Select(CharacGroup => CharacGroup.ToList()).ToList();
Why does this work now? ;)
Related
Suppose there are two properties in Myclass: Date, Symbol
I want to frequently convert between those two properties, but I find that
for List <Myclass> vector
if I use
vector.groupby(o => o.Date).Select(o => o)
the vector is no longer the type of List<IGrouping<string, Myclass>>
And if I want to convert groupby(o => o.Date) to groupby(o => o.Symbol)
I have to use
vector.groupby(o => o.Date).Selectmany(o => o).groupby(o => o.Symbol)
I try to use SortedList<Date, Myclass>, but I am not familiar with SortedList(actually, I don't know what's the difference between SortedList and Groupby).
Is there any effective way to achieve such effect, as I highly depend on the speed of running?
int volDay = 100;
Datetime today = new DateTime(2012, 1, 1);
//choose the effective database used today, that is the symbol with data more than volDay
var todayData = dataBase.Where(o => o.Date <= today).OrderByDescending(o => o.Date)
.GroupBy(o => o.Symbol).Select(o => o.Take(volDay))
.Where(o => o.Count() == volDay).SelectMany(o => o);
//Select symbols we want today
var symbolList = todayData
.Where(o => o.Date == today && o.Eqy_Dvd_Yld_12M > 0))
.OrderByDescending(o => o.CUR_MKT_CAP)
.Take((int)(1.5 * volDay)).Where(o => o.Close > o.DMA10)
.OrderBy(o => o.AnnualizedVolatility10)
.Take(volDay).Select(o => o.Symbol).ToList();
//Select the database again only for the symbols in symbolList
var portfolios = todayData.GroupBy(o => o.Symbol)
.Where(o=>symbolList.Contains(o.Key)).ToList();
This is my real code, dataBase is the total data, and I will run the cycle day by day(here just given a fixed day). The last List portfolios is the final goal I want obtain, you can ignore other properties, which are used for the selections under the collection of Date and Symbol
It may be faster, or at least easier to read, if you performed a .Distinct().
To get distinct Dates:
var distinctDates = vector.Select(o => o.Date).Distinct()
To get distinct Symbols:
var distinctSymbols = vector.Select(o => o.Symbol).Distinct()
I asked what you were trying to accomplish so that I can provide you with a useful answer. Do you need both values together? E.g., the unique set of symbols and dates? You should only need a single group by statement depending on what you are ultimately trying to achieve.
E.g., this question Group By Multiple Columns would be relevant if you want to group by multiple properties and track the two unique pieces of data. a .Distinct() after the grouping should still work.
I'm working on a report right now that runs great with our on-premises DB (just refreshed from PROD). However, when I deploy the site to Azure, I get a SQL Timeout during its execution. If I point my development instance at the SQL Azure instance, I get a timeout as well.
Goal: To output a list of customers that have had an activity created during the search range, and when that customer is found, get some other information about that customer regarding policies, etc. I've removed some of the properties below for brevity (as best I can)...
UPDATE
After lots of trial and error, I can get the entire query to run fairly consistently within 1000MS so long as this block of code is not executed.
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Status.Name)
.FirstOrDefault(),
With this code in place, things begin to go haywire. I think this Where clause is a big part of it: .Where(b => b.ActivityType.IsReportable). What is the best way to grab the status name?
EXISTING CODE
Any thoughts as to why SQL Azure would timeout whereas on-premises would turn this around in less than 100MS?
return db.Customers
.Where(a => a.Activities.Where(
b => b.CreatedDateTime >= search.BeginDateCreated
&& b.CreatedDateTime <= search.EndDateCreated).Count() > 0)
.Where(a => a.CustomerGroup.Any(d => d.GroupId== search.GroupId))
.Select(a => new CustomCustomerReport
{
CustomerId = a.Id,
Manager = a.Manager.Name,
Customer = a.FirstName + " " + a.LastName,
ContactSource= a.ContactSource!= null ? a.ContactSource.Name : "Unknown",
ContactDate = a.DateCreated,
NewSale = a.Sales
.Where(p => p.Employee.IsActive)
.OrderByDescending(p => p.DateCreated)
.Select(p => new PolicyViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
ExistingSale = a.Sales
.Where(p => p.CancellationDate == null || p.CancellationDate <= myDate)
.Where(p => p.SaleDate < myDate)
.OrderByDescending(p => p.DateCreated)
.Select(p => new SalesViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Disposition.Name)
.FirstOrDefault(),
CustomerGroup = a.CustomerGroup
.Where(cd => cd.GroupId == search.GroupId)
.Select(cd => new GroupViewModel
{
//MISC PROPERTIES
}).FirstOrDefault()
}).ToList();
I cannot give you a definite answer but I would recommend approaching the problem by:
Run SQL profiler locally when this code is executed and see what SQL is generated and run. Look at the query execution plan for each query and look for table scans and other slow operations. Add indexes as needed.
Check your lambdas for things that cannot be easily translated into SQL. You might be pulling the contents of a table into memory and running lambdas on the results, which will be very slow. Change your lambdas or consider writing raw SQL.
Is the Azure database the same as your local database? If not, pull the data locally so your local system is indicative.
Remove sections (i.e. CustomerGroup then CurrentDisposition then ExistingSale then NewSale) and see if there is a significant performance improvement after removing the last section. Focus on the last removed section.
Looking at the line itself:
You use ".Count() > 0" on line 4. Use ".Any()" instead, since the former goes through every row in the database to get you an accurate count when you just want to know if at least one row satisfies the requirements.
Ensure fields referenced in where clauses have indexes, such as IsReportable.
Short answer: use memory.
Long answer:
Because of either bad maintenance plans or limited hardware, running this query in one big lump is what's causing it to fail on Azure. Even if that weren't the case, because of all the navigation properties you're using, this query would generate a staggering number of joins. The answer here is to break it down in smaller pieces that Azure can run. I'm going to try to rewrite your query into multiple smaller, easier to digest queries that use the memory of your .NET application. Please bear with me as I make (more or less) educated guesses about your business logic/db schema and rewrite the query accordingly. Sorry for using the query form of LINQ but I find things such as join and group by are more readable in that form.
var activityFilterCustomerIds = db.Activities
.Where(a =>
a.CreatedDateTime >= search.BeginDateCreated &&
a.CreatedDateTime <= search.EndDateCreated)
.Select(a => a.CustomerId)
.Distinct()
.ToList();
var groupFilterCustomerIds = db.CustomerGroup
.Where(g => g.GroupId = search.GroupId)
.Select(g => g.CustomerId)
.Distinct()
.ToList();
var customers = db.Customers
.AsNoTracking()
.Where(c =>
activityFilterCustomerIds.Contains(c.Id) &&
groupFilterCustomerIds.Contains(c.Id))
.ToList();
var customerIds = customers.Select(x => x.Id).ToList();
var newSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& s.Employee.IsActive
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new PolicyViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var existingSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& (s.CancellationDate == null || s.CancellationDate <= myDate)
&& s.SaleDate < myDate
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new SalesViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var currentStatuses =
(from a in db.Activities.AsNoTracking()
where customerIds.Contains(a.CustomerId)
&& a.ActivityType.IsReportable
group a by a.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Status = grouped
.OrderByDescending(x => x.DueDateTime)
.Select(x => x.Disposition.Name)
.FirstOrDefault()
}).ToList();
var customerGroups =
(from cg in db.CustomerGroups
where cg.GroupId == search.GroupId
group cg by cg.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Group = grouped
.Select(x =>
new GroupViewModel
{
// ...
})
.FirstOrDefault()
}).ToList();
return customers
.Select(c =>
new CustomCustomerReport
{
// ... simple props
// ...
// ...
NewSale = newSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
ExistingSale = existingSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
CurrentStatus = currentStatuses
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Status)
.FirstOrDefault(),
CustomerGroup = customerGroups
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Group)
.FirstOrDefault(),
})
.ToList();
Hard to suggest anything without seeing actual table definitions, espectially the indexes and foreign keys on Activities entity.
As far I understand Activity (CustomerId, ActivityTypeId, DueDateTime, DispositionId). If this is standard warehousing table (DateTime, ClientId, Activity), I'd suggest the following:
If number of Activities is reasonably small, then force the use of CONTAINS by
var activities = db.Activities.Where( x => x.IsReportable ).ToList();
...
.Where( b => activities.Contains(b.Activity) )
You can even help the optimiser by specifying that you want ActivityId.
Indexes on Activitiy entity should be up to date. For this particular query I suggest (CustomerId, ActivityId, DueDateTime DESC)
precache Disposition table, my crystal ball tells me that it's dictionary table.
For similar task to avoid constantly hitting Activity table I made another small table (CustomerId, LastActivity, LastVAlue) and updated it as the status changed.
What is the most efficient way to order a LocalDb table in descending order by four columns? I have a table that tracks a file storage hierarchy. Four folders act like an odometer (one digit for each folder). The table reflects this as a "storage item." I need to find the highest number using all four folders.
Here is the code I am currently using. I am worried that it is not efficient or accurate for a LocalDb database...
public StorageItem GetLastItem()
{
var item = _context.StorageItems.AsNoTracking()
.OrderByDescending(x => x.LevelA) // int
.OrderByDescending(x => x.LevelB) // int
.OrderByDescending(x => x.LevelC) // int
.OrderByDescending(x => x.ItemNumber) // int
.Where(x => !x.AuditDateDeleted.HasValue) // DateTime?
FirstOrDefault();
// Caching logic here
return item;
}
I don't think it'll be inefficient, but chaining a bunch of OrderByDescendings is probably not what you intended to do. Currently, this should generate a SQL ORDER BY clause of ItemNumber DESC, LevelC DESC, LevelB DESC, LevelA DESC. I think you want to use ThenByDescending...
var item = _context.StorageItems.AsNoTracking()
.Where(x => !x.AuditDateDeleted.HasValue)
.OrderByDescending(x => x.LevelA)
.ThenByDescending(x => x.LevelB)
.ThenByDescending(x => x.LevelC)
.ThenByDescending(x => x.ItemNumber)
.FirstOrDefault();
Also moved the where clause higher up, although I think the database should be smart enough to optimize that.
I know that this won't work as written, but I'm struggling to see the right answer, and this non-functional code hopefully illustrates what I'm trying to achieve:
var defaults = _cilQueryContext.DefaultCharges
.Where(dc => dc.ChargingSchedule_RowId == cs.RowId);
List<DevelopmentType> devTypes =
defaults.Select(dc => dc.DevelopmentType)
.Include(d => d.DefaultCharges)
.Include(d => d.OverrideCharges.Where(oc => oc.ChargingSchedule_RowId == cs.RowId))
.Include(d => d.OverrideCharges.Select(o => o.Zone))
.ToList();
Essentially, I had presumed this required a join, but seeing as I'm trying to select a parent object containing two related types of children, I can't see what would go in the join's "select new" clause.
As far as I am aware Include does not support this type of sub-querying. Your best option is to use projection e.g.
List<DevelopmentType> devTypes =
defaults.Include(x => x.DefaultCharges)
.Include(x => x.OverrideCharges)
.Select(x => new {
DevType = x.DevelopmentType,
Zones = x.OverrideCharges.Where(oc => oc.ChargingSchedule_RowId == cs.RowId)
.Select(oc => oc.Zone).ToList()
})
.Select(x => x.DevType)
.ToList();
This question already has answers here:
List sort based on another list
(3 answers)
Closed 9 years ago.
I am building a search function which needs to return a list ordered by relevance.
IList<ProjectDTO> projects = new List<ProjectDTO>();
projects = GetSomeProjects();
List<ProjectDTO> rawSearchResults = new List<ProjectDTO>();
//<snip> - do the various search functions here and write to the rawSearchResults
//now take the raw list of projects and group them into project number and
//number of search returns.
//we will sort by number of search returns and then last updated date
var orderedProjects = rawSearchResults.GroupBy(x => x.ProjectNbr)
.Select(x => new
{
Count = x.Count(),
ProjectNbr = x.Key,
LastUpdated = x.First().UpdatedDateTime
})
.OrderByDescending(x => x.Count)
.ThenByDescending(x => x.LastUpdated);
So far so good; the "orderedProjects" variable returns my list in the correct order. However, I need the entire object for the next step. When I try to query back to get the original object type, my results lose their order. In retrospect, this makes sense, but I need to find a way around it.
projects = (from p in projects
where orderedProjects.Any(o => o.ProjectNbr == p.ProjectNbr)
select p).ToList();
Is there a LINQ-friendly method for preserving the order in the above projects query?
I can loop through the orderedProject list and get each item, but that's not very efficient. I can also rebuild the entire object in the original orderedProjects query, but I'd like to avoid that if possible.
You need to do it the other way around:
Query orderedProjects and select the corresponding items from projects:
var projects =
orderedProjects
.Select(o => projects.SingleOrDefault(p => p.ProjectNbr == o.ProjectNbr))
.Where(x => x != null) // This is only necessary if there can be
// ProjectNbrs in orderedProjects that are not in
// projects
.ToList();
You shouldn't use "Select" in the middle there as that operator transforms the object into another type and you say that you need the original object.
var orderedProjects = rawSearchResults.GroupBy(x => x.ProjectNbr)
.OrderByDescending(x => x.Count)
.ThenByDescending(x => x.First().UpdatedDateTime);
Do they come in chronological order or something? Otherwise, I'm pretty sure you want the "ThenByDescending" to be performed on the newest or oldest project update like so:
var orderedProjects = rawSearchResults.GroupBy(x => x.ProjectNbr)
.OrderByDescending(x => x.Count)
.ThenByDescending(x => x.Max(p=>p.UpdatedDateTime));