Suppose there are two properties in Myclass: Date, Symbol
I want to frequently convert between those two properties, but I find that
for List <Myclass> vector
if I use
vector.groupby(o => o.Date).Select(o => o)
the vector is no longer the type of List<IGrouping<string, Myclass>>
And if I want to convert groupby(o => o.Date) to groupby(o => o.Symbol)
I have to use
vector.groupby(o => o.Date).Selectmany(o => o).groupby(o => o.Symbol)
I try to use SortedList<Date, Myclass>, but I am not familiar with SortedList(actually, I don't know what's the difference between SortedList and Groupby).
Is there any effective way to achieve such effect, as I highly depend on the speed of running?
int volDay = 100;
Datetime today = new DateTime(2012, 1, 1);
//choose the effective database used today, that is the symbol with data more than volDay
var todayData = dataBase.Where(o => o.Date <= today).OrderByDescending(o => o.Date)
.GroupBy(o => o.Symbol).Select(o => o.Take(volDay))
.Where(o => o.Count() == volDay).SelectMany(o => o);
//Select symbols we want today
var symbolList = todayData
.Where(o => o.Date == today && o.Eqy_Dvd_Yld_12M > 0))
.OrderByDescending(o => o.CUR_MKT_CAP)
.Take((int)(1.5 * volDay)).Where(o => o.Close > o.DMA10)
.OrderBy(o => o.AnnualizedVolatility10)
.Take(volDay).Select(o => o.Symbol).ToList();
//Select the database again only for the symbols in symbolList
var portfolios = todayData.GroupBy(o => o.Symbol)
.Where(o=>symbolList.Contains(o.Key)).ToList();
This is my real code, dataBase is the total data, and I will run the cycle day by day(here just given a fixed day). The last List portfolios is the final goal I want obtain, you can ignore other properties, which are used for the selections under the collection of Date and Symbol
It may be faster, or at least easier to read, if you performed a .Distinct().
To get distinct Dates:
var distinctDates = vector.Select(o => o.Date).Distinct()
To get distinct Symbols:
var distinctSymbols = vector.Select(o => o.Symbol).Distinct()
I asked what you were trying to accomplish so that I can provide you with a useful answer. Do you need both values together? E.g., the unique set of symbols and dates? You should only need a single group by statement depending on what you are ultimately trying to achieve.
E.g., this question Group By Multiple Columns would be relevant if you want to group by multiple properties and track the two unique pieces of data. a .Distinct() after the grouping should still work.
Related
Hello this is a LINQ Query but it doesn't sort properly because four different dates are involved.
var EventReportRemarks = (from i in _context.pm_main_repz
.Include(a => a.PM_Evt_Cat)
.Include(b => b.department)
.Include(c => c.employees)
.Include(d => d.provncs)
where i.department.DepartmentName == "Finance"
orderby i.English_seen_by_executive_on descending
orderby i.Brief_seen_by_executive_on descending
orderby i.French_seen_by_executive_on descending
orderby i.Russian_seen_by_executive_on descending
select i).ToList();
All i want is that it should somehow combine the four dates and sort them in group not one by one.
For Example, at the moment it sorts all English Reports based on the date that executive has seen it, then Brief Report and So on.
But i want that it should check which one is seen first and so on. For example if the first report which is seen is French, then Brief, then English then Russian, so it should sort it accordingly.
Is it Possible??
You need to have them all in one column. The approach I would do, assuming that the value of the respective cells is null, when you don't want them to show up in the order by:
var EventReportRemarks = (from i in _context.pm_main_repz
.Include(a => a.PM_Evt_Cat)
.Include(b => b.department)
.Include(c => c.employees)
.Include(d => d.provncs)
where i.department.DepartmentName == "Finance"
select new
{
Date =
(
i.English_seen_by_executive_on != null ? i.English_seen_by_executive_on :
i.Brief_seen_by_executive_on != null ? i.Brief_seen_by_executive_on :
i.French_seen_by_executive_on != null ? i.French_seen_by_executive_on :
i.Russian_seen_by_executive_on
)
}).ToList().OrderBy(a => a.Date);
In the select clause you could add more columns if you whish.
Reference taken from here.
Why not just use .Min() or .Max() on the dates and then .OrderBy() or .OrderByDescending() based on that?
Logic is creating a new Enumerable (here, an array) with the 4 dates for the current line, and calculate the Max/Min of the 4 dates: this results in getting the latest/earliest of the 4. Then order the records based on this value.
var EventReportRemarks = (from i in _context.pm_main_repz
.Include(a => a.PM_Evt_Cat)
.Include(b => b.department)
.Include(c => c.employees)
.Include(d => d.provncs)
where i.department.DepartmentName == "Finance"
select i)
.OrderBy(i => new[]{
i.English_seen_by_executive_on,
i.Brief_seen_by_executive_on,
i.French_seen_by_executive_on,
i.Russian_seen_by_executive_on
}.Max())
.ToList();
Your problem is not a problem if you use method syntax for your LINQ query instead of query syntax.
var EventReportRemarks = _context.pm_main_repz
.Where(rep => rep.Department.DepartmentName == "Finance")
.OrderByDescending(rep => rep.English_seen_by_executive_on)
.ThenByDescending(rep => rep.Brief_seen_by_executive_on)
.ThenByDescending(rep => rep.French_seen_by_executive_on descending)
.ThenByDescending(rep => resp.Russian_seen_by_executive_on descending)
.Select(rep => ...);
Optimization
One of the slower parts of a database query is the transport of selected data from the DBMS to your local process. Hence it is wise to limit the transported data to values you actually plan to use.
You transport way more data than you need to.
For example. Every pm_main_repz (my, you do love to use easy identifiers for your items, don't you?), every pm_main_repz has zero or more Employees. Every Employees belongs to exactly one pm_main_repz using a foreign key like pm_main_repzId.
If you use include to transport pm_main_repz 4 with his 1000 Employees every Employee will have a pm_main_repzId with value 4. You'll transport this value 1001 times, while 1 time would have been enough
Always use Select to select data from the database and Select only the properties you actually plan to use. Only use Include if you plan to update the fetched objects
Consider using a proper Select where you only select the items that you actually plan to use:
.Select(rep => new
{
// only Select the rep properties you actually plan to use:
Id = rep.Id,
Name = rep.Name,
...
Employees = rep.Employees.Select(employee => new
{
// again: select only the properties you plan to use
Id = employee.Id,
Name = employee.Name,
// not needed: foreign key to pm_main_repz
// pm_main_repzId = rep.pm_main_repzId,
})
.ToList(),
Department = new
{
Id = rep.Department,
...
}
// etc for pm_evt_cat and provencs
});
This is mostly a curiosity rather than a real problem as I've already fixed that bug. I would be glad to have a deep answer that explain the LINQ mechanics behind this wizardry. So I have this query:
List<List<IMS_CMM_Measurement>> imsCMMMeasurements =
(from i in context.IMS_CMM_Measurement
where i.Job_FK == jobId
&& currentOperationCharacteristics.Contains(i.Characteristic_FK)
select i)
.GroupBy(elm => elm.Characteristic_FK)
.Select(CharacGroup => CharacGroup.GroupBy(elm => elm.Part_Number))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.OrderByDescending(measure => measure.Timestamp)))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.FirstOrDefault()))
.Select(CharacGroup => CharacGroup.OrderBy(measure => measure.Part_Number))
.Select(CharacGroup => CharacGroup.ToList()).ToList();
Basically, it takes measurements from a database and group them by characteristics. For each characteristic there are several measures that correspond to different parts, and some parts are measured more than once. In the later case, we only take the most recent one measure (the one with the greatest Timestamp, wich is a Date format entry). In order to do that, I have to order the measure for each part by the timestamp in decreasing order. Unfortunately, this does the exact opposite: it takes the oldest measure.
I managed to get the appropriate result by doing this instead (adding .ToList() at the end of the LINQ query at the 4th line):
List<List<IMS_CMM_Measurement>> imsCMMMeasurements =
(from i in context.IMS_CMM_Measurement
where i.Job_FK == jobId
&& currentOperationCharacteristics.Contains(i.Characteristic_FK)
select i).ToList()
.GroupBy(elm => elm.Characteristic_FK)
.Select(CharacGroup => CharacGroup.GroupBy(elm => elm.Part_Number))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.OrderByDescending(measure => measure.Timestamp)))
.Select(CharacGroup => CharacGroup.Select(PartGroup => PartGroup.FirstOrDefault()))
.Select(CharacGroup => CharacGroup.OrderBy(measure => measure.Part_Number))
.Select(CharacGroup => CharacGroup.ToList()).ToList();
Why does this work now? ;)
This question already has answers here:
List sort based on another list
(3 answers)
Closed 9 years ago.
I am building a search function which needs to return a list ordered by relevance.
IList<ProjectDTO> projects = new List<ProjectDTO>();
projects = GetSomeProjects();
List<ProjectDTO> rawSearchResults = new List<ProjectDTO>();
//<snip> - do the various search functions here and write to the rawSearchResults
//now take the raw list of projects and group them into project number and
//number of search returns.
//we will sort by number of search returns and then last updated date
var orderedProjects = rawSearchResults.GroupBy(x => x.ProjectNbr)
.Select(x => new
{
Count = x.Count(),
ProjectNbr = x.Key,
LastUpdated = x.First().UpdatedDateTime
})
.OrderByDescending(x => x.Count)
.ThenByDescending(x => x.LastUpdated);
So far so good; the "orderedProjects" variable returns my list in the correct order. However, I need the entire object for the next step. When I try to query back to get the original object type, my results lose their order. In retrospect, this makes sense, but I need to find a way around it.
projects = (from p in projects
where orderedProjects.Any(o => o.ProjectNbr == p.ProjectNbr)
select p).ToList();
Is there a LINQ-friendly method for preserving the order in the above projects query?
I can loop through the orderedProject list and get each item, but that's not very efficient. I can also rebuild the entire object in the original orderedProjects query, but I'd like to avoid that if possible.
You need to do it the other way around:
Query orderedProjects and select the corresponding items from projects:
var projects =
orderedProjects
.Select(o => projects.SingleOrDefault(p => p.ProjectNbr == o.ProjectNbr))
.Where(x => x != null) // This is only necessary if there can be
// ProjectNbrs in orderedProjects that are not in
// projects
.ToList();
You shouldn't use "Select" in the middle there as that operator transforms the object into another type and you say that you need the original object.
var orderedProjects = rawSearchResults.GroupBy(x => x.ProjectNbr)
.OrderByDescending(x => x.Count)
.ThenByDescending(x => x.First().UpdatedDateTime);
Do they come in chronological order or something? Otherwise, I'm pretty sure you want the "ThenByDescending" to be performed on the newest or oldest project update like so:
var orderedProjects = rawSearchResults.GroupBy(x => x.ProjectNbr)
.OrderByDescending(x => x.Count)
.ThenByDescending(x => x.Max(p=>p.UpdatedDateTime));
I have a database of documents in an array, each with an owner and a document type, and I'm trying to get a list of the 5 most common document types for a specific user.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id);
This returns all the documents belonging to a specific owner and grouped as I need them, I now need a way to extract the ids of the most common document types. I'm not too familiar with Linq to Sql, so any help would be great.
This would order the groups by count descending and then take the top 5 of them, you could adapt to another number or completely take out the Take() if its not needed in your case:
var mostCommon = docTypes.OrderByDescending( x => x.Count()).Take(5);
To just select the top document keys:
var mostCommonDocTypes = docTypes.OrderByDescending( x => x.Count())
.Select( x=> x.Key)
.Take(5);
You can also of course combine this with your original query by appending/chaining it, just separated for clarity in this answer.
Using the Select you can get the value from the Key of the Grouping (the Id) and then a count of each item in the grouping.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id)
.Select(groupingById=>
new
{
Id = groupingById.Key,
Count = groupingById.Count(),
})
.OrderByDescending(x => x.Count);
Based on my previous question, I've trying now to have them in the following order using the same approach, OrderByDescending and ThenBy
Original (can be in any random order):
1:1
0:0
0:1
2:1
1:0
2:0
Output
2:0
1:0
0:0
2:1
1:1
0:1
as you can see, a is descending, and b being ascending. But I'm still not getting the right sort. Any ideas why? Thanks
Think to what you would do manually:
First you must sort the values by the 2nd part in ascending order
Then you must sort values having the same 2nd part, using the 1st part in descending order
Translated in LINQ it's pretty the same:
var sorted = arrayList
.Cast<string>()
.Select(x => x.Split(':'))
.OrderBy(x => x[1])
.ThenByDescending(x => x[0])
.Select(x => x[0] + ":" + x[1]);
To clarify a bit more, ThenBy/ThenByDescending methods are used to sort elements that are equal in the previous OrderBy/OrderByDescending, hence the code :)
arrayList.ToList().Select(i => { var split = i.Split(":".ToArray(),2));
return new { a = Int32.Parse(split[0]),
b = Int32.Parse(split[1}) };
})
.OrderByDescending(i => i.a)
.ThenBy(i => i.b)
From your question it is not clear whether you want the order-by's reversed (just swap them).
Work from there, perhaps rejoining
.Select(i => String.Format("{0}:{1}", i.a, i.b));
Good luck