Regroup an IGrouping by original group criteria - c#

This sounds stupid (maybe the answer is quite easy), but I just don't see a solution for this issue:
Let's say I have this:
public IGrouping<T, Item> process<T>(IGrouping<T,Item> group)
{
var toIgnore = group.AsEnumerable().Where(ignoreExpression);
var toContinue = group.AsEnumerable().Except(toIgnore);
// how to group the "toContinue" Enumerable by the same criteria as the group ?
//
return newGroup;
}
What I want to do is: I get the group, I want to ignore some items in each group (not whole groups). So far so good. Then I want to re-group it, according to the original criteria - and that's where I'm stuck.
How could this be done? I can find the original criteria from the grouping (this is group.Key of course), but how can I re-group the result "toContinue" (or even toContinue.ToList()) by group.Key?
thx for suggestions
Andreas

The group key is on IGrouping Key property.
Since IGrouping is IEnumerable once you make a Where, you loose the grouping key. You have to filter before the GroupBy(), not after

Related

Linq Efficiently find if collection contains duplicates using IQueryable.GroupBy

The keyword in this is Queryable.GroupBy instead of Enumerable.GroupBy
I use EntityFramework and I want to check if there are no duplicate values. Several answers on StackOverflow like this one suggest using GroupBy
IQueryable<MyType> myItems = ...
IQueryable<IGrouping<string, MyType> groupsWithSameName = myItems
.GroupBy(myItem => myItem.Name);
// note: IQueryable!
bool containsDuplicates = groupsWithSameName.Any(group => group.Skip(1).Any());
Although this is allowed on IEnumerables, Skip is not supported on an unordered sequence. The NotSupportedException suggests using OrberBy before using the Skip.
As an alternative I could check if there are groups with more than one element using Count
bool containsDuplicates = groupsWithSameName.Any(group => group.Count() > 1);
Both methods require to scan all elements in the collection. This is for the 2nd time because they were also scanned to group them.
Is there a method to check for duplicates on an IQueryable more efficiently?
I think that scanning of all the elements will not be avoided. In any case, the process of finding a duplicate with SQL will look like this:
SELECT
name, COUNT(*)
FROM
MyType
GROUP BY
name
HAVING
COUNT(*) > 1
It may be worth trying to seek a solution in such a way?:
Linq with group by having count

Selecting items in an ordered list after a certain entry

I have an ordered list of objects. I can easily find an item in the list by using the following code:
purchaseOrders.FirstOrDefault(x => x.OurRef.Equals(lastPurchaseOrder, StringComparison.OrdinalIgnoreCase))
What I want to do is select all the items in the list that appear after this entry. How best to achieve this? Would it to be to get the index of this item and select a range?
It sounds like you want SkipWhile:
var orders = purchaseOrders.SkipWhile(x => !x.OurRef.Equals(...));
Once the iterator has stopped skipping, it doesn't evaluate the predicate for later entries.
Note that that code will include the entry that doesn't match the predicate, i.e. the one with the given reference. It will basically give you all entries from that order onwards. You can always use .Skip(1) if you want to skip that:
// Skip the exact match
var orders = purchaseOrders.SkipWhile(x => !x.OurRef.Equals(...)).Skip(1);
This will be linear, mind you... if the list is ordered by x.OurRef you could find the index with a binary search and take the range from there onwards... but I wouldn't do that unless you find that the simpler code causes you problems.
Probably you should take a look at LINQ's combination of Reverse and TakeWhile methods, if I understand your question correctly.
It may look like purchaseOrder.Reverse().TakeWhile(x => !x.OurRef.Equals(lastPurchaseOrder, StringComparison.OrdinalIgnoreCase)).
Sorry if code is unformatted, I'm from mobile web right now.
May be you want something like this:
int itemIndex = list.IndexOf(list.FirstOrDefault(x => x.OurRef.Equals(lastPurchaseOrder, StringComparison.OrdinalIgnoreCase));
var newList = list.Where((f, i) => i >= itemIndex);

Best way to get an ordered list of groups by value from an unordered list

I'd like to know if there's a more efficient way to get an ordered list of groups by value from an initially unordered list, than using GroupBy() followed by OrderBy(), like this:
List<int> list = new List<int>();
IEnumerable<IEnumerable<int>> orderedGroups = list.GroupBy(x => x).OrderBy(x => x.Key);
For more detail, I have a large List<T> which I'd like to sort, however there are lots of duplicate values so I want to return the results as IEnumerable<IEnumerable<T>>, much as GroupBy() returns an IEnumerable of groups. If I use OrderBy(), I just get IEnumerable<T>, with no easy way to know whether the value has changed from one item to the next. I could group the list then sort the groups, but the list is large so this ends up being slow. Since OrderBy() returns an OrderedEnumerable which can then be sorted on a secondary field using ThenBy(), it must internally distinguish between adjacent items with the same or different values.
Is there any way I can make use of the fact that OrderedEnumerable<T> must internally group its results by value (in order to facilitate ThenBy()), or otherwise what's the most efficient way to use LINQ to get an ordered list of groups?
You can use ToLookup, which returns an IEnumerable<IGrouping<TKey, TElement> and then do OrderBy for values of each key on demand. This will be O(n) to create the lookup and O(h) to order elements under each group (values for a key) assuming h is the number of elements under a group
You can improve the performance to amortized O(n) by using IDictionary<TKey, IOrderedEnumerable<T>>. But if you want to order by multiple properties, it will again by O(h) on the group. See this answer for more info on IOrderedEnumerable. You can also use SortedList<TKey, TValue> instead of IOrderedEnumerable
[Update]:
Here is another answer which you can take a look. But again, it involves doing OrderBy on top of the result.
Further, you can come up with your own data structure as I don't see any data structure available on BCL meeting this requrement.
One possible implementation:
You can have a Binary Search Tree which does search/delete/insert in O(longN) on an average. And doing an in-order traversal will give you sorted keys. Each node on the tree will have an ordered collection for example, for the values.
node roughly looks like this:
public class MyNode
{
prop string key;
prop SortedCollection myCollection;
}
You can traverse over the initial collection once and create this special data structure which can be queried to get fast results.
[Update 2]:
if you have possible keys below 100k, then I feel implementing your own data structure is an overkill. Generally an order by will return pretty fast and the time taken is tiny. Unless you have large data and you do order by multiple times, ToLookup should work fairly well.
Honestly, you're not going to do much better than
items.GroupBy(i => i.KeyProperty).OrderBy(g => g.Key);
GroupBy is an O(n) operation. The OrderBy is then O(k log k) where k is the number of groups.
If you call OrderBy first... well, firstly, your O(n log n) is now in your number of items rather than your number of groups, so it's already slower than the above.
And secondly, an IOrderedEnumerable doesn't have the internal magic you think it does. It isn't an ordered sequence that contains groups of same-ordered items which can then by reordered with ThenBy; it's an unordered sequence with a list of sort keys which ThenBy adds to, which is eventually ordered by each of those keys when you iterate over it.
You may be able to eke out a little more speed by rolling your own "group and sort" loop, maybe manually adding to an SortedDictionary<TKey, IList<TItem>>, but I don't think you're going to get a better big O than what out-of-the-box LINQ gets you.LINQ
I think iterating thru the list for(;;) as you populate Dictionary<T, int>, where value is count of repeated elements will be faster.

why Linq GroupBy After OrderBy dismissed order operation?

I have a Action model with Session Navigation Property,
Consider this code:
var x=db.Actions.OrderBy(p => p.Session.Number).ThenBy(p => p.Date);//it's OK
x is a ordered Action, but when grouped on x, group not iterate on x(base on Action.Session) manually on ordered enumerable:
var y=x.GroupBy(p=>p.Session).ToArray()
y have a group(Key,IGrouping) of sessions but why group.Key not ordered base on Session.Number?
How to i reached a group of Session order by number and each group ordered by date?
Because it's Enumerable.GroupBy that preserves order. No such promise is made for Queryable.GroupBy.
From the documentation of the former:
The IGrouping(Of TKey, TElement) objects are yielded in an order based on
order of the elements in source that produced the first key of each
IGrouping(Of TKey, TElement). Elements in a grouping are yielded in the order
they appear in source.
You're calling the latter, and the above is not mentioned. Call OrderBy after GroupBy to make it work.
Update: since you apparently want to sort on more than just the GroupBy key, you should be able to use another GroupBy overload to specify that each session's list of actions is to be sorted:
db.Actions.GroupBy(
p => p.Session,
(session, actions) => new {
Session = session,
Actions = actions.OrderBy(p => p.Date)
}).OrderBy(p => p.Session.Number).ToArray();
Because it is not defined that GroupBy preserves either insertion order or the underlying key order (in the same way that Dictionay<,> makes no such guarantee, for local in-memory work). Just order after grouping, instead:
var y = db.Actions.GroupBy(p=>p.Session).OrderBy(grp => grp.Key).ToArray();
In particular, note that to translate the order directly would require it to analyse the expression to spot which parts of the ordering overlap with the grouping (and which don't), which is non-trivial.
Thanks to #Marc Gravell & #hvd for note about groupby IGrouping(Of TKey, TElement) not preserves order of TKey but preserves order of TElement.
So my answer for my final question (How to i reached a group of Session order by number and each group ordered by date?) is:
var x= db.Actions
.OrderBy(p => p.ActionDateTime)
.GroupBy(p => p.Session)
.OrderBy(q => q.Key.Number)
.ToArray();
Just the name GroupBy suggests that the data queried at that moment will be grouped, aggregated (call how you want) into another data unit based on parameter provided.
In general if you want to see result sorted the Sort() function call should be the last one in sequence.

best way to use LINQ to query against a list

i have a collection
IEnumerable<Project>
and i want to do a filter based on project's Id property to included any id that is in a list:
List<int> Ids
what is the best way to do a where clause to check if a property is contained in a list.
var filteredProjectCollection = projectCollection.Where(p => Ids.Contains(p.id));
You may be able to get a more efficient implementation using the Except method:
var specialProjects = Ids.Select(id => new Project(id));
var filtered = projects.Except(specialProjects, comparer);
The tricky thing is that Except works with two collections of the same type - so you want to have two collections of projects. You can get that by creating new "dummy" projects and using comparer that compares projects just based on the ID.
Alternatively, you could use Except just on collections of IDs, but then you may need to lookup projects by the ID, which makes this approach less appealing.
var nonExcludedProjects = from p in allprojects where Ids.Contains(p => p.Id) select p;
If you're going to use one of the .Where(p=> list.Contains(p)) answers, you should consier first making a HashSet out of the list so that it doesn't have to do an O(n) search each time. This cuts running time from O(mn) to O(m+n).
I'm not sure that I understand your question but I'll have a shot.
If you have: IEnumerable enumerable,
and you want to filter it such that it only contians items that are also present in the list: List list,
then: IEnumerable final = enumerable.Where(e => list.Contains(e));

Categories