why Linq GroupBy After OrderBy dismissed order operation? - c#

I have a Action model with Session Navigation Property,
Consider this code:
var x=db.Actions.OrderBy(p => p.Session.Number).ThenBy(p => p.Date);//it's OK
x is a ordered Action, but when grouped on x, group not iterate on x(base on Action.Session) manually on ordered enumerable:
var y=x.GroupBy(p=>p.Session).ToArray()
y have a group(Key,IGrouping) of sessions but why group.Key not ordered base on Session.Number?
How to i reached a group of Session order by number and each group ordered by date?

Because it's Enumerable.GroupBy that preserves order. No such promise is made for Queryable.GroupBy.
From the documentation of the former:
The IGrouping(Of TKey, TElement) objects are yielded in an order based on
order of the elements in source that produced the first key of each
IGrouping(Of TKey, TElement). Elements in a grouping are yielded in the order
they appear in source.
You're calling the latter, and the above is not mentioned. Call OrderBy after GroupBy to make it work.
Update: since you apparently want to sort on more than just the GroupBy key, you should be able to use another GroupBy overload to specify that each session's list of actions is to be sorted:
db.Actions.GroupBy(
p => p.Session,
(session, actions) => new {
Session = session,
Actions = actions.OrderBy(p => p.Date)
}).OrderBy(p => p.Session.Number).ToArray();

Because it is not defined that GroupBy preserves either insertion order or the underlying key order (in the same way that Dictionay<,> makes no such guarantee, for local in-memory work). Just order after grouping, instead:
var y = db.Actions.GroupBy(p=>p.Session).OrderBy(grp => grp.Key).ToArray();
In particular, note that to translate the order directly would require it to analyse the expression to spot which parts of the ordering overlap with the grouping (and which don't), which is non-trivial.

Thanks to #Marc Gravell & #hvd for note about groupby IGrouping(Of TKey, TElement) not preserves order of TKey but preserves order of TElement.
So my answer for my final question (How to i reached a group of Session order by number and each group ordered by date?) is:
var x= db.Actions
.OrderBy(p => p.ActionDateTime)
.GroupBy(p => p.Session)
.OrderBy(q => q.Key.Number)
.ToArray();

Just the name GroupBy suggests that the data queried at that moment will be grouped, aggregated (call how you want) into another data unit based on parameter provided.
In general if you want to see result sorted the Sort() function call should be the last one in sequence.

Related

ordering of OrderBy, Where, Select in the Linq query

Considering this sample code
System.Collections.ArrayList fruits = new System.Collections.ArrayList();
fruits.Add("mango");
fruits.Add("apple");
fruits.Add("lemon");
IEnumerable<string> query = fruits.Cast<string>()
.OrderBy(fruit => fruit)
.Where(fruit => fruit.StartsWith("m"))
.Select(fruit => fruit);
I have two questions:
Do I need to write the last Select clause if Where returns the same type by itself? The example is from msdn, why do they always write it?
What is the correct order of these methods? Does the order affect something? What if I swap Select and Where, or OrderBy?
No, the Select is not necesssary if you are not actually transforming the returned type.
In this case, the ordering of the method calls could have an impact on performance. Sorting all the objects before filtering is sure to take longer than filtering and then sorting a smaller data set.
The .Select is unnecessary in this case because .Cast already guarantees that you're working with IEnumerable<string>.
The ordering of .OrderBy and .Where doesn't affect the results of the query, but in general if you use .Where first you'll get better performance because there will be fewer elements to sort.

Best way to get an ordered list of groups by value from an unordered list

I'd like to know if there's a more efficient way to get an ordered list of groups by value from an initially unordered list, than using GroupBy() followed by OrderBy(), like this:
List<int> list = new List<int>();
IEnumerable<IEnumerable<int>> orderedGroups = list.GroupBy(x => x).OrderBy(x => x.Key);
For more detail, I have a large List<T> which I'd like to sort, however there are lots of duplicate values so I want to return the results as IEnumerable<IEnumerable<T>>, much as GroupBy() returns an IEnumerable of groups. If I use OrderBy(), I just get IEnumerable<T>, with no easy way to know whether the value has changed from one item to the next. I could group the list then sort the groups, but the list is large so this ends up being slow. Since OrderBy() returns an OrderedEnumerable which can then be sorted on a secondary field using ThenBy(), it must internally distinguish between adjacent items with the same or different values.
Is there any way I can make use of the fact that OrderedEnumerable<T> must internally group its results by value (in order to facilitate ThenBy()), or otherwise what's the most efficient way to use LINQ to get an ordered list of groups?
You can use ToLookup, which returns an IEnumerable<IGrouping<TKey, TElement> and then do OrderBy for values of each key on demand. This will be O(n) to create the lookup and O(h) to order elements under each group (values for a key) assuming h is the number of elements under a group
You can improve the performance to amortized O(n) by using IDictionary<TKey, IOrderedEnumerable<T>>. But if you want to order by multiple properties, it will again by O(h) on the group. See this answer for more info on IOrderedEnumerable. You can also use SortedList<TKey, TValue> instead of IOrderedEnumerable
[Update]:
Here is another answer which you can take a look. But again, it involves doing OrderBy on top of the result.
Further, you can come up with your own data structure as I don't see any data structure available on BCL meeting this requrement.
One possible implementation:
You can have a Binary Search Tree which does search/delete/insert in O(longN) on an average. And doing an in-order traversal will give you sorted keys. Each node on the tree will have an ordered collection for example, for the values.
node roughly looks like this:
public class MyNode
{
prop string key;
prop SortedCollection myCollection;
}
You can traverse over the initial collection once and create this special data structure which can be queried to get fast results.
[Update 2]:
if you have possible keys below 100k, then I feel implementing your own data structure is an overkill. Generally an order by will return pretty fast and the time taken is tiny. Unless you have large data and you do order by multiple times, ToLookup should work fairly well.
Honestly, you're not going to do much better than
items.GroupBy(i => i.KeyProperty).OrderBy(g => g.Key);
GroupBy is an O(n) operation. The OrderBy is then O(k log k) where k is the number of groups.
If you call OrderBy first... well, firstly, your O(n log n) is now in your number of items rather than your number of groups, so it's already slower than the above.
And secondly, an IOrderedEnumerable doesn't have the internal magic you think it does. It isn't an ordered sequence that contains groups of same-ordered items which can then by reordered with ThenBy; it's an unordered sequence with a list of sort keys which ThenBy adds to, which is eventually ordered by each of those keys when you iterate over it.
You may be able to eke out a little more speed by rolling your own "group and sort" loop, maybe manually adding to an SortedDictionary<TKey, IList<TItem>>, but I don't think you're going to get a better big O than what out-of-the-box LINQ gets you.LINQ
I think iterating thru the list for(;;) as you populate Dictionary<T, int>, where value is count of repeated elements will be faster.

Regroup an IGrouping by original group criteria

This sounds stupid (maybe the answer is quite easy), but I just don't see a solution for this issue:
Let's say I have this:
public IGrouping<T, Item> process<T>(IGrouping<T,Item> group)
{
var toIgnore = group.AsEnumerable().Where(ignoreExpression);
var toContinue = group.AsEnumerable().Except(toIgnore);
// how to group the "toContinue" Enumerable by the same criteria as the group ?
//
return newGroup;
}
What I want to do is: I get the group, I want to ignore some items in each group (not whole groups). So far so good. Then I want to re-group it, according to the original criteria - and that's where I'm stuck.
How could this be done? I can find the original criteria from the grouping (this is group.Key of course), but how can I re-group the result "toContinue" (or even toContinue.ToList()) by group.Key?
thx for suggestions
Andreas
The group key is on IGrouping Key property.
Since IGrouping is IEnumerable once you make a Where, you loose the grouping key. You have to filter before the GroupBy(), not after

C# GroupBy: Will the GroupBy clause keep the original order of the list [duplicate]

I use LINQ to Objects instructions on an ordered array.
Which operations shouldn't I do to be sure the order of the array is not changed?
I examined the methods of System.Linq.Enumerable, discarding any that returned non-IEnumerable results. I checked the remarks of each to determine how the order of the result would differ from order of the source.
Preserves Order Absolutely. You can map a source element by index to a result element
AsEnumerable
Cast
Concat
Select
ToArray
ToList
Preserves Order. Elements are filtered or added, but not re-ordered.
Distinct
Except
Intersect
OfType
Prepend (new in .net 4.7.1)
Skip
SkipWhile
Take
TakeWhile
Where
Zip (new in .net 4)
Destroys Order - we don't know what order to expect results in.
ToDictionary
ToLookup
Redefines Order Explicitly - use these to change the order of the result
OrderBy
OrderByDescending
Reverse
ThenBy
ThenByDescending
Redefines Order according to some rules.
GroupBy - The IGrouping objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping. Elements in a grouping are yielded in the order they appear in source.
GroupJoin - GroupJoin preserves the order of the elements of outer, and for each element of outer, the order of the matching elements from inner.
Join - preserves the order of the elements of outer, and for each of these elements, the order of the matching elements of inner.
SelectMany - for each element of source, selector is invoked and a sequence of values is returned.
Union - When the object returned by this method is enumerated, Union enumerates first and second in that order and yields each element that has not already been yielded.
Edit: I've moved Distinct to Preserving order based on this implementation.
private static IEnumerable<TSource> DistinctIterator<TSource>
(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource element in source)
if (set.Add(element)) yield return element;
}
Are you actually talking about SQL, or about arrays? To put it another way, are you using LINQ to SQL or LINQ to Objects?
The LINQ to Objects operators don't actually change their original data source - they build sequences which are effectively backed by the data source. The only operations which change the ordering are OrderBy/OrderByDescending/ThenBy/ThenByDescending - and even then, those are stable for equally ordered elements. Of course, many operations will filter out some elements, but the elements which are returned will be in the same order.
If you convert to a different data structure, e.g. with ToLookup or ToDictionary, I don't believe order is preserved at that point - but that's somewhat different anyway. (The order of values mapping to the same key is preserved for lookups though, I believe.)
If you are working on an array, it sounds like you are using LINQ-to-Objects, not SQL; can you confirm? Most LINQ operations don't re-order anything (the output will be in the same order as the input) - so don't apply another sort (OrderBy[Descending]/ThenBy[Descending]).
[edit: as Jon put more clearly; LINQ generally creates a new sequence, leaving the original data alone]
Note that pushing the data into a Dictionary<,> (ToDictionary) will scramble the data, as dictionary does not respect any particular sort order.
But most common things (Select, Where, Skip, Take) should be fine.
I found a great answer in a similar question which references official documentation. To quote it:
For Enumerable methods (LINQ to Objects, which applies to List<T>), you can rely on the order of elements returned by Select, Where, or GroupBy. This is not the case for things that are inherently unordered like ToDictionary or Distinct.
From Enumerable.GroupBy documentation:
The IGrouping<TKey, TElement> objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping<TKey, TElement>. Elements in a grouping are yielded in the order they appear in source.
This is not necessarily true for IQueryable extension methods (other LINQ providers).
Source: Do LINQ's Enumerable Methods Maintain Relative Order of Elements?
Any 'group by' or 'order by' will possibly change the order.
The question here is specifically referring to LINQ-to-Objects.
If your using LINQ-to-SQL instead there is no order there unless you impose one with something like:
mysqlresult.OrderBy(e=>e.SomeColumn)
If you do not do this with LINQ-to-SQL then the order of results can vary between subsequent queries, even of the same data, which could cause an intermittant bug.

LINQ query OrderBy doesn't work

_db.InstellingAdressens
.Where(l => l.GEMEENTE.Contains(gem_query))
.OrderBy(q => q.GEMEENTE)
.Select(q => q.GEMEENTE)
.Distinct();
this is the query. it returns a List<string> but the strings are not ordered at all. Why does the OrderBy have no effect? and how to fix it?
Try putting OrderBy at the end of your call.
_db.InstellingAdressens.
Where(l => l.GEMEENTE.Contains(gem_query)).
Select(q=>q.GEMEENTE).Distinct().
OrderBy(q=>q).ToList();
Distinct has no knowledge that you have ordered your items before it gets them, so it can't use that knowledge. As such, it has to assume the items are unordered, and will thus just do what it wants with them.
A typical implementation will use a hashtable, which isn't ordered by what you normally want the items to be ordered by, so the result from the distinct operation is an unordered set.
So as others have suggested, change the ordering of your calls to do the ordering last, and you should get what you want.
Change the order of calls
_db.InstellingAdressens.Where(l => l.GEMEENTE.Contains(gem_query)).Select(q=>q.GEMEENTE).Distinct().OrderBy(q=>q.GEMEENTE).ToList();
Try this just put orderby last of the query
_db.InstellingAdressens
.Where(l => l.GEMEENTE.Contains(gem_query))
.Select(q=>q.GEMEENTE)
.Distinct()
.OrderBy(q=>q.GEMEENTE).ToList();

Categories