How to simply convert an IEnumerable into IOrderedEnumerable in O(1)? [duplicate] - c#

Say there is an extension method to order an IQueryable based on several types of Sorting (i.e. sorting by various properties) designated by a SortMethod enum.
public static IOrderedEnumerable<AClass> OrderByX(this IQueryable<AClass> values,
SortMethod? sortMethod)
{
IOrderedEnumerable<AClass> queryRes = null;
switch (sortMethod)
{
case SortMethod.Method1:
queryRes = values.OrderBy(a => a.Property1);
break;
case SortMethod.Method2:
queryRes = values.OrderBy(a => a.Property2);
break;
case null:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
default:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
}
return queryRes;
}
In the case where sortMethod is null (i.e. where it is specified that I don't care about the order of the values), is there a way to instead of ordering by some default property, to instead just pass the IEnumerator values through as "ordered" without having to perform the actual sort?
I would like the ability to call this extension, and then possibly perform some additional ThenBy orderings.

All you need to do for the default case is:
queryRes = values.OrderBy(a => 1);
This will effectively be a noop sort. Because the OrderBy performs a stable sort the original order will be maintained in the event that the selected objects are equal. Note that since this is an IQueryable and not an IEnumerable it's possible for the query provider to not perform a stable sort. In that case, you need to know if it's important that order be maintained, or if it's appropriate to just say "I don't care what order the result is, so long as I can call ThenBy on the result).
Another option, that allows you to avoid the actual sort is to create your own IOrderedEnumerable implementation:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
if (descending)
{
return source.OrderByDescending(keySelector, comparer);
}
else
{
return source.OrderBy(keySelector, comparer);
}
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}
With that your query can be:
queryRes = new NoopOrder<AClass>(values);
Note that the consequence of the above class is that if there is a call to ThenBy that ThenBy will effectively be a top level sort. It is in effect turning the subsequent ThenBy into an OrderBy call. (This should not be surprising; ThenBy will call the CreateOrderedEnumerable method, and in there this code is calling OrderBy, basically turning that ThenBy into an OrderBy. From a conceptual sorting point of view, this is a way of saying that "all of the items in this sequence are equal in the eyes of this sort, but if you specify that equal objects should be tiebroken by something else, then do so.
Another way of thinking of a "no op sort" is that it orders the items based in the index of the input sequence. This means that the items are not all "equal", it means that the order input sequence will be the final order of the output sequence, and since each item in the input sequence is always larger than the one before it, adding additional "tiebreaker" comparisons will do nothing, making any subsequent ThenBy calls pointless. If this behavior is desired, it is even easier to implement than the previous one:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
return new NoopOrder<T>(source);
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}

If you return always the same index value you will get an IOrderedEnumerable that preserve the original list order:
case null:
queryRes = values.OrderBy(a => 1);
break;
Btw I don't think this is a right thing to do. You will get a collection that is supposted to be ordered but actually it is not.

Bottom line, IOrderedEnumerable exists solely to provide a grammar structure to the OrderBy()/ThenBy() methods, preventing you from trying to start an ordering clause with ThenBy(). process. It's not intended to be a "marker" that identifies the collection as ordered, unless it was actually ordered by OrderBy(). So, the answer is that if the sorting method being null is supposed to indicate that the enumerable is in some "default order", you should specify that default order (as your current implementation does). It's disingenuous to state that the enumerable is ordered when in fact it isn't, even if, by not specifying a SortingMethod, you are inferring it's "ordered by nothing" and don't care about the actual order.
The "problem" inherent in trying to simply mark the collection as ordered using the interface is that there's more to the process than simply sorting. By executing an ordering method chain, such as myCollection.OrderBy().ThenBy().ThenByDescending(), you're not actually sorting the collection with each call; not yet anyway. You are instead defining the behavior of an "iterator" class, named OrderedEnumerable, which will use the projections and comparisons you define in the chain to perform the sorting at the moment you need an actual sorted element.
Servy's answer, stating that OrderBy(x=>1) is a noop and should be optimized out of SQL providers ignores the reality that this call, made against an Enumerable, will still do quite a bit of work, and that most SQL providers in fact do not optimize this kind of call; OrderBy(x=>1) will, in most Linq providers, produce a query with an "ORDER BY 1" clause, which not only forces the SQL provider to perform its own sorting, it will actually result in a change to the order, because in T-SQL at least "ORDER BY 1" means to order by the first column of the select list.

Related

Force IEnumerable<T> to evaluate without calling .ToArray() or .ToList()

If I query EF using something like this...
IEnumerable<FooBar> fooBars = db.FooBars.Where(o => o.SomeValue == something);
IIRC, This creates a lazy-evaluated, iterable state machine in the background, that does not yet contain any results; rather, it contains an expression of "how" to obtain the results when required.
If I want to force the collection to contain results I have to call .ToArray() or .ToList()
Is there a way to force an IEnumerable<T> collection to contain results without calling .ToArray() or .ToList(); ?
Rationale
I don't know if the CLR is capable of doing this, but essentially I want to forcibly create an evaluated collection that implements the IEnumerable<T> interface, but is implemented under the hood by the CLR, thus NOT a List<T> or Array<T>
Presumably this is not possible, since I'm not aware of any CLR capability to create in-memory, evaluated collections that implement IEnumerable<T>
Proposal
Say for example, I could write something like this:
var x = IEnumerable<FooBar> fooBars = db.FooBars
.Where(o => o.SomeValue == something)
.Evaluate(); // Does NOT return a "concrete" impl such as List<T> or Array<T>
Console.WriteLine(x.GetType().Name);
// eg. <EvaluatedEnumerable>e__123
Is there a way to force an IEnumerable<T> collection to contain results without calling .ToArray() or .ToList(); ?
Yes, but it is perhaps not what you want:
IEnumerable<T> source = …;
IEnumerable<T> cached = new List<T>(source);
The thing is, IEnumerable<T> is not a concrete type. It is an interface (contract) representing an item sequence. There can be any concrete type "hiding behind" this interface; some might only represent a query, others actually hold the queried items in memory.
If you want to force-evaluate your sequence so that the result is actually stored in physical memory, you need to make sure that the concrete type behind IEnumerable<T> is a in-memory collection that holds the results of the evaluation. The above code example does just that.
You can use a foreach loop:
foreach (var item in fooBars) { }
Note that this evaluates all items in fooBars, but throws away the result immediately. Next time you run the same foreach loop or .ToArray(), .ToList(), the enumerable will be evaluated once again.
A concrete use case I've run into revolves around needing to ensure that an IEnumerable that wraps a DB Query has begun returning results (indicating that the query did not time out) before returning control to the calling method. But the results are too large to evaluate fully, hence the IEnumerable to support streaming.
internal class EagerEvaluator<T>
{
private readonly T _first;
private readonly IEnumerator<T> _enumerator;
private readonly bool _hasFirst;
public EagerEvaluator(IEnumerable<T> enumerable)
{
_enumerator = enumerable.GetEnumerator();
if (_enumerator.MoveNext())
{
_hasFirst = true;
_first = _enumerator.Current;
}
}
public IEnumerable<T> ToEnumerable()
{
if (_hasFirst)
{
yield return _first;
while (_enumerator.MoveNext())
{
yield return _enumerator.Current;
}
}
}
}
The usage is pretty straight forward:
IEnumerable<FooBar> fooBars = new EagerEvaluator(fooBars).ToEnumerable()
Another options is:
<linq expression>.All( x => true);
I use Aggregate<T>() to evaluate an IEnumerable<T> with side effects:
private static IEnumerable<T> Evaluate<T>(IEnumerable<T> source)
=> source.Aggregate(Enumerable.Empty<T>(), (evaluated, s) => evaluated.Append(s));
See it in action: https://dotnetfiddle.net/iya2l0

Checking if all items in one generic collection exist in another using custom comparison delegate

I have a situation where I need a generic method to which I can pass two collections of type T along with a delegate that compares the two collections and returns true if every element in collection 1 has an equal element in collection 2, even if they are not in the same index of the collection. What I mean by "equal" is handled by the delegate. My initial thought was to return false if the collections were different lengths and otherwise sort them and then compare them like parallel arrays. Then it occurred to me that I can't sort a collection of a generic type without the types sharing an interface. So now I am thinking a LINQ expression might do the trick, but I can't think of how to write it. Consider my current code:
private static bool HasSameCollectionItems<T>(ICollection<T> left, ICollection<T> right, Func<T, T, bool> func)
{
if (left.Count != right.Count)
{
return false;
}
foreach (var item in left)
{
bool leftItemIsInRightCollection = ??? MAGIC ???
if (!leftItemIsInRightCollection)
{
return false;
}
}
return true;
}
I would like to replace ??? MAGIC ??? with a LINQ expression to see if item is "equal" to an element in right using the passed in delegate func. Is this even possible?
Note: For reasons I don't want to bother getting into here, impelemnting IEquatable or overriding the Equals method is not an option here.
It looks like you want .All() and .Any() methods (first method checks that all elements satisfy condition second only check if such an element exist) :
bool leftItemIsInRightCollection = right.Any(rItem => func(item, rItem));
Also i'd refactor your code to something like :
private static bool HasSameCollectionItems<T>(ICollection<T> left, ICollection<T> right, Func<T, T, bool> func)
{
return left.Count == right.Count && left.All(LI => right.Any(RI => func(LI, RI)));
}
The following works by checking whether there are element in left which are not in right.
If you insist on a delegate to determine equality, you can use the FuncEqualityComparer from here. (Note that you must also provide an implementation for Object.GetHashCode)
private static bool HasSameCollectionItems<T>(ICollection<T> left, ICollection<T> right, IEqualityComparer<T> comparer)
{
if (left.Count != right.Count) return false;
return !left.Except(right, comparer).Any();
}

How to get excluded collection without a second LINQ query?

I have a LINQ query that looks like this:
var p = option.GetType().GetProperties().Where(t => t.PropertyType == typeof(bool));
What is the most efficient way to get the items which aren't included in this query, without executing a second iteration over the list.
I could easily do this with a for loop but I was wondering if there's a shorthand with LINQ.
var p = option.GetType().GetProperties().ToLookup(t => t.PropertyType == typeof(bool));
var bools = p[true];
var notBools = p[false];
.ToLookup() is used to partition an IEnumerable based on a key function. In this case, it will return an Lookup which will have at most 2 items in it. Items in the Lookup can be accessed using a key similar to an IDictionary.
.ToLookup() is evaluated immediately and is an O(n) operation and accessing a partition in the resulting Lookup is an O(1) operation.
Lookup is very similar to a Dictionary and have similar generic parameters (a Key type and a Value type). However, where Dictionary maps a key to a single value, Lookup maps a key to an set of values. Lookup can be implemented as IDictionary<TKey, IEnumerable<TValue>>
.GroupBy() could also be used. But it is different from .ToLookup() in that GroupBy is lazy evaluated and could possibly be enumerated multiple times. .ToLookup() is evaluated immediately and the work is only done once.
You cannot get something that you don't ask for. So if you exlude all but bool you can't expect to get them later. You need to ask for them.
For what it's worth, if you need both, the one you want and all other in a single query you could GroupBy this condition or use ToLookup which i would prefer:
var isboolOrNotLookup = option.GetType().GetProperties()
.ToLookup(t => t.PropertyType == typeof(bool)); // use PropertyType instead
Now you can use this lookup for further processing. For example, if you want a collection of all properties which are bool:
List<System.Reflection.PropertyInfo> boolTypes = isboolOrNotLookup[true].ToList();
or just the count:
int boolCount = isboolOrNotLookup[true].Count();
So if you want to process all which are not bool:
foreach(System.Reflection.PropertyInfo prop in isboolOrNotLookup[false])
{
}
Well, you could go for source.Except(p), but it would reiterate the list and perform a lot of comparisons.
I'd say - write an extension method that does it using foreach, basically splitting the list into two destinations. Or something like this.
How about:
public class UnzipResult<T>{
private readonly IEnumearator<T> _enumerator;
private readonly Func<T, bool> _filter;
private readonly Queue<T> _nonMatching = new Queue<T>();
private readonly Queue<T> _matching = new Queue<T>();
public IEnumerable<T> Matching {get{
if(_matching.Count > 0)
yield return _matching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_nonMatching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public IEnumerable<T> Rest {get{
if(_matching.Count > 0)
yield return _nonMatching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(!_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_matching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public UnzipResult(IEnumerable<T> source, Func<T, bool> filter){
_enumerator = source.GetEnumerator();
_filter = filter;
}
}
public static UnzipResult<T> Unzip(this IEnumerable<T> source, Func<T,bool> filter){
return new UnzipResult(source, filter);
}
It's written in notepad, so probably doesn't compile, but my idea is: whatever collection you enumerate (matching or non-matching), you only enumerate the source once. And it should work fairly well with those pesky infinite collections (think yield return random.Next()), unless all elements do/don't fulfil filter.

Does LINQ to Objects keep its order

I have a List<Person> and instead want to convert them for simple processing to a List<string>, doing the following:
List<Person> persons = GetPersonsBySeatOrder();
List<string> seatNames = persons.Select(x => x.Name).ToList();
Console.WriteLine("First in line: {0}", seatNames[0]);
Is the .Select() statement on a LINQ to Objects object guaranteed to not change the order of the list members? Assuming no explicit distinct/grouping/ordering is added
Also, if an arbitrary .Where() clause is used first, is it still guaranteed to keep the relative order, or does it sometimes use non-iterative filtering?
As Fermin commented above, this is essentially a duplicate question. I failed on selecting the correct keywords to search stackoverflow
Preserving order with LINQ
It depends on the underlying collection type more than anything. You could get inconsistent ordering from a HashSet, but a List is safe. Even if the ordering you want is provided implicitly, it's better to define an explicit ordering if you need it though. It looks like you're doing that judging by the method names.
In current .Net implementation it use such code. But there are no guarantee that this implementation will be in future.
private static IEnumerable<TResult> SelectIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (TSource source1 in source)
{
checked { ++index; }
yield return selector(source1, index);
}
}
Yes, Linq Select is guaranteed to return all its results in the order of the enumeration it is passed. Like most Linq functions, it is fully specified what it does. Barring handling of errors, this might as well be the code for Select:
IEnumerable<Y> Select<X, Y>(this IEnumerable<X> input, Func<X, Y> transform)
{
foreach (var x in input)
yield return transform(x);
}
But as Samantha Branham pointed out, the underlying collection might not have an intrinsic order. I've seen hashtables that rearrange themselves on read.

ICollection<T> is non-index based, but TakeWhile() exists

I'm trying to replace usages of T[] or List<T> as function parameters and return values with more appropriate types such as IEnumerable<T>, ICollection<T> and IList<T>.
ICollection<T> from my understanding is preferrable to IList<T> where you are only needing basic/simple collection functionality (eg an enumerator and count functionality) as it provides this with the least restriction. From reading on here one of the main differentiators I thought was that ICollection<T> doesn't require that the underlying collection to be index based where IList<T> does?
In switching my List<T> references over I needed to replace a List<T>.GetRange() call and I was very surprised to find the ICollection<T>.TakeWhile() extension method which has an overload supporting selection based on index?! (msdn link)
I'm confused why this method exists on ICollection where there is nothing index based on this interface? Have I misunderstood or how can this method actually work if the underlying collection is eg a Hashset or something?
The method, like most of LINQ, is on IEnumerable<T>. Any features that just pass the indexer to the consumer (such as TakeWhile) only need to loop while incrementing a counter. Some APIs may be able to optimize using an indexer, and then it is up to them to decide whether to do that, or just use IEnumerable<T> and simply skip (etc) unwanted data.
For example:
int i = 0;
foreach(var item in source) {
if(!predicate(i++, item)) break;
yield return item;
}
Indexing can be done without collection's support of it
int i = -1;
foreach(var item in collection)
{
i++;
// item is at index i;
}
TakeWhile and other extension methods from System.Linq.Enumerable class work on all the types implementing IEnumerable<T>. They all iterate over the collection (using foreach statement) and perform appropriate actions.
Here is the implementation of the TakeWhile method, with some simplifications:
private static IEnumerable<TSource> TakeWhile<TSource>(IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
foreach (TSource item in source)
{
if (!predicate(item))
{
break;
}
yield return item;
}
}
As you see, it simply iterates over the collection, and evaluates the predicate. This is true for almost all other LINQ methods. The same will happen when you use any other collection, like HashSet<T>.

Categories