So I recently found myself writing a loop similar to this one:
var headers = new Dictionary<string, string>();
...
foreach (var header in headers)
{
if (String.IsNullOrEmpty(header.Value)) continue;
...
}
Which works fine, it iterates through the dictionary once and does all I need it to do. However, my IDE is suggesting this as a more readable / optimized alternative, but I disagree:
var headers = new Dictionary<string, string>();
...
foreach (var header in headers.Where(header => !String.IsNullOrEmpty(header.Value)))
{
...
}
But wont that iterate through the dictionary twice? Once to evaluate the .Where(...) and then once for the for-each loop?
If not, and the second code example only iterates the dictionary once, please explain why and how.
The code with continue is about twice as fast.
I ran the following code in LINQPad, and the results consistently say that the clause with continue is twice as fast.
void Main()
{
var headers = Enumerable.Range(1,1000).ToDictionary(i => "K"+i,i=> i % 2 == 0 ? null : "V"+i);
var stopwatch = new Stopwatch();
var sb = new StringBuilder();
stopwatch.Start();
foreach (var header in headers.Where(header => !String.IsNullOrEmpty(header.Value)))
sb.Append(header);
stopwatch.Stop();
Console.WriteLine("Using LINQ : " + stopwatch.ElapsedTicks);
sb.Clear();
stopwatch.Reset();
stopwatch.Start();
foreach (var header in headers)
{
if (String.IsNullOrEmpty(header.Value)) continue;
sb.Append(header);
}
stopwatch.Stop();
Console.WriteLine("Using continue : " + stopwatch.ElapsedTicks);
}
Here are some of the results I got
Using LINQ : 1077
Using continue : 348
Using LINQ : 939
Using continue : 459
Using LINQ : 768
Using continue : 382
Using LINQ : 1256
Using continue : 457
Using LINQ : 875
Using continue : 318
In general LINQ is always going to be slower when working with an already evaluated IEnumerable<T>, than the foreach counterpart. The reason is that LINQ-to-Objects is just a high-level wrapper of these lower level language features. The benefit to using LINQ here is not performance, but the provision of a consistent interface. LINQ absolutely does provide performance benefits, but they come into play when you are working with resources that are not already in active memory (and allow you to leverage the ability to optimize the code that is actually executed). When the alternative code is the most optimal alternative, then LINQ just has to go through a redundant process to call the same code you would have written anyway. To illustrate this, I'm going to paste the code below that is actually called when you use LINQ's Where operator on a loaded enumerable:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
if (source is Iterator<TSource>)
{
return ((Iterator<TSource>) source).Where(predicate);
}
if (source is TSource[])
{
return new WhereArrayIterator<TSource>((TSource[]) source, predicate);
}
if (source is List<TSource>)
{
return new WhereListIterator<TSource>((List<TSource>) source, predicate);
}
return new WhereEnumerableIterator<TSource>(source, predicate);
}
And here is the WhereSelectEnumerableIterator<TSource,TResult> class. The predicate field is the delegate that you pass into the Where() method. You will see where it actually gets executed in the MoveNext method (as well as all the redundant null checks). You will also see that the enumerable is only looped through once. Stacking where clauses will result in the creation of multiple iterator classes (wrapping their predecessors), but will not result in multiple enumeration actions (due to deferred execution). Keep in mind that when you write a Lambda like this, you are also actually creating a new Delegate instance (also affecting your performance in a minor way).
private class WhereSelectEnumerableIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
private IEnumerator<TSource> enumerator;
private Func<TSource, bool> predicate;
private Func<TSource, TResult> selector;
private IEnumerable<TSource> source;
public WhereSelectEnumerableIterator(IEnumerable<TSource> source, Func<TSource, bool> predicate, Func<TSource, TResult> selector)
{
this.source = source;
this.predicate = predicate;
this.selector = selector;
}
public override Enumerable.Iterator<TResult> Clone()
{
return new Enumerable.WhereSelectEnumerableIterator<TSource, TResult>(this.source, this.predicate, this.selector);
}
public override void Dispose()
{
if (this.enumerator != null)
{
this.enumerator.Dispose();
}
this.enumerator = null;
base.Dispose();
}
public override bool MoveNext()
{
switch (base.state)
{
case 1:
this.enumerator = this.source.GetEnumerator();
base.state = 2;
break;
case 2:
break;
default:
goto Label_007C;
}
while (this.enumerator.MoveNext())
{
TSource current = this.enumerator.Current;
if ((this.predicate == null) || this.predicate(current))
{
base.current = this.selector(current);
return true;
}
}
this.Dispose();
Label_007C:
return false;
}
public override IEnumerable<TResult2> Select<TResult2>(Func<TResult, TResult2> selector)
{
return new Enumerable.WhereSelectEnumerableIterator<TSource, TResult2>(this.source, this.predicate, Enumerable.CombineSelectors<TSource, TResult, TResult2>(this.selector, selector));
}
public override IEnumerable<TResult> Where(Func<TResult, bool> predicate)
{
return (IEnumerable<TResult>) new Enumerable.WhereEnumerableIterator<TResult>(this, predicate);
}
}
I personally think the performance difference is completely justifiable, because LINQ code is much easier to maintain and reuse. I also do things to offset the performance issues (like declaring all my anonymous lambda delegates and expressions as static readonly fields in a common class). But in reference to your actual question, your continue clause is definitely faster than the LINQ alternative.
No it won't iterate through it twice. the .Where does not actually evaluate by itself. The foreach actually pulls out each element from the where that satisfies the clause.
Similarly a headers.Select(x) doesn't actually process anything until you put a .ToList() or something behind it that forces it to evaluate.
EDIT:
To explain it a bit more, as Marcus pointed out, the .Where returns an iterator so each element is iterated over and the expression is processed once, if it matches then it goes into the body of the loop.
i think the second example will only iterates the dict once.
because what the header.Where(...) returns is exactly a "iterator", rather than a temporary value, every time the loop iterates, it will use the filter which is defined in Where(...), which make the one-time-iteration work.
However, i am not a sophisticated C# coder, i am not sure how C# will deal with such situation, but i think things should be the same.
Related
I have a LINQ query that looks like this:
var p = option.GetType().GetProperties().Where(t => t.PropertyType == typeof(bool));
What is the most efficient way to get the items which aren't included in this query, without executing a second iteration over the list.
I could easily do this with a for loop but I was wondering if there's a shorthand with LINQ.
var p = option.GetType().GetProperties().ToLookup(t => t.PropertyType == typeof(bool));
var bools = p[true];
var notBools = p[false];
.ToLookup() is used to partition an IEnumerable based on a key function. In this case, it will return an Lookup which will have at most 2 items in it. Items in the Lookup can be accessed using a key similar to an IDictionary.
.ToLookup() is evaluated immediately and is an O(n) operation and accessing a partition in the resulting Lookup is an O(1) operation.
Lookup is very similar to a Dictionary and have similar generic parameters (a Key type and a Value type). However, where Dictionary maps a key to a single value, Lookup maps a key to an set of values. Lookup can be implemented as IDictionary<TKey, IEnumerable<TValue>>
.GroupBy() could also be used. But it is different from .ToLookup() in that GroupBy is lazy evaluated and could possibly be enumerated multiple times. .ToLookup() is evaluated immediately and the work is only done once.
You cannot get something that you don't ask for. So if you exlude all but bool you can't expect to get them later. You need to ask for them.
For what it's worth, if you need both, the one you want and all other in a single query you could GroupBy this condition or use ToLookup which i would prefer:
var isboolOrNotLookup = option.GetType().GetProperties()
.ToLookup(t => t.PropertyType == typeof(bool)); // use PropertyType instead
Now you can use this lookup for further processing. For example, if you want a collection of all properties which are bool:
List<System.Reflection.PropertyInfo> boolTypes = isboolOrNotLookup[true].ToList();
or just the count:
int boolCount = isboolOrNotLookup[true].Count();
So if you want to process all which are not bool:
foreach(System.Reflection.PropertyInfo prop in isboolOrNotLookup[false])
{
}
Well, you could go for source.Except(p), but it would reiterate the list and perform a lot of comparisons.
I'd say - write an extension method that does it using foreach, basically splitting the list into two destinations. Or something like this.
How about:
public class UnzipResult<T>{
private readonly IEnumearator<T> _enumerator;
private readonly Func<T, bool> _filter;
private readonly Queue<T> _nonMatching = new Queue<T>();
private readonly Queue<T> _matching = new Queue<T>();
public IEnumerable<T> Matching {get{
if(_matching.Count > 0)
yield return _matching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_nonMatching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public IEnumerable<T> Rest {get{
if(_matching.Count > 0)
yield return _nonMatching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(!_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_matching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public UnzipResult(IEnumerable<T> source, Func<T, bool> filter){
_enumerator = source.GetEnumerator();
_filter = filter;
}
}
public static UnzipResult<T> Unzip(this IEnumerable<T> source, Func<T,bool> filter){
return new UnzipResult(source, filter);
}
It's written in notepad, so probably doesn't compile, but my idea is: whatever collection you enumerate (matching or non-matching), you only enumerate the source once. And it should work fairly well with those pesky infinite collections (think yield return random.Next()), unless all elements do/don't fulfil filter.
To be more specific: will the Linq extension method Any(IEnumerable collection, Func predicate) stop checking all the remaining elements of the collections once the predicate has yielded true for an item?
Because I don't want to spend to much time on figuring out if I need to do the really expensive parts at all:
if(lotsOfItems.Any(x => x.ID == target.ID))
//do expensive calculation here
So if Any is always checking all the items in the source this might end up being a waste of time instead of just going with:
var candidate = lotsOfItems.FirstOrDefault(x => x.ID == target.ID)
if(candicate != null)
//do expensive calculation here
because I'm pretty sure that FirstOrDefault does return once it got a result and only keeps going through the whole Enumerable if it does not find a suitable entry in the collection.
Does anyonehave information about the internal workings of Any, or could anyone suggest a solution for this kind of decision?
Also, a colleague suggested something along the lines of:
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but I'm not sure on that, so if anyone could shed some light on this as well it would be appreciated.
As we see from the source code, Yes:
internal static bool Any<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) {
return true; // Attention to this line
}
}
return false;
}
Any() is the most efficient way to determine whether any element of a sequence satisfies a condition with LINQ.
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID)) since this is supposed to
stop once the conditions returns false for the first time but i'm not
sure on that, so if anyone could shed some light on this as well it
would be appreciated :>]
All() determines whether all elements of a sequence satisfy a condition. So, the enumeration of source is stopped as soon as the result can be determined.
Additional note:
The above is true if you are using Linq to objects. If you are using Linq to Database, then it will create a query and will execute it against database.
You could test it yourself: https://ideone.com/nIDKxr
public static IEnumerable<int> Tester()
{
yield return 1;
yield return 2;
throw new Exception();
}
static void Main(string[] args)
{
Console.WriteLine(Tester().Any(x => x == 1));
Console.WriteLine(Tester().Any(x => x == 2));
try
{
Console.WriteLine(Tester().Any(x => x == 3));
}
catch
{
Console.WriteLine("Error here");
}
}
Yes, it does :-)
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but i'm not sure on that, so if anyone could shed some light on this as well it would be appreciated :>]
Using the same reasoning, All() could continue even if one of the element returns false :-) No, even All() is programmed correctly :-)
It does whatever is the quickest way of doing what it has to do.
When used on an IEnumerable this will be along the lines of:
foreach(var item in source)
if(predicate(item))
return true;
return false;
Or for the variant that doesn't take a predicate:
using(var en = source.GetEnumerator())
return en.MoveNext();
When run against at database it will be something like
SELECT EXISTS(SELECT null FROM [some table] WHERE [some where clause])
And so on. How that was executed would depend in turn on what indices were available for fulfilling the WHERE clause, so it could be a quick index lookup, a full table scan aborting on first match found, or an index lookup followed by a partial table scan aborting on first match found, depending on that.
Yet other Linq providers would have yet other implementations, but generally the people responsible will be trying to be at least reasonably efficient.
In all, you can depend upon it being at least slightly more efficient than calling FirstOrDefault, as FirstOrDefault uses similar approaches but does have to return a full object (perhaps constructing it). Likewise !All(inversePredicate) tends to be pretty much on a par with Any(predicate) as per this answer.
Single is an exception to this
Update: The following from this point on no longer applies to .NET Core, which has changed the implementation of Single.
It's important to note that in the case of linq-to objects, the overloads of Single and SingleOrDefault that take a predicate do not stop on identified failure. While the obvious approach to Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) would be something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
using(var en = source.GetEnumerator())
while(en.MoveNext())
{
var val = en.Current;
if(predicate(val))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("too many matching items");
return val;
}
}
throw new InvalidOperationException("no matching items");
}
The actual implementation is something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
var result = default(TSource);
long tally = 0;
for(var item in source)
if(predicate(item))
{
result = item;
checked{++tally;}
}
switch(tally)
{
case 0:
throw new InvalidOperationException("no matching items");
case 1:
return result;
default:
throw new InvalidOperationException("too many matching items");
}
}
Now, while successful Single will have to scan everything, this can mean that an unsucessful Single is much, much slower than it needs to (and can even potentially throw an undocumented error) and if the reason for the unexpected duplicate is a bug which is duplicating items into the sequence - and hence making it far larger than it should be, then the Single that should have helped you find that problem is now dragging away through this.
SingleOrDefault has the same issue.
This only applies to linq-to-objects, but it remains safer to do .Where(predicate).Single() rather than Single(predicate).
Any stops at the first match. All stops at the first non-match.
I don't know whether the documentation guarantees that but this behavior is now effectively fixed for all time due to compatibility reasons. It also makes sense.
Yes it stops when the predicate is satisfied once. Here is code via RedGate Reflector:
[__DynamicallyInvokable]
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
foreach (TSource local in source)
{
if (predicate(local))
{
return true;
}
}
return false;
}
I am new in LINQ and would like to write some extension methods. Before doing so I wanted to test if I will do it correctly. I just wanted to compare the performance of my CustomSelect extension method with built-in Select extension method.
static void Main(string[] args)
{
List<int> list = new List<int>();
for (int i = 0; i < 10000000; i++)
list.Add(i);
DateTime now1 = DateTime.Now;
List<int> process1 = list.Select(i => i).ToList();
Console.WriteLine(DateTime.Now - now1);
DateTime now2 = DateTime.Now;
List<int> process2 = list.CustomSelect(i => i).ToList();
Console.WriteLine(DateTime.Now - now2);
}
public static IEnumerable<TResult> CustomSelect<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
foreach (TSource item in source)
{
yield return selector(item);
}
}
Timespan for built-in method: 0.18 sec
Timespan for custom method: 0.35 sec
Changing the order of processes yields the same result.
If I collect the elements in a list and return instead of yield return, then the timespan is being nearly same with built-in one. But as far as I know we should yield return whereever possible.
So what can be the code for built-in method? What should be my approach?
Thanks in advance
The key difference I can see is that the inbuilt method checks for List<T> and special-cases it, exploiting the custom List<T>.Enumerator implementation, rather than IEnumerable<T> / IEnumerator<T>. You can do that special-case yourself:
public static IEnumerable<TResult> CustomSelect<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
if (source is List<TSource>)
return CustomSelectList((List<TSource>)source, selector);
return CustomSelectDefault(source, selector);
}
private static IEnumerable<TResult> CustomSelectList<TSource, TResult>(
List<TSource> source, Func<TSource, TResult> selector)
{
foreach (TSource item in source)
{
yield return selector(item);
}
}
private static IEnumerable<TResult> CustomSelectDefault<TSource, TResult>(
IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
foreach (TSource item in source)
{
yield return selector(item);
}
}
You could take that a stage further by hand-rolling the entire iterator (which is what WhereSelectListIterator<TSource, TResult> does), but the above is probably close enough.
The inbuilt implementation also special-cases arrays, and handles various forms of composed queries.
There's a lot of things wrong with your performance test, which makes it inconclusive - you should look into best practices for benchmarking code in .NET. Use Stopwatch instead of DateTime.Now, use many repetitions of the same thing on at once instead of one shot at each, make sure you're not getting hindered by the GC (.ToList() is going to screw your measurements quite a bit).
yield return should not be used because it's faster, the idea is that it's easy to write, and it's lazy. If I did Take(10) on the yield return variant, I'd only get 10 elements. The return variant, on the other hand, will produce the whole list, return it, and then reduce it to 10 elements.
In effect, you're taking pretty much the simplest case where there's very little reason to use Select at all (apart from clarity). Enumerables are made to handle far more crazier stuff, and using the LINQ methods, do it in easy to understand and concise matter, exposing an interface familiar to functional programmers. That often means that you could get more performance by rewriting the whole thing in a less general way - the point is that you should really only do that if necessary - if this is not a performance bottleneck of your application (and it rarely will be), the cleaner, easier to extend code is a better option.
The title pretty much sums up the question. I don't have any problems with it, I'm just curious about reasons behind that design choice.
My Guess? Simplicity and compatibility for different providers.
Contrary to some other answers this has nothing to do with deferred execution - which is an important concept but irrelevant to the issue. For example I could make the following completely valid method:
public static IEnumerable<T> NotBuffered<T>(this IEnumerable<T> input)
{
return (IEnumerable<T>)input.ToList(); //not deferred
}
Alternatively I could expose a WhereEnumerable that works just like a IEnumerable but has the following properties:
WhereEnumerable data = source.Where(x=> x.Name == "Cheese"); //still deferred
print(data.First());
print(data.skipped); //Number of items that failed the test.
print(data.returned); //Number of items that passed the test.
And this could conceivably be useful - as demonstrated - and easy to implement in the basic LinqToObjects implementation. However it might be considerably harder to impossible to implement the same functionality in the LinqToSQL or LinqToMongo or LinqToOpenCL drivers. This would risk making code less portable between implementations, and increase the implementors complexity.
For example MongoDB runs the query on the server (in a specialized query language) and does not make these stats available to the user. Furthermore with concepts such as indexes, these concepts could be meaningless e.g. users.Where(user => user.ID = "{ID"}).First() on a index might 'skip' 0 records before finding the result, even if it's at position 100,412 in the Index or 40,231 on the disk or on index node 431. That's a 'simple' problem...
Lastly, you can always write your own LINQ methods to return your own custom types with this functionality if you wish or through overloads that output a 'stats' object and similar. For a hypothetical example of the latter:
var stats = new WhereStats();
WhereEnumerable data = source.Where(x=> x.Name == "Cheese", stats);
print(data.First());
print(stats.skipped); //Number of items that failed the test.
print(stats.returned); //Number of items that passed the test.
Edit: Example of a typed where (Proof of Concept Only):
using System;
using System.Collections.Generic;
using System.Linq;
namespace TypedWhereExample
{
class Program
{
static void Main(string[] args)
{
var data = Enumerable.Range(0, 1000);
var typedWhere1 = data.TypedWhere(x => x % 2 == 0);
var typedWhere2 = typedWhere1.TypedWhere(x => x % 3 == 0);
var result = typedWhere2.Take(10).ToList(); //Works like usual Linq
//But returns additional data
Console.WriteLine("Result: " + string.Join(",", result));
Console.WriteLine("Typed Where 1 Skipped: " + typedWhere1.Skipped);
Console.WriteLine("Typed Where 1 Returned: " + typedWhere1.Returned);
Console.WriteLine("Typed Where 2 Skipped: " + typedWhere2.Skipped);
Console.WriteLine("Typed Where 2 Returned: " + typedWhere2.Returned);
Console.ReadLine();
//Result: 0,6,12,18,24,30,36,42,48,54
//Typed Where 1 Skipped: 27
//Typed Where 1 Returned: 28
//Typed Where 2 Skipped: 18
//Typed Where 2 Returned: 10
}
}
public static class MyLINQ
{
public static TypedWhereEnumerable<T> TypedWhere<T>
(this IEnumerable<T> source, Func<T, bool> filter)
{
return new TypedWhereEnumerable<T>(source, filter);
}
}
public class TypedWhereEnumerable<T> : IEnumerable<T>
{
IEnumerable<T> source;
Func<T, bool> filter;
public int Skipped { get; private set; }
public int Returned { get; private set; }
public TypedWhereEnumerable(IEnumerable<T> source, Func<T, bool> filter)
{
this.source = source;
this.filter = filter;
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
foreach (var o in source)
if (filter(o)) { Returned++; yield return o; }
else Skipped++;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
foreach (var o in source)
if (filter(o)) { Returned++; yield return o; }
else Skipped++;
}
}
}
Just to make sure I understand your question correctly, I'll use an example:
Take this method:
public static IEnumerable<TSource> Where<TSource>
(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
I assume your question is: why does it return IEnumerable<TSource> and not, for example, an Enumerable.WhereEnumerableIterator<TSource> ?
Note that the type above is the actual runtime type of the object returned, but the method simply declares it to be IEnumerable<TSource>.
The answer is that there would be virtually no benefit in doing otherwise, but there would be a non-zero cost. When the cost is higher than the benefit, don't do it.
Why is there no benefit?
First of all, because there would be many WhereEnumerableIterator<TSource> objects around, that are still statically typed as IEnumerable<TSource>. As a result, method overloading would not work, and just as today, the Where method would have to try to cast its input to a WhereEnumerableIterator<TSource> if it want to optimize the .Where(...).Where(...) sequence. There are several reasons why this is the case, one of them being this pattern:
IEnumerable<whatever> q = source;
if (!string.IsNullOrEmpty(searchText))
{
q = q.Where(item => item.Name.Contains(searchText));
}
if (startDate.HasValue)
{
// What is the runtime type of q? And what is its compiletime type?
q = q.Where(item => item.Date > startDate.Value);
}
The non-zero cost is composed of maintenance and documentation cost (you make it more difficult to change your implementation if you expose it, and you have to document it), and increased complexity for the user.
IEnumerable's only yield a result upon enumeration. Say you have a query such as:
someEnumerable
.Select(a=>new b(a))
.Filter(b=>b.someProp > 10)
.Select(b=>new c(b));
If the return value of the LINQ steps where something with eagerly evaluated contents such as a List<T> then each step would have to fully evaluate in order to pass into the next. With large inputs, this could mean a noticeable wait/lag while performing that step.
LINQ queries return a lazily evaluated IEnumerable<T>. The query is performed upon enumeration. Even if your source IEnumerable<T> had millions of records the above query would be instantaneous.
Edit: Think of LINQ queries as creating a pipeline for your result rather than imperatively creating the result. Enumeration is essentially opening the resulting pipeline and seeing what comes out.
Additionally, IEnumerables are the most upcasted form of a sequence in .NET e.g.
IList<T> :> ICollection<T> :> IEnumerable<T>
This gives you the most flexible interface available.
What would be the most readable way to apply the following to a sequence using linq:
TakeWhile elements are valid but always at least the first element
EDIT: I have updated the title, to be more precise. I'm sorry for any confusion, the answers below have definitely taught me something!
The expected behavior is this: Take while element are valid. If the result is an empty sequence, take the first element anyway.
I think this makes the intention quite clear:
things.TakeWhile(x => x.Whatever).DefaultIfEmpty(things.First());
My earlier, more verbose solution:
var query = things.TakeWhile(x => x.Whatever);
if (!query.Any()) { query = things.Take(1); }
The following works*, and seems pretty well readable to me:
seq.Take(1).Concat(seq.TakeWhile(condition).Skip(1));
There may be a better way, not sure.
*with thanks to #Jeff M for the correction
It would be most efficiently implemented manually as far as I can tell to ensure that it isn't enumerated over more than necessary.
public static IEnumerable<TSource> TakeWhileOrFirst<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
using (var enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
TSource current = enumerator.Current;
yield return current;
if (predicate(current))
{
while (enumerator.MoveNext() && predicate(current = enumerator.Current))
yield return current;
}
}
}
And for the sake of completion, an overload that includes the index:
public static IEnumerable<TSource> TakeWhileOrFirst<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)
{
using (var enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
TSource current = enumerator.Current;
int index = 0;
yield return current;
if (predicate(current, index++))
{
while (enumerator.MoveNext() && predicate(current = enumerator.Current, index++))
yield return current;
}
}
}
DISCLAIMER:
This is a variation of Jeff Ms nice answer and as such is only meant to show the code using do-while instead. It's only provided as an extension to Jeffs answer.
public static IEnumerable<TSource> TakeWhileOrFirst<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
using (var enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
var current = enumerator.Current;
do{
yield return current
} while (predicate(current) &&
enumerator.MoveNext() &&
predicate(current = enumerator.Current) );
}
}
of course it's a matter of style I personally like to have as low a nesting level of my conditional logic as possible but the double use of predicate might be hard to grasp and can be a slight performance hog (depending on optimization and branch prediction)