Does Any() stop on success? - c#

To be more specific: will the Linq extension method Any(IEnumerable collection, Func predicate) stop checking all the remaining elements of the collections once the predicate has yielded true for an item?
Because I don't want to spend to much time on figuring out if I need to do the really expensive parts at all:
if(lotsOfItems.Any(x => x.ID == target.ID))
//do expensive calculation here
So if Any is always checking all the items in the source this might end up being a waste of time instead of just going with:
var candidate = lotsOfItems.FirstOrDefault(x => x.ID == target.ID)
if(candicate != null)
//do expensive calculation here
because I'm pretty sure that FirstOrDefault does return once it got a result and only keeps going through the whole Enumerable if it does not find a suitable entry in the collection.
Does anyonehave information about the internal workings of Any, or could anyone suggest a solution for this kind of decision?
Also, a colleague suggested something along the lines of:
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but I'm not sure on that, so if anyone could shed some light on this as well it would be appreciated.

As we see from the source code, Yes:
internal static bool Any<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) {
return true; // Attention to this line
}
}
return false;
}
Any() is the most efficient way to determine whether any element of a sequence satisfies a condition with LINQ.
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID)) since this is supposed to
stop once the conditions returns false for the first time but i'm not
sure on that, so if anyone could shed some light on this as well it
would be appreciated :>]
All() determines whether all elements of a sequence satisfy a condition. So, the enumeration of source is stopped as soon as the result can be determined.
Additional note:
The above is true if you are using Linq to objects. If you are using Linq to Database, then it will create a query and will execute it against database.

You could test it yourself: https://ideone.com/nIDKxr
public static IEnumerable<int> Tester()
{
yield return 1;
yield return 2;
throw new Exception();
}
static void Main(string[] args)
{
Console.WriteLine(Tester().Any(x => x == 1));
Console.WriteLine(Tester().Any(x => x == 2));
try
{
Console.WriteLine(Tester().Any(x => x == 3));
}
catch
{
Console.WriteLine("Error here");
}
}
Yes, it does :-)
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but i'm not sure on that, so if anyone could shed some light on this as well it would be appreciated :>]
Using the same reasoning, All() could continue even if one of the element returns false :-) No, even All() is programmed correctly :-)

It does whatever is the quickest way of doing what it has to do.
When used on an IEnumerable this will be along the lines of:
foreach(var item in source)
if(predicate(item))
return true;
return false;
Or for the variant that doesn't take a predicate:
using(var en = source.GetEnumerator())
return en.MoveNext();
When run against at database it will be something like
SELECT EXISTS(SELECT null FROM [some table] WHERE [some where clause])
And so on. How that was executed would depend in turn on what indices were available for fulfilling the WHERE clause, so it could be a quick index lookup, a full table scan aborting on first match found, or an index lookup followed by a partial table scan aborting on first match found, depending on that.
Yet other Linq providers would have yet other implementations, but generally the people responsible will be trying to be at least reasonably efficient.
In all, you can depend upon it being at least slightly more efficient than calling FirstOrDefault, as FirstOrDefault uses similar approaches but does have to return a full object (perhaps constructing it). Likewise !All(inversePredicate) tends to be pretty much on a par with Any(predicate) as per this answer.
Single is an exception to this
Update: The following from this point on no longer applies to .NET Core, which has changed the implementation of Single.
It's important to note that in the case of linq-to objects, the overloads of Single and SingleOrDefault that take a predicate do not stop on identified failure. While the obvious approach to Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) would be something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
using(var en = source.GetEnumerator())
while(en.MoveNext())
{
var val = en.Current;
if(predicate(val))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("too many matching items");
return val;
}
}
throw new InvalidOperationException("no matching items");
}
The actual implementation is something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
var result = default(TSource);
long tally = 0;
for(var item in source)
if(predicate(item))
{
result = item;
checked{++tally;}
}
switch(tally)
{
case 0:
throw new InvalidOperationException("no matching items");
case 1:
return result;
default:
throw new InvalidOperationException("too many matching items");
}
}
Now, while successful Single will have to scan everything, this can mean that an unsucessful Single is much, much slower than it needs to (and can even potentially throw an undocumented error) and if the reason for the unexpected duplicate is a bug which is duplicating items into the sequence - and hence making it far larger than it should be, then the Single that should have helped you find that problem is now dragging away through this.
SingleOrDefault has the same issue.
This only applies to linq-to-objects, but it remains safer to do .Where(predicate).Single() rather than Single(predicate).

Any stops at the first match. All stops at the first non-match.
I don't know whether the documentation guarantees that but this behavior is now effectively fixed for all time due to compatibility reasons. It also makes sense.

Yes it stops when the predicate is satisfied once. Here is code via RedGate Reflector:
[__DynamicallyInvokable]
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
foreach (TSource local in source)
{
if (predicate(local))
{
return true;
}
}
return false;
}

Related

Looking for alternative LINQ expression(s)

I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?
The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.
You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).

C# SkipWhile leaks memory if predicate is false

// C:\logs\AzureSDK.log is ~2.5GB file
IEnumerable<string> lines = File.ReadLines(#"C:\logs\AzureSDK.log").SkipWhile(line => false);
Console.WriteLine(string.Join("\n", lines));
return;
This clearly does not return an iterator and allocates memory internally until I get OOM. Returning true in SkipWhile predicate does not lead to this and completes as expected (couple MB memory usage during the execution)
As per docs, method signature and common sense, SkipWhile must return an iterator and not load all the data into memory.
Machine info
Microsoft Windows [Version 10.0.14393]
Target 4.5.2, AnyCPU, Release
VS 2015 Update 3
NET 4.6.01586
Thoughts? I must be doing something stupid but unsure what
UPD: well the stupid thing was the string.Join I forgot about, which is appending to a single StringBuilder loading all the lines into memory.
I also checked SkipWhile sources and it's obviously perfectly fine:
public static IEnumerable<TSource> SkipWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
return SkipWhileIterator<TSource>(source, predicate);
}
static IEnumerable<TSource> SkipWhileIterator<TSource>(IEnumerable<TSource> source, Func<TSource, bool> predicate) {
bool yielding = false;
foreach (TSource element in source) {
if (!yielding && !predicate(element)) yielding = true;
if (yielding) yield return element;
}
}
SkipWhile does return an enumerator. But then you use string.Join to concatenate everything, and therefore end up loading the whole file into memory.
If you change your code to process each line independently, you'll see that you use much less memory:
foreach (var line in File.ReadLines(#"C:\logs\AzureSDK.log").SkipWhile(_ => false))
{
Console.WriteLine(line);
}
Your error is not on the SkipWhile, when you pass true in it is causing it to skip every line - returning no results for your join.
string.Join is causing the out of memory exception because it's trying to allocate a string that is 2.5gb in length.

Is this a bug in resharper?

I have me some Resharper squiggles here.
and they tell me that I have a possible multiple enumeration of IEnumerable going on. However you can see that this is not true. final is explicitly declared as a list ( List<Point2D> ) and pointTangents is declared previously as List<PointVector2D>
Any idea on why Resharper might be telling me this?
Edit Experiments To See If I can replicate with simpler code
As you can see below there are no squiggles and no warnings even though Bar is declared to take IEnumerable as arg.
Looks a lot like RSRP-429474 False-positive warning for possible multiple enumeration :
I have this code:
List<string> duplicateLabelsList = allResourcesLookup.SelectMany(x => x).Select(x => x.LoaderOptions.Label).Duplicates<string, string>().ToList(); ;
if (duplicateLabelsList.Any())
throw new DuplicateResourceLoaderLabelsException(duplicateLabelsList);
For both usages of duplicateLabelsList, I'm being warned about
possible multiple enumeration, despite the fact I've called ToList and
therefore there should be no multiple enumeration.
which (currently) has a Fix Version of 9.2, which (currently) isn't yet released.
The extension method public static TSource Last<TSource>(this IEnumerable<TSource> source); is defined for the type IEnumberable<TSource>.
If one looks at the implentation of Last<TSource>:
public static TSource Last<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
int count = list.Count;
if (count > 0) return list[count - 1];
}
else
{
using (IEnumerator<TSource> e = source.GetEnumerator())
{
if (e.MoveNext())
{
TSource result;
do
{
result = e.Current;
} while (e.MoveNext());
return result;
}
}
}
throw Error.NoElements();
}
It is clear that if source implements IList then source is not enumerated and therefore your assumption that this is a "bug" in Resharper is correct.
I'd consider it more like a false positive probably due to the fact that Resharper has no general way to know that Last()'s implementation avoids unnecessary enumerations. It is probably deciding to flag the potential multiple enumeration based on the fact that Last<TSource> is defined for typed IEnumerable<T> objects.

How do I verify a collection of values is unique (contains no duplicates) in C#

Surely there is an easy way to verify a collection of values has no duplicates [using the default Comparison of the collection's Type] in C#/.NET ? Doesn't have to be directly built in but should be short and efficient.
I've looked a lot but I keep hitting examples of using collection.Count() == collection.Distinct().Count() which for me is inefficient. I'm not interested in the result and want to bail out as soon as I detect a duplicate, should that be the case.
(I'd love to delete this question and/or its answer if someone can point out the duplicates)
Okay, if you just want to get out as soon as the duplicate is found, it's simple:
// TODO: add an overload taking an IEqualityComparer<T>
public bool AllUnique<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var distinctItems = new HashSet<T>();
foreach (var item in source)
{
if (!distinctItems.Add(item))
{
return false;
}
}
return true;
}
... or use All, as you've already shown. I'd argue that this is slightly simpler to understand in this case... or if you do want to use All, I'd at least separate the creation of the set from the method group conversion, for clarity:
public static bool IsUnique<T>(this IEnumerable<T> source)
{
// TODO: validation
var distinctItems = new HashSet<T>();
// Add will return false if the element already exists. If
// every element is actually added, then they must all be unique.
return source.All(distinctItems.Add);
}
Doing it inline, you can replace:
collection.Count() == collection.Distinct().Count()
with
collection.All( new HashSet<T>().Add );
(where T is the type of your collection's elements)
Or you can extract the above to a helper extension method[1] so you can say:
collection.IsUnique()
[1]
static class EnumerableUniquenessExtensions
{
public static bool IsUnique<T>(this IEnumerable<T> that)
{
return that.All( new HashSet<T>().Add );
}
}
(and as Jon has pointed out in his answer, one really should separate and comment the two lines as such 'cuteness' is generally Not A Good Idea)

Linq + foreach loop optimization

So I recently found myself writing a loop similar to this one:
var headers = new Dictionary<string, string>();
...
foreach (var header in headers)
{
if (String.IsNullOrEmpty(header.Value)) continue;
...
}
Which works fine, it iterates through the dictionary once and does all I need it to do. However, my IDE is suggesting this as a more readable / optimized alternative, but I disagree:
var headers = new Dictionary<string, string>();
...
foreach (var header in headers.Where(header => !String.IsNullOrEmpty(header.Value)))
{
...
}
But wont that iterate through the dictionary twice? Once to evaluate the .Where(...) and then once for the for-each loop?
If not, and the second code example only iterates the dictionary once, please explain why and how.
The code with continue is about twice as fast.
I ran the following code in LINQPad, and the results consistently say that the clause with continue is twice as fast.
void Main()
{
var headers = Enumerable.Range(1,1000).ToDictionary(i => "K"+i,i=> i % 2 == 0 ? null : "V"+i);
var stopwatch = new Stopwatch();
var sb = new StringBuilder();
stopwatch.Start();
foreach (var header in headers.Where(header => !String.IsNullOrEmpty(header.Value)))
sb.Append(header);
stopwatch.Stop();
Console.WriteLine("Using LINQ : " + stopwatch.ElapsedTicks);
sb.Clear();
stopwatch.Reset();
stopwatch.Start();
foreach (var header in headers)
{
if (String.IsNullOrEmpty(header.Value)) continue;
sb.Append(header);
}
stopwatch.Stop();
Console.WriteLine("Using continue : " + stopwatch.ElapsedTicks);
}
Here are some of the results I got
Using LINQ : 1077
Using continue : 348
Using LINQ : 939
Using continue : 459
Using LINQ : 768
Using continue : 382
Using LINQ : 1256
Using continue : 457
Using LINQ : 875
Using continue : 318
In general LINQ is always going to be slower when working with an already evaluated IEnumerable<T>, than the foreach counterpart. The reason is that LINQ-to-Objects is just a high-level wrapper of these lower level language features. The benefit to using LINQ here is not performance, but the provision of a consistent interface. LINQ absolutely does provide performance benefits, but they come into play when you are working with resources that are not already in active memory (and allow you to leverage the ability to optimize the code that is actually executed). When the alternative code is the most optimal alternative, then LINQ just has to go through a redundant process to call the same code you would have written anyway. To illustrate this, I'm going to paste the code below that is actually called when you use LINQ's Where operator on a loaded enumerable:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
if (source is Iterator<TSource>)
{
return ((Iterator<TSource>) source).Where(predicate);
}
if (source is TSource[])
{
return new WhereArrayIterator<TSource>((TSource[]) source, predicate);
}
if (source is List<TSource>)
{
return new WhereListIterator<TSource>((List<TSource>) source, predicate);
}
return new WhereEnumerableIterator<TSource>(source, predicate);
}
And here is the WhereSelectEnumerableIterator<TSource,TResult> class. The predicate field is the delegate that you pass into the Where() method. You will see where it actually gets executed in the MoveNext method (as well as all the redundant null checks). You will also see that the enumerable is only looped through once. Stacking where clauses will result in the creation of multiple iterator classes (wrapping their predecessors), but will not result in multiple enumeration actions (due to deferred execution). Keep in mind that when you write a Lambda like this, you are also actually creating a new Delegate instance (also affecting your performance in a minor way).
private class WhereSelectEnumerableIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
private IEnumerator<TSource> enumerator;
private Func<TSource, bool> predicate;
private Func<TSource, TResult> selector;
private IEnumerable<TSource> source;
public WhereSelectEnumerableIterator(IEnumerable<TSource> source, Func<TSource, bool> predicate, Func<TSource, TResult> selector)
{
this.source = source;
this.predicate = predicate;
this.selector = selector;
}
public override Enumerable.Iterator<TResult> Clone()
{
return new Enumerable.WhereSelectEnumerableIterator<TSource, TResult>(this.source, this.predicate, this.selector);
}
public override void Dispose()
{
if (this.enumerator != null)
{
this.enumerator.Dispose();
}
this.enumerator = null;
base.Dispose();
}
public override bool MoveNext()
{
switch (base.state)
{
case 1:
this.enumerator = this.source.GetEnumerator();
base.state = 2;
break;
case 2:
break;
default:
goto Label_007C;
}
while (this.enumerator.MoveNext())
{
TSource current = this.enumerator.Current;
if ((this.predicate == null) || this.predicate(current))
{
base.current = this.selector(current);
return true;
}
}
this.Dispose();
Label_007C:
return false;
}
public override IEnumerable<TResult2> Select<TResult2>(Func<TResult, TResult2> selector)
{
return new Enumerable.WhereSelectEnumerableIterator<TSource, TResult2>(this.source, this.predicate, Enumerable.CombineSelectors<TSource, TResult, TResult2>(this.selector, selector));
}
public override IEnumerable<TResult> Where(Func<TResult, bool> predicate)
{
return (IEnumerable<TResult>) new Enumerable.WhereEnumerableIterator<TResult>(this, predicate);
}
}
I personally think the performance difference is completely justifiable, because LINQ code is much easier to maintain and reuse. I also do things to offset the performance issues (like declaring all my anonymous lambda delegates and expressions as static readonly fields in a common class). But in reference to your actual question, your continue clause is definitely faster than the LINQ alternative.
No it won't iterate through it twice. the .Where does not actually evaluate by itself. The foreach actually pulls out each element from the where that satisfies the clause.
Similarly a headers.Select(x) doesn't actually process anything until you put a .ToList() or something behind it that forces it to evaluate.
EDIT:
To explain it a bit more, as Marcus pointed out, the .Where returns an iterator so each element is iterated over and the expression is processed once, if it matches then it goes into the body of the loop.
i think the second example will only iterates the dict once.
because what the header.Where(...) returns is exactly a "iterator", rather than a temporary value, every time the loop iterates, it will use the filter which is defined in Where(...), which make the one-time-iteration work.
However, i am not a sophisticated C# coder, i am not sure how C# will deal with such situation, but i think things should be the same.

Categories