This question already has answers here:
Why does IQueryable.All() return true on an empty collection?
(11 answers)
Closed 7 years ago.
var strs = new Collection<string>();
bool b = strs.All(str => str == "ABC");
The code creates an empty collection of string, then tries to determine if all the elements in the collection are "ABC".
If you run it, b will be true.
But the collection does not even have any elements in it, let alone any elements that equal to "ABC".
Is this a bug, or is there a reasonable explanation?
It's certainly not a bug. It's behaving exactly as documented:
true if every element of the source sequence passes the test in the specified predicate, or if the sequence is empty; otherwise, false.
Now you can argue about whether or not it should work that way (it seems fine to me; every element of the sequence conforms to the predicate) but the very first thing to check before you ask whether something is a bug, is the documentation. (It's the first thing to check as soon as a method behaves in a way other than what you expected.)
All requires the predicate to be true for all elements of the sequence. This is explicitly stated in the documentation. It's also the only thing that makes sense if you think of All as being like a logical "and" between the predicate's results for each element. The true you're getting out for the empty sequence is the identity element of the "and" operation. Likewise, the false you get from Any for the empty sequence is the identity for logical "or".
If you think of All as "there are no elements in the sequence that are not", this might make more sense.
It is true, as nothing (no condition) makes it false.
The docs probably explain it. (Jon Skeet also mentioned something a few years back)
Same goes for Any (the opposite of All) returning false for empty sets.
Edit:
You can imagine All to be implemented semantically the same as:
foreach (var e in elems)
{
if (!cond(e))
return false;
}
return true; // no escape from loop
Most answers here seem to go along the lines of "because that's how is defined". But there is also a logical reason why is defined this way.
When defining a function, you want your function to be as general as possible, such that it can be applied to the largest possible number of cases. Say, for instance, that I want to define the Sum function, which returns the sum of all the numbers in a list. What should it return when the list is empty? If you'd return an arbitrary number x, you'd define the function as the:
Function that returns the sum of all numbers in the given list, or x if the list is empty.
But if x is zero, you can also define it as the
Function that returns x plus the given numbers.
Note that definition 2 implies definition 1, but 1 does not imply 2 when x is not zero, which by itself is enough reason to pick 2 over 1. But also note 2 is more elegant and, in its own right, more general than 1. Is like placing a spotlight farther away so that it lightens a larger area. A lot larger actually. I'm not a mathematician myself but I'm sure they'll find a ton of connections between definition 2 and other mathematical concepts, but not so many related to definition 1 when x is not zero.
In general, you can, and most likely want to return the identity element (the one that leaves the other operand unchanged) whenever you have a function that applies a binary operator over a set of elements and the set is empty. This is the same reason a Product function will return 1 when the list is empty (note that you could just replace "x plus" with "one times" in definition 2). And is the same reason All (which can be thought of as the repeated application of the logical AND operator) will return true when the list is empty (p && true is equivalent to p), and the same reason Any (the OR operator) will return false.
The method cycles through all elements until it finds one that does not satisfy the condition, or finds none that fail. If none fail, true is returned.
So, if there are no elements, true is returned (since there were none that failed)
Here is an extension that can do what OP wanted to do:
static bool All<T>(this IEnumerable<T> source, Func<T, bool> predicate, bool mustExist)
{
foreach (var e in source)
{
if (!predicate(e))
return false;
mustExist = false;
}
return !mustExist;
}
...and as others have pointed out already this is not a bug but well-documented intended behavior.
An alternative solution if one does not wish to write a new extension is:
strs.DefaultIfEmpty().All(str => str == "ABC");
PS: The above does not work if looking for the default value itself!
(Which for strings would be null.)
In such cases it becomes less elegant with something similar to:
strs.DefaultIfEmpty(string.Empty).All(str => str == null);
If you can enumerate more than once the easiest solution is:
strs.All(predicate) && strs.Any();
i.e simply add a check after that there actually were any element.
Keeping the implementation aside. Does it really matter if it is true? See if you have some code which iterates over the enumerable and executes some code. if All() is true then that code is still not going to run since the enumerable doesn't have any elements in it.
var hungryDogs = Enumerable.Empty<Dog>();
bool allAreHungry = hungryDogs.All(d=>d.Hungry);
if (allAreHungry)
foreach (Dog dog in hungryDogs)
dog.Feed(biscuits); <--- this line will not run anyway.
Related
I have a single list with objects. Let's call it ItemList. Objects have properties such as Name, Code, ParentCode and so on. Some of these items have the same ParentCode and I need to compare the first element's ParentCode with the next one in some conditions. So I am using an if condition like this:
if (ItemList.First().ParentCode != ItemList.ElementAt(1).ParentCode)
However, sometimes this causes some issues because the ItemList can have single element inside it and it throws argument out of range or index out of range exception. To overcome this I changed the code to this:
if (ItemList.Count >= 2 && ItemList.First().ParentCode != ItemList.ElementAt(1).ParentCode)
Sometimes I need to run the same method when the ItemList have only one element or the first element does not have the equal ParentCode with the second element so I use this condition:
if (ItemList.Count == 1 || ItemList.Count >= 2 && ItemList.First().ParentCode != ItemList.ElementAt(1).ParentCode)
All of these seems counterintuitive thus I am open to suggestions on making to code more maintainable and readable. Thanks in advance!
A couple other Linq functions may help here, depending on the use-case. I'm not sure I fully understand why you'd only want to compare the first 2 elements though. What if it has 10 or 100 items? Only the first 2 matter? If it can only ever have 2 elements because of some other business logic, then consider creating a class that holds exactly 2 items, and put the "comparison/validation" logic inside that class. A constructor that accepts 2 parameters, first + second instance, should ensure validity of the wrapper class.
Either way... for a purely LINQ solution...
ItemList.GroupBy(x => x.ParentCode).Where(x => x.Count() > 1) ... will get you a list of "groups" that contain more than 1 duplicate ParentCode. Iterating that will provide you with a "Key" representing the ParentCode or whatever you group by.
ItemList.Skip(1).FirstOrDefault() ... will get you the second element, if it exists, otherwise it will be the default for whatever type is in the list
Is there any specific reason you have to use linq here? I feel like just referencing the objects directly is just as good of a solution.
if(ItemList.Count == 2)
{
if(ItemList[0].ParentCode != ItemList[1].ParentCode)
{
..do stuff..
}
}
Yes, the code isn't as flat, but this is extremely readable. If you need to compare the 0th value to more than just 1st value, checking that the list's length is greater than or equal to 2 and using a for loop will work just fine.
if(ItemList.Count >= 2)
{
for (var i = 1; i < ItemList.Count; i++)
{
if(ItemList[0].ParentCode != ItemList[i].ParentCode)
{
..do stuff..
}
}
}
Only suggesting a non-linq solution because you didn't mention that it had to be linq. This code is still extremely readable and has no difference in performance. Also, as you mentioned in your post, none of this code seems counterintuitive.
I am looking at this code
var numbers = Enumerable.Range(0, 20);
var parallelResult = numbers.AsParallel().AsOrdered()
.Where(i => i % 2 == 0).AsSequential();
foreach (int i in parallelResult.Take(5))
Console.WriteLine(i);
The AsSequential() is supposed to make the resulting array sorted. Actually it is sorted after its execution, but if I remove the call to AsSequential(), it is still sorted (since AsOrdered()) is called.
What is the difference between the two?
AsSequential is just meant to stop any further parallel execution - hence the name. I'm not sure where you got the idea that it's "supposed to make the resulting array sorted". The documentation is pretty clear:
Converts a ParallelQuery into an IEnumerable to force sequential evaluation of the query.
As you say, AsOrdered ensures ordering (for that particular sequence).
I know that this was asked over a year old but here are my two cents.
In the example exposed, i think it uses AsSequential so that the next query operator (in this case the Take operator) it is execute sequentially.
However the Take operator prevent a query from being parallelized, unless the source elements are in their original indexing position, so that is why even when you remove the AsSequential operator, the result is still sorted.
I am trying to build a SeparatedList using a dynamically-generated IEnumerable sequence (which is constructed by an Enumerable.Select() function call). The API function to create a SeparatedList takes two parameters, an IEnumerable<T> and an IEnumerable<SyntaxToken>. I have provided a simple function, Repeat, that is an infinite sequence generator which yields as many commas, in this case, as are requested.
The SeparatedList function appears to consume as many of the first sequence (parameter types here) as there are entries in the second sequence, which messes me up. Have I misunderstood how the function is supposed to work and has anyone else done this? Thanks
Syntax.SeparatedList<ParameterSyntax>(
functionParameterTypes,Repeat(i=>Syntax.Token(SyntaxKind.CommaToken)))
(Edit: I should add that converting the functionParameterTypes to a List<> and passing another List<> with one fewer token than elements in functionParameterTypes does work but I am trying to do this without having to explicitly build the list ahead of time.)
The XML documentation for the separators parameter says:
The number of tokens must be one less than the number of nodes.
You're right that this is not what the method actually requires: The number of tokens must be one less than the number of nodes or same as the number of tokens. I wouldn't be surprised if this was intentional, code like f(foo, bar, ) makes sense if you're trying to handle code that's just being written.
I think that calling ToList() on the sequence of parameters is the best choice here. And you don't have to use another List for separators, you can use Enumerable.Repeat() for that. For example like this (taken from a library I wrote where I faced the same issue):
public static SeparatedSyntaxList<T> ToSeparatedList<T>(
this IEnumerable<T> nodes, SyntaxKind separator = SyntaxKind.CommaToken)
where T : SyntaxNode
{
var nodesList = nodes == null ? new List<T>() : nodes.ToList();
return Syntax.SeparatedList(
nodesList,
Enumerable.Repeat(
Syntax.Token(separator), Math.Max(nodesList .Count - 1, 0)));
}
I also had the same need to create a SeparatedList using a dynamically generated list of parameters. My solution was to use SelectMany() and Take() to add separators (i.e. "comma") to the parameters but then remove the last trailing comma.
SyntaxFactory.SeparatedList<ParameterSyntax>(
functionParameterTypes
.SelectMany(param =>
new SyntaxNodeOrToken[]
{
param,
SyntaxFactory.Token(SyntaxKind.CommaToken)
})
.Take(functionParameterTypes.Count() * 2 - 1)
);
What is the difference between these two Linq queries:
var result = ResultLists().Where( c=> c.code == "abc").FirstOrDefault();
// vs.
var result = ResultLists().FirstOrDefault( c => c.code == "abc");
Are the semantics exactly the same?
Iff sematically equal, does the predicate form of FirstOrDefault offer any theoretical or practical performance benefit over Where() plus plain FirstOrDefault()?
Either is fine.
They both run lazily - if the source list has a million items, but the tenth item matches then both will only iterate 10 items from the source.
Performance should be almost identical and any difference would be totally insignificant.
The second one. All other things being equal, the iterator in the second case can stop as soon as it finds a match, where the first one must find all that match, and then pick the first of those.
Nice discussion, all the above answers are correct.
I didn't run any performance test, whereas on the bases of my experience FirstOrDefault() sometimes faster and optimize as compare to Where().FirstOrDefault().
I recently fixed the memory overflow/performance issue ("neural-network algorithm") and fix was changing Where(x->...).FirstOrDefault() to simply FirstOrDefault(x->..).
I was ignoring the editor's recommendation to change Where(x->...).FirstOrDefault() to simply FirstOrDefault(x->..).
So I believe the correct answer to the above question is
The second option is the best approach in all cases
Where is actually a deferred execution - it means, the evaluation of an expression is delayed until its realized value is actually required. It greatly improves performance by avoiding unnecessary execution.
Where looks kind of like this, and returns a new IEnumerable
foreach (var item in enumerable)
{
if (condition)
{
yield return item;
}
}
FirstOrDefault() returns <T> and not throw any exception or return null when there is no result
Just wondered if any LINQ guru might be able to shed light on how Aggregate and Any work under the hood.
Imagine that I have an IEnumerable which stores the results of testing an array for a given condition. I want to determine whether any element of the array is false. Is there any reason I should prefer one option above the other?
IEnumerable<bool> results = PerformTests();
return results.Any(r => !r); //Option 1
return results.Aggregate((h, t) => h && t); //Option 2
In production code I'd tend towards 1 as it's more obvious but out of curiosity wondered whether there's a difference in the way these are evalulated under the hood.
Yes, definitely prefer option 1 - it will stop as soon as it finds any value which is false.
Option 2 will go through the whole array.
Then there's the readability issue as well, of course :)
Jon beat me again, but to add some more text:
Aggregate always needs to consume the whole IEnumerable<T>, because that's exactly what it's supposed to do: To generate a dataset from your (complete) source.
It's the "Reduce" in the well-known Map/Reduce scenario.