Predicate inside a Where<> query not hit? - c#

I need to call the method for each item in the list. So, I have used Where<> query as follows,
List<string> list = new List<string>();
list.Add("Name1");
list.Add("Name2");
list.Add("Name3");
var name = list.Where(n =>
{
return CheckName(n);
});
But in the above case, CheckName() is not hit. The same method is triggered if I use FirstOrDefault<>. I don't know whether it is a framework break or I am going in a wrong way.
As additional info, I am using.NET Framework 4.5.
Has anyone experienced this error? If so, is there any solution to overcome this issue?

You are understanding incorrectly the result of the Where condition. As linq is deffered executed it will only enter the where condition when materialized (by a ToList/FirstOrDefault/Sum and such).
The Where is never actually materialized in your current code (It did as you experienced when using FirstOrDefault) and as such it will never enter the CheckName method. Then, as Where will never return null but "worst case" an empty collection, which is not null, the result is true.
If you debug you will see that name equals true at the end of this. To "overcome" this depends on what is your desired output:
If you want to know if you have any item that matched the predicate then:
var result = list.Any(CheckName);
If you want to retrieve those that match the predicate:
var result = list.Where(CheckName);
If later you want to query and check if results contains anything then:
if(result.Any()) { /* ... */ }
If you only want the results (and thus materializing the query):
list.Where(CheckName).ToList();
Read more about linq being deffered executed here:
Linq and deffered execution
What are the benefits of a Deferred Execution in LINQ?
Just as a side note see how you can change your current code from:
var name = list.Where(n =>
{
return CheckName(n);
})
To:
var name = list.Where(n => CheckName(n));
And eventually to:
var name = list.Where(CheckName);

LINQ has a Deferred Execution principal which means the query will not be executed until and unless you access name variable. If you want to execute it immediately, (just for example) add .ToList() in the end, which is exactly what FirstOrDefault does. It does immediate execution instead of deferred execution.
var name = list.Where(n =>
{
return CheckName(n);
}).ToList() != null;
Also where condition result will never be null. Even if there is no object in list satisfying your condition(s) in CheckName, where will return an empty collection.

The CheckName() method is not executed because of Deferred execution of Linq. The actual statement is not executed till you actually access it. So in your case, for the CheckName(), you should do something like:
var name = list.Where(n =>
{
return CheckName(n);
}).ToList();

When you look at the Where-Method source code you can easily see why:
internal static IEnumerable<T> Where<T>(this IEnumerable<T> enumerable, Func<T, bool> where) {
foreach (T t in enumerable) {
if (where(t)) {
yield return t;
}
}
}
The yield will cause the execution to only happen once the returned IEnumerable<T> is actually accessed. That is what is called deferred execution.

If you need to call a method for each item in a list then you should use a simple for loop:
foreach var name in list
CheckName(name);
Just because LINQ is available, doesn't mean it should be used everywhere there is a collection. It is important to write code that makes sense and is self commenting and using it here has simultaneously introduced a flaw into your logic and made your code harder to read, understand and maintain. It's the wrong tool for the stated purpose
Doubtless you have additional requirements not stated here, like "I want to check every name in a list and make sure that none are null". You can and possibly should use linq for this but it looks more like
bool allNamesOK = list.All(n => n != null);
This code is compact and reads well; we can clearly see the intention (though I wouldn't call the list "list" - "names" would better)

Related

Why am I able to edit a LINQ list while iterating over it?

I recently came across an issue where I was able to change the IEnumerable object that I was iterating over in a foreach loop. It's my understanding that in C#, you aren't supposed to be able to edit the list you're iterating over, but after some frustration, I found that this is exactly what was happening. I basically looped through a LINQ query and used the object IDs to make changes in the database on those objects and those changes affected the values in the .Where() statement.
Does anybody have an explanation for this? It seems like the LINQ query re-runs every time it's iterated over
NOTE: The fix for this is adding .ToList() after the .Where(), but my question is why this issue is happening at all i.e. if it's a bug or something I'm unaware of
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var i = 3;
var linqObj = aArray.Where(x => x == "a");
foreach (var item in linqObj ) {
aArray[i] = "b";
i--;
}
foreach (var arrItem in aArray) {
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}
This code is just a reproducible mockup, but I'd expect it to loop through 4 times and change all of the strings in aArray into b's. However, it only loops through twice and turns the last two strings in aArray into b's
EDIT: After some feedback and to be more concise, my main question here is this: "Why am I able to change what I'm looping over as I'm looping over it". Looks like the overwhelming answer is that LINQ does deferred execution, so it's re-evaluating as I'm looping through the LINQ IEnumerable.
EDIT 2: Actually looking through, it seems that everyone is concerned with the .Count() function, thinking that is what the issue here is. However, you can comment out that line and I still have the issue of the LINQ object changing. I updated the code to reflect the main issue
Why am I able to edit a LINQ list while iterating over it?
All of the answers that say that this is because of deferred "lazy" execution are wrong, in the sense that they do not adequately address the question that was asked: "Why am I able to edit a list while iterating over it?" Deferred execution explains why running the query twice gives different results, but does not address why the operation described in the question is possible.
The problem is actually that the original poster has a false belief:
I recently came across an issue where I was able to change the IEnumerable object that I was iterating over in a foreach loop. It's my understanding that in C#, you aren't supposed to be able to edit the list you're iterating over
Your understanding is wrong, and that's where the confusion comes from. The rule in C# is not "it is impossible to edit an enumerable from within an enumeration". The rule is you are not supposed to edit an enumerable from within an enumeration, and if you choose to do so, arbitrarily bad things can happen.
Basically what you're doing is running a stop sign and then asking "Running a stop sign is illegal, so why did the police not prevent me from running the stop sign?" The police are not required to prevent you from doing an illegal act; you are responsible for not making the attempt in the first place, and if you choose to do so, you take the chance of getting a ticket, or causing a traffic accident, or any other bad consequence of your poor choice. Usually the consequences of running a stop sign are no consequences at all, but that does not mean that it's a good idea.
Editing an enumerable while you're enumerating it is a bad practice, but the runtime is not required to be a traffic cop and prevent you from doing so. Nor is it required to flag the operation as illegal with an exception. It may do so, and sometimes it does do so, but there is not a requirement that it does so consistently.
You've found a case where the runtime does not detect the problem and does not throw an exception, but you do get a result that you find unexpected. That's fine. You broke the rules, and this time it just happens that the consequence of breaking the rules was an unexpected outcome. The runtime is not required to make the consequence of breaking the rules into an exception.
If you tried to do the same thing where, say, you called Add on a List<T> while enumerating the list, you'd get an exception because someone wrote code in List<T> that detects that situation.
No one wrote that code for "linq over an array", and so, no exception. The authors of LINQ were not required to write that code; you were required to not write the code you wrote! You chose to write a bad program that violates the rules, and the runtime is not required to catch you every time you write a bad program.
It seems like the LINQ query re-runs every time it's iterated over
That is correct. A query is a question about a data structure. If you change that data structure, the answer to the question can change. Enumerating the query answers the question.
However, that is an entirely different issue than the one in the title of your question. You really have two questions here:
Why can I edit an enumerable while I am enumerating it?
You can do this bad practice because nothing stops you from writing a bad program except your good sense; write better programs that do not do this!
Does a query re-execute from scratch every time I enumerate it?
Yes; a query is a question, not an answer. An enumeration of the query is an answer, and the answer can change over time.
The explanation to your first question, why your LINQ query re-runs every time it's iterated over is because of Linq's deferred execution.
This line just declares the linq exrpession and does not execute it:
var linqLIST = aArray.Where(x => x == "a");
and this is where it gets executed:
foreach (var arrItem in aArray)
and
Console.WriteLine(linqList.Count());
An explict call ToList() would run the Linq expression immediately. Use it like this:
var linqList = aArray.Where(x => x == "a").ToList();
Regarding the edited question:
Of course, the Linq expression is evaluated in every foreach iteration. The issue is not the Count(), instead every call to the LINQ expression re-evaluates it. As mentioned above, enumerate it to a List and iterate over the list.
Late edit:
Concerning #Eric Lippert's critique, I will also refer and go into detail for the rest of the OP's questions.
//Why does this only print out 2 a's and 2 b's, rather than 4 b's?
In the first loop iteration i = 3, so after aArray[3] = "b"; your array will look like this:
{ "a", "a", "a", "b" }
In the second loop iteration i(--) has now the value 2 and after executing aArray[i] = "b"; your array will be:
{ "a", "a", "b", "b" }
At this point, there are still a's in your array but the LINQ query returns IEnumerator.MoveNext() == false and as such the loop reaches its exit condition because the IEnumerator internally used, now reaches the third position in the index of the array and as the LINQ is re-evaluated it doesn't match the where x == "a" condition any more.
Why am I able to change what I'm looping over as I'm looping over it?
You are able to do so because the build in code analyser in Visual Studio is not detecting that you modify the collection within the loop. At runtime the array is modified, changing the outcome of the LINQ query but there is no handling in the implementation of the array iterator so no exception is thrown.
This missing handling seems by design, as arrays are of fixed size oposed to lists where such an exception is thrown at runtime.
Consider following example code which should be equivalent with your initial code example (before edit):
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var iterationList = aArray.Where(x => x == "a").ToList();
foreach (var item in iterationList)
{
var index = iterationList.IndexOf(item);
iterationList.Remove(item);
iterationList.Insert(index, "b");
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem);
}
Console.ReadKey();
}
}
}
This code will compile and iterate the loop once before throwing an System.InvalidOperationException with the message:
Collection was modified; enumeration operation may not execute.
Now the reason why the List implementation throws this error while enumerating it, is because it follows a basic concept: For and Foreach are iterative control flow statements that need to be deterministic at runtime. Furthermore the Foreach statement is a C# specific implementation of the iterator pattern, which defines an algorithm that implies sequential traversal and as such it would not change within the execution. Thus the List implementation throws an exception when you modify the collection while enumerating it.
You found one of the ways to modify a loop while iterating it and re-eveluating it in each iteration. This is a bad design choice because you might run into an infinite loop if the LINQ expression keeps changing the results and never meets an exit condition for the loop. This will make it hard to debug and will not be obvious when reading the code.
In contrast there is the while control flow statement which is a conditional construct and is ment to be non-deterministic at runtime, having a specific exit condition that is expected to change while execution.
Consider this rewrite base on your example:
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
bool arrayHasACondition(string x) => x == "a";
while (aArray.Any(arrayHasACondition))
{
var index = Array.FindIndex(aArray, arrayHasACondition);
aArray[index] = "b";
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}
I hope this should outline the technical background and explain your false expectations.
Enumerable.Where returns an instance that represents a query definition. When it is enumerated*, the query is evaluted. foreach allows you to work with each item at the time it is found by the query. The query is deferred, but it also pause-able/resume-able, by the enumeration mechanisms.
var aArray = new string[] { "a", "a", "a", "a" };
var i = 3;
var linqObj = aArray.Where(x => x == "a");
foreach (var item in linqObj )
{
aArray[i] = "b";
i--;
}
At the foreach loop, linqObj is enumerated* and the query is started.
The first item is examined and a match is found. The query is paused.
The loop body happens: item="a", aArray[3]="b", i=2
Back to the foreach loop, the query is resumed.
The second item is examined and a match is found. The query is paused.
The loop body happens: item="a", aArray[2]="b", i=2
Back to the foreach loop, the query is resumed.
The third item is examined and is "b", not a match.
The fourth item is examined and is "b", not a match.
The loop exits and the query concludes.
Note: is enumerated* : this means GetEnumerator and MoveNext are called. This does not mean that the query is fully evaluated and results held in a snapshot.
For further understanding, read up on yield return and how to write a method that uses that language feature. If you do this, you'll understand what you need in order to write Enumerable.Where
IEnumerable in c# is lazy. This means whenever you force it to evaluate you get the result. In your case Count() forces the linqLIST to evaluate every time you call it. by the way, linqLIST is not a list right now
You could upgrade the «avoid side-effects while enumerating an array» advice to a requirement, by utilizing the extension method below:
private static IEnumerable<T> DontMessWithMe<T>(this T[] source)
{
var copy = source.ToArray();
return source.Zip(copy, (x, y) =>
{
if (!EqualityComparer<T>.Default.Equals(x, y))
throw new InvalidOperationException(
"Array was modified; enumeration operation may not execute.");
return x;
});
}
Now chain this method to your query and watch what happens. 😃
var linqObj = aArray.DontMessWithMe().Where(x => x == "a");
Of course this comes with a cost. Now every time you enumerate the array, a copy is created. This is why I don't expect that anyone will use this extension, ever!

Linq Select query which returns a task enumerates multiple times

I have a linq query which returns a task object and stores it in an IEnumerable. For some reason the select query keeps enumerating until the task is started or finished (I think, it's hard to debug).
The query is pretty straight forward:
Context.RetrieveDataTasks = retrievableProducts.Select(product => Context.HostController.RetrieveProductDataFiles(product));
Where the signature for RetrieveProductDataFiles is :
public Task RetrieveProductDataFiles(IProduct product)
The retrievableProducts is in this case a list of 1 product:
var retrievableProducts = products
.Where(product => AFancyButIrrelevantClause)
.ToList();
I don't mind to rewrite the code to a foreach loop where I fill a new list manually to avoid this problem, but I'd like to understand why the select query keeps executing. I think it has something to do with the task which is waiting for activation, but I have no idea why that would cause this.
Edit:
Just to be complete, I'd expect that above code works exactly the same as :
var retrievableDataTasks = new List<Task>();
foreach (var product in retrievableProducts)
{
retrievableDataTasks.Add(Context.HostController.RetrieveProductDataFiles(product));
}
Context.RetrieveDataTasks = retrievableDataTasks;
While the construction with a foreach does exactly what I expect: it populates a list with tasks (in this specific case a list of 1 task) and this task is executed once. While in the construction with the Select query that same 1 task is started over and over again.
I hope it's clear enough with the code I provided, looking forward to learn why the select query behaves differently (and if possible, how to avoid it from happening).
The correct answer to the posted question is by "bas" himself. Every time you reference the IEnumerable it re-evaluates the expression inside of "Select" and thus starts the tasks again. "ToList" would actually fix the problem because it would stop evaluating them.
Using 'ToList' forces the iterator to iterate through all the collection, even though you think you said 'simply give me the first two items in the collection'. If that said collection has 1000 elements, you'll iterate on that collection until you've reached the last item, and it'll still give you 2 elements.
You consume an iterator method by using a foreach statement or LINQ query. Each iteration of the foreach loop calls the iterator method. When a yield return statement is reached in the iterator method, expression is returned, and the current location in code is retained. Execution is restarted from that location the next time that the iterator function is called.
In your method where you instantiate a list where you add to it, you'd need to improve a little to use yield returns and thus, not allocate data that doesn't need to be allocated. LINQ methods are lazy evaluated, which means that there won't be any memory allocation for data until you try to materialize the results (ToList for instance). While you're in your LINQ method, the only memory usage you get is for the current iteration, not for everything that's found in your collection.
Let's say use the following code snippet to help you.
private static IEnumerable<Product> GetMyProducts(IEnumerable<Product> products, bool AFancyButIrrelevantClause)
{
foreach(var product in products)
{
if(AFancyButIrrelevantClause)
yield return product;
}
}
or directly in LINQ to be more concise:
products.Where(product => AFancyButIrrelevantClause)

Using LINQ Where result in foreach: hidden if statement, double foreach?

foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}
Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.
To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.
It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.

deferred execution or not

Are below comments correct about DEFERRED EXECUTION?
1. var x = dc.myTables.Select(r=>r);//yes
2. var x = dc.myTables.Where(..).Select(r=>new {..});//yes
3. var x = dc.myTables.Where(..).Select(r=>new MyCustomClass {..});//no
In other words, I always thought projecting custom class objects will always cause eager execution. But I couldn't find references supporting/denying it (though I am seeing results contradicting it, hence the post)
Every statement in your question is an example of deferred execution. The contents of the Select and Where statement have no effect on whether or not the resulting value is deferred executed or not. The Select + Where statements themselves dictate that.
As a counter example consider the Sum method. This is always eagerly executed irrespective of what the input is.
var sum = dc.myTables.Sum(...); // Always eager
To prove your point, your test should look like this:
var tracer = string.Empty;
Func<inType, outType> map = r => {
tracer = "set";
return new outType(...);
}
var x = dc.myTables.Where(..).Select(map);
// this confirms x was never enumerated as tracer would be "set".
Assert.AreEqual(string.Empty, tracer);
// confirm that it would have enumerated if it could
CollectionAssert.IsNotEmpty(x);
It has been my observation that the only way to force execution right away is to force iteration of the collection. I do this by calling .ToArray() on my LINQ.
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray doesn't because they can't.
from here
.Select(...) is always deferred.
When you're working with IQueryable<T>, this and other deferred execution methods build up an expression tree and this isn't ever compiled into an actual executable expression until it's iterated. That is, you need to:
Do a for-each on the projected enumerable.
Call a method that internally enumerates the enumerable (i.e. .Any(...), .Count(...), .ToList(...), ...).

Weird issue about IEnumerable<T> collection

I have following code:
IEnumerable<TreeItem> rootTreeItems = BuildRootTreeItems();
BuildTreeView(rootTreeItems.ElementAt(0));
private static void BuildTreeView(TreeItem treeItem)
{
TreeItem subMenuTreeItem = new TreeItem();
subMenuTreeItem.Header = "1";
TreeItem subMenuTreeItem2 = new TreeItem();
subMenuTreeItem.Header = "2";
treeItem.TreeItems.Add(subMenuTreeItem);
treeItem.TreeItems.Add(subMenuTreeItem2);
}
The weird thing is after the BuildTreeView returns, the first element of rootTreeItems doesn't have any children nodes, while it really has when debugging into the BuildTreeView method.
This problem really confused me for quite a long time, any one has any idea? thanks so much.
You're most likely hitting a deferred execution issue with IEnumerable<>. The thing to remember is that your IEnumerable<TreeItem> rootTreeItems is not a list, instead it is a promise to get a list each and every time it is asked to do so.
So, if BuildRootTreeItems() creates the IEnumerable<TreeItem> using a LINQ query and it doesn't force the execution of the query using .ToList() or .ToArray() then each time that you use rootTreeItems you are re-executing the query inside BuildRootTreeItems()
Calling rootTreeItems.ElementAt(0) will cause the query to execute. If later you try to call rootTreeItems.ElementAt(0) again then you are re-executing the query and getting back a different instance of the first TreeItem.
Try changing the first line like so:
IEnumerable<TreeItem> rootTreeItems = BuildRootTreeItems().ToArray();
This forces the execution and will prevent re-execution later. I'll bet your problem goes away.
There is possibility what your BuildRootTreeItems() method returns IEnumerable interface which elements are created via yield statement or as Gabe mentioned in comment above an implementation of IEnumerable which is created via Linq expressions chain containing Select method.
This can lead to recreating elements of IEnumerable on each access to any element from enumeration or iterating via it using Linq expressions or foreach statement.
I would go simplier:
This also could happen if
A) TreeItem is a value type(struct)
B) TreeItem.TreeItems returns a new collection
But correctness of this is difficult to deduct just from code provided.

Categories