Why does this method result in an infinite loop? - c#

One of my coworkers came to me with a question about this method that results in an infinite loop. The actual code is a bit too involved to post here, but essentially the problem boils down to this:
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
items = items.Select(item => items.First(i => i == item));
return items;
}
This should (you would think) just be a very inefficient way to create a copy of a list. I called it with:
var foo = GoNuts(new[]{1,2,3,4,5,6});
The result is an infinite loop. Strange.
I think that modifying the parameter is, stylistically a bad thing, so I changed the code slightly:
var foo = items.Select(item => items.First(i => i == item));
return foo;
That worked. That is, the program completed; no exception.
More experiments showed that this works, too:
items = items.Select(item => items.First(i => i == item)).ToList();
return items;
As does a simple
return items.Select(item => .....);
Curious.
It's clear that the problem has to do with reassigning the parameter, but only if evaluation is deferred beyond that statement. If I add the ToList() it works.
I have a general, vague, idea of what's going wrong. It looks like the Select is iterating over its own output. That's a little bit strange in itself, because typically an IEnumerable will throw if the collection it's iterating changes.
What I don't understand, because I'm not intimately familiar with the internals of how this stuff works, is why re-assigning the parameter causes this infinite loop.
Is there somebody with more knowledge of the internals who would be willing to explain why the infinite loop occurs here?

The key to answering this is deferred execution. When you do this
items = items.Select(item => items.First(i => i == item));
you do not iterate the items array passed into the method. Instead, you assign it a new IEnumerable<int>, which references itself back, and starts iterating only when the caller starts enumerating the results.
That is why all your other fixes have dealt with the problem: all you needed to do is to stop feeding IEnumerable<int> back to itself:
Using var foo breaks self-reference by using a different variable,
Using return items.Select... breaks self-reference by not using intermediate variables at all,
Using ToList() breaks self-reference by avoiding deferred execution: by the time items is re-assigned, old items has been iterated over, so you end up with a plain in-memory List<int>.
But if it's feeding on itself, how does it get anything at all?
That's right, it does not get anything! The moment you try iterating items and ask it for the first item, the deferred sequence asks the sequence fed to it for the first item to process, which means that the sequence is asking itself for the first item to process. At this point, it's turtles all the way down, because in order to return the first item to process the sequence must first get the first item to process from itself.

It looks like the Select is iterating over its own output
You are correct. You are returning a query that iterates over itself.
The key is that you reference items within the lambda. The items reference is not resolved ("closed over") until the query iterates, at which point items now references the query instead of the source collection. That's where the self-reference occurs.
Picture a deck of cards with a sign in front of it labelled items. Now picture a man standing beside the deck of cards whose assignment is to iterate the collection called items. But then you move the sign from the deck to the man. When you ask the man for the first "item" - he looks for the collection marked "items" - which is now him! So he asks himself for the first item, which is where the circular reference occurs.
When you assign the result to a new variable, you then have a query that iterates over a different collection, and so does not result in an infinite loop.
When you call ToList, you hydrate the query to a new collection and also do not get an infinite loop.
Other things that would break the circular reference:
Hydrating items within the lambda by calling ToList
Assigning items to another variable and referencing that within the lambda.

After studying the two answers given and poking around a bit, I came up with a little program that better illustrates the problem.
private int GetFirst(IEnumerable<int> items, int foo)
{
Console.WriteLine("GetFirst {0}", foo);
var rslt = items.First(i => i == foo);
Console.WriteLine("GetFirst returns {0}", rslt);
return rslt;
}
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
items = items.Select(item =>
{
Console.WriteLine("Select item = {0}", item);
return GetFirst(items, item);
});
return items;
}
If you call that with:
var newList = GoNuts(new[]{1, 2, 3, 4, 5, 6});
You'll get this output repeatedly until you finally get StackOverflowException.
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
...
What this shows is exactly what dasblinkenlight made clear in his updated answer: the query goes into an infinite loop trying to get the first item.
Let's write GoNuts a slightly different way:
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
var originalItems = items;
items = items.Select(item =>
{
Console.WriteLine("Select item = {0}", item);
return GetFirst(originalItems, item);
});
return items;
}
If you run that, it succeeds. Why? Because in this case it's clear that the call to GetFirst is passing a reference to the original items that were passed to the method. In the first case, GetFirst is passing a reference to the new items collection, which hasn't yet been realized. In turn, GetFirst says, "Hey, I need to enumerate this collection." And thus begins the first recursive call that eventually leads to StackOverflowException.
Interestingly, I was right and wrong when I said that it was consuming its own output. The Select is consuming the original input, as I would expect. The First is trying to consume the output.
Lots of lessons to be learned here. To me, the most important is "don't modify the value of input parameters."
Thanks to dasblinkenlight, D Stanley, and Lucas Trzesniewski for their help.

Related

Removerange method after the use of take method from the list<T> has unexpected behavior

Does anyone knows why the length after leaving the while loop body in the below code is zero? Please do not provide me chunking algorithm I am already aware of several such algorithms and I know how to solve this problem. My question is only about the wired behavior of RemoverRange or perhaps Take.
static void Main(string[] args)
{
var list = new List<int>();
for (int i = 0; i < 1000; i++)
{
list.Add(i);
}
var chunks = new List<IEnumerable<int>>();
while (list.Count > 0)
{
chunks.Add(list.Take(10));
list.RemoveRange(0, 10);
}
int length = chunks.ToList()[0].Count();
}
At the following line:
var chunks = new List<IEnumerable<int>>();
you create a list, whose Count is 0, since there aren't any items in the list. Then at the foreach statement at each step you add list.Take(10) and after this you Remove the first 10 items from list. The important this here is to realize that list.Take(10) is lazy evaluated (more often you will hear the term deferred execution). The time that this would first be evaluated is at the following line:
int length = chunks.ToList()[0].Count();
At this line list.Take(10) would be evaluated and since you have removed all the items form the list, there aren't any elements in list. For this reason, list.Take(10) return an empty sequence and consequently the chunks.ToList()[0] would be an empty list.
Update deferred execution explanation:
Let that we have the following list:
var list = new List<int> {1, 2, 3, 4, 5 };
var firstThree = list.Take(3);
The variable firstThree holds a reference to an enumerator - specifically an IEnumerable<int>. It does not hold an array or a list of 1,2,3. The first time you will use this iterator you will start to "fetch" data from the list.
For instance you could call the ToList or ToArray methods:
var firstThreeList = firstThree.ToList();
var firstThreeArray = firstThree.ToArray();
Both the above calls would put the iterator in action - in general terms would force the execution of your query in LINQ -. At this very moment, you will traverse the list and you will fetch the first three items.
That being said, it is clear that if in the meanwhile you have modified the list by removing all the numbers from iist, there wouldn't be any elements in the list and you will get nothing.
As a test I would suggest you run the above code once and then run it again but before ToList and ToArray calls to make this call:
list.RemoveAll(x => true);
You will notice now that both firstThreeArray and firstThreeList are empty.
The line
chunks.Add(list.Take(10));
does not actually take 10 items from list, it only tells to take 10 when chunks[i] is first referenced (through e.g. Count()). Since you are altering the list, the reference points to an empty list. You can force the evaluation of the list using
chunks.Add(list.Take(10).ToList());
The problem you're experiencing is that LINQ is kind of like a view. It only actually iterates through the collection when you call a method that requires it to produce a specific value (First(), Last(), Count(), etc.). So the list is only evaluated at the point where you call one of these methods.
chunks.Add(list.Take(10));
This code effectively says "take a reference to list, and when somebody iterates you, only go as far as the first 10 items". To resolve this, you can convert that small section to a list (evaluate those 10 items, and create a new list from them):
chunks.Add(list.Take(10).ToList());
Consider this code:
List<string> names = new List<string>() { "a", "b", "c" };
IEnumerable<string> skip2 = names.Skip(2);
Console.WriteLine(string.Join(", ", skip2)); // "c"
names.Add("d");
names.Add("e");
Console.WriteLine(string.Join(", ", skip2)); // "c, d, e"
Because you use the iterator (IEnumerable<string> skip2) each time you call string.Join(", ", skip2) it will iterate through the list each time, even if the list has changed.
As such, you will get "c" on the first run, and "c, d, e" on the second run.
In fact, this would be perfectly valid (although harder to read):
List<int> list = new List<int>();
IEnumerable<int> values = list.DefaultIfEmpty(0);
list.Add(5);
list.Add(10);
list.Add(15);
Console.WriteLine(values.Average()); // 10
There is no weird behaviour at all. That is exactly the expected behaviour for IEnumerable. What you should keep in mind is that IEnumerable is lazily evaluated, which means it is evaluated when is enumerated, i.e when you actually access the said IEnumerable. What you are doing is basically→
①Get the reference to list object
②Prepare to take the first 10 elements of the said list object, do not yet evaluate!
③Add the not yet evaluated object, in this case→(LINQ.Take(10)) into chunks list.
④Remove the first 10 elements of list
⑤Rinse and Repeat until there are no more items in list
⑥Create a list, which is all made up not yet evaluated items of list.Take(10).
⑦You take the first element of the said chunks, which is not yet evaluate but is a reference to the first 10 elements of list, which is empty!!!
⑧You call Count on IEnumerable instance, which finally evaluates the first ten elements of an enmpy list

Resharper, linq within foreach loop

Resharper is suggesting to use the top example, over the bottom example. However I am under the impression that a new list of items will be created first, and thus all of the _executeFuncs will be run before the runstoredprocedure is called.
This would normally not be an issue, however exceptions are prone to occur and if my hypothesis is correct then my database will not be update despite the functions having been ran??
foreach (var result in rows.Select(row => _executeFunc(row)))
{
RunStoredProcedure(result)
}
Or
foreach(var row in rows)
{
var result = _executeFunc(row);
RunStoredProcedure(result);
}
The statements are, in this case, semantically the same as Select (and linq in general) uses deferred execution of delegates. It won't run any declared queries until the result is being materialised, and depending on how you write that query it will do it in proper sequence.
A very simple example to show that:
var list = new List<string>{"hello", "world", "example"};
Func<string, string> func = (s) => {
Console.WriteLine(s);
return s.ToUpper();
};
foreach(var item in list.Select(i => func(i)))
{
Console.WriteLine(item);
}
results in
hello
HELLO
world
WORLD
example
EXAMPLE
In your first example, _executeFunc(row) will NOT be called first for each item in rows before your foreach loop begins. LINQ will defer execution. See This answer for more details.
The order of events will be:
Evaluate the first item in rows
Call executeFunc(row) on that item
Call RunStoredProcedure(result)
Repeat with the next item in rows
Now, if your code were something like this:
foreach (var result in rows.Select(row => _executeFunc(row)).ToList())
{
RunStoredProcedure(result)
}
Then it WOULD run the LINQ .Select first for every item in rows because the .ToList() causes the collection to be enumerated.
In the top example, using Select will project the rows, by yielding them one by one.
So
foreach (var result in rows.Select(row => _executeFunc(row)))
is basically the same as
foreach(var row in rows)
Thus Select is doing something like this
for each row in source
result = _executeFunc(row)
yield result
That yield is passing each row back one by one (it's a bit more complicated than that, but this explanation should suffice for now).
If you did this instead
foreach (var result in rows.Select(row => _executeFunc(row)).ToList())
Calling ToList() will return a List of rows immediately, and that means _executeFunc() will indeed be called for every row, before you've had a chance to call RunStoredProcedure().
Thus what Resharper is suggesting is valid. To be fair, I'm sure the Jetbrains devs know what they are doing :)
Select uses deferred execution. This means that it will, in order:
take an item from rows
call _executeFunc on it
call RunStoredProcedure on the result of _executeFunc
And then it will do the same for the next item, until all the list has been processed.
The execution will be deferred meaning they will have the same exec

Using Foreach method to perform an OR or AND Operation

I have the following piece of code which uses the foreach iterator.
foreach (var item in daysOfWeeksList)
{
daysOper |= item;
}
daysOfWeeksList is a list. I want to OR each item in the list and process the result?
This daysOfWeeksList is a
List<int> daysOfWeeksList
Say I wan to do something like this. The dosomething I want to do is the OR operation.
list.ForEach( item =>
{
item.DoSomething();
} );
How would you go about this using an foreach method available as part of the List collection? I got plenty of examples for this for 2 operands but not for a single operand.
Assuming daysOper starts as 0, I wouldn't use ForEach at all - I'd use Aggregate from LINQ:
var daysOper = daysOfWeekList.Aggregate((current, next) => current | next);
In other words, keep a running "current" value, and keep OR-ing it with the next value each time. (The result of one iteration will be used as the "current" value for the next iteration.)
In general, you want to use the Aggregate method for stuff like this where the standard aggragetors, like Sum don't fit.
(Edit: I assumed that the OP was doing the OR operation over a List. So I edited the below paragraph)
However, if DaysOfWeekList is a List, then have the opportunity to optimize performance by stopping at the first instance of "true". The Any method does this.
var result = daysOfWeekList.Any(daysOpr=>daysOpr);

Iterator issue on yield IEnumerable

I wrote a program designed to create a randomish list of numbers from a given starting point. It was a quick a dirty thing but I found an interesting effect when playing with it that I don't quite understand.
void Main()
{
List<int> foo = new List<int>(){1,2,3};
IEnumerable<int> bar = GetNumbers(foo);
for (int i = 1; i < 3; i++)
{
foo = new List<int>(){1,2,3};
var wibble = GetNumbers(foo);
bar = bar.Concat(wibble);
}
Iterate(bar);
Iterate(bar);
}
public void Iterate(IEnumerable<int> numbers)
{
Console.WriteLine("iterating");
foreach(int number in numbers)
{
Console.WriteLine(number);
}
}
public IEnumerable<int> GetNumbers(List<int> input)
{
//This function originally did more but this is a cutdown version for testing.
while (input.Count>0)
{
int returnvalue = input[0];
input.Remove(input[0]);
yield return returnvalue;
}
}
The output of runing this is:
iterating
1
2
3
1
2
3
1
2
3
iterating
That is to say the second time I iterate through bar immediately after it is empty.
I assume this is something to do with the fact that the first time I iterate that it empties the lists that are being used to generate the list and subsequently it is using these same lists that are now empty to iterate.
My confusion is on why this is happening? Why do my IEnumerables not start from their default state each time I enumerate over them? Can somebody explain what exactly I'm doing here?
And to be clear I know that I can solve this problem by adding a .ToList() to my call to GetNumbers() which forces immediate evaluation and storage of the results.
Your iterator does start from its initial state. However, it modifies the list it's reading from, and once the list is cleared, your iterator doesn't have anything left to do. Basically, consider
var list = new List<int> { 1, 2, 3 };
var enumerable = list.Where(i => i != 2);
foreach (var item in enumerable)
Console.WriteLine(item);
list.Clear();
foreach (var item in enumerable)
Console.WriteLine(item);
enumerable doesn't get changed by list.Clear();, but the results it gives do.
Your observation can be reproduced with this shorter version of the main method:
void Main()
{
List<int> foo = new List<int>(){1,2,3};
IEnumerable<int> bar = GetNumbers(foo);
Console.WriteLine(foo.Count); // prints 3
Iterate(bar);
Console.WriteLine(foo.Count); // prints 0
Iterate(bar);
}
What happens is the following:
When you call GetNumbers it isn't really being executed. It will only be executed when you iterate over the result. You can verify this by putting Console.WriteLine(foo.Count); between the call to GetNumbers and Iterate.
On the first call to Iterate, GetNumbers is executed and empties foo.
On the second call to Iterate, GetNumbers is executed again, but now foo is empty, so there is nothing left to return.
Well, the lazy evaluation is what hit you. You see, when you create a yield return-style method, it's not executed immediately upon call. It'll be however executed as soon as you iterate over the sequence.
So, this means that the list won't be cleared during GetNumbers, but only during Iterate. In fact, the whole body of the function GetNumbers will be executed only during Iterate.
You problem is that you made your IEnumersbles depend not only on inner state, but on outer state as well. That outer state is the content of foo lists.
So, the all the lists are filled until you Iterate the first time. (The IEnumerable created by GetNumbers holds a reference to them, so the fact that you overwrite foo doesn't matter.) All the three are emptied during the first Iterate. Next, the next iteration starts with the same inner state, but changed outer state, giving different result.
I'd like to notice, that mutation and depending on outer state is generally frowned upon in functional programming style. The LINQ is actually a step toward functional programming, so it's a good idea to follow the FP's rules. So you could do better with just not removing the items from input in GetNumbers.

in foreach loop, should I expect an error since query collection has changed?

For example
var query = myDic.Where(x => !blacklist.Contains(x.Key));
foreach (var item in query)
{
if (condition)
blacklist.Add(item.key+1); //key is int type
ret.add(item);
}
return ret;
would this code be valid? and how do I improve it?
Updated
i am expecting my blacklist.add(item.key+1) would result in smaller ret then otherwise. The ToList() approach won't achieve my intention in this sense.
is there any other better ideas, correct and unambiguous.
That is perfectly safe to do and there shouldn't be any problems as you're not directly modifying the collection that you are iterating over. Though you are making other changes that affects where clause, it's not going to blow up on you.
The query (as written) is lazily evaluated so blacklist is updated as you iterate through the collection and all following iterations will see any newly added items in the list as it is iterated.
The above code is effectively the same as this:
foreach (var item in myDic)
{
if (!blacklist.Contains(item.Key))
{
if (condition)
blacklist.Add(item.key + 1);
}
}
So what you should get out of this is that as long as you are not directly modifying the collection that you are iterating over (the item after in in the foreach loop), what you are doing is safe.
If you're still not convinced, consider this and what would be written out to the console:
var blacklist = new HashSet<int>(Enumerable.Range(3, 100));
var query = Enumerable.Range(2, 98).Where(i => !blacklist.Contains(i));
foreach (var item in query)
{
Console.WriteLine(item);
if ((item % 2) == 0)
{
var value = 2 * item;
blacklist.Remove(value);
}
}
Yes. Changing a collections internal objects is strictly prohibited when iterating over a collection.
UPDATE
I initially made this a comment, but here is a further bit of information:
I should note that my knowledge comes from experience and articles I've read a long time ago. There is a chance that you can execute the code above because (I believe) the query contains references to the selected object within blacklist. blacklist might be able to change, but not query. If you were strictly iterating over blacklist, you would not be able to add to the blacklist collection.
Your code as presented would not throw an exception. The collection being iterated (myDic) is not the collection being modified (blacklist or ret).
What will happen is that each iteration of the loop will evaluate the current item against the query predicate, which would inspect the blacklist collection to see if it contains the current item's key. This is lazily evaluated, so a change to blacklist in one iteration will potentially impact subsequent iterations, but it will not be an error. (blacklist is fully evaluated upon each iteration, its enumerator is not being held.)

Categories