I have this string array of names.
string[] someArray = { "Messi", "Ronaldo", "Bale", "Neymar", "Pele" };
foreach (string item in someArray)
{
Console.WriteLine(item);
}
When I run in debug mode with breakpoint on foreach line, and using Step Into, string item is highlighted each time with each pass like it declares it again. I suppose it should highlight item only. Does this mean that with each time item is declared again?
Note: using VS 2012.
The answer here is subtle because the question is (accidentally) unclear. Let me clarify it by breaking it apart into multiple questions.
Is the loop variable logically redeclared every time I go through a foreach loop?
As the other answer correctly notes, in C# 5 and higher, yes. In C# 1 through 4, no.
What is the observable consequence of that change?
In C# 1 there is no way to tell the difference. (And the spec is actually unclear as to where the variable is declared.)
In C# 2 the team added closures in the form of anonymous methods. If there are multiple closures that capture a loop variable in C# 2, 3 and 4 they all capture the same variable, and they all see it mutate as the loop runs. This is almost never what you want.
In C# 5 we took the breaking change to say that a new loop variable is declared every time through the loop, and so each closure captures a different variable and therefore does not observe it changing.
Why does the debugger go back to the loop variable declaration every time I step through the loop?
To inform you that the loop variable is being mutated. If you have
IEnumerable<string> someCollection = whatever;
foreach(string item in someCollection)
{
Console.WriteLine(item);
}
That is logically equivalent to
IEnumerable<string> someCollection = whatever;
{
IEnumerator<string> enumerator = someCollection.GetEnumerator();
try
{
while( enumerator.MoveNext() )
{
string item; // In C# 4.0 and before, this is outside the while
item = enumerator.Current;
{
Console.WriteLine(item);
}
}
}
finally
{
if (enumerator != null)
((IDisposable)enumerator).Dispose();
}
}
If you were debugging around that expanded version of the code, you would see the debugger highlight the invocations of MoveNext and Current. Well, the debugger does the same thing in the compact form; when MoveNext is about to be called the debugger highlights the in, for lack of anything better to highlight. When Current is about to be called it highlights item. Again, what would be the better thing to highlight?
So the fact that the debugger highlights the variable has nothing to do with whether the variable is redeclared?
Correct. It highlights the variable when Current is about to be called, because that will mutate the variable. It mutates the variable regardless of whether it is a new variable, in C# 5, or the same variable as before, in C# 4.
I noticed that you changed my example to use a generic sequence rather than an array. Why did you do that?
Because if the C# compiler knows that the collection in the foreach is an array, it actually just generates a good old fashioned for loop rather than calling MoveNext and Current and all that. Doing so is typically faster and produces less collection pressure, so it's a win all around.
However, the C# team wants the debugging experience to feel the same regardless. So again, the code generator creates stepping points on in -- this time where the generated for's loop variable is mutated -- and again, on the code which mutates the foreach loop variable. That way the debugging experience is consistent.
It depends on which version of C# you are targeting. In C# 5, it's a new variable. Before that, the variable was reused.
See Eric Lippert's blog post here: http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
From the post:
UPDATE: We are taking the breaking change. In C# 5, the loop variable
of a foreach will be logically inside the loop, and therefore closures
will close over a fresh copy of the variable each time. The "for" loop
will not be changed. We return you now to our original article.
Related
It's well known that the mutation of a collection within an iteration loop is not allowed. The runtime will throw an exception when, for instance, an item is removed.
However, today I was surprised to notice that there's no exception if the mutating operation is followed by any exit-loop statement. That is, the loop ends.
//this won't throw!
var coll = new List<int>(new[] { 1, 2, 3 });
foreach (var item in coll)
{
coll.RemoveAt(1);
break;
}
I watched at the framework code, and it's pretty clear that the exception is thrown only when the iterator will moved forward.
My question is: the above "pattern" could be considered an acceptable practice, or is there any sneaky problem on using it?
In the example you gave, and to answer the question "Is it good practice/acceptable to modify a collection during enumeration if you break after the change", it is fine to do this, as long as you are aware of the side-effects.
The primary side effect, and why I would not recommend doing this in the general case, is that the rest of your foreach loop doesn't execute. I would consider that a problem in almost every instance of a foreach that I have used.
In most (if not all) instances where you could get away with this, a simple if check would suffice (as Servy has in his answer), so you may want to look at what other options you have available if you find yourself writing this kind of code a lot.
The most common general solution is to add to a "kill" list, and then remove after your iteration:
List<int> killList = new List<int>();
foreach (int i in coll)
{
if (i < 0)
killList.Add(i);
...
}
foreach (int i in killList)
coll.Remove(i);
There are various ways to make this code shorter, but this is the most explicit way of doing it.
You can also iterate backwards, which won't cause the exception to be thrown. This is a neat workaround, but you may want to add a comment explaining why you are iterating backwards.
So your example can be relied on to work, for starters. Mutating a collection while iterating fails when you go to ask for the next item. Since this provably never asks for another item after it mutates the list, we know that won't happen. Of course, the fact that it works doesn't mean that it's clear, or that it's a good idea to use it.
What this is trying to do is remove the second item if there is an item to remove. It is designed to not break when trying to remove an item from a collection without two items. This is not a well designed way of doing that though; it's confusing to the readers and doesn't effectively convey its intentions. A much clearer method of accomplishing the same goal is something like the following:
if(coll.Count > 1)
coll.RemoveAt(1);
In the more general case, such a Remove in a foreach can only ever be used to remove one item, so for those cases you're better off transforming the forech into an if that validates that there is an item to remove (if needed, as it is here), and then a call to remove that single item (which may involve a query to find the item to remove, instead of using a hard coded index).
I've always been confused about this one. Consider the following loops:
int [] list = new int [] { 1, 2, 3 };
for (int i=0; i < list.Length; i++) { }
foreach (int i in list) { }
while (list.GetEnumerator().MoveNext()) { } // Yes, yes you wouldn't call GetEnumerator with the while. Actually never tried that.
The [list] above is hard-coded. If the list was changed externally while the loop was going through iterations, what would happen?
What if the [list] was a readonly property e.g. int List{get{return(new int [] {1,2,3});}}? Would this upset the loop. If not, would it create a new instance in each iteration?
Well:
The for loop checks against list.Length on each iteration; you don't actually access list within the loop, so the contents would be irrevelant
The foreach loop only uses list to get the iterator; changing it to refer to a different list would make no difference, but if you modified the list itself structurally (e.g. by adding a value, if this were really a List<int> instead of an int[]), that would invalidate the iterator
Your third example as written would go on forever unless the list were cleared, given that it'll get a new iterator each time. If you want a more sensible explanation, post more sensible code
Fundamentally you need to understand the difference between the contents of an array vs changing which object a variable refers to - and then give us a very concrete situation to explain. In general though, a foreach loop only directly touches the source expression once, when it fetches the iterator - whereas a for loop has no magic in it, and how often it's accessed simply depends on the code - both the condition and the "step" part of the for loop are executed on each iteration, so if you refer to the variable in either of those parts, you'll see any changes...
Suppose we have the following code:
void AFunction()
{
foreach(AClass i in AClassCollection)
{
listOfLambdaFunctions.AddLast( () => { PrintLine(i.name); } );
}
}
void Main()
{
AFunction();
foreach( var i in listOfLambdaFunctions)
i();
}
One might think that the above code would out the same as the following:
void Main()
{
foreach(AClass i in AClassCollection)
PrintLine(i.name);
}
However, it doesn't. Instead, it prints the name of the last item in AClassCollection every time.
It appears as if the same item was being used in each lambda function. I suspect there might be some delay from when the lambda was created to when the lambda took a snapshot of the external variables used in it.
Essentially, the lambda is holding a reference to the local variable i, instead of taking a "snapshot" of i's value when the lambda was created.
To test this theory, I tried this code:
string astr = "a string";
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(astr); };
astr = "changed";
fnc();
and, surprise, it outputs changed!
I am using XNA 3.1, and whichever version of C# that comes with it.
My questions are:
What is going on?
Does the lambda function somehow store a 'reference' to the variable or something?
Is there any way around this problem?
This is a modified closure
See: similar questions like Access to Modified Closure
To work around the issue you have to store a copy of the variable inside the scope of the for loop:
foreach(AClass i in AClassCollection)
{
AClass anotherI= i;
listOfLambdaFunctions.AddLast( () => { PrintLine(anotherI.name); } );
}
does the lambda function somehow store a 'reference' to the variable or something?
Close. The lambda function captures the variable itself. There is no need to store a reference to a variable, and in fact, in .NET it is impossible to permanently store a reference to a variable. You just capture the entire variable. You never capture the value of the variable.
Remember, a variable is a storage location. The name "i" refers to a particular storage location, and in your case, it always refers to the same storage location.
Is there anyway around this problem?
Yes. Create a new variable every time through the loop. The closure then captures a different variable every time.
This is one of the most frequently reported problems with C#. We're considering changing the semantics of the loop variable declaration so that a new variable is created every time through the loop.
For more details on this issue see my articles on the subject:
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/
what is going on? does the lambda function somehow store a 'reference' to the variable or something?
Yes exactly that; c# captured variables are to the variable, not the value of the variable. You can usually get around this by introducing a temp variable and binding to that:
string astr = "a string";
var tmp = astr;
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(tmp); };
especially in foreach where this is notorious.
Yes, the lambda stores a reference to the variable (conceptually speaking, anyway).
A very simple workaround is this:
foreach(AClass i in AClassCollection)
{
AClass j = i;
listOfLambdaFunctions.AddLast( () => { PrintLine(j.name); } );
}
In every iteration of the foreach loop, a new j gets created, which the lambda captures.
i on the other hand, is the same variable throughout, but gets updated with every iteration (so all the lambdas end up seeing the last value)
And I agree that this is a bit surprising. :)
I've been caught by this one as well, as said by Calgary Coder, it is a modified closure. I really had trouble spotting them until I got resharper. Since it is one of the warnings that resharper watches for, I am much better at identifying them as I code.
Why is a foreach loop a read only loop? What reasons are there for this?
I'm not sure exactly what you mean by a "readonly loop" but I'm guessing that you want to know why this doesn't compile:
int[] ints = { 1, 2, 3 };
foreach (int x in ints)
{
x = 4;
}
The above code will give the following compile error:
Cannot assign to 'x' because it is a 'foreach iteration variable'
Why is this disallowed? Trying to assigning to it probably wouldn't do what you want - it wouldn't modify the contents of the original collection. This is because the variable x is not a reference to the elements in the list - it is a copy. To avoid people writing buggy code, the compiler disallows this.
I would assume it's how the iterator travels through the list.
Say you have a sorted list:
Alaska
Nebraska
Ohio
In the middle of
foreach(var s in States)
{
}
You do a States.Add("Missouri")
How do you handle that? Do you then jump to Missouri even if you're already past that index.
If, by this, you mean:
Why shouldn't I modify the collection that's being foreach'd over?
There's no surety that the items that you're getting come out in a given order, and that adding an item, or removing an item won't cause the order of items in the collection to change, or even the Enumerator to become invalid.
Imagine if you ran the following code:
var items = GetListOfTOfSomething(); // Returns 10 items
int i = 0;
foreach(vat item in items)
{
i++;
if (i == 5)
{
items.Remove(item);
}
}
As soon as you hit the loop where i is 6 (i.e. after the item is removed) anything could happen. The Enumerator might have been invalidated due to you removing an item, everything might have "shuffled up by one" in the underlying collection causing an item to take the place of the removed one, meaning you "skip" one.
If you meant "why can't I change the value that is provided on each iteration" then, if the collection you're working with contains value types, any changes you make won't be preserved as it's a value you're working with, rather than a reference.
The foreach command uses the IEnumerable interface to loop throught the collection. The interface only defined methods for stepping through a collection and get the current item, there is no methods for updating the collection.
As the interface only defines the minimal methods required to read the collecton in one direction, the interface can be implemented by a wide range of collections.
As you only access a single item at a time, the entire collection doesn't have to exist at the same time. This is for example used by LINQ expressions, where it creates the result on the fly as you read it, instead of first creating the entire result and then let you loop through it.
Not sure what you mean with read-only but I'm guessing that understanding what the foreach loop is under the hood will help. It's syntactic sugar and could also be written something like this:
IEnumerator enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
T element = enumerator.Current;
//body goes here
}
If you change the collection (list) it's getting hard to impossible to figure out how to process the iteration.
Assigning to element (in the foreach version) could be viewed as either trying to assign to enumerator.Current which is read only or trying to change the value of the local holding a ref to enumerator.Current in which case you might as well introduce a local yourself because it no longer has anything to do with the enumerated list anymore.
foreach works with everything implementing the IEnumerable interface. In order to avoid synchronization issues, the enumerable shall never be modified while iterating on it.
The problems arise if you add or remove items in another thread while iterating: depending on where you are you might miss an item or apply your code to an extra item. This is detected by the runtime (in some cases or all???) and throws an exception:
System.InvalidOperationException was unhandled
Message="Collection was modified; enumeration operation may not execute."
foreach tries to get next item on each iteration which can cause trouble if you are modifying it from another thread at the same time.
Today I coded a function that uses two nested foreach loops. After seeing, that it did not work like expected, i debugged it. But I dont see an error, and dont think a simple error can cause the behavior i have noticed.
The part looks like this:
foreach(MyClass cItem in checkedListBoxItemList.Items)
{
foreach(MyClass cActiveItem in ActiveItemList)
{
if (cActiveItem.ID == cItem.ID) /*...check checkbox for item...*/;
}
}
Lets say, checkedListBoxItemList.items holds 4 items of type MyClass, and ActiveItemList is a List< MyClass > with 2 Items.
The debugger jumps into the outer foreach, reaches inner foreach, executes the if 2 times (once per cActiveItem) and reaches the end of the outer foreach.Now, the debugger jumps back to the head of the outer foreach as it should. But instead of starting the second round of the outer foreach, the debugger suddenly jumps into the MyClass.ToString() method.
I can step through this method 4 times (number of items in checkedListBoxItemList.Items)
and then ... nothing. Visual Studio shows me my windows form, and the foreach is not continued.
When changing the code to
int ListCount = checkedListBoxItemList.Items.Count;
for(int i=0; i<ListCount; i++)
{
MyClass cItem = checkedListBoxItemList.Items[i] as MyClass;
foreach(MyClass cActiveItem in ActiveItemList)
{
if (cActiveItem.ID == cItem.ID) /*...check checkbox for item...*/;
}
}
everything works fine and as supposed.
I showed the problem to a collegue, but he also didnt understand, what happened. I dont understand why the debugger jumps into the MyClass.ToString() method. I used F10 to step through, so no need to leave the function. And even, if there is a reason, why isnt the foreach loop continued?
Im using Visual Studio 2010, if this is of any matter.
Please tell me what happened. Thanks.
When iterating a collection (using a foreach), the collection you are iterating is not allowed to change; but when you check the checkbox for the matching item, the collection of the outer loop (checkedListBoxItemList.Items) changes and the consequent error that is thrown is probably swallowed somewhere. That more or less explains why you suddenly go into the ToString method and don't continue the loop.
When you use a forstatement to iterate, you don't have that restriction as there is no reference to the collection at the moment you start iterating.
Hope this explains.