Lambda capture problem with iterators? - c#

Apologies if this question has been asked already, but suppose we have this code (I've run it with Mono 2.10.2 and compiled with gmcs 2.10.2.0):
using System;
public class App {
public static void Main(string[] args) {
Func<string> f = null;
var strs = new string[]{
"foo",
"bar",
"zar"
};
foreach (var str in strs) {
if ("foo".Equals(str))
f = () => str;
}
Console.WriteLine(f()); // [1]: Prints 'zar'
foreach (var str in strs) {
var localStr = str;
if ("foo".Equals(str))
f = () => localStr;
}
Console.WriteLine(f()); // [2]: Prints 'foo'
{ int i = 0;
for (string str; i < strs.Length; ++i) {
str = strs[i];
if ("foo".Equals(str))
f = () => str;
}}
Console.WriteLine(f()); // [3]: Prints 'zar'
}
}
It seems logical that [1] print the same as [3]. But to be honest, I somehow expected it to print the same as [2]. I somehow believed the implementation of [1] would be closer to [2].
Question: Could anyone please provide a reference to the specification where it tells exactly how the str variable (or perhaps even the iterator) is captured by the lambda in [1].
I guess what I am looking for is the exact implementation of the foreach loop.

You asked for a reference to the specification; the relevant location is section 8.8.4, which states that a "foreach" loop is equivalent to:
V v;
while (e.MoveNext()) {
v = (V)(T)e.Current;
embedded-statement
}
Note that the value v is declared outside the while loop, and therefore there is a single loop variable. That is then closed over by the lambda.
UPDATE
Because so many people run into this problem the C# design and compiler team changed C# 5 to have these semantics:
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
Which then has the expected behaviour -- you close over a different variable every time. Technically that is a breaking change, but the number of people who depend on the weird behaviour you are experiencing is hopefully very small.
Be aware that C# 2, 3, and 4 are now incompatible with C# 5 in this regard. Also note that the change only applies to foreach, not to for loops.
See http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/ for details.
Commenter abergmeier states:
C# is the only language that has this strange behavior.
This statement is categorically false. Consider the following JavaScript:
var funcs = [];
var results = [];
for(prop in { a : 10, b : 20 })
{
funcs.push(function() { return prop; });
results.push(funcs[0]());
}
abergmeier, would you care to take a guess as to what are the contents of results?

The core difference between 1 / 3 and 2 is the lifetime of the variable which is being captured. In 1 and 3 the lambda is capturing the iteration variable str. In both for and foreach loops there is one iteration variable for the lifetime of the loop. When the lambda is executed at the end of the loop it executes with the final value: zar
In 2 you are capturing a local variable who's lifetime is a single iteration of the loop. Hence you capture the value at that time which is "foo"
The best reference I can you you to is Eric's blog post on the subject
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/

The following happens in loop 1 and 3:
The current value is assigned to the variable str. It is always the same variable, just with a different value in each iteration. This variable is captured by the lambda. As the lambda is executed after the loop finishes, it has the value of the last element in your array.
The following happens in loop 2:
The current value is assigned to a new variable localStr. It is always a new variable that gets the value assigned. This new variable is captured by the lambda. Because the next iteration of the loop creates a new variable, the value of the captured variable is not changed and because of that it outputs "foo".

For the people from google
I've fixed lambda bug using this approach:
I have changed this
for(int i=0;i<9;i++)
btn.OnTap += () => { ChangeCurField(i * 2); };
to this
for(int i=0;i<9;i++)
{
int numb = i * 2;
btn.OnTap += () => { ChangeCurField(numb); };
}
This forces "numb" variable to be the only one for the lambda and also makes generate at this moment and not when lambda is called/generated < not sure when it happens.

Related

Why isn't assign on a foreach iteration variable in Array.ForEach an error?

Given an array of int numbers like:
int[] arr = new int[] { 0, 1, 2, 3, 4, 5 };
If we want to increment every number by 1, the best choice would be:
for(int i = 0; i < arr.Length; i++)
{
arr[i]++;
}
If we try to do it using foreach
foreach(int n in arr)
{
n++;
}
as expected, we meet the error:
Cannot assign to 'n' because it is a 'foreach iteration variable'
Why if we use this approach:
Array.ForEach(arr, (n) => {
n++;
});
which is equal to the foreach above, visual studio and compiler aren't going to tell us anything, the code is going to compile and just not producing any result in runtime, neither throw an exception?
foreach(int n in arr)
{
n++;
}
This is a language construct, the compiler knows exactly what a foreach-loop is supposed to do and what nis. It can therefore prevent you from changing the iteration variable n.
Array.ForEach(arr, (n) => {
n++;
});
This is a regular function call passing in a lambda. It is perfectly valid to modify local variables in a function (or lambda), so changing n is okay. While the compiler could warn you that the increment has no effect as it's never been used afterwards, it's valid code, and just because the function is called ForEach and actually does something similar to the foreach-loop doesn't change the fact that this is a regular function and a regular lambda.
As pointed out by #tkausl, n with ForEach is a local variable. Therefore:
static void Main()
{
int[] arr = new int[] { 0, 1, 2, 3, 4, 5 };
Console.WriteLine(string.Join(" ",arr));
Array.ForEach(arr, (n) => {
n++;
});
Console.WriteLine(string.Join(" ",arr));
}
will output:
0 1 2 3 4 5
0 1 2 3 4 5
Meaning you don't change the values of arr.
Array.ForEach is not identical to a foreach-loop. It´s an extension-method which will iterate a collection and performs an action on every of its elements.
Array.ForEach(arr, (n) => {
n++;
});
however won´t modify the actuzal collection, it will just re-assign a new value to n which has no relation to the underlying value in the array, because it´s a value-type which is **copied* to the anonymous method. So whatever you do with the param in your anonymous method isn´t reflected to the ForEach-method and thus has no effect in your array. This is why you can do this.
But even if you had an array of reference-types that would work, because you simply re-assign a new instance to the provided parameter, which again has no effect to the underlying array.
Take a look at this simplified example:
MyClass
{
void ForEach(Action<Item> a)
{
foreach(var e in myList)
Action(e);
}
}
In your case the action looks like this:
x => x++
which simply assigns a new value to x. As x however is passed by value, this won´t have any effect to the calling method and thus to myList.
Both are two different things.
First we need to be clear what we need. If the requirement is to mutate the existing values then you can use for loop as modifying the values while enumerating the collection shouldn't be done that' why you face error for the first foreach loop.
So one approach could be if mutating is the intention:
for(int i=0; i< arr.Length; i++)
{
arr[i] = arr[i] +1;
}
Secondly, If the intention is to get a new collection with the updated values then consider using linq Select method which will return a new collection of int.
var incrementedArray = arr.Select( x=> (x+1));
EDIT:
the key difference is in the first example we are modifying the values of colelction while enumerating it while in lambda syntax foreach a delegate is used which get input as local variable.
The foreach statement executes a statement or a block of statements for each element in an instance of the type that implements the System.Collections.IEnumerable or System.Collections.Generic.IEnumerable<T> interface. You cannot modify iterated value because you are using System.Collections.IEnumberable or System.COllections.Generic.IEnumberable<T> interfaces which support deferred execution.
If you want to modify value you can also use
foreach(ref int n in arr)
{
n++;
}
Updated
The Array.Foreach is a method that performs specified action on each element of the specified array. This function support immediate execution behavior and can be applied to only data that holds in memory. The Array.Foreach method take an array and used For loop to iterate through collection.
foreach and Array.Foreach both looks same but are different in their working.

lifetime of local variable inside an Action in c# [duplicate]

What is a closure? Do we have them in .NET?
If they do exist in .NET, could you please provide a code snippet (preferably in C#) explaining it?
I have an article on this very topic. (It has lots of examples.)
In essence, a closure is a block of code which can be executed at a later time, but which maintains the environment in which it was first created - i.e. it can still use the local variables etc of the method which created it, even after that method has finished executing.
The general feature of closures is implemented in C# by anonymous methods and lambda expressions.
Here's an example using an anonymous method:
using System;
class Test
{
static void Main()
{
Action action = CreateAction();
action();
action();
}
static Action CreateAction()
{
int counter = 0;
return delegate
{
// Yes, it could be done in one statement;
// but it is clearer like this.
counter++;
Console.WriteLine("counter={0}", counter);
};
}
}
Output:
counter=1
counter=2
Here we can see that the action returned by CreateAction still has access to the counter variable, and can indeed increment it, even though CreateAction itself has finished.
If you are interested in seeing how C# implements Closure read "I know the answer (its 42) blog"
The compiler generates a class in the background to encapsulate the anoymous method and the variable j
[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
public <>c__DisplayClass2();
public void <fillFunc>b__0()
{
Console.Write("{0} ", this.j);
}
public int j;
}
for the function:
static void fillFunc(int count) {
for (int i = 0; i < count; i++)
{
int j = i;
funcArr[i] = delegate()
{
Console.Write("{0} ", j);
};
}
}
Turning it into:
private static void fillFunc(int count)
{
for (int i = 0; i < count; i++)
{
Program.<>c__DisplayClass1 class1 = new Program.<>c__DisplayClass1();
class1.j = i;
Program.funcArr[i] = new Func(class1.<fillFunc>b__0);
}
}
Closures are functional values that hold onto variable values from their original scope. C# can use them in the form of anonymous delegates.
For a very simple example, take this C# code:
delegate int testDel();
static void Main(string[] args)
{
int foo = 4;
testDel myClosure = delegate()
{
return foo;
};
int bar = myClosure();
}
At the end of it, bar will be set to 4, and the myClosure delegate can be passed around to be used elsewhere in the program.
Closures can be used for a lot of useful things, like delayed execution or to simplify interfaces - LINQ is mainly built using closures. The most immediate way it comes in handy for most developers is adding event handlers to dynamically created controls - you can use closures to add behavior when the control is instantiated, rather than storing data elsewhere.
Func<int, int> GetMultiplier(int a)
{
return delegate(int b) { return a * b; } ;
}
//...
var fn2 = GetMultiplier(2);
var fn3 = GetMultiplier(3);
Console.WriteLine(fn2(2)); //outputs 4
Console.WriteLine(fn2(3)); //outputs 6
Console.WriteLine(fn3(2)); //outputs 6
Console.WriteLine(fn3(3)); //outputs 9
A closure is an anonymous function passed outside of the function in which it is created.
It maintains any variables from the function in which it is created that it uses.
A closure is when a function is defined inside another function (or method) and it uses the variables from the parent method. This use of variables which are located in a method and wrapped in a function defined within it, is called a closure.
Mark Seemann has some interesting examples of closures in his blog post where he does a parallel between oop and functional programming.
And to make it more detailed
var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);//when this variable
Func<int, string> read = id =>
{
var path = Path.Combine(workingDirectory.FullName, id + ".txt");//is used inside this function
return File.ReadAllText(path);
};//the entire process is called a closure.
Here is a contrived example for C# which I created from similar code in JavaScript:
public delegate T Iterator<T>() where T : class;
public Iterator<T> CreateIterator<T>(IList<T> x) where T : class
{
var i = 0;
return delegate { return (i < x.Count) ? x[i++] : null; };
}
So, here is some code that shows how to use the above code...
var iterator = CreateIterator(new string[3] { "Foo", "Bar", "Baz"});
// So, although CreateIterator() has been called and returned, the variable
// "i" within CreateIterator() will live on because of a closure created
// within that method, so that every time the anonymous delegate returned
// from it is called (by calling iterator()) it's value will increment.
string currentString;
currentString = iterator(); // currentString is now "Foo"
currentString = iterator(); // currentString is now "Bar"
currentString = iterator(); // currentString is now "Baz"
currentString = iterator(); // currentString is now null
Hope that is somewhat helpful.
Closures are chunks of code that reference a variable outside themselves, (from below them on the stack), that might be called or executed later, (like when an event or delegate is defined, and could get called at some indefinite future point in time)... Because the outside variable that the chunk of code references may gone out of scope (and would otherwise have been lost), the fact that it is referenced by the chunk of code (called a closure) tells the runtime to "hold" that variable in scope until it is no longer needed by the closure chunk of code...
Basically closure is a block of code that you can pass as an argument to a function. C# supports closures in form of anonymous delegates.
Here is a simple example:
List.Find method can accept and execute piece of code (closure) to find list's item.
// Passing a block of code as a function argument
List<int> ints = new List<int> {1, 2, 3};
ints.Find(delegate(int value) { return value == 1; });
Using C#3.0 syntax we can write this as:
ints.Find(value => value == 1);
If you write an inline anonymous method (C#2) or (preferably) a Lambda expression (C#3+), an actual method is still being created. If that code is using an outer-scope local variable - you still need to pass that variable to the method somehow.
e.g. take this Linq Where clause (which is a simple extension method which passes a lambda expression):
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
// this is a predicate, i.e. a Func<T, bool> written as a lambda expression
// which is still a method actually being created for you in compile time
{
i++;
return true;
});
if you want to use i in that lambda expression, you have to pass it to that created method.
So the first question that arises is: should it be passed by value or reference?
Pass by reference is (I guess) more preferable as you get read/write access to that variable (and this is what C# does; I guess the team in Microsoft weighed the pros and cons and went with by-reference; According to Jon Skeet's article, Java went with by-value).
But then another question arises: Where to allocate that i?
Should it actually/naturally be allocated on the stack?
Well, if you allocate it on the stack and pass it by reference, there can be situations where it outlives it's own stack frame. Take this example:
static void Main(string[] args)
{
Outlive();
var list = whereItems.ToList();
Console.ReadLine();
}
static IEnumerable<string> whereItems;
static void Outlive()
{
var i = 0;
var items = new List<string>
{
"Hello","World"
};
whereItems = items.Where(x =>
{
i++;
Console.WriteLine(i);
return true;
});
}
The lambda expression (in the Where clause) again creates a method which refers to an i. If i is allocated on the stack of Outlive, then by the time you enumerate the whereItems, the i used in the generated method will point to the i of Outlive, i.e. to a place in the stack that is no longer accessible.
Ok, so we need it on the heap then.
So what the C# compiler does to support this inline anonymous/lambda, is use what is called "Closures": It creates a class on the Heap called (rather poorly) DisplayClass which has a field containing the i, and the Function that actually uses it.
Something that would be equivalent to this (you can see the IL generated using ILSpy or ILDASM):
class <>c_DisplayClass1
{
public int i;
public bool <GetFunc>b__0()
{
this.i++;
Console.WriteLine(i);
return true;
}
}
It instantiates that class in your local scope, and replaces any code relating to i or the lambda expression with that closure instance. So - anytime you are using the i in your "local scope" code where i was defined, you are actually using that DisplayClass instance field.
So if I would change the "local" i in the main method, it will actually change _DisplayClass.i ;
i.e.
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
{
i++;
return true;
});
filtered.ToList(); // will enumerate filtered, i = 2
i = 10; // i will be overwriten with 10
filtered.ToList(); // will enumerate filtered again, i = 12
Console.WriteLine(i); // should print out 12
it will print out 12, as "i = 10" goes to that dispalyclass field and changes it just before the 2nd enumeration.
A good source on the topic is this Bart De Smet Pluralsight module (requires registration) (also ignore his erroneous use of the term "Hoisting" - what (I think) he means is that the local variable (i.e. i) is changed to refer to the the new DisplayClass field).
In other news, there seems to be some misconception that "Closures" are related to loops - as I understand "Closures" are NOT a concept related to loops, but rather to anonymous methods / lambda expressions use of local scoped variables - although some trick questions use loops to demonstrate it.
A closure aims to simplify functional thinking, and it allows the runtime to manage
state, releasing extra complexity for the developer. A closure is a first-class function
with free variables that are bound in the lexical environment. Behind these buzzwords
hides a simple concept: closures are a more convenient way to give functions access
to local state and to pass data into background operations. They are special functions
that carry an implicit binding to all the nonlocal variables (also called free variables or
up-values) referenced. Moreover, a closure allows a function to access one or more nonlocal variables even when invoked outside its immediate lexical scope, and the body
of this special function can transport these free variables as a single entity, defined in
its enclosing scope. More importantly, a closure encapsulates behavior and passes it
around like any other object, granting access to the context in which the closure was
created, reading, and updating these values.
Just out of the blue,a simple and more understanding answer from the book C# 7.0 nutshell.
Pre-requisit you should know :A lambda expression can reference the local variables and parameters of the method
in which it’s defined (outer variables).
static void Main()
{
int factor = 2;
//Here factor is the variable that takes part in lambda expression.
Func<int, int> multiplier = n => n * factor;
Console.WriteLine (multiplier (3)); // 6
}
Real part:Outer variables referenced by a lambda expression are called captured variables. A lambda expression that captures variables is called a closure.
Last Point to be noted:Captured variables are evaluated when the delegate is actually invoked, not when the variables were captured:
int factor = 2;
Func<int, int> multiplier = n => n * factor;
factor = 10;
Console.WriteLine (multiplier (3)); // 30
A closure is a function, defined within a function, that can access the local variables of it as well as its parent.
public string GetByName(string name)
{
List<things> theThings = new List<things>();
return theThings.Find<things>(t => t.Name == name)[0];
}
so the function inside the find method.
t => t.Name == name
can access the variables inside its scope, t, and the variable name which is in its parents scope. Even though it is executed by the find method as a delegate, from another scope all together.

C# Closure is not working as expected

I can't understand quite clearly the difference between two blocks of code. Consider there is a program
class Program
{
static void Main(string[] args)
{
List<Number> numbers = new List<Number>
{
new Number(1),
new Number(2),
new Number(3)
};
List<Action> actions = new List<Action>();
foreach (Number numb in numbers)
{
actions.Add(() => WriteNumber(numb));
}
Number number = null;
IEnumerator<Number> enumerator = numbers.GetEnumerator();
while (enumerator.MoveNext())
{
number = enumerator.Current;
actions.Add(() => WriteNumber(number));
}
foreach (Action action in actions)
{
action();
}
Console.ReadKey();
}
public static void WriteNumber(Number num)
{
Console.WriteLine(num.Value);
}
public class Number
{
public int Value;
public Number(int i)
{
this.Value = i;
}
}
}
The output is
1
2
3
3
3
3
These two blocks of code should work identically. But you can see that the closure is not working for the first loop. What am i missing?
Thanks in advance.
You declare the number variable outside of your while loop. For each number you store the reference of it in your number variable - every time overwriting the last value.
You should just move the declaration inside the while-loop, so you have a new variable for each of your numbers.
IEnumerator<Number> enumerator = numbers.GetEnumerator();
while (enumerator.MoveNext())
{
Number number = enumerator.Current;
actions.Add(() => WriteNumber(number));
}
These two blocks of code should work identically.
No they shouldn't - at least in C# 5. In C# 3 and 4 they would, in fact.
But in the foreach loop, in C# 5, you have one variable per iteration of the loop. Your lambda expression captures that variable. Subsequent iterations of the loop create different variables which don't affect the previously-captured variable.
In the while loop, you have one variable which all the iterations capture. Changes to that variable will be seen in all of the delegates that captured it. You can see this by adding this line after your while loop:
number = new Number(999);
Then your output would be
1
2
3
999
999
999
Now in C# 3 and 4, the foreach specification was basically broken by design - it would capture a single variable across all iterations. This was then fixed in C# 5 to use a separate variable per iteration, which is basically what you always want with that sort of code.
In your loop:
Number number = null;
IEnumerator<Number> enumerator = numbers.GetEnumerator();
while (enumerator.MoveNext())
{
number = enumerator.Current;
actions.Add(() => WriteNumber(number));
}
number is declared outside of the loop scope. So when it gets set to the next current iterator, all your action refernces to number also get updated to the latest. So when you run each action, they will all use the last number.
Thanks for all your answers. But I think I was misunderstood. I WANT the closue to work. That's why i set the loop variable out of scope. The question is: Why does not it work in the first case? I forgot to mention that I use C# 3.5 (not C# 5.0). So the soop variable should be defined out of scope and two code blocks shoul work identically.

Should I assign parameter values to local variables first instead of using them directly?

Is there any reason to assign parameter values to local variables inside a method in order to use those values without changing them? I.e. like the following:
private void MyMethod(string path)
{
string myPath = path;
StreamReader mystream = new StreamReader(myPath);
...
}
Or can I always put it like this (and the code above is redundant and just not clean):
private void MyMethod(string path)
{
StreamReader mystream = new StreamReader(path);
...
}
I know it works both ways, but I'd like to be sure there isn't anything I missed in my understanding.
The only time you need to do this (assign locally) is if you are in a foreach loop or using Linq. Otherwise you can run into issues with modified closures.
Here is a snippet from an MSDN blog (All content below is from the link).
http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
But I'm getting ahead of myself. What's the output of this fragment?
var values = new List<int>() { 100, 110, 120 };
var funcs = new List<Func<int>>();
foreach(var v in values)
funcs.Add( ()=>v );
foreach(var f in funcs)
Console.WriteLine(f());
Most people expect it to be 100 / 110 / 120. It is in fact 120 / 120 / 120. Why?
Because ()=>v means "return the current value of variable v", not "return the value v was back when the delegate was created". Closures close over variables, not over values. And when the methods run, clearly the last value that was assigned to v was 120, so it still has that value.
This is very confusing. The correct way to write the code is:
foreach(var v in values)
{
var v2 = v;
funcs.Add( ()=>v2 );
}
Now what happens? Every time we re-start the loop body, we logically create a fresh new variable v2. Each closure is closed over a different v2, which is only assigned to once, so it always keeps the correct value.
Basically, the problem arises because we specify that the foreach loop is a syntactic sugar for
{
IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();
try
{
int m; // OUTSIDE THE ACTUAL LOOP
while(e.MoveNext())
{
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
}
finally
{
if (e != null) ((IDisposable)e).Dispose();
}
}
If we specified that the expansion was
try
{
while(e.MoveNext())
{
int m; // INSIDE
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
then the code would behave as expected.
It's exactly the same thing, the only difference is that in the first case you make a copy of the reference (which is destroyed anyway when the method gets out of scope, which happens when the execution ends).
For better readability, stick with the second case.
I prefer the second option. It makes no sense to create a new variable with the parameter. Also, from a reading perspective, it makes more sense to create a stream from a path (the one you received) instead of instantiating a "myPath" variable.

Is there a reason for C#'s reuse of the variable in a foreach?

When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:
foreach (var s in strings)
{
query = query.Where(i => i.Prop == s); // access to modified closure
...
}
Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.
As explained here, this happens because the s variable declared in foreach loop above is translated like this in the compiler:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
instead of like this:
while (enumerator.MoveNext())
{
string s;
s = enumerator.Current;
...
}
As pointed out here, there are no performance advantages to declaring a variable outside the loop, and under normal circumstances the only reason I can think of for doing this is if you plan to use the variable outside the scope of the loop:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
var finalString = s;
However variables defined in a foreach loop cannot be used outside the loop:
foreach(string s in strings)
{
}
var finalString = s; // won't work: you're outside the scope.
So the compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable, or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Your criticism is entirely justified.
I discuss this problem in detail here:
Closing over the loop variable considered harmful
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.
I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.
The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.
What you are asking is thoroughly covered by Eric Lippert in his blog post Closing over the loop variable considered harmful and its sequel.
For me, the most convincing argument is that having new variable in each iteration would be inconsistent with for(;;) style loop. Would you expect to have a new int i in each iteration of for (int i = 0; i < 10; i++)?
The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:
foreach (var s in strings)
{
var s_for_closure = s;
query = query.Where(i => i.Prop == s_for_closure); // access to modified closure
My blog post about this issue: Closure over foreach variable in C#.
Having been bitten by this, I have a habit of including locally defined variables in the innermost scope which I use to transfer to any closure. In your example:
foreach (var s in strings)
query = query.Where(i => i.Prop == s); // access to modified closure
I do:
foreach (var s in strings)
{
string search = s;
query = query.Where(i => i.Prop == search); // New definition ensures unique per iteration.
}
Once you have that habit, you can avoid it in the very rare case you actually intended to bind to the outer scopes. To be honest, I don't think I have ever done so.
In C# 5.0, this problem is fixed and you can close over loop variables and get the results you expect.
The language specification says:
8.8.4 The foreach statement
(...)
A foreach statement of the form
foreach (V v in x) embedded-statement
is then expanded to:
{
E e = ((C)(x)).GetEnumerator();
try {
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}
(...)
The placement of v inside the while loop is important for how it is
captured by any anonymous function occurring in the
embedded-statement. For example:
int[] values = { 7, 9, 13 };
Action f = null;
foreach (var value in values)
{
if (f == null) f = () => Console.WriteLine("First value: " + value);
}
f();
If v was declared outside of the while loop, it would be shared
among all iterations, and its value after the for loop would be the
final value, 13, which is what the invocation of f would print.
Instead, because each iteration has its own variable v, the one
captured by f in the first iteration will continue to hold the value
7, which is what will be printed. (Note: earlier versions of C#
declared v outside of the while loop.)

Categories