I want to know everything about the yield statement, in an easy to understand form.
I have read about the yield statement and its ease when implementing the iterator pattern. However, most of it is very dry. I would like to get under the covers and see how Microsoft handles return yield.
Also, when do you use yield break?
yield works by building a state machine internally. It stores the current state of the routine when it exits and resumes from that state next time.
You can use Reflector to see how it's implemented by the compiler.
yield break is used when you want to stop returning results. If you don't have a yield break, the compiler would assume one at the end of the function (just like a return; statement in a normal function)
As Mehrdad says, it builds a state machine.
As well as using Reflector (another excellent suggestion) you might find my article on iterator block implementation useful. It would be relatively simple if it weren't for finally blocks - but they introduce a whole extra dimension of complexity!
Let's rewind a little bit: the yield keyword is translated as many others said to a state machine.
Actually this is not exactly like using a built-in implementation that would be used behind the scenes but rather the compiler rewriting the yield related code to a state machine by implementing of one the relevant interfaces (the return type of the method containing the yield keywords).
A (finite) state machine is just a piece of code that depending on where you are in the code (depending on the previous state, input) goes to another state action, and this is pretty much what is happening when you are using and yield with method return type of IEnumerator<T> / IEnumerator. The yield keyword is what going to create another action to move to the next state from the previous one, hence the state management is created in the MoveNext() implementation.
This is what exactly the C# compiler / Roslyn is going to do: check the presence of a yield keyword plus the kind of return type of the containing method, whether it's a IEnumerator<T>, IEnumerable<T>, IEnumerator or IEnumerable and then create a private class reflecting that method, integrating necessary variables and states.
If you are interested in the details of how the state machine and how the iterations are rewrited by by the compiler, you can check those links out on Github:
IteratorRewriter source code
StateMachineRewriter: the parent class of above source code
Trivia 1: the AsyncRewriter (used when you write async/await code also inherits from StateMachineRewriter since it also leverages a state machine behind.
As mentioned, the state machine is heavily reflected in the bool MoveNext() generated implementation in which there is a switch + sometimes some old fashioned goto based on a state field which represents the different paths of execution to different states in your method.
The code that is generated by the compiler from the user-code does not look that "good", mostly cause the compiler adds some weird prefixes and suffixes here and there
For example, the code:
public class TestClass
{
private int _iAmAHere = 0;
public IEnumerator<int> DoSomething()
{
var start = 1;
var stop = 42;
var breakCondition = 34;
var exceptionCondition = 41;
var multiplier = 2;
// Rest of the code... with some yield keywords somewhere below...
The variables and types related to that piece of code above will after compilation look like:
public class TestClass
{
[CompilerGenerated]
private sealed class <DoSomething>d__1 : IEnumerator<int>, IDisposable, IEnumerator
{
// Always present
private int <>1__state;
private int <>2__current;
// Containing class
public TestClass <>4__this;
private int <start>5__1;
private int <stop>5__2;
private int <breakCondition>5__3;
private int <exceptionCondition>5__4;
private int <multiplier>5__5;
Regarding the state machine itself, let's take a look at a very simple example with a dummy branching for yielding some even / odd stuff.
public class Example
{
public IEnumerator<string> DoSomething()
{
const int start = 1;
const int stop = 42;
for (var index = start; index < stop; index++)
{
yield return index % 2 == 0 ? "even" : "odd";
}
}
}
Will be translated in the MoveNext as:
private bool MoveNext()
{
switch (<>1__state)
{
default:
return false;
case 0:
<>1__state = -1;
<start>5__1 = 1;
<stop>5__2 = 42;
<index>5__3 = <start>5__1;
break;
case 1:
<>1__state = -1;
goto IL_0094;
case 2:
{
<>1__state = -1;
goto IL_0094;
}
IL_0094:
<index>5__3++;
break;
}
if (<index>5__3 < <stop>5__2)
{
if (<index>5__3 % 2 == 0)
{
<>2__current = "even";
<>1__state = 1;
return true;
}
<>2__current = "odd";
<>1__state = 2;
return true;
}
return false;
}
As you can see this implementation is far from being straightforward but it does the job!
Trivia 2: What happens with the IEnumerable / IEnumerable<T> method return type?
Well, instead of just generating a class implementing the IEnumerator<T>, it will, generate a class that implement both IEnumerable<T> as well as the IEnumerator<T> so that the implementation of IEnumerator<T> GetEnumerator() will leverage the same generated class.
Warm reminder about the few interfaces that are implemented automatically when used a yield keyword:
public interface IEnumerable<out T> : IEnumerable
{
new IEnumerator<T> GetEnumerator();
}
public interface IEnumerator<out T> : IDisposable, IEnumerator
{
T Current { get; }
}
public interface IEnumerator
{
bool MoveNext();
object Current { get; }
void Reset();
}
You can also check out this example with different paths / branching and the full implementation by the compiler rewriting.
This has been created with SharpLab, you can play with that tool to try different yield related execution paths and see how the compiler will rewrite them as a state machine in the MoveNext implementation.
About the second part of the question, ie, yield break, it has been answered here
It specifies that an iterator has come to an end. You can think of
yield break as a return statement which does not return a value.
Related
I'm trying to serialize/deserialize an IEnumerator generated from a function using yield.
I would like to serialize the IEnumerator at any iteration, I don't want to force it to generate all of it values.
I know that the yield keyword generate a class behind the scene, and that why I'm using it, to avoid manually writing iterator, and also to make the code cleaner.
My goal is to make a small game engine similar to Nick Gravelyn - The magic of yield, where each GameElement generate an iterator about his behavior, allowing the programmer to easily control timings (because the iterator allow to interrupt and continue a script). I want to try to add a multiplayer layer on top of that, that why I need to serialize/deserialize an IEnumerator.
Forcing the IEnumerator to generate all of his values also force the game to update, this should be avoided.
My first tentative was somethings like :
using System;
using System.Collections.Generic;
using System.Text.Json;
namespace SerializeTest
{
class Program
{
public static IEnumerator<int> CountTo(int end)
{
for(int i = 1; i <= end; i++)
{
Console.WriteLine("i = " + i);
yield return i;
}
}
static void Main(string[] args)
{
IEnumerator<int> e = CountTo(5);
string json = JsonSerializer.Serialize(e);
Console.WriteLine(json);
var f = JsonSerializer.Deserialize<IEnumerator<int>> (json); // crash because interface
while (f.MoveNext());
}
}
}
But deserializing an interface (IEnumerator<int>) is illegal, we need to know the Type of the object behind IEnumerator<int>.
Since the type is generated by the compiler, I try to use reflection over the Deserialize method in order to call it with the correct type generated by the function CountTo() (SerializeTest.Program+<CountTo>d__0 if you are curious)
So my second tentative was somethings like :
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Text.Json;
namespace SerializeTest
{
class Program
{
public static IEnumerator<int> CountTo(int end)
{
for(int i = 1; i <= end; i++)
{
Console.WriteLine("i = " + i);
yield return i;
}
}
public static T Deserialize<T>(string json) => JsonSerializer.Deserialize<T>(json); // still crash
static void Main(string[] args)
{
IEnumerator<int> e = CountTo(5);
string json = JsonSerializer.Serialize(e);
Console.WriteLine(json);
Type eType = e.GetType();
MethodInfo method = typeof(Program).GetMethod("Deserialize");
MethodInfo genericMethod = method.MakeGenericMethod(eType);
object fObj = genericMethod.Invoke(null, new object[] { json });
var f = (IEnumerator<int>)fObj;
while (f.MoveNext()) ;
}
}
}
but it still crash when calling JsonSerializer.Deserialize() even if the type is now correct and not an interface.
(System.InvalidOperationException : 'Each parameter in constructor 'Void .ctor(Int32)' on type 'SerializeTest.Program+<CountTo>d__0' must bind to an object property or field on deserialization. Each parameter name must match with a property or field on the object. The match can be case-insensitive.'
)
The format don't have any importance (json, xml...) as long as it is possible to send it to another computer and deserialize it back.
I naively hope that there is a way to Serialize/Deserialize an IEnumerator because a lot of cool stuff can be done with it, although I have some serious doubts about such a things to be possible.
Thank for reading so far.
I naively hope that there is a way to Serialize/Deserialize an IEnumerator because a lot of cool stuff can be done with it, although I have some serious doubts about such a things to be possible.
Sending code (and enumerator state machine generated by the compiler which you are trying to send is basically a code) is far more complex then just serializing it to json/xml/etc. For example Apache Ignite supports sending code to another nodes including assembly peer loading. One of the starting points for investigation can be here.
As for your attempt, as you should have already seen that serialized version of the state machine contains only one one property - Current: {"Current":0} while the generated class contains some other state data looking something like this:
private sealed class <<<Main>$>g__CountTo|0_0>d : IEnumerator<int>, IEnumerator, IDisposable
{
private int <>1__state;
private int <>2__current;
public int end;
private int <i>5__1;
int IEnumerator<int>.Current
{
[DebuggerHidden]
get
{
return <>2__current;
}
}
// ... rest of the generated code
}
While you can look into writing your own custom converter that will use reflection to actually serialize/deserialize the internal state data (that is possibly not that hard), it can be quite brittle (class names can change, the generated code can change, the code generator code can change, and we have not even started with multiversion client support), in the end those fields a private for a reason, so I suggest looking into some set of messages the multiplayer clients will send to each other and you can turn them into iterators, classes, method calls, etc.
I give silly examples for simplicity.
IEnumerable<T> Silly<T>(this IEnumerable<T> source)
{
foreach(var x in source) yield return x;
}
I know that this will be compiled into a state machine. but its also similar to
IEnumerable<T> Silly<T>(this IEnumerable<T> source)
{
using(var sillier = source.GetEnumerator())
{
while(sillier.MoveNext()) yield return sillier.Current;
}
}
Now consider this usage
list.Silly().Take(2).ToArray();
Here you can see that Silly enumerable may not be fully consumed, but Take(2) it self will be fully consumed.
Question: when dispose is called on Take enumerator will it also call dispose on Silly enumerator and more specifically sillier enumerator?
My guess is, compiler can handle this simple use case because of foreach but what about not so simple use cases?
IEnumerable<T> Silly<T>(this IEnumerable<T> source)
{
using(var sillier = source.GetEnumerator())
{
// move next can be called on different stages.
}
}
Will this ever be a problem? because most enumerators don't use unmanaged resources, but if one does, this can cause memory leaks.
If dispose is not called, How do i make disposable enumerable?
An Idea: there can be a if(disposed) yield break; after every yield return. now dispose method of silly enumerator will just have to set disposed = true and move the enumerator once to dispose all the required stuff.
The C# compiler takes care of a lot for you when it turns your iterator into the real code. For instance, here's the MoveNext which contains the implementation of your second example1:
private bool MoveNext()
{
try
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<sillier>5__1 = this.source.GetEnumerator();
this.<>1__state = -3;
while (this.<sillier>5__1.MoveNext())
{
this.<>2__current = this.<sillier>5__1.Current;
this.<>1__state = 1;
return true;
Label_005A:
this.<>1__state = -3;
}
this.<>m__Finally1();
this.<sillier>5__1 = null;
return false;
case 1:
goto Label_005A;
}
return false;
}
fault
{
this.System.IDisposable.Dispose();
}
}
So, you'll notice that the finally clause from your using isn't there at all, and it's a state machine2 that relies on being in certain good (>= 0) states in order to make further progress forwards. (It's also illegal C#, but hey ho).
Now lets look at its Dispose:
[DebuggerHidden]
void IDisposable.Dispose()
{
switch (this.<>1__state)
{
case -3:
case 1:
try
{
}
finally
{
this.<>m__Finally1();
}
break;
}
}
So we can see the <>m__Finally1 is called here (as well as due to exiting the while loop in MoveNext.
And <>m__Finally1:
private void <>m__Finally1()
{
this.<>1__state = -1;
if (this.<sillier>5__1 != null)
{
this.<sillier>5__1.Dispose();
}
}
So, we can see that sillier was disposed and we moved into a negative state which means that MoveNext doesn't have to do any special work to handle the "we've already been disposed state".
So,
An Idea: there can be a if(disposed) yield break; after every yield return. now dispose method of silly enumerator will just have to set disposed = true and move the enumerator once to dispose all the required stuff.
Is completely unnecessary. Trust the compiler to transform the code so that it does all of the logical things it should - it just runs it's finally clause once, when it's either exhausted the iterator logic or when it's explicitly disposed.
1All code samples produced by .NET Reflector. But it's too good at decompiling these constructs these days so if you go and look at the Silly method itself:
[IteratorStateMachine(typeof(<Silly>d__1)), Extension]
private static IEnumerable<T> Silly<T>(this IEnumerable<T> source)
{
IEnumerator<T> <sillier>5__1;
using (<sillier>5__1 = source.GetEnumerator())
{
while (<sillier>5__1.MoveNext())
{
yield return <sillier>5__1.Current;
}
}
<sillier>5__1 = null;
}
It's managed to hide most details about that state machine away again. You need to chase the type referenced by the IteratorStateMachine attribute to see all of the gritty bits shown above.
2Please also note that the compiler is under no obligations to produce a state machine to allow iterators to work. It's an implementation detail of the current C# compilers. The C# Specification places no restriction on how the compiler transforms the iterator, just on what the effects should be.
I am getting a deadlock when I run IncrementModelClientReOrderCount but the problem goes away when I run IncrementModelClientReOrderCountLOCK.
The difference is a lock() statement.
I has assumed that the use of a ConcurrentDictionary had mitigated against the chance of a deadlock.
Am I using the ConcurrentDictionary incorrectly perhaps:
public ConcurrentDictionary<Connection, ModelClient> ModelClients = new ConcurrentDictionary<Connection, ModelClient>();
public bool IncrementModelClientReOrderCount(Connection mc)
{
ModelClient curValue;
while (ModelClients.TryGetValue(mc, out curValue))
{
ModelClient curValue2 = curValue.Clone() as ModelClient;
curValue2.reOrderCount++;
curValue2.DSP.seen = false;
if (ModelClients.TryUpdate(mc, curValue2, curValue))
return true;
}
return false;
}
public bool IncrementModelClientReOrderCountLOCK(Connection mc)
{
lock (ModelClients)
{
ModelClient curValue;
while (ModelClients.TryGetValue(mc, out curValue))
{
ModelClient curValue2 = curValue.Clone() as ModelClient;
curValue2.reOrderCount++;
curValue2.DSP.seen = false;
if (ModelClients.TryUpdate(mc, curValue2, curValue))
return true;
}
return false;
}
}
public class ModelClient : ICloneable
{
public string Symbol;
public int Amount;
public double Price;
public ModelClient(string Symbol, int Amount, double Price)
{
this.Symbol = Symbol;
this.Amount = Amount;
this.Price = Price;
}
public object Clone() { return this.MemberwiseClone(); }
}
A "thread-safe" collection is thread-safe unto itself--not to all other code that didn't exist when the collection was written. ConncurrentDictionary is also not lock-free--which means it does lock and thus has a potential to block on calls to some of its methods. (e.g. two threads dependent on one another for forward progress both calling a blocking method on ConcurrentDictionary at the same time)
The nature of a lock is such that it guards two pieces of code from executing at the same time and potentially corrupting state--which means there's almost always more than one place in the code that uses lock and thus you can write code that overlaps execution of those two blocks and causes a deadlock.
A thread-safe collection means only that you can use it from multiple threads and it itself will not corrupt its own state (a state you don't have access to and cannot protect yourself). Use of a thread-safe collection doesn't automatically make all of your code thread-safe nor does it free you from having to understand areas of potential deadlock and compensate for them with your own thread-safety primitives.
You haven't provided enough code for anyone to tell exactly how you're getting into a deadlock (or whether it is in fact a livelock). But, TryUpdate can block and if another thread called TryUpdate with the same curValue, I would expect TryUpdate would return false and your code would try all over again (i.e. potential live lock
It appears you have an invariant between when you get the value via TryGetValue and when you update it with TryUpdate. This invariant is unique to your code and you need to guard it. lock is a good start; but you likely need to understand it better before you accept that lock is the best solution.
Yesterday I was giving a talk about the new C# "async" feature, in particular delving into what the generated code looked like, and the GetAwaiter() / BeginAwait() / EndAwait() calls.
We looked in some detail at the state machine generated by the C# compiler, and there were two aspects we couldn't understand:
Why the generated class contains a Dispose() method and a $__disposing variable, which never appear to be used (and the class doesn't implement IDisposable).
Why the internal state variable is set to 0 before any call to EndAwait(), when 0 normally appears to mean "this is the initial entry point".
I suspect the first point could be answered by doing something more interesting within the async method, although if anyone has any further information I'd be glad to hear it. This question is more about the second point, however.
Here's a very simple piece of sample code:
using System.Threading.Tasks;
class Test
{
static async Task<int> Sum(Task<int> t1, Task<int> t2)
{
return await t1 + await t2;
}
}
... and here's the code which gets generated for the MoveNext() method which implements the state machine. This is copied directly from Reflector - I haven't fixed up the unspeakable variable names:
public void MoveNext()
{
try
{
this.$__doFinallyBodies = true;
switch (this.<>1__state)
{
case 1:
break;
case 2:
goto Label_00DA;
case -1:
return;
default:
this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
this.<>1__state = 1;
this.$__doFinallyBodies = false;
if (this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate))
{
return;
}
this.$__doFinallyBodies = true;
break;
}
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
this.<a2>t__$await4 = this.t2.GetAwaiter<int>();
this.<>1__state = 2;
this.$__doFinallyBodies = false;
if (this.<a2>t__$await4.BeginAwait(this.MoveNextDelegate))
{
return;
}
this.$__doFinallyBodies = true;
Label_00DA:
this.<>1__state = 0;
this.<2>t__$await3 = this.<a2>t__$await4.EndAwait();
this.<>1__state = -1;
this.$builder.SetResult(this.<1>t__$await1 + this.<2>t__$await3);
}
catch (Exception exception)
{
this.<>1__state = -1;
this.$builder.SetException(exception);
}
}
It's long, but the important lines for this question are these:
// End of awaiting t1
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
// End of awaiting t2
this.<>1__state = 0;
this.<2>t__$await3 = this.<a2>t__$await4.EndAwait();
In both cases the state is changed again afterwards before it's next obviously observed... so why set it to 0 at all? If MoveNext() were called again at this point (either directly or via Dispose) it would effectively start the async method again, which would be wholly inappropriate as far as I can tell... if and MoveNext() isn't called, the change in state is irrelevant.
Is this simply a side-effect of the compiler reusing iterator block generation code for async, where it may have a more obvious explanation?
Important disclaimer
Obviously this is just a CTP compiler. I fully expect things to change before the final release - and possibly even before the next CTP release. This question is in no way trying to claim this is a flaw in the C# compiler or anything like that. I'm just trying to work out whether there's a subtle reason for this that I've missed :)
Okay, I finally have a real answer. I sort of worked it out on my own, but only after Lucian Wischik from the VB part of the team confirmed that there really is a good reason for it. Many thanks to him - and please visit his blog (on archive.org), which rocks.
The value 0 here is only special because it's not a valid state which you might be in just before the await in a normal case. In particular, it's not a state which the state machine may end up testing for elsewhere. I believe that using any non-positive value would work just as well: -1 isn't used for this as it's logically incorrect, as -1 normally means "finished". I could argue that we're giving an extra meaning to state 0 at the moment, but ultimately it doesn't really matter. The point of this question was finding out why the state is being set at all.
The value is relevant if the await ends in an exception which is caught. We can end up coming back to the same await statement again, but we mustn't be in the state meaning "I'm just about to come back from that await" as otherwise all kinds of code would be skipped. It's simplest to show this with an example. Note that I'm now using the second CTP, so the generated code is slightly different to that in the question.
Here's the async method:
static async Task<int> FooAsync()
{
var t = new SimpleAwaitable();
for (int i = 0; i < 3; i++)
{
try
{
Console.WriteLine("In Try");
return await t;
}
catch (Exception)
{
Console.WriteLine("Trying again...");
}
}
return 0;
}
Conceptually, the SimpleAwaitable can be any awaitable - maybe a task, maybe something else. For the purposes of my tests, it always returns false for IsCompleted, and throws an exception in GetResult.
Here's the generated code for MoveNext:
public void MoveNext()
{
int returnValue;
try
{
int num3 = state;
if (num3 == 1)
{
goto Label_ContinuationPoint;
}
if (state == -1)
{
return;
}
t = new SimpleAwaitable();
i = 0;
Label_ContinuationPoint:
while (i < 3)
{
// Label_ContinuationPoint: should be here
try
{
num3 = state;
if (num3 != 1)
{
Console.WriteLine("In Try");
awaiter = t.GetAwaiter();
if (!awaiter.IsCompleted)
{
state = 1;
awaiter.OnCompleted(MoveNextDelegate);
return;
}
}
else
{
state = 0;
}
int result = awaiter.GetResult();
awaiter = null;
returnValue = result;
goto Label_ReturnStatement;
}
catch (Exception)
{
Console.WriteLine("Trying again...");
}
i++;
}
returnValue = 0;
}
catch (Exception exception)
{
state = -1;
Builder.SetException(exception);
return;
}
Label_ReturnStatement:
state = -1;
Builder.SetResult(returnValue);
}
I had to move Label_ContinuationPoint to make it valid code - otherwise it's not in the scope of the goto statement - but that doesn't affect the answer.
Think about what happens when GetResult throws its exception. We'll go through the catch block, increment i, and then loop round again (assuming i is still less than 3). We're still in whatever state we were before the GetResult call... but when we get inside the try block we must print "In Try" and call GetAwaiter again... and we'll only do that if state isn't 1. Without the state = 0 assignment, it will use the existing awaiter and skip the Console.WriteLine call.
It's a fairly tortuous bit of code to work through, but that just goes to show the kinds of thing that the team has to think about. I'm glad I'm not responsible for implementing this :)
if it was kept at 1 (first case) you would get a call to EndAwait without a call to BeginAwait. If it's kept at 2 (second case) you'd get the same result just on the other awaiter.
I'm guessing that calling the BeginAwait returns false if it has be started already (a guess from my side) and keeps the original value to return at the EndAwait. If that's the case it would work correctly whereas if you set it to -1 you might have an uninitialized this.<1>t__$await1 for the first case.
This however assumes that BeginAwaiter won't actually start the action on any calls after the first and that it will return false in those cases. Starting would of course be unacceptable since it could have side effect or simply give a different result. It also assumpes that the EndAwaiter will always return the same value no matter how many times it's called and that is can be called when BeginAwait returns false (as per the above assumption)
It would seem to be a guard against race conditions
If we inline the statements where movenext is called by a different thread after the state = 0 in questions it woule look something like the below
this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
this.<>1__state = 1;
this.$__doFinallyBodies = false;
this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate)
this.<>1__state = 0;
//second thread
this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
this.<>1__state = 1;
this.$__doFinallyBodies = false;
this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate)
this.$__doFinallyBodies = true;
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
//other thread
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
If the assumptions above are correct the there's some unneeded work done such as get sawiater and reassigning the same value to <1>t__$await1. If the state was kept at 1 then the last part would in stead be:
//second thread
//I suppose this un matched call to EndAwait will fail
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
further if it was set to 2 the state machine would assume it already had gotten the value of the first action which would be untrue and a (potentially) unassigned variable would be used to calculate the result
Could it be something to do with stacked/nested async calls ?..
i.e:
async Task m1()
{
await m2;
}
async Task m2()
{
await m3();
}
async Task m3()
{
Thread.Sleep(10000);
}
Does the movenext delegate get called multiple times in this situation ?
Just a punt really?
Explanation of actual states:
possible states:
0 Initialized (i think so) or waiting for end of operation
>0 just called MoveNext, chosing next state
-1 ended
Is it possible that this implementation just wants to assure that if another Call to MoveNext from whereever happens (while waiting) it will reevaluate the whole state-chain again from the beginning, to reevaluate results which could be in the mean time already outdated?
I'd love to figure it out myself but I was wondering roughly what's the algorithm for converting a function with yield statements into a state machine for an enumerator? For example how does C# turn this:
IEnumerator<string> strings(IEnumerable<string> args)
{ IEnumerator<string> enumerator2 = getAnotherEnumerator();
foreach(var arg in arg)
{ enumerator2.MoveNext();
yield return arg+enumerator.Current;
}
}
into this:
bool MoveNext()
{ switch (this.state)
{
case 0:
this.state = -1;
this.enumerator2 = getAnotherEnumerator();
this.argsEnumerator = this.args.GetEnumerator();
this.state = 1;
while (this.argsEnumerator.MoveNext())
{
this.arg = this.argsEnumerator.Current;
this.enumerator2.MoveNext();
this.current = this.arg + this.enumerator2.Current;
this.state = 2;
return true;
state1:
this.state = 1;
}
this.state = -1;
if (this.argsEnumerator != null) this.argsEnumerator.Dispose();
break;
case 2:
goto state1;
}
return false;
}
Of course the result can be completely different depending on the original code.
The particular code sample you are looking at involves a series of transformations.
Please note that this is an approximate description of the algorithm. The actual names used by the compiler and the exact code it generates may be different. The idea is the same, however.
The first transformation is the "foreach" transformation, which transforms this code:
foreach (var x in y)
{
//body
}
into this code:
var enumerator = y.GetEnumerator();
while (enumerator.MoveNext())
{
var x = enumerator.Current;
//body
}
if (y != null)
{
enumerator.Dispose();
}
The second transformation finds all the yield return statements in the function body, assigns a number to each (a state value), and creates a "goto label" right after the yield.
The third transformation lifts all the local variables and function arguments in the method body into an object called a closure.
Given the code in your example, that would look similar to this:
class ClosureEnumerable : IEnumerable<string>
{
private IEnumerable<string> args;
private ClassType originalThis;
public ClosureEnumerator(ClassType origThis, IEnumerable<string> args)
{
this.args = args;
this.origianlThis = origThis;
}
public IEnumerator<string> GetEnumerator()
{
return new Closure(origThis, args);
}
}
class Closure : IEnumerator<string>
{
public Closure(ClassType originalThis, IEnumerable<string> args)
{
state = 0;
this.args = args;
this.originalThis = originalThis;
}
private IEnumerable<string> args;
private IEnumerator<string> enumerator2;
private IEnumerator<string> argEnumerator;
//- Here ClassType is the type of the object that contained the method
// This may be optimized away if the method does not access any
// class members
private ClassType originalThis;
//This holds the state value.
private int state;
//The current value to return
private string currentValue;
public string Current
{
get
{
return currentValue;
}
}
}
The method body is then moved from the original method to a method inside "Closure" called MoveNext, which returns a bool, and implements IEnumerable.MoveNext.
Any access to any locals is routed through "this", and any access to any class members are routed through this.originalThis.
Any "yield return expr" is translated into:
currentValue = expr;
state = //the state number of the yield statement;
return true;
Any yield break statement is translated into:
state = -1;
return false;
There is an "implicit" yield break statement at the end of the function.
A switch statement is then introduced at the beginning of the procedure that looks at the state number and jumps to the associated label.
The original method is then translated into something like this:
IEnumerator<string> strings(IEnumerable<string> args)
{
return new ClosureEnumerable(this,args);
}
The fact that the state of the method is all pushed into an object and that the MoveNext method uses a switch statement / state variable is what allows the iterator to behave as if control is being passed back to the point immediately after the last "yield return" statement the next time "MoveNext" is called.
It is important to point out, however, that the transformation used by the C# compiler is not the best way to do this. It suffers from poor performance when trying to use "yield" with recursive algorithms. There is a good paper that outlines a better way to do this here:
http://research.microsoft.com/en-us/projects/specsharp/iterators.pdf
It's worth a read if you haven't read it yet.
Just spotted this question - I wrote an article on it recently. I'll have to add the other links mentioned here to the article though...
Raymond Chen answers this here.