The wonders of the yield keyword

The wonders of the yield keyword - c#

Ok, as I was poking around with building a custom enumerator, I had noticed this behavior that concerns the yield
Say you have something like this:
public class EnumeratorExample
{
public static IEnumerable<int> GetSource(int startPoint)
{
int[] values = new int[]{1,2,3,4,5,6,7};
Contract.Invariant(startPoint < values.Length);
bool keepSearching = true;
int index = startPoint;
while(keepSearching)
{
yield return values[index];
//The mind reels here
index ++
keepSearching = index < values.Length;
}
}
}
What makes it possible underneath the compiler's hood to execute the index ++ and the rest of the code in the while loop after you technically do a return from the function?

The compiler rewrites the code into a state machine. The single method you wrote is split up into different parts. Each time you call MoveNext (either implicity or explicitly) the state is advanced and the correct block of code is executed.
Suggested reading if you want to know more details:
The implementation of iterators in C# and its consequences - Raymond Chen
Part 1
Part 2
Part 3
Iterator block implementation details: auto-generated state machines - Jon Skeet
Eric Lippert's blog

The compiler generates a state-machine on your behalf.
From the language specification:
10.14 Iterators
10.14.4 Enumerator objects
When a function member returning an
enumerator interface type is
implemented using an iterator block,
invoking the function member does not
immediately execute the code in the
iterator block. Instead, an enumerator
object is created and returned. This
object encapsulates the code specified
in the iterator block, and execution
of the code in the iterator block
occurs when the enumerator object’s
MoveNext method is invoked. An
enumerator object has the following
characteristics:
• It implements
IEnumerator and IEnumerator, where
T is the yield type of the iterator.
• It implements System.IDisposable.
• It is initialized with a copy of the
argument values (if any) and instance
value passed to the function member.
• It has four potential states,
before, running, suspended, and after,
and is initially in the before state.
An enumerator object is typically an
instance of a compiler-generated
enumerator class that encapsulates the
code in the iterator block and
implements the enumerator interfaces,
but other methods of implementation
are possible. If an enumerator class
is generated by the compiler, that
class will be nested, directly or
indirectly, in the class containing
the function member, it will have
private accessibility, and it will
have a name reserved for compiler use
(§2.4.2).
To get an idea of this, here's how Reflector decompiles your class:
public class EnumeratorExample
{
// Methods
public static IEnumerable<int> GetSource(int startPoint)
{
return new <GetSource>d__0(-2) { <>3__startPoint = startPoint };
}
// Nested Types
[CompilerGenerated]
private sealed class <GetSource>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
// Fields
private int <>1__state;
private int <>2__current;
public int <>3__startPoint;
private int <>l__initialThreadId;
public int <index>5__3;
public bool <keepSearching>5__2;
public int[] <values>5__1;
public int startPoint;
// Methods
[DebuggerHidden]
public <GetSource>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
}
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<values>5__1 = new int[] { 1, 2, 3, 4, 5, 6, 7 };
this.<keepSearching>5__2 = true;
this.<index>5__3 = this.startPoint;
while (this.<keepSearching>5__2)
{
this.<>2__current = this.<values>5__1[this.<index>5__3];
this.<>1__state = 1;
return true;
Label_0073:
this.<>1__state = -1;
this.<index>5__3++;
this.<keepSearching>5__2 = this.<index>5__3 < this.<values>5__1.Length;
}
break;
case 1:
goto Label_0073;
}
return false;
}
[DebuggerHidden]
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
EnumeratorExample.<GetSource>d__0 d__;
if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2))
{
this.<>1__state = 0;
d__ = this;
}
else
{
d__ = new EnumeratorExample.<GetSource>d__0(0);
}
d__.startPoint = this.<>3__startPoint;
return d__;
}
[DebuggerHidden]
IEnumerator IEnumerable.GetEnumerator()
{
return this.System.Collections.Generic.IEnumerable<System.Int32>.GetEnumerator();
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
void IDisposable.Dispose()
{
}
// Properties
int IEnumerator<int>.Current
{
[DebuggerHidden]
get
{
return this.<>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden]
get
{
return this.<>2__current;
}
}
}
}

Yield is magic.
Well, not really. The compiler generates a full class to generate the enumeration that you're doing. It's basically sugar to make your life simpler.
Read this for an intro.
EDIT: Wrong this. Link changed, check again if you have once.

That's one of the most complex parts of the C# compiler. Best read the free sample chapter of Jon Skeet's C# in Depth (or better, get the book and read it :-)
Implementing iterators the easy way
For further explanations see Marc Gravell's answer here:
Can someone demystify the yield keyword?

Here is an excellent blog series (from Microsoft veteran Raymond Chen) that details how yield works:
https://devblogs.microsoft.com/oldnewthing/20080812-00/?p=21273
https://devblogs.microsoft.com/oldnewthing/20080813-00/?p=21253
https://devblogs.microsoft.com/oldnewthing/20080814-00/?p=21243
https://devblogs.microsoft.com/oldnewthing/20080815-00/?p=21223

Related

Operation not valid due to the current state of the object on IEnumerator usage [duplicate]

How exactly is the right way to call IEnumerator.Reset?
The documentation says:
The Reset method is provided for COM interoperability. It does not necessarily need to be implemented; instead, the implementer can simply throw a NotSupportedException.
Okay, so does that mean I'm not supposed to ever call it?
It's so tempting to use exceptions for flow control:
using (enumerator = GetSomeExpensiveEnumerator())
{
while (enumerator.MoveNext()) { ... }
try { enumerator.Reset(); } //Try an inexpensive method
catch (NotSupportedException)
{ enumerator = GetSomeExpensiveEnumerator(); } //Fine, get another one
while (enumerator.MoveNext()) { ... }
}
Is that how we're supposed to use it? Or are we not meant to use it from managed code at all?

never; ultimately this was a mistake. The correct way to iterate a sequence more than once is to call .GetEnumerator() again - i.e. use foreach again. If your data is non-repeatable (or expensive to repeat), buffer it via .ToList() or similar.
It is a formal requirement in the language spec that iterator blocks throw exceptions for this method. As such, you cannot rely on it working. Ever.

I recommend not using it. A lot of modern IEnumerable implementations will just throw an exception.
Getting enumerators is hardly ever "expensive". It is enumerating them all (fully) that can be expensive.

public class PeopleEnum : IEnumerator
{
public Person[] _people;
// Enumerators are positioned before the first element
// until the first MoveNext() call.
int position = -1;
public PeopleEnum(Person[] list)
{
_people = list;
}
public bool MoveNext()
{
position++;
return (position < _people.Length);
}
public void Reset()
{
position = -1;
}
object IEnumerator.Current
{
get
{
return Current;
}
}
public Person Current
{
get
{
try
{
return _people[position];
}
catch (IndexOutOfRangeException)
{
throw new InvalidOperationException();
}
}
}
}

How does IEnumerable<T> work in background

I am wandering about the more in-depth functionality of the IEnumerable<T> interface.
Basically, it works as an intermediary step in execution. For example, if you write:
IEnumerable<int> temp = new int[]{1,2,3}.Select(x => 2*x);
The result of the Select function will not be calculated (enumerated) until something is done with temp to allow it (such as List<int> list = temp.ToList()).
However, what puzzles me is, since IEnumerable<T> is an interface, it cannot, by definition, be instantiated. So, what is the collection the actual items (in the example 2*x items) reside in?
Moreover, if we were to write IEnumerable<int> temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
I cannot seem to find a thorough (more in-depth) explanation as to the actual implementation of this interface and its functionality (for example, if there is an underlying collection, how does the yield keyword work).
Basically, what I am asking for is a more elaborate explanation on the functionality of IEnumerable<T>.

Implementation shouldn't matter. All these (LINQ) methods return IEnumerable<T>, interface members are the only members you can access, and that should be enough to use them.
However, if you really have to know, you can find actual implementations on http://sourceof.net.
Enumerable.cs
But, for some of the methods you won't be able to find explicit class declaration, because some of them use yield return, which means proper class (with state machine) is generated by compiler during compilation. e.g. Enumerable.Repeat is implemented that way:
public static IEnumerable<int> Range(int start, int count) {
long max = ((long)start) + count - 1;
if (count < 0 || max > Int32.MaxValue)
throw Error.ArgumentOutOfRange("count");
return RangeIterator(start, count);
}
static IEnumerable<int> RangeIterator(int start, int count) {
for (int i = 0; i < count; i++)
yield return start + i;
}
You can read more about that on MSDN: Iterators (C# and Visual Basic)

Not all objects that implement IEnumerable defer execution in some way. The API of the interface makes it possible to defer execution, but it doesn't require it. There are likewise implementations that don't defer execution in any way.
So, what is the collection the actual items (in the example 2*x items) reside in?
There is none. Whenever the next value is requested it computes that one value on demand, gives it to the caller, and then forgets the value. It doesn't store it anywhere else.
Moreover, if we were to write IEnumerable<int> temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
There wouldn't be one. It would compute each new value immediately when you ask for the next value and it won't remember it afterward. It only stores enough information to be able to compute the next value, which means it only needs to store the element and the number of values left to yield.
While the actual .NET implementations will use much more concise means of creating such a type, creating an enumerable that defers execution is not particularly hard. Doing so even the long way is more tedious than difficult. You simply compute the next value in the MoveNext method of the iterator. In the example you asked of, Repeat, this is easy as you only need to compute if there is another value, not what it is:
public class Repeater<T> : IEnumerator<T>
{
private int count;
private T element;
public Repeater(T element, int count)
{
this.element = element;
this.count = count;
}
public T Current { get { return element; } }
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose() { }
public bool MoveNext()
{
if (count > 0)
{
count--;
return true;
}
else
return false;
}
public void Reset()
{
throw new NotSupportedException();
}
}
(I've omitted an IEnumerable type that just returns a new instance of this type, or a static Repeat method that creates a new instance of that enumerable. There isn't anything particularly interesting to see there.)
A slightly more interesting example would be something like Count:
public class Counter : IEnumerator<int>
{
private int remaining;
public Counter(int start, int count)
{
Current = start;
this.remaining = count;
}
public int Current { get; private set; }
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose() { }
public bool MoveNext()
{
if (remaining > 0)
{
remaining--;
Current++;
return true;
}
else
return false;
}
public void Reset()
{
throw new NotSupportedException();
}
}
Here we're not only computing if we have another value, but what that next value is, each time a new value is requested of us.

So, what is the collection the actual items (in the example 2*x items) reside in?
It is not residing anywhere. There is code that will produce the individual items "on demand" when you iterate, but the 2*x numbers are not computed upfront. They are also not stored anywhere, unless you call ToList or ToArray.
Moreover, if we were to write IEnumerable temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
The same picture is here: the returned implementation of IEnumerable is not public, and it returns its items on demand, without storing them anywhere.
C# compiler provides a convenient way to implement IEnumerable without defining a class for it. All you need is to declare your method return type as IEnumerable<T>, and use yield return to supply values on as-needed basis.

Java's Enumeration translated to C# IEnumerator

In practice this seems simple but I'm getting really confused about it. Java's enumeration hasMoreElements() and nextElement() methods are related but work differently from C#'s IEnumerator MoveNext() and Current() properties of course. But how would I translate something like this?:
//class declaration, fields constructors, unrelated code etc.
private Vector atomlist = new Vector();
public int getNumberBasis() {
Enumeration basis = this.getBasisEnumeration();
int numberBasis = 0;
while (basis.hasMoreElements()) {
Object temp = basis.nextElement();
numberBasis++;
}
return numberBasis;
}
public Enumeration getBasisEnumeration() {
return new BasisEnumeration(this);
}
private class BasisEnumeration implements Enumeration {
Enumeration atoms;
Enumeration basis;
public BasisEnumeration(Molecule molecule) {
atoms = molecule.getAtomEnumeration();
basis = ((Atom) atoms.nextElement()).getBasisEnumeration();
}
public boolean hasMoreElements() {
return (atoms.hasMoreElements() || basis.hasMoreElements());
}
public Object nextElement() {
if (basis.hasMoreElements())
return basis.nextElement();
else {
basis = ((Atom) atoms.nextElement()).getBasisEnumeration();
return basis.nextElement();
}
}
}
As you can see, the enumration class's methods are overloaded and I don't think replacing hasMoreElements and nextElement with MoveNext and Current everywhere would work... because the basis.nextElement() calls hasMoreElements() again in an if-else statement. If I was to replace hasMoreElements with MoveNext(), the code would advance twice instead of one.

You can indeed implement IEnumerable yourself, but it generally needed only for exercises in internals of C#. You'd probably use either iterator method:
IEnumerable<Atom> GetAtoms()
{
foreach(Atom item in basis)
{
yield return item;
}
foreach(Atom item in atoms)
{
yield return item;
}
}
Or Enumerable.Concat
IEnumerable<Atom> GetAtoms()
{
return basis.Concat(atoms);
}

What are the syntax enabling patterns in C#?

There are several pattern-features of C# language, i.e. classes need not derive from a specific interface; but rather implement a certain pattern in order to partake in some C# syntax/features.
Let's consider an example:
public class MyCollection : IEnumerable
{
public T Add(T name, T name2, ...) { }
public IEnumerator GetEnumerator() { return null; }
}
Here, TYPE is any type. Basically we have a class that implements IEnumerable and has a method named Add() with any number of parameters.
This enables the following declaration of a new MyCollection instance:
new MyCollection{{a1, a2, ...}, {b1, b2, ...} }
Which is equivalent to:
var mc = new MyCollection();
mc.Add(a1, a1, ...);
mc.Add(b1, b2, ...);
Magic! Meanwhile, recently (I believe during the BUILD event) Anders Hejlsberg let slip that the new await/async will be implemented using patterns as well, which lets WinRT get away with returning something other than Task<T>.
So my question is twofold,
What is the pattern Anders was talking about, or did I misunderstand something? The answer should be somewhere between the type WinRT provides, something to the effect of IAsyncFoo and the unpublished specification.
Are there any other such patterns (perhaps already existing) in C#?

The draft specification is published - you can download it from the Visual Studio home page. The pattern for async is the one given in driis's answer - you can also read my Eduasync blog series for more details, with this post being dedicated to the pattern.
Note that this pattern only applies to "what you can await". An async method must return void, Task or Task<T>.
In terms of other patterns in C# beyond the collection initializer you mentioned originally:
foreach can iterate over non-IEnumerable implemenations, so long as the type has a GetEnumerator method returning a type which has MoveNext() and Current members
LINQ query expressions resolve to calls to Select, Where, GroupBy etc.

For async, it works on the awaiter pattern, which I think is described best here, by Stephen Toub:
"The languages support awaiting any instance that exposes the right method (either instance method or extension method): GetAwaiter. A GetAwaiter needs to return a type that itself exposes three members:"
bool IsCompleted { get; }
void OnCompleted(Action continuation);
TResult GetResult(); // TResult can also be void
As an example of this, in the Async CTP, Task’s GetAwaiter method returns a value of type TaskAwaiter:
public struct TaskAwaiter
{
public bool IsCompleted { get; }
public void OnCompleted(Action continuation);
public void GetResult();
}
If you want all the details of async, start reading Jon Skeets posts about async. They go into great detail about the subject.
Besides collection initializers, which is pattern based as you mention, another pattern based feature in C# is LINQ: For the LINQ keywords, all that is required is that overload resolution finds an instance or extension method with the correct name and signature. Have a look at Eric Lipperts article about the subject. Also, foreach is pattern based - Eric also describes the details on this pattern in the linked article.

Another pattern you can use is the using keyword.
If you have a class that implements IDisposable then you can say:
using(Resource myResource = GetResource())
{
}
Which translates to something akin to::
Resource myResource;
try
{
myResource = GetResource();
}
finally
{
var disposable = myResource as IDisposable;
if(disposable != null) disposable.Dispose()
}
While I suppose it is less "magical" than foreach or the query operators it is a relatively nice bit of syntax.
Also a bit more in the same vein you can use the yield return to automatically implement an iterator for you.
public struct SimpleBitVector32 : IEnumerable<bool>
{
public SimpleBitVector32(uint value)
{
this.data = value;
}
private uint data;
public bool this[int offset]
{
get
{
unchecked
{
return (this.data & (1u << offset)) != 0;
}
}
set
{
unchecked
{
this.data = value ? (this.data | (1u << offset)) : (this.data & ~(1u << offset));
}
}
}
public IEnumerator<bool> GetEnumerator()
{
for (int i = 0; i < 32; i++)
{
yield return this[i];
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}

What concrete type does 'yield return' return?

What is the concrete type for this IEnumerable<string>?
private IEnumerable<string> GetIEnumerable()
{
yield return "a";
yield return "a";
yield return "a";
}

It's a compiler-generated type. The compiler generates an IEnumerator<string> implementation that returns three "a" values and an IEnumerable<string> skeleton class that provides one of these in its GetEnumerator method.
The generated code looks something like this*:
// No idea what the naming convention for the generated class is --
// this is really just a shot in the dark.
class GetIEnumerable_Enumerator : IEnumerator<string>
{
int _state;
string _current;
public bool MoveNext()
{
switch (_state++)
{
case 0:
_current = "a";
break;
case 1:
_current = "a";
break;
case 2:
_current = "a";
break;
default:
return false;
}
return true;
}
public string Current
{
get { return _current; }
}
object IEnumerator.Current
{
get { return Current; }
}
void IEnumerator.Reset()
{
// not sure about this one -- never really tried it...
// I'll just guess
_state = 0;
_current = null;
}
}
class GetIEnumerable_Enumerable : IEnumerable<string>
{
public IEnumerator<string> GetEnumerator()
{
return new GetIEnumerable_Enumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Or maybe, as SLaks says in his answer, the two implementations end up in the same class. I wrote this based on my choppy memory of generated code I'd looked at before; really, one class would suffice, as there's no reason the above functionality requires two.
In fact, come to think of it, the two implementations really should fall within a single class, as I just remembered the functions that use yield statements must have a return type of either IEnumerable<T> or IEnumerator<T>.
Anyway, I'll let you perform the code corrections to what I posted mentally.
*This is purely for illustration purposes; I make no claim as to its real accuracy. It's only to demonstrate in a general way how the compiler does what it does, based on the evidence I've seen in my own investigations.

The compiler will automatically generate a class that implements both IEnumerable<T> and IEnumerator<T> (in the same class).
Jon Skeet has a detailed explanation.

The concrete implementation of IEnumerable<string> returned by the method is an anonymous type generated by the compiler
Console.WriteLine(GetIEnumerable().GetType());
Prints :
YourClass+<GetIEnumerable>d__0

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

The wonders of the yield keyword - c#

Yield is magic. Well, not really. The compiler generates a full class to generate the enumeration that you're doing. It's basically sugar to make your life simpler. Read this for an intro. EDIT: Wrong this. Link changed, check again if you have once.

That's one of the most complex parts of the C# compiler. Best read the free sample chapter of Jon Skeet's C# in Depth (or better, get the book and read it :-) Implementing iterators the easy way For further explanations see Marc Gravell's answer here: Can someone demystify the yield keyword?

Related

Operation not valid due to the current state of the object on IEnumerator usage [duplicate]

How does IEnumerable<T> work in background

Java's Enumeration translated to C# IEnumerator

What are the syntax enabling patterns in C#?

What concrete type does 'yield return' return?

Categories

Resources