C# "Generator" Method - c#

I come from the world of Python and am trying to create a "generator" method in C#. I'm parsing a file in chunks of a specific buffer size, and only want to read and store the next chunk one at a time and yield it in a foreach loop. Here's what I have so far (simplified proof of concept):
class Page
{
public uint StartOffset { get; set; }
private uint currentOffset = 0;
public Page(MyClass c, uint pageNumber)
{
uint StartOffset = pageNumber * c.myPageSize;
if (StartOffset < c.myLength)
currentOffset = StartOffset;
else
throw new ArgumentOutOfRangeException("Page offset exceeds end of file");
while (currentOffset < c.myLength && currentOffset < (StartOffset + c.myPageSize))
// read data from page and populate members (not shown for MWE purposes)
. . .
}
}
class MyClass
{
public uint myLength { get; set; }
public uint myPageSize { get; set; }
public IEnumerator<Page> GetEnumerator()
{
for (uint i = 1; i < this.myLength; i++)
{
// start count at 1 to skip first page
Page p = new Page(this, i);
try
{
yield return p;
}
catch (ArgumentOutOfRangeException)
{
// end of available pages, how to signal calling foreach loop?
}
}
}
}
I know this is not perfect since it is a minimum working example (I don't allow many of these properties to be set publicly, but for keeping this simple I don't want to type private members and properties).
However, my main question is how do I let the caller looping over MyClass with a foreach statement know that there are no more items left to loop through? Is there an exception I throw to indicate there are no elements left?

As mentioned in the comments, you should use IEnumerable<T> instead of IEnumerator<T>. The enumerator is the technical object that is being used to enumerate over something. That something—in many cases–is an enumerable.
C# has special abilities to deal with enumerables. Most prominently, you can use a foreach loop with an enumerable (but not an enumerator; even though the loop actually uses the enumerator of the enumerable). Also, enumerables allow you to use LINQ which makes it even more easier to consume.
So you should change your class like this:
class MyClass
{
public uint myLength { get; set; }
public uint myPageSize { get; set; }
# note the modified signature
public IEnumerable<Page> GetPages()
{
for (uint i = 1; i < this.myLength; i++)
{
Page p;
try
{
p = new Page(this, i);
}
catch (ArgumentOutOfRangeException)
{
yield break;
}
yield return p;
}
}
}
In the end, this allows you to use it like this:
var obj = new MyClass();
foreach (var page in obj.GetPages())
{
// do whatever
}
// or even using LINQ
var pageOffsets = obj.GetPages().Select(p => p.currentOffset).ToList();
Of course, you should also change the name of the method to something meaningful. If you’re returning pages, GetPages is maybe a good first step in the right direction. The name GetEnumerator is kind of reserved for types implementing IEnumerable, where the GetEnumerator method is supposed to return an enumerator of the collection the object represents.

The two ways to do it is let the code execution reach the end of the GetEnumerator function or put in a yield break; in the code, this would behave the same as a return; in a function that returned void.
From the caller's perceptive the Enumerator returned from GetEnumerator() will start returning false for MoveNext(), that is how they tell that the enumerator is done.
To fix your "Can't yield a value inside the body of a try block with a catch clause" you put the try/catch around the wrong part of the code, the execption will be thrown on the new not the yield return. Your code should look like
public IEnumerator<Page> GetEnumerator()
{
for (uint i = 1; i < this.myLength; i++)
{
// start count at 1 to skip first page
Page p;
try
{
p = new Page(this, i);
}
catch (ArgumentOutOfRangeException)
{
yield break;
}
yield return p;
}
}

Use the yield break; statement to end the sequence that your iterator method is generating.

Related

How to push 0 item to the stack of int?

I have a simple stack implementation. But I cant realize how programmer solve the following problem: It is not possible to push a 0 to the stack. How to do that? I mean how to track is it a 0 value or just end of the stack? Or its not a problem in my implementation?
public class Stack: IStack
{
private int[] s;
private int N = 0;
public Stack(int N)
{
s = new int[N];
}
public void push(int x)
{
s[N++] = x;
if (N >= s.Length)
{
Array.Resize(ref s, s.Length*2);
}
}
public int pop()
{
s[N] = 0;
return s[--N];
}
}
You are already tracking the last element of the stack with N (or rather, N - 1). You don't need to verify whether the element is 0, and your implementation actually doesn't distinguish between zeroes and other numbers.
In the implementation you provided, it is perfectly possible to push a 0 into the stack.
By the way, I would reimplement your pop() method like this:
public int? pop()
{
if (N != 0)
{
return s[--N];
}
else
{
return null;
}
}
This way, it returns null in case the stack is empty.
You should realize that it doesn't matter what the values S[N], S[N+1], ... are since you are only using the values S[0..N-1] for your implementation. You consider the part S[N...] as uninitialized and adding a new element, even 0, causes S[N] to become initialized as the new value.
You can push 0 nothing prevents it. N is equal to number of elements, it's also used to track index of next item to push N == (index of last element + 1). The problem i see is that if you run pop() too many times you will get IndexOutOfRangeException.
You can add IsEmpty property like this:
public bool IsEmpty
{
get { return N < 1; }
}

Buffering a LINQ query

FINAL EDIT:
I've chosen Timothy's answer but if you want a cuter implementation that leverages the C# yield statement check Eamon's answer: https://stackoverflow.com/a/19825659/145757
By default LINQ queries are lazily streamed.
ToArray/ToList give full buffering but first they're eager and secondly it may take quite some time to complete with an infinite sequence.
Is there any way to have a combination of both behaviors : streaming and buffering values on the fly as they are generated, so that the next querying won't trigger the generation of the elements that have already been queried.
Here is a basic use-case:
static IEnumerable<int> Numbers
{
get
{
int i = -1;
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
static void Main(string[] args)
{
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0);
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
Console.WriteLine("==========");
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
}
Here is the output:
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
==========
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
The generation code is triggered 22 times.
I'd like it to be triggered 11 times, the first time the enumerable is iterated.
Then the second iteration would benefit from the already generated values.
It would be something like:
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer();
For those familiar with Rx it's a behavior similar to a ReplaySubject.
IEnumerable<T>.Buffer() extension method
public static EnumerableExtensions
{
public static BufferEnumerable<T> Buffer(this IEnumerable<T> source)
{
return new BufferEnumerable<T>(source);
}
}
public class BufferEnumerable<T> : IEnumerable<T>, IDisposable
{
IEnumerator<T> source;
List<T> buffer;
public BufferEnumerable(IEnumerable<T> source)
{
this.source = source.GetEnumerator();
this.buffer = new List<T>();
}
public IEnumerator<T> GetEnumerator()
{
return new BufferEnumerator<T>(source, buffer);
}
public void Dispose()
{
source.Dispose()
}
}
public class BufferEnumerator<T> : IEnumerator<T>
{
IEnumerator<T> source;
List<T> buffer;
int i = -1;
public BufferEnumerator(IEnumerator<T> source, List<T> buffer)
{
this.source = source;
this.buffer = buffer;
}
public T Current
{
get { return buffer[i]; }
}
public bool MoveNext()
{
i++;
if (i < buffer.Count)
return true;
if (!source.MoveNext())
return false;
buffer.Add(source.Current);
return true;
}
public void Reset()
{
i = -1;
}
public void Dispose()
{
}
}
Usage
using (var evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer())
{
...
}
Comments
The key point here is that the IEnumerable<T> source given as input to the Buffer method only has GetEnumerator called once, regardless of how many times the result of Buffer is enumerated. All enumerators for the result of Buffer share the same source enumerator and internal list.
You can use the Microsoft.FSharp.Collections.LazyList<> type from the F# power pack (yep, from C# without F# installed - no problem!) for this. It's in Nuget package FSPowerPack.Core.Community.
In particular, you want to call LazyListModule.ofSeq(...) which returns a LazyList<T> that implements IEnumerable<T> and is lazy and cached.
In your case, usage is just a matter of...
var evenNumbers = LazyListModule.ofSeq(Numbers.Where(i => i % 2 == 0));
var cachedEvenNumbers = LazyListModule.ofSeq(evenNumbers);
Though I personally prefer var in all such cases, note that this does mean the compile-time type will be more specific than just IEnumerable<> - not that this is likely to ever be a downside. Another advantage of the F# non-interface types is that they expose some efficient operations you can't do efficienly with plain IEnumerables, such as LazyListModule.skip.
I'm not sure whether LazyList is thread-safe, but I suspect it is.
Another alternative pointed out in the comments below (if you have F# installed) is SeqModule.Cache (namespace Microsoft.FSharp.Collections, it'll be in GACed assembly FSharp.Core.dll) which has the same effective behavior. Like other .NET enumerables, Seq.cache doesn't have a tail (or skip) operator you can efficiently chain.
Thread-safe: unlike other solutions to this question Seq.cache is thread-safe in the sense that you can have multiple enumerators running in parallel (each enumerator is not thread safe).
Performance I did a quick benchmark, and the LazyList enumerable has at least 4 times more overhead than the SeqModule.Cache variant, which has at least three times more overhead than the custom implementation answers. So, while the F# variants work, they're not quite as fast. Note that 3-12 times slower still isn't very slow compared to an enumerable that does (say) I/O or any non-trivial computation, so this probably won't matter most of the time, but it's good to keep in mind.
TL;DR If you need an efficient, thread-safe cached enumerable, just use SeqModule.Cache.
Building upon Eamon's answer above, here's another functional solution (no new types) that works also with simultaneous evaluation. This demonstrates that a general pattern (iteration with shared state) underlies this problem.
First we define a very general helper method, meant to allow us to simulate the missing feature of anonymous iterators in C#:
public static IEnumerable<T> Generate<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
Generate is like an aggregator with state. It accepts a function that returns initial state, and a generator function that would have been an anonymous with yield return in it, if it were allowed in C#. The state returned by initialize is meant to be per-enumeration, while a more global state (shared between all enumerations) can be maintained by the caller to Generate e.g. in closure variables as we'll show below.
Now we can use this for the "buffered Enumerable" problem:
public static IEnumerable<T> Cached<T>(IEnumerable<T> enumerable)
{
var cache = new List<T>();
var enumerator = enumerable.GetEnumerator();
return Generate<T>(() =>
{
int pos = -1;
return () => {
pos += 1;
if (pos < cache.Count())
{
return new Tuple<T>(cache[pos]);
}
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
return new Tuple<T>(enumerator.Current);
}
return null;
};
});
}
I hope this answer combines the brevity and clarity of sinelaw's answer and the support for multiple enumerations of Timothy's answer:
public static IEnumerable<T> Cached<T>(this IEnumerable<T> enumerable) {
return CachedImpl(enumerable.GetEnumerator(), new List<T>());
}
static IEnumerable<T> CachedImpl<T>(IEnumerator<T> source, List<T> buffer) {
int pos=0;
while(true) {
if(pos == buffer.Count)
if (source.MoveNext())
buffer.Add(source.Current);
else
yield break;
yield return buffer[pos++];
}
}
Key ideas are to use the yield return syntax to make for a short enumerable implementation, but you still need a state-machine to decide whether you can get the next element from the buffer, or whether you need to check the underlying enumerator.
Limitations: This makes no attempt to be thread-safe, nor does it dispose the underlying enumerator (which, in general, is quite tricky to do as the underlying uncached enumerator must remain undisposed as long as any cached enumerabl might still be used).
As far as I know there is no built-in way to do this, which - now that you mention it - is slightly surprising (my guess is, given the frequency with which one would want to use this option, it was probably not worth the effort needed to analyse the code to make sure that the generator gives the exact same sequence every time).
You can however implement it yourself. The easy way would be on the call-site, as
var evenNumbers = Numbers.Where(i => i % 2 == 0).
var startOfList = evenNumbers.Take(10).ToList();
// use startOfList instead of evenNumbers in the loop.
More generally and accurately, you could do it in the generator: create a List<int> cache and every time you generate a new number add it to the cache before you yield return it. Then when you loop through again, first serve up all the cached numbers. E.g.
List<int> cachedEvenNumbers = new List<int>();
IEnumerable<int> EvenNumbers
{
get
{
int i = -1;
foreach(int cached in cachedEvenNumbers)
{
i = cached;
yield return cached;
}
// Note: this while loop now starts from the last cached value
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
I guess if you think about this long enough you could come up with a general implementation of a IEnumerable<T>.Buffered() extension method - again, the requirement is that the enumeration doesn't change between calls and the question is if it is worth it.
Here's an incomplete yet compact 'functional' implementation (no new types defined).
The bug is that it does not allow simultaneous enumeration.
Original description:
The first function should have been an anonymous lambda inside the second, but C# does not allow yield in anonymous lambdas:
// put these in some extensions class
private static IEnumerable<T> EnumerateAndCache<T>(IEnumerator<T> enumerator, List<T> cache)
{
while (enumerator.MoveNext())
{
var current = enumerator.Current;
cache.Add(current);
yield return current;
}
}
public static IEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
{
var enumerator = enumerable.GetEnumerator();
var cache = new List<T>();
return cache.Concat(EnumerateAndCache(enumerator, cache));
}
Usage:
var enumerable = Numbers.ToCachedEnumerable();
Full credit to Eamon Nerbonne and sinelaw for their answers, just a couple of tweaks! First, to release the enumerator when it is completed. Secondly to protect the underlying enumerator with a lock so the enumerable can be safely used on multiple threads.
// This is just the same as #sinelaw's Generator but I didn't like the name
public static IEnumerable<T> AnonymousIterator<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
// Cached/Buffered/Replay behaviour
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> self)
{
// Rows are stored here when they've been fetched once
var cache = new List<T>();
// This counter is thread-safe in that it is incremented after the item has been added to the list,
// hence it will never give a false positive. It may give a false negative, but that falls through
// to the code which takes the lock so it's ok.
var count = 0;
// The enumerator is retained until it completes, then it is discarded.
var enumerator = self.GetEnumerator();
// This lock protects the enumerator only. The enumerable could be used on multiple threads
// and the enumerator would then be shared among them, but enumerators are inherently not
// thread-safe so a) we must protect that with a lock and b) we don't need to try and be
// thread-safe in our own enumerator
var lockObject = new object();
return AnonymousIterator<T>(() =>
{
int pos = -1;
return () =>
{
pos += 1;
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Only take the lock when we need to
lock (lockObject)
{
// The counter could have been updated between the check above and this one,
// so now we have the lock we must check again
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Enumerator is set to null when it has completed
if (enumerator != null)
{
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
count += 1;
return new Tuple<T>(enumerator.Current);
}
else
{
enumerator = null;
}
}
}
}
return null;
};
});
}
I use the following extension method.
This way, the input is read at maximum speed, and the consumer processes at maximum speed.
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> input)
{
var blockingCollection = new BlockingCollection<T>();
//read from the input
Task.Factory.StartNew(() =>
{
foreach (var item in input)
{
blockingCollection.Add(item);
}
blockingCollection.CompleteAdding();
});
foreach (var item in blockingCollection.GetConsumingEnumerable())
{
yield return item;
}
}
Example Usage
This example has a fast producer (find files), and a slow consumer (upload files).
long uploaded = 0;
long total = 0;
Directory
.EnumerateFiles(inputFolder, "*.jpg", SearchOption.AllDirectories)
.Select(filename =>
{
total++;
return filename;
})
.Buffer()
.ForEach(filename =>
{
//pretend to do something slow, like upload the file.
Thread.Sleep(1000);
uploaded++;
Console.WriteLine($"Uploaded {uploaded:N0}/{total:N0}");
});

What is the use of the "yield" keyword in C#? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Proper Use of yield return
What is the use of the yield keyword in C#?
I didn't understand it from the MSDN reference... can someone explain it to me please?
I'm going to try and give you an example
Here's the classical way of doing, which fill up a list object and then returns it:
private IEnumerable<int> GetNumbers()
{
var list = new List<int>();
for (var i = 0; i < 10; i++)
{
list.Add(i);
}
return list;
}
the yield keyword returns items one by one like this :
private IEnumerable<int> GetNumbers()
{
for (var i = 0; i < 10; i++)
{
yield return i;
}
}
so imagine the code that calls the GetNumbers function as following:
foreach (int number in GetNumbers())
{
if (number == 5)
{
//do something special...
break;
}
}
without using yield you would have to generate the whole list from 0-10 which is then returned, then iterated over until you find the number 5.
Now thanks to the yield keyword, you will only generate numbers until you reach the one you're looking for and break out the loop.
I don't know if I was clear enough..
my question is, when do I use it? Is there any example out there where I have there is no other choice but using yield? Why did someone feel C# needed another keyword?
The article you linked provided a nice example of when and how it is used.
I hate to quote an article you yourself linked too, but incase it's too long, and you didn't read it.
The yield keyword signals to the compiler that the method in which it appears is an iterator block. The compiler generates a class to implement the behavior that is expressed in the iterator block.
public static System.Collections.IEnumerable Power(int number, int exponent)
{
int counter = 0;
int result = 1;
while (counter++ < exponent)
{
result = result * number;
yield return result;
}
}
In the above example, the yield statement is used inside an iterator block. When the Power method is invoked, it returns an enumerable object that contains the powers of a number. Notice that the return type of the Power method is System.Collections.IEnumerable, an iterator interface type.
So the compiler automatically generates a IEnumerable interfaced based on the things that were yielded during the method's execution.
Here is a simplified example, for the sake of completeness:
public static System.Collections.IEnumerable CountToTen()
{
int counter = 0;
while (counter++ < 10)
{
yield return counter;
}
}
public static Main(string[]...)
{
foreach(var i in CountToTen())
{
Console.WriteLine(i);
}
}

Why can't iterator methods take either 'ref' or 'out' parameters?

I tried this earlier today:
public interface IFoo
{
IEnumerable<int> GetItems_A( ref int somethingElse );
IEnumerable<int> GetItems_B( ref int somethingElse );
}
public class Bar : IFoo
{
public IEnumerable<int> GetItems_A( ref int somethingElse )
{
// Ok...
}
public IEnumerable<int> GetItems_B( ref int somethingElse )
{
yield return 7; // CS1623: Iterators cannot have ref or out parameters
}
}
What's the rationale behind this?
C# iterators are state machines internally. Every time you yield return something, the place where you left off should be saved along with the state of local variables so that you could get back and continue from there.
To hold this state, C# compiler creates a class to hold local variables and the place it should continue from. It's not possible to have a ref or out value as a field in a class. Consequently, if you were allowed to declare a parameter as ref or out, there would be no way to keep the complete snapshot of the function at the time we had left off.
EDIT: Technically, not all methods that return IEnumerable<T> are considered iterators. Just those that use yield to produce a sequence directly are considered iterators. Therefore, while the splitting the iterator into two methods is a nice and common workaround, it doesn't contradict with what I just said. The outer method (that doesn't use yield directly) is not considered an iterator.
If you want to return both an iterator and an int from your method, a workaround is this:
public class Bar : IFoo
{
public IEnumerable<int> GetItems( ref int somethingElse )
{
somethingElse = 42;
return GetItemsCore();
}
private IEnumerable<int> GetItemsCore();
{
yield return 7;
}
}
You should note that none of the code inside an iterator method (i.e. basically a method that contains yield return or yield break) is executed until the MoveNext() method in the Enumerator is called. So if you were able to use out or ref in your iterator method, you would get surprising behavior like this:
// This will not compile:
public IEnumerable<int> GetItems( ref int somethingElse )
{
somethingElse = 42;
yield return 7;
}
// ...
int somethingElse = 0;
IEnumerable<int> items = GetItems( ref somethingElse );
// at this point somethingElse would still be 0
items.GetEnumerator().MoveNext();
// but now the assignment would be executed and somethingElse would be 42
This is a common pitfall, a related issue is this:
public IEnumerable<int> GetItems( object mayNotBeNull ){
if( mayNotBeNull == null )
throw new NullPointerException();
yield return 7;
}
// ...
IEnumerable<int> items = GetItems( null ); // <- This does not throw
items.GetEnumerators().MoveNext(); // <- But this does
So a good pattern is to separate iterator methods into two parts: one to execute immediately and one that contains the code that should be lazily executed.
public IEnumerable<int> GetItems( object mayNotBeNull ){
if( mayNotBeNull == null )
throw new NullPointerException();
// other quick checks
return GetItemsCore( mayNotBeNull );
}
private IEnumerable<int> GetItemsCore( object mayNotBeNull ){
SlowRunningMethod();
CallToDatabase();
// etc
yield return 7;
}
// ...
IEnumerable<int> items = GetItems( null ); // <- Now this will throw
EDIT:
If you really want the behavior where moving the iterator would modify the ref-parameter, you could do something like this:
public static IEnumerable<int> GetItems( Action<int> setter, Func<int> getter )
{
setter(42);
yield return 7;
}
//...
int local = 0;
IEnumerable<int> items = GetItems((x)=>{local = x;}, ()=>local);
Console.WriteLine(local); // 0
items.GetEnumerator().MoveNext();
Console.WriteLine(local); // 42
At a highish level, A ref variable can point to many locations including to value types that are on the stack. The time at which the iterator is initially created by calling the iterator method and when the ref variable would be assigned are two very different times. It is not possible to guarantee that the variable which originally was passed by reference is still around when the iterator actually executes. Hence it is not allowed (or verifiable)
Others have explained why your iterator can't have a ref parameter. Here's a simple alternative:
public interface IFoo
{
IEnumerable<int> GetItems( int[] box );
...
}
public class Bar : IFoo
{
public IEnumerable<int> GetItems( int[] box )
{
int value = box[0];
// use and change value and yield to your heart's content
box[0] = value;
}
}
If you have several items to pass in and out, define a class to hold them.
I've gotten around this problem using functions, when the value that I need to return is derived from the iterated items:
// One of the problems with Enumerable.Count() is
// that it is a 'terminator', meaning that it will
// execute the expression it is given, and discard
// the resulting sequence. To count the number of
// items in a sequence without discarding it, we
// can use this variant that takes an Action<int>
// (or Action<long>), invokes it and passes it the
// number of items that were yielded.
//
// Example: This example allows us to find out
// how many items were in the original
// source sequence 'items', as well as
// the number of items consumed by the
// call to Sum(), without causing any
// LINQ expressions involved to execute
// multiple times.
//
// int start = 0; // the number of items from the original source
// int finished = 0; // the number of items in the resulting sequence
//
// IEnumerable<KeyValuePair<string, double>> items = // assumed to be an iterator
//
// var result = items.Count( i => start = i )
// .Where( p => p.Key = "Banana" )
// .Select( p => p.Value )
// .Count( i => finished = i )
// .Sum();
//
// // by getting the count of items operated
// // on by Sum(), we can calculate an average:
//
// double average = result / (double) finished;
//
// Console.WriteLine( "started with {0} items", start );
// Console.WriteLine( "finished with {0} items", finished );
//
public static IEnumerable<T> Count<T>(
this IEnumerable<T> source,
Action<int> receiver )
{
int i = 0;
foreach( T item in source )
{
yield return item;
++i ;
}
receiver( i );
}
public static IEnumerable<T> Count<T>(
this IEnumerable<T> source,
Action<long> receiver )
{
long i = 0;
foreach( T item in source )
{
yield return item;
++i ;
}
receiver( i );
}

Some help understanding "yield"

In my everlasting quest to suck less I'm trying to understand the "yield" statement, but I keep encountering the same error.
The body of [someMethod] cannot be an iterator block because
'System.Collections.Generic.List< AClass>' is not an iterator interface type.
This is the code where I got stuck:
foreach (XElement header in headersXml.Root.Elements()){
yield return (ParseHeader(header));
}
What am I doing wrong? Can't I use yield in an iterator? Then what's the point?
In this example it said that List<ProductMixHeader> is not an iterator interface type.
ProductMixHeader is a custom class, but I imagine List is an iterator interface type, no?
--Edit--
Thanks for all the quick answers.
I know this question isn't all that new and the same resources keep popping up.
It turned out I was thinking I could return List<AClass> as a return type, but since List<T> isn't lazy, it cannot. Changing my return type to IEnumerable<T> solved the problem :D
A somewhat related question (not worth opening a new thread): is it worth giving IEnumerable<T> as a return type if I'm sure that 99% of the cases I'm going to go .ToList() anyway? What will the performance implications be?
A method using yield return must be declared as returning one of the following two interfaces:
IEnumerable<SomethingAppropriate>
IEnumerator<SomethingApropriate>
(thanks Jon and Marc for pointing out IEnumerator)
Example:
public IEnumerable<AClass> YourMethod()
{
foreach (XElement header in headersXml.Root.Elements())
{
yield return (ParseHeader(header));
}
}
yield is a lazy producer of data, only producing another item after the first has been retrieved, whereas returning a list will return everything in one go.
So there is a difference, and you need to declare the method correctly.
For more information, read Jon's answer here, which contains some very useful links.
It's a tricky topic. In a nutshell, it's an easy way of implementing IEnumerable and its friends. The compiler builds you a state machine, transforming parameters and local variables into instance variables in a new class. Complicated stuff.
I have a few resources on this:
Chapter 6 of C# in Depth (free download from that page)
Iterators, iterator blocks and data pipelines (article)
Iterator block implementation details (article)
"yield" creates an iterator block - a compiler generated class that can implement either IEnumerable[<T>] or IEnumerator[<T>]. Jon Skeet has a very good (and free) discussion of this in chapter 6 of C# in Depth.
But basically - to use "yield" your method must return an IEnumerable[<T>] or IEnumerator[<T>]. In this case:
public IEnumerable<AClass> SomeMethod() {
// ...
foreach (XElement header in headersXml.Root.Elements()){
yield return (ParseHeader(header));
}
}
List implements Ienumerable.
Here's an example that might shed some light on what you are trying to learn. I wrote this about 6 months
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace YieldReturnTest
{
public class PrimeFinder
{
private Boolean isPrime(int integer)
{
if (0 == integer)
return false;
if (3 > integer)
return true;
for (int i = 2; i < integer; i++)
{
if (0 == integer % i)
return false;
}
return true;
}
public IEnumerable<int> FindPrimes()
{
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
yield return i;
}
}
}
}
class Program
{
static void Main(string[] args)
{
PrimeFinder primes = new PrimeFinder();
foreach (int i in primes.FindPrimes())
{
Console.WriteLine(i);
Console.ReadLine();
}
Console.ReadLine();
Console.ReadLine();
}
}
}
I highly recommend using Reflector to have a look at what yield actually does for you. You'll be able to see the full code of the class that the compiler generates for you when using yield, and I've found that people understand the concept much more quickly when they can see the low-level result (well, mid-level I guess).
To understand yield, you need to understand when to use IEnumerator and IEnumerable (because you have to use either of them). The following examples help you to understand the difference.
First, take a look at the following class, it implements two methods - one returning IEnumerator<int>, one returning IEnumerable<int>. I'll show you that there is a big difference in usage, although the code of the 2 methods is looking similar:
// 2 iterators, one as IEnumerator, one as IEnumerable
public class Iterator
{
public static IEnumerator<int> IterateOne(Func<int, bool> condition)
{
for(var i=1; condition(i); i++) { yield return i; }
}
public static IEnumerable<int> IterateAll(Func<int, bool> condition)
{
for(var i=1; condition(i); i++) { yield return i; }
}
}
Now, if you're using IterateOne you can do the following:
// 1. Using IEnumerator allows to get item by item
var i=Iterator.IterateOne(x => true); // iterate endless
// 1.a) get item by item
i.MoveNext(); Console.WriteLine(i.Current);
i.MoveNext(); Console.WriteLine(i.Current);
// 1.b) loop until 100
int j; while (i.MoveNext() && (j=i.Current)<=100) { Console.WriteLine(j); }
1.a) prints:
1
2
1.b) prints:
3
4
...
100
because it continues counting right after the 1.a) statements have been executed.
You can see that you can advance item by item using MoveNext().
In contrast, IterateAll allows you to use foreach and also LINQ statements for bigger comfort:
// 2. Using IEnumerable makes looping and LINQ easier
var k=Iterator.IterateAll(x => x<100); // limit iterator to 100
// 2.a) Use a foreach loop
foreach(var x in k){ Console.WriteLine(x); } // loop
// 2.b) LINQ: take 101..200 of endless iteration
var lst=Iterator.IterateAll(x=>true).Skip(100).Take(100).ToList(); // LINQ: take items
foreach(var x in lst){ Console.WriteLine(x); } // output list
2.a) prints:
1
2
...
99
2.b) prints:
101
102
...
200
Note: Since IEnumerator<T> and IEnumerable<T> are Generics, they can be used with any type. However, for simplicity I have used int in my examples for type T.
This means, you can use one of the return types IEnumerator<ProductMixHeader> or IEnumerable<ProductMixHeader> (the custom class you have mentioned in your question).
The type List<ProductMixHeader> does not implement any of these interfaces, which is the reason why you can't use it that way. But Example 2.b) is showing how you can create a list from it.
If you're creating a list by appending .ToList() then the implication is, that it will create a list of all elements in memory, while an IEnumerable allows lazy creation of its elements - in terms of performance, it means that elements are enumerated just in time - as late as possible, but as soon as you're using .ToList(), then all elements are created in memory. LINQ tries to optimize performance this way behind the scenes.
DotNetFiddle of all examples
#Ian P´s answer helped me a lot to understand yield and why it is used. One (major) use case for yield is in "foreach" loops after the "in" keyword not to return a fully completed list. Instead of returning a complete list at once, in each "foreach" loop only one item (the next item) is returned. So you will gain performance with yield in such cases.
I have rewritten #Ian P´s code for my better understanding to the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace YieldReturnTest
{
public class PrimeFinder
{
private Boolean isPrime(int integer)
{
if (0 == integer)
return false;
if (3 > integer)
return true;
for (int i = 2; i < integer; i++)
{
if (0 == integer % i)
return false;
}
return true;
}
public IEnumerable<int> FindPrimesWithYield()
{
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
yield return i;
}
}
}
public IEnumerable<int> FindPrimesWithoutYield()
{
var primes = new List<int>();
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
primes.Add(i);
}
}
return primes;
}
}
class Program
{
static void Main(string[] args)
{
PrimeFinder primes = new PrimeFinder();
Console.WriteLine("Finding primes until 7 with yield...very fast...");
foreach (int i in primes.FindPrimesWithYield()) // FindPrimesWithYield DOES NOT iterate over all integers at once, it returns item by item
{
if (i > 7)
{
break;
}
Console.WriteLine(i);
//Console.ReadLine();
}
Console.WriteLine("Finding primes until 7 without yield...be patient it will take lonkg time...");
foreach (int i in primes.FindPrimesWithoutYield()) // FindPrimesWithoutYield DOES iterate over all integers at once, it returns the complete list of primes at once
{
if (i > 7)
{
break;
}
Console.WriteLine(i);
//Console.ReadLine();
}
Console.ReadLine();
Console.ReadLine();
}
}
}
What does the method you're using this in look like? I don't think this can be used in just a loop by itself.
For example...
public IEnumerable<string> GetValues() {
foreach(string value in someArray) {
if (value.StartsWith("A")) { yield return value; }
}
}

Categories