Buffering a LINQ query - c#

FINAL EDIT:
I've chosen Timothy's answer but if you want a cuter implementation that leverages the C# yield statement check Eamon's answer: https://stackoverflow.com/a/19825659/145757
By default LINQ queries are lazily streamed.
ToArray/ToList give full buffering but first they're eager and secondly it may take quite some time to complete with an infinite sequence.
Is there any way to have a combination of both behaviors : streaming and buffering values on the fly as they are generated, so that the next querying won't trigger the generation of the elements that have already been queried.
Here is a basic use-case:
static IEnumerable<int> Numbers
{
get
{
int i = -1;
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
static void Main(string[] args)
{
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0);
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
Console.WriteLine("==========");
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
}
Here is the output:
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
==========
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
The generation code is triggered 22 times.
I'd like it to be triggered 11 times, the first time the enumerable is iterated.
Then the second iteration would benefit from the already generated values.
It would be something like:
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer();
For those familiar with Rx it's a behavior similar to a ReplaySubject.

IEnumerable<T>.Buffer() extension method
public static EnumerableExtensions
{
public static BufferEnumerable<T> Buffer(this IEnumerable<T> source)
{
return new BufferEnumerable<T>(source);
}
}
public class BufferEnumerable<T> : IEnumerable<T>, IDisposable
{
IEnumerator<T> source;
List<T> buffer;
public BufferEnumerable(IEnumerable<T> source)
{
this.source = source.GetEnumerator();
this.buffer = new List<T>();
}
public IEnumerator<T> GetEnumerator()
{
return new BufferEnumerator<T>(source, buffer);
}
public void Dispose()
{
source.Dispose()
}
}
public class BufferEnumerator<T> : IEnumerator<T>
{
IEnumerator<T> source;
List<T> buffer;
int i = -1;
public BufferEnumerator(IEnumerator<T> source, List<T> buffer)
{
this.source = source;
this.buffer = buffer;
}
public T Current
{
get { return buffer[i]; }
}
public bool MoveNext()
{
i++;
if (i < buffer.Count)
return true;
if (!source.MoveNext())
return false;
buffer.Add(source.Current);
return true;
}
public void Reset()
{
i = -1;
}
public void Dispose()
{
}
}
Usage
using (var evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer())
{
...
}
Comments
The key point here is that the IEnumerable<T> source given as input to the Buffer method only has GetEnumerator called once, regardless of how many times the result of Buffer is enumerated. All enumerators for the result of Buffer share the same source enumerator and internal list.

You can use the Microsoft.FSharp.Collections.LazyList<> type from the F# power pack (yep, from C# without F# installed - no problem!) for this. It's in Nuget package FSPowerPack.Core.Community.
In particular, you want to call LazyListModule.ofSeq(...) which returns a LazyList<T> that implements IEnumerable<T> and is lazy and cached.
In your case, usage is just a matter of...
var evenNumbers = LazyListModule.ofSeq(Numbers.Where(i => i % 2 == 0));
var cachedEvenNumbers = LazyListModule.ofSeq(evenNumbers);
Though I personally prefer var in all such cases, note that this does mean the compile-time type will be more specific than just IEnumerable<> - not that this is likely to ever be a downside. Another advantage of the F# non-interface types is that they expose some efficient operations you can't do efficienly with plain IEnumerables, such as LazyListModule.skip.
I'm not sure whether LazyList is thread-safe, but I suspect it is.
Another alternative pointed out in the comments below (if you have F# installed) is SeqModule.Cache (namespace Microsoft.FSharp.Collections, it'll be in GACed assembly FSharp.Core.dll) which has the same effective behavior. Like other .NET enumerables, Seq.cache doesn't have a tail (or skip) operator you can efficiently chain.
Thread-safe: unlike other solutions to this question Seq.cache is thread-safe in the sense that you can have multiple enumerators running in parallel (each enumerator is not thread safe).
Performance I did a quick benchmark, and the LazyList enumerable has at least 4 times more overhead than the SeqModule.Cache variant, which has at least three times more overhead than the custom implementation answers. So, while the F# variants work, they're not quite as fast. Note that 3-12 times slower still isn't very slow compared to an enumerable that does (say) I/O or any non-trivial computation, so this probably won't matter most of the time, but it's good to keep in mind.
TL;DR If you need an efficient, thread-safe cached enumerable, just use SeqModule.Cache.

Building upon Eamon's answer above, here's another functional solution (no new types) that works also with simultaneous evaluation. This demonstrates that a general pattern (iteration with shared state) underlies this problem.
First we define a very general helper method, meant to allow us to simulate the missing feature of anonymous iterators in C#:
public static IEnumerable<T> Generate<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
Generate is like an aggregator with state. It accepts a function that returns initial state, and a generator function that would have been an anonymous with yield return in it, if it were allowed in C#. The state returned by initialize is meant to be per-enumeration, while a more global state (shared between all enumerations) can be maintained by the caller to Generate e.g. in closure variables as we'll show below.
Now we can use this for the "buffered Enumerable" problem:
public static IEnumerable<T> Cached<T>(IEnumerable<T> enumerable)
{
var cache = new List<T>();
var enumerator = enumerable.GetEnumerator();
return Generate<T>(() =>
{
int pos = -1;
return () => {
pos += 1;
if (pos < cache.Count())
{
return new Tuple<T>(cache[pos]);
}
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
return new Tuple<T>(enumerator.Current);
}
return null;
};
});
}

I hope this answer combines the brevity and clarity of sinelaw's answer and the support for multiple enumerations of Timothy's answer:
public static IEnumerable<T> Cached<T>(this IEnumerable<T> enumerable) {
return CachedImpl(enumerable.GetEnumerator(), new List<T>());
}
static IEnumerable<T> CachedImpl<T>(IEnumerator<T> source, List<T> buffer) {
int pos=0;
while(true) {
if(pos == buffer.Count)
if (source.MoveNext())
buffer.Add(source.Current);
else
yield break;
yield return buffer[pos++];
}
}
Key ideas are to use the yield return syntax to make for a short enumerable implementation, but you still need a state-machine to decide whether you can get the next element from the buffer, or whether you need to check the underlying enumerator.
Limitations: This makes no attempt to be thread-safe, nor does it dispose the underlying enumerator (which, in general, is quite tricky to do as the underlying uncached enumerator must remain undisposed as long as any cached enumerabl might still be used).

As far as I know there is no built-in way to do this, which - now that you mention it - is slightly surprising (my guess is, given the frequency with which one would want to use this option, it was probably not worth the effort needed to analyse the code to make sure that the generator gives the exact same sequence every time).
You can however implement it yourself. The easy way would be on the call-site, as
var evenNumbers = Numbers.Where(i => i % 2 == 0).
var startOfList = evenNumbers.Take(10).ToList();
// use startOfList instead of evenNumbers in the loop.
More generally and accurately, you could do it in the generator: create a List<int> cache and every time you generate a new number add it to the cache before you yield return it. Then when you loop through again, first serve up all the cached numbers. E.g.
List<int> cachedEvenNumbers = new List<int>();
IEnumerable<int> EvenNumbers
{
get
{
int i = -1;
foreach(int cached in cachedEvenNumbers)
{
i = cached;
yield return cached;
}
// Note: this while loop now starts from the last cached value
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
I guess if you think about this long enough you could come up with a general implementation of a IEnumerable<T>.Buffered() extension method - again, the requirement is that the enumeration doesn't change between calls and the question is if it is worth it.

Here's an incomplete yet compact 'functional' implementation (no new types defined).
The bug is that it does not allow simultaneous enumeration.
Original description:
The first function should have been an anonymous lambda inside the second, but C# does not allow yield in anonymous lambdas:
// put these in some extensions class
private static IEnumerable<T> EnumerateAndCache<T>(IEnumerator<T> enumerator, List<T> cache)
{
while (enumerator.MoveNext())
{
var current = enumerator.Current;
cache.Add(current);
yield return current;
}
}
public static IEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
{
var enumerator = enumerable.GetEnumerator();
var cache = new List<T>();
return cache.Concat(EnumerateAndCache(enumerator, cache));
}
Usage:
var enumerable = Numbers.ToCachedEnumerable();

Full credit to Eamon Nerbonne and sinelaw for their answers, just a couple of tweaks! First, to release the enumerator when it is completed. Secondly to protect the underlying enumerator with a lock so the enumerable can be safely used on multiple threads.
// This is just the same as #sinelaw's Generator but I didn't like the name
public static IEnumerable<T> AnonymousIterator<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
// Cached/Buffered/Replay behaviour
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> self)
{
// Rows are stored here when they've been fetched once
var cache = new List<T>();
// This counter is thread-safe in that it is incremented after the item has been added to the list,
// hence it will never give a false positive. It may give a false negative, but that falls through
// to the code which takes the lock so it's ok.
var count = 0;
// The enumerator is retained until it completes, then it is discarded.
var enumerator = self.GetEnumerator();
// This lock protects the enumerator only. The enumerable could be used on multiple threads
// and the enumerator would then be shared among them, but enumerators are inherently not
// thread-safe so a) we must protect that with a lock and b) we don't need to try and be
// thread-safe in our own enumerator
var lockObject = new object();
return AnonymousIterator<T>(() =>
{
int pos = -1;
return () =>
{
pos += 1;
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Only take the lock when we need to
lock (lockObject)
{
// The counter could have been updated between the check above and this one,
// so now we have the lock we must check again
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Enumerator is set to null when it has completed
if (enumerator != null)
{
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
count += 1;
return new Tuple<T>(enumerator.Current);
}
else
{
enumerator = null;
}
}
}
}
return null;
};
});
}

I use the following extension method.
This way, the input is read at maximum speed, and the consumer processes at maximum speed.
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> input)
{
var blockingCollection = new BlockingCollection<T>();
//read from the input
Task.Factory.StartNew(() =>
{
foreach (var item in input)
{
blockingCollection.Add(item);
}
blockingCollection.CompleteAdding();
});
foreach (var item in blockingCollection.GetConsumingEnumerable())
{
yield return item;
}
}
Example Usage
This example has a fast producer (find files), and a slow consumer (upload files).
long uploaded = 0;
long total = 0;
Directory
.EnumerateFiles(inputFolder, "*.jpg", SearchOption.AllDirectories)
.Select(filename =>
{
total++;
return filename;
})
.Buffer()
.ForEach(filename =>
{
//pretend to do something slow, like upload the file.
Thread.Sleep(1000);
uploaded++;
Console.WriteLine($"Uploaded {uploaded:N0}/{total:N0}");
});

Related

Is there a down side to returning an IEnumerable<T> via an infinite loop yielding elements?

Is the infinite loop here causing any negative impact? Resharper warns me that the..
Function never returns
..but I can't see any downside here.
public static class RandomEntityFactory
{
public static IEnumerable<T> Enumerate<T>() where T : class
{
while (true)
{
yield return Get<T>();
}
}
public static T Get<T>() where T : class
{
if (typeof(T) == typeof(Client)) return CreateRandomClient() as T;
if (typeof(T) == typeof(Font)) return CreateRandomFont() as T;
throw new Exception("unknown type: " + typeof(T).Name);
}
}
Is there a downside?
It depends on the intended usage. As written, some Linq functions - primarily All and Count - will never complete. Linq queries that use deferred execution would be fine so long as the consumer has a way to break the loop at some point. Clients will have to be sure to use deterministic functions such as Take or First (assuming at some point a return value will meet the condition).
If the intended usage is to let clients enumerate with a foreach, breaking the loop at their discretion, then it's reasonable to return items indefinitely.
Having an "infinite" enumerable does not cause any problems by itself. The particular code you have written looks very odd though. I would do it this way instead:
static IEnumerable<long> Iota()
{
long i = 0;
while (true)
{
yield return i;
i++;
}
}
This is the general-purpose sequence 0, 1, 2, ...
If you want to make an endless sequence of random fonts, you can do this:
Iota().Select(_ => CreateRandomFont())
This avoids writing your Get method, which looks bad to me.

Using ImmutableSortedSet<T> for a thread safe cache

I have a method that takes a DateTime and returns the date marking the end of that quarter. Because of some complexity involving business days and holiday calendars, I want to cache the result to speed up subsequent calls. I'm using a SortedSet<DateTime> to maintain a cache of data, and I use the GetViewBetween method in order to do cache lookups as follows:
private static SortedSet<DateTime> quarterEndCache = new SortedSet<DateTime>();
public static DateTime GetNextQuarterEndDate(DateTime date)
{
var oneDayLater = date.AddDays(1.0);
var fiveMonthsLater = date.AddMonths(5);
var range = quarterEndCache.GetViewBetween(oneDayLater, fiveMonthsLater);
if (range.Count > 0)
{
return range.Min;
}
// Perform expensive calc here
}
Now I want to make my cache threadsafe. Rather than use a lock everywhere which would incur a performance hit on every lookup, I'm exploring the new ImmutableSortedSet<T> collection which would allow me to avoid locks entirely. The problem is that ImmutableSortedSet<T> doesn't have the method GetViewBetween. Is there any way to get similar functionality from the ImmutableSortedSet<T>?
[EDIT]
Servy has convinced me just using a lock with a normal SortedSet<T> is the easiest solution. I'll leave the question open though just because I'm interested to know whether the ImmutableSortedSet<T> can handle this scenario efficiently.
Let's divide the question into two parts:
How to get a functionality similar to GetViewBetween with ImmutableSortedSet<T>? I'd suggest using the IndexOf method. In the snippet below, I created an extension method GetRangeBetween which should do the job.
How to implement lock-free, thread-safe updates with data immutable data structures? Despite this is not the original question, there are some skeptical comments with respect to this issue.
The immutables framework implements a method for exactly that purpose: System.Collections.Immutable.Update<T>(ref T location, Func<T, T> transformer) where T : class; The method internally relies on atomic compare/exchange operations. If you want to do this by hand, you'll find an alternative implementation below which should behave the same like Immutable.Update.
So here is the code:
public static class ImmutableExtensions
{
public static IEnumerable<T> GetRangeBetween<T>(
this ImmutableSortedSet<T> set, T min, T max)
{
int i = set.IndexOf(min);
if (i < 0) i = ~i;
while (i < set.Count)
{
T x = set[i++];
if (set.KeyComparer.Compare(x, min) >= 0 &&
set.KeyComparer.Compare(x, max) <= 0)
{
yield return x;
}
else
{
break;
}
}
}
public static void LockfreeUpdate<T>(ref T item, Func<T, T> fn)
where T: class
{
T x, y;
do
{
x = item;
y = fn(x);
} while (Interlocked.CompareExchange(ref item, y, x) != x);
}
}
Usage:
private static volatile ImmutableSortedSet<DateTime> quarterEndCache =
ImmutableSortedSet<DateTime>.Empty;
private static volatile int counter; // test/verification purpose only
public static DateTime GetNextQuarterEndDate(DateTime date)
{
var oneDayLater = date.AddDays(1.0);
var fiveMonthsLater = date.AddMonths(5);
var range = quarterEndCache.GetRangeBetween(oneDayLater, fiveMonthsLater);
if (range.Any())
{
return range.First();
}
// Perform expensive calc here
// -> Meaningless dummy computation for verification purpose only
long x = Interlocked.Increment(ref counter);
DateTime test = DateTime.FromFileTime(x);
ImmutableExtensions.LockfreeUpdate(
ref quarterEndCache,
c => c.Add(test));
return test;
}
[TestMethod]
public void TestIt()
{
var tasks = Enumerable
.Range(0, 100000)
.Select(x => Task.Factory.StartNew(
() => GetNextQuarterEndDate(DateTime.Now)))
.ToArray();
Task.WaitAll(tasks);
Assert.AreEqual(100000, counter);
}

Is IEnumerable.Any faster than a for loop with a break?

We experienced some slowness in our code opening a form and it was possibly due to a for loop with a break that was taking a long time to execute. I switched this to an IEnumerable.Any() and saw the form open very quickly. I am now trying to figure out if making this change alone increased performance or if it was accessing the ProductIDs property more efficiently. Should this implementation be faster, and if so, why?
Original Implementation:
public bool ContainsProduct(int productID) {
bool containsProduct = false;
for (int i = 0; i < this.ProductIDs.Length; i++) {
if (productID == this.ProductIDs[i]) {
containsProduct = true;
break;
}
}
return containsProduct;
}
New Implementation:
public bool ContainsProduct(int productID) {
return this.ProductIDs.Any(t => productID == t);
}
Call this an educated guess:
this.ProductIDs.Length
This probably is where the slowness lies. If the list of ProductIDs gets retrieved from database (for example) on every iteration in order to get the Length it would indeed be very slow. You can confirm this by profiling your application.
If this is not the case (say ProductIDs is in memory and Length is cached), then both should have an almost identical running time.
First implementation is slightly faster (enumeration is slightly slower than for loop). Second one is a lot more readable.
UPDATE
Oded's answer is possibly correct and well done for spotting it. The first one is slower here since it involves database roundtrip. Otherwise, it is slightly faster as I said.
UPDATE 2 - Proof
Here is a simple code showing why first one is faster:
public static void Main()
{
int[] values = Enumerable.Range(0, 1000000).ToArray();
int dummy = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < values.Length; i++)
{
dummy *= i;
}
stopwatch.Stop();
Console.WriteLine("Loop took {0}", stopwatch.ElapsedTicks);
dummy = 0;
stopwatch.Reset();
stopwatch.Start();
foreach (var value in values)
{
dummy *= value;
}
stopwatch.Stop();
Console.WriteLine("Iteration took {0}", stopwatch.ElapsedTicks);
Console.Read();
}
Here is output:
Loop took 12198
Iteration took 20922
So loop is twice is fast as iteration/enumeration.
I think they would be more or less identical. I usually refer to Jon Skeet's Reimplementing LINQ to Objects blog series to get an idea of how the extension methods work. Here's the post for Any() and All()
Here's the core part of Any() implementation from that post
public static bool Any<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
...
foreach (TSource item in source)
{
if (predicate(item))
{
return true;
}
}
return false;
}
This post assumes that ProductIDs is a List<T> or an array. So I'm talking about Linq-to-objects.
Linq is usually slower but shorter/more readable than conventional loop based code. A factor of 2-3 depending on what you're doing is typical.
Can you refactor your code to make this.ProductIDs a HashSet<T>? Or at least sort the array so you can use a binary search. Your problem is that you're performing a linear search, which is slow if there are many products.
I think the below implementation would be a little faster than the corresponding linq implementation, but very minor though
public bool ContainsProduct(int productID) {
var length = this.ProductIDs.Length;
for (int i = 0; i < length; i++) {
if (productID == this.ProductIDs[i]) {
return true;
}
}
return false;
}
The difference will be generally in memory usage then speed.
But generally you should use for loop when you know that you will be using all elements of array in other cases you should try to use while or do while.
I think that this solution use minimum resources
int i = this.ProductIDs.Length - 1;
while(i >= 0) {
if(this.ProductIDs[i--] == productId) {
return true;
}
}
return false;

Can someone demystify the yield keyword?

I have seen the yield keyword being used quite a lot on Stack Overflow and blogs. I don't use LINQ. Can someone explain the yield keyword?
I know that similar questions exist.
But none really explain what is its use in plain simple language.
By far the best explanation of this (that I've seen) is Jon Skeet's book - and that chapter is free! Chapter 6, C# in Depth. There is nothing I can add here that isn't covered.
Then buy the book; you will be a better C# programmer for it.
Q: Why didn't I write a longer answer here (paraphrased from comments); simple. As Eric Lippert observes (here), the yield construct (and the magic that goes behind it) is the single most complex bit of code in the C# compiler, and to try and describe it in a brief reply here is naïve at best. There are so many nuances to yield that IMO it is better to refer to a pre-existing (and fully qualified) resource.
Eric's blog now has 7 entries (and that is just the recent ones) discussing yield. I have a vast amount of respect for Eric, but his blog is probably more appropriate as a "further information" for people who are comfortable with the subject (yield in this case), as it typically describes a lot of the background design considerations. Best done in the context of a reasonable foundation.
(and yes, chapter 6 does download; I verified...)
The yield keyword is used with methods that return IEnumerable<T> or IEnumerator<T> and it makes the compiler generate a class that implements the necessary plumbing for using the iterator. E.g.
public IEnumerator<int> SequenceOfOneToThree() {
yield return 1;
yield return 2;
yield return 3;
}
Given the above the compiler will generate a class that implements IEnumerator<int>, IEnumerable<int> and IDisposable (actually it will also implement the non-generic versions of IEnumerable and IEnumerator).
This allows you to call the method SequenceOfOneToThree in a foreach loop like this
foreach(var number in SequenceOfOneToThree) {
Console.WriteLine(number);
}
An iterator is a state machine, so each time yield is called the position in the method is recorded. If the iterator is moved to the next element, the method resumes right after this position. So the first iteration returns 1 and marks that position. The next iterator resumes right after one and thus returns 2 and so forth.
Needless to say you can generate the sequence in any way you like, so you don't have to hard code the numbers like I did. Also, if you want to break the loop you can use yield break.
In an effort to demystify I'll avoid talking about iterators, since they could be part of the mystery themselves.
the yield return and yield break statements are most often used to provide "deferred evaluation" of the collection.
What this means is that when you get the value of a method that uses yield return, the collection of things you are trying to get don't exist together yet (it's essentially empty). As you loop through them (using foreach) it will execute the method at that time and get the next element in the enumeration.
Certain properties and methods will cause the entire enumeration to be evaluated at once (such as "Count").
Here's a quick example of the difference between returning a collection and returning yield:
string[] names = { "Joe", "Jim", "Sam", "Ed", "Sally" };
public IEnumerable<string> GetYieldEnumerable()
{
foreach (var name in names)
yield return name;
}
public IEnumerable<string> GetList()
{
var list = new List<string>();
foreach (var name in names)
list.Add(name);
return list;
}
// we're going to execute the GetYieldEnumerable() method
// but the foreach statement inside it isn't going to execute
var yieldNames = GetNamesEnumerable();
// now we're going to execute the GetList() method and
// the foreach method will execute
var listNames = GetList();
// now we want to look for a specific name in yieldNames.
// only the first two iterations of the foreach loop in the
// GetYieldEnumeration() method will need to be called to find it.
if (yieldNames.Contains("Jim")
Console.WriteLine("Found Jim and only had to loop twice!");
// now we'll look for a specific name in listNames.
// the entire names collection was already iterated over
// so we've already paid the initial cost of looping through that collection.
// now we're going to have to add two more loops to find it in the listNames
// collection.
if (listNames.Contains("Jim"))
Console.WriteLine("Found Jim and had to loop 7 times! (5 for names and 2 for listNames)");
This can also be used if you need to get a reference to the Enumeration before the source data has values. For example if the names collection wasn't complete to start with:
string[] names = { "Joe", "Jim", "Sam", "Ed", "Sally" };
public IEnumerable<string> GetYieldEnumerable()
{
foreach (var name in names)
yield return name;
}
public IEnumerable<string> GetList()
{
var list = new List<string>();
foreach (var name in names)
list.Add(name);
return list;
}
var yieldNames = GetNamesEnumerable();
var listNames = GetList();
// now we'll change the source data by renaming "Jim" to "Jimbo"
names[1] = "Jimbo";
if (yieldNames.Contains("Jimbo")
Console.WriteLine("Found Jimbo!");
// Because this enumeration was evaluated completely before we changed "Jim"
// to "Jimbo" it isn't going to be found
if (listNames.Contains("Jimbo"))
// this can't be true
else
Console.WriteLine("Couldn't find Jimbo, because he wasn't there when I was evaluated.");
The yield keyword is a convenient way to write an IEnumerator. For example:
public static IEnumerator<int> Range(int from, int to)
{
for (int i = from; i < to; i++)
{
yield return i;
}
}
is transformed by the C# compiler to something similiar to:
public static IEnumerator<int> Range(int from, int to)
{
return new RangeEnumerator(from, to);
}
class RangeEnumerator : IEnumerator<int>
{
private int from, to, current;
public RangeEnumerator(int from, int to)
{
this.from = from;
this.to = to;
this.current = from;
}
public bool MoveNext()
{
this.current++;
return this.current < this.to;
}
public int Current
{
get
{
return this.current;
}
}
}
Take a look at the MSDN documentation and the example. It is essentially an easy way to create an iterator in C#.
public class List
{
//using System.Collections;
public static IEnumerable Power(int number, int exponent)
{
int counter = 0;
int result = 1;
while (counter++ < exponent)
{
result = result * number;
yield return result;
}
}
static void Main()
{
// Display powers of 2 up to the exponent 8:
foreach (int i in Power(2, 8))
{
Console.Write("{0} ", i);
}
}
}
Eric White's series on functional programming it well worth the read in it's entirety, but the entry on Yield is as clear an explanation as I've seen.
yield is not directly related to LINQ, but rather to iterator blocks. The linked MSDN article gives great detail on this language feature. See especially the Using Iterators section. For deep details of iterator blocks, see Eric Lippert's recent blog posts on the feature. For the general concept, see the Wikipedia article on iterators.
I came up with this to overcome a .NET shortcoming having to manually deep copy List.
I use this:
static public IEnumerable<SpotPlacement> CloneList(List<SpotPlacement> spotPlacements)
{
foreach (SpotPlacement sp in spotPlacements)
{
yield return (SpotPlacement)sp.Clone();
}
}
And at another place:
public object Clone()
{
OrderItem newOrderItem = new OrderItem();
...
newOrderItem._exactPlacements.AddRange(SpotPlacement.CloneList(_exactPlacements));
...
return newOrderItem;
}
I tried to come up with oneliner that does this, but it's not possible, due to yield not working inside anonymous method blocks.
EDIT:
Better still, use a generic List cloner:
class Utility<T> where T : ICloneable
{
static public IEnumerable<T> CloneList(List<T> tl)
{
foreach (T t in tl)
{
yield return (T)t.Clone();
}
}
}
Let me add to all of this. Yield is not a keyword.
It will only work if you use "yield return" other than that it will work like a normal variable.
It's uses to return iterator from a function. You can search further on that.
I recommend searching for "Returning Array vs Iterator"

Some help understanding "yield"

In my everlasting quest to suck less I'm trying to understand the "yield" statement, but I keep encountering the same error.
The body of [someMethod] cannot be an iterator block because
'System.Collections.Generic.List< AClass>' is not an iterator interface type.
This is the code where I got stuck:
foreach (XElement header in headersXml.Root.Elements()){
yield return (ParseHeader(header));
}
What am I doing wrong? Can't I use yield in an iterator? Then what's the point?
In this example it said that List<ProductMixHeader> is not an iterator interface type.
ProductMixHeader is a custom class, but I imagine List is an iterator interface type, no?
--Edit--
Thanks for all the quick answers.
I know this question isn't all that new and the same resources keep popping up.
It turned out I was thinking I could return List<AClass> as a return type, but since List<T> isn't lazy, it cannot. Changing my return type to IEnumerable<T> solved the problem :D
A somewhat related question (not worth opening a new thread): is it worth giving IEnumerable<T> as a return type if I'm sure that 99% of the cases I'm going to go .ToList() anyway? What will the performance implications be?
A method using yield return must be declared as returning one of the following two interfaces:
IEnumerable<SomethingAppropriate>
IEnumerator<SomethingApropriate>
(thanks Jon and Marc for pointing out IEnumerator)
Example:
public IEnumerable<AClass> YourMethod()
{
foreach (XElement header in headersXml.Root.Elements())
{
yield return (ParseHeader(header));
}
}
yield is a lazy producer of data, only producing another item after the first has been retrieved, whereas returning a list will return everything in one go.
So there is a difference, and you need to declare the method correctly.
For more information, read Jon's answer here, which contains some very useful links.
It's a tricky topic. In a nutshell, it's an easy way of implementing IEnumerable and its friends. The compiler builds you a state machine, transforming parameters and local variables into instance variables in a new class. Complicated stuff.
I have a few resources on this:
Chapter 6 of C# in Depth (free download from that page)
Iterators, iterator blocks and data pipelines (article)
Iterator block implementation details (article)
"yield" creates an iterator block - a compiler generated class that can implement either IEnumerable[<T>] or IEnumerator[<T>]. Jon Skeet has a very good (and free) discussion of this in chapter 6 of C# in Depth.
But basically - to use "yield" your method must return an IEnumerable[<T>] or IEnumerator[<T>]. In this case:
public IEnumerable<AClass> SomeMethod() {
// ...
foreach (XElement header in headersXml.Root.Elements()){
yield return (ParseHeader(header));
}
}
List implements Ienumerable.
Here's an example that might shed some light on what you are trying to learn. I wrote this about 6 months
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace YieldReturnTest
{
public class PrimeFinder
{
private Boolean isPrime(int integer)
{
if (0 == integer)
return false;
if (3 > integer)
return true;
for (int i = 2; i < integer; i++)
{
if (0 == integer % i)
return false;
}
return true;
}
public IEnumerable<int> FindPrimes()
{
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
yield return i;
}
}
}
}
class Program
{
static void Main(string[] args)
{
PrimeFinder primes = new PrimeFinder();
foreach (int i in primes.FindPrimes())
{
Console.WriteLine(i);
Console.ReadLine();
}
Console.ReadLine();
Console.ReadLine();
}
}
}
I highly recommend using Reflector to have a look at what yield actually does for you. You'll be able to see the full code of the class that the compiler generates for you when using yield, and I've found that people understand the concept much more quickly when they can see the low-level result (well, mid-level I guess).
To understand yield, you need to understand when to use IEnumerator and IEnumerable (because you have to use either of them). The following examples help you to understand the difference.
First, take a look at the following class, it implements two methods - one returning IEnumerator<int>, one returning IEnumerable<int>. I'll show you that there is a big difference in usage, although the code of the 2 methods is looking similar:
// 2 iterators, one as IEnumerator, one as IEnumerable
public class Iterator
{
public static IEnumerator<int> IterateOne(Func<int, bool> condition)
{
for(var i=1; condition(i); i++) { yield return i; }
}
public static IEnumerable<int> IterateAll(Func<int, bool> condition)
{
for(var i=1; condition(i); i++) { yield return i; }
}
}
Now, if you're using IterateOne you can do the following:
// 1. Using IEnumerator allows to get item by item
var i=Iterator.IterateOne(x => true); // iterate endless
// 1.a) get item by item
i.MoveNext(); Console.WriteLine(i.Current);
i.MoveNext(); Console.WriteLine(i.Current);
// 1.b) loop until 100
int j; while (i.MoveNext() && (j=i.Current)<=100) { Console.WriteLine(j); }
1.a) prints:
1
2
1.b) prints:
3
4
...
100
because it continues counting right after the 1.a) statements have been executed.
You can see that you can advance item by item using MoveNext().
In contrast, IterateAll allows you to use foreach and also LINQ statements for bigger comfort:
// 2. Using IEnumerable makes looping and LINQ easier
var k=Iterator.IterateAll(x => x<100); // limit iterator to 100
// 2.a) Use a foreach loop
foreach(var x in k){ Console.WriteLine(x); } // loop
// 2.b) LINQ: take 101..200 of endless iteration
var lst=Iterator.IterateAll(x=>true).Skip(100).Take(100).ToList(); // LINQ: take items
foreach(var x in lst){ Console.WriteLine(x); } // output list
2.a) prints:
1
2
...
99
2.b) prints:
101
102
...
200
Note: Since IEnumerator<T> and IEnumerable<T> are Generics, they can be used with any type. However, for simplicity I have used int in my examples for type T.
This means, you can use one of the return types IEnumerator<ProductMixHeader> or IEnumerable<ProductMixHeader> (the custom class you have mentioned in your question).
The type List<ProductMixHeader> does not implement any of these interfaces, which is the reason why you can't use it that way. But Example 2.b) is showing how you can create a list from it.
If you're creating a list by appending .ToList() then the implication is, that it will create a list of all elements in memory, while an IEnumerable allows lazy creation of its elements - in terms of performance, it means that elements are enumerated just in time - as late as possible, but as soon as you're using .ToList(), then all elements are created in memory. LINQ tries to optimize performance this way behind the scenes.
DotNetFiddle of all examples
#Ian P´s answer helped me a lot to understand yield and why it is used. One (major) use case for yield is in "foreach" loops after the "in" keyword not to return a fully completed list. Instead of returning a complete list at once, in each "foreach" loop only one item (the next item) is returned. So you will gain performance with yield in such cases.
I have rewritten #Ian P´s code for my better understanding to the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace YieldReturnTest
{
public class PrimeFinder
{
private Boolean isPrime(int integer)
{
if (0 == integer)
return false;
if (3 > integer)
return true;
for (int i = 2; i < integer; i++)
{
if (0 == integer % i)
return false;
}
return true;
}
public IEnumerable<int> FindPrimesWithYield()
{
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
yield return i;
}
}
}
public IEnumerable<int> FindPrimesWithoutYield()
{
var primes = new List<int>();
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
primes.Add(i);
}
}
return primes;
}
}
class Program
{
static void Main(string[] args)
{
PrimeFinder primes = new PrimeFinder();
Console.WriteLine("Finding primes until 7 with yield...very fast...");
foreach (int i in primes.FindPrimesWithYield()) // FindPrimesWithYield DOES NOT iterate over all integers at once, it returns item by item
{
if (i > 7)
{
break;
}
Console.WriteLine(i);
//Console.ReadLine();
}
Console.WriteLine("Finding primes until 7 without yield...be patient it will take lonkg time...");
foreach (int i in primes.FindPrimesWithoutYield()) // FindPrimesWithoutYield DOES iterate over all integers at once, it returns the complete list of primes at once
{
if (i > 7)
{
break;
}
Console.WriteLine(i);
//Console.ReadLine();
}
Console.ReadLine();
Console.ReadLine();
}
}
}
What does the method you're using this in look like? I don't think this can be used in just a loop by itself.
For example...
public IEnumerable<string> GetValues() {
foreach(string value in someArray) {
if (value.StartsWith("A")) { yield return value; }
}
}

Categories