Iterator issue on yield IEnumerable

Iterator issue on yield IEnumerable - c#

I wrote a program designed to create a randomish list of numbers from a given starting point. It was a quick a dirty thing but I found an interesting effect when playing with it that I don't quite understand.
void Main()
{
List<int> foo = new List<int>(){1,2,3};
IEnumerable<int> bar = GetNumbers(foo);
for (int i = 1; i < 3; i++)
{
foo = new List<int>(){1,2,3};
var wibble = GetNumbers(foo);
bar = bar.Concat(wibble);
}
Iterate(bar);
Iterate(bar);
}
public void Iterate(IEnumerable<int> numbers)
{
Console.WriteLine("iterating");
foreach(int number in numbers)
{
Console.WriteLine(number);
}
}
public IEnumerable<int> GetNumbers(List<int> input)
{
//This function originally did more but this is a cutdown version for testing.
while (input.Count>0)
{
int returnvalue = input[0];
input.Remove(input[0]);
yield return returnvalue;
}
}
The output of runing this is:
iterating
1
2
3
1
2
3
1
2
3
iterating
That is to say the second time I iterate through bar immediately after it is empty.
I assume this is something to do with the fact that the first time I iterate that it empties the lists that are being used to generate the list and subsequently it is using these same lists that are now empty to iterate.
My confusion is on why this is happening? Why do my IEnumerables not start from their default state each time I enumerate over them? Can somebody explain what exactly I'm doing here?
And to be clear I know that I can solve this problem by adding a .ToList() to my call to GetNumbers() which forces immediate evaluation and storage of the results.

Your iterator does start from its initial state. However, it modifies the list it's reading from, and once the list is cleared, your iterator doesn't have anything left to do. Basically, consider
var list = new List<int> { 1, 2, 3 };
var enumerable = list.Where(i => i != 2);
foreach (var item in enumerable)
Console.WriteLine(item);
list.Clear();
foreach (var item in enumerable)
Console.WriteLine(item);
enumerable doesn't get changed by list.Clear();, but the results it gives do.

Your observation can be reproduced with this shorter version of the main method:
void Main()
{
List<int> foo = new List<int>(){1,2,3};
IEnumerable<int> bar = GetNumbers(foo);
Console.WriteLine(foo.Count); // prints 3
Iterate(bar);
Console.WriteLine(foo.Count); // prints 0
Iterate(bar);
}
What happens is the following:
When you call GetNumbers it isn't really being executed. It will only be executed when you iterate over the result. You can verify this by putting Console.WriteLine(foo.Count); between the call to GetNumbers and Iterate.
On the first call to Iterate, GetNumbers is executed and empties foo.
On the second call to Iterate, GetNumbers is executed again, but now foo is empty, so there is nothing left to return.

Well, the lazy evaluation is what hit you. You see, when you create a yield return-style method, it's not executed immediately upon call. It'll be however executed as soon as you iterate over the sequence.
So, this means that the list won't be cleared during GetNumbers, but only during Iterate. In fact, the whole body of the function GetNumbers will be executed only during Iterate.
You problem is that you made your IEnumersbles depend not only on inner state, but on outer state as well. That outer state is the content of foo lists.
So, the all the lists are filled until you Iterate the first time. (The IEnumerable created by GetNumbers holds a reference to them, so the fact that you overwrite foo doesn't matter.) All the three are emptied during the first Iterate. Next, the next iteration starts with the same inner state, but changed outer state, giving different result.
I'd like to notice, that mutation and depending on outer state is generally frowned upon in functional programming style. The LINQ is actually a step toward functional programming, so it's a good idea to follow the FP's rules. So you could do better with just not removing the items from input in GetNumbers.

Related

Removerange method after the use of take method from the list<T> has unexpected behavior

Does anyone knows why the length after leaving the while loop body in the below code is zero? Please do not provide me chunking algorithm I am already aware of several such algorithms and I know how to solve this problem. My question is only about the wired behavior of RemoverRange or perhaps Take.
static void Main(string[] args)
{
var list = new List<int>();
for (int i = 0; i < 1000; i++)
{
list.Add(i);
}
var chunks = new List<IEnumerable<int>>();
while (list.Count > 0)
{
chunks.Add(list.Take(10));
list.RemoveRange(0, 10);
}
int length = chunks.ToList()[0].Count();
}

At the following line:
var chunks = new List<IEnumerable<int>>();
you create a list, whose Count is 0, since there aren't any items in the list. Then at the foreach statement at each step you add list.Take(10) and after this you Remove the first 10 items from list. The important this here is to realize that list.Take(10) is lazy evaluated (more often you will hear the term deferred execution). The time that this would first be evaluated is at the following line:
int length = chunks.ToList()[0].Count();
At this line list.Take(10) would be evaluated and since you have removed all the items form the list, there aren't any elements in list. For this reason, list.Take(10) return an empty sequence and consequently the chunks.ToList()[0] would be an empty list.
Update deferred execution explanation:
Let that we have the following list:
var list = new List<int> {1, 2, 3, 4, 5 };
var firstThree = list.Take(3);
The variable firstThree holds a reference to an enumerator - specifically an IEnumerable<int>. It does not hold an array or a list of 1,2,3. The first time you will use this iterator you will start to "fetch" data from the list.
For instance you could call the ToList or ToArray methods:
var firstThreeList = firstThree.ToList();
var firstThreeArray = firstThree.ToArray();
Both the above calls would put the iterator in action - in general terms would force the execution of your query in LINQ -. At this very moment, you will traverse the list and you will fetch the first three items.
That being said, it is clear that if in the meanwhile you have modified the list by removing all the numbers from iist, there wouldn't be any elements in the list and you will get nothing.
As a test I would suggest you run the above code once and then run it again but before ToList and ToArray calls to make this call:
list.RemoveAll(x => true);
You will notice now that both firstThreeArray and firstThreeList are empty.

The line
chunks.Add(list.Take(10));
does not actually take 10 items from list, it only tells to take 10 when chunks[i] is first referenced (through e.g. Count()). Since you are altering the list, the reference points to an empty list. You can force the evaluation of the list using
chunks.Add(list.Take(10).ToList());

The problem you're experiencing is that LINQ is kind of like a view. It only actually iterates through the collection when you call a method that requires it to produce a specific value (First(), Last(), Count(), etc.). So the list is only evaluated at the point where you call one of these methods.
chunks.Add(list.Take(10));
This code effectively says "take a reference to list, and when somebody iterates you, only go as far as the first 10 items". To resolve this, you can convert that small section to a list (evaluate those 10 items, and create a new list from them):
chunks.Add(list.Take(10).ToList());
Consider this code:
List<string> names = new List<string>() { "a", "b", "c" };
IEnumerable<string> skip2 = names.Skip(2);
Console.WriteLine(string.Join(", ", skip2)); // "c"
names.Add("d");
names.Add("e");
Console.WriteLine(string.Join(", ", skip2)); // "c, d, e"
Because you use the iterator (IEnumerable<string> skip2) each time you call string.Join(", ", skip2) it will iterate through the list each time, even if the list has changed.
As such, you will get "c" on the first run, and "c, d, e" on the second run.
In fact, this would be perfectly valid (although harder to read):
List<int> list = new List<int>();
IEnumerable<int> values = list.DefaultIfEmpty(0);
list.Add(5);
list.Add(10);
list.Add(15);
Console.WriteLine(values.Average()); // 10

There is no weird behaviour at all. That is exactly the expected behaviour for IEnumerable. What you should keep in mind is that IEnumerable is lazily evaluated, which means it is evaluated when is enumerated, i.e when you actually access the said IEnumerable. What you are doing is basically→
①Get the reference to list object
②Prepare to take the first 10 elements of the said list object, do not yet evaluate!
③Add the not yet evaluated object, in this case→(LINQ.Take(10)) into chunks list.
④Remove the first 10 elements of list
⑤Rinse and Repeat until there are no more items in list
⑥Create a list, which is all made up not yet evaluated items of list.Take(10).
⑦You take the first element of the said chunks, which is not yet evaluate but is a reference to the first 10 elements of list, which is empty!!!
⑧You call Count on IEnumerable instance, which finally evaluates the first ten elements of an enmpy list

How to prove the method which returns IEnumerable has been called twice?

In Visual Studio, ReSharper warns: "Possible multiple enumeration of IEnumerable" for the following code:
static void Main(string[] args)
{
IEnumerable<string> items = Test2();
foreach (var item in items)
{
Console.WriteLine(item);
}
var newitems = new StringBuilder();
foreach (var item in items)
{
newitems.Append(item);
}
}
private static IEnumerable<string> Test2()
{
string[] array1 = { "1", "2", "3" };
return array1;
}
I expect that the Test2 method will be called twice, but it's called once.
What am I missing?

It's only called once because Test2() actually returns string [] which is also an IEnumerable<string>.
This string [] array remains referenced by items so each time you use items you just re-use the array.
The case you're expecting is an implementation of Test2() with an iterator block :
private static IEnumerable<string> Test2()
{
string[] array1 = { "1", "2", "3" };
foreach (var str in array1)
{
yield return str;
}
}

Take a look at this example:
void Main()
{
IEnumerable<int> items = Test2();
foreach (var item in items)
{
Console.WriteLine(item);
}
var newitems = new StringBuilder();
foreach (var item in items)
{
newitems.Append(item);
}
}
IEnumerable<int> Test2()
{
Console.WriteLine("Test2 called");
return GetEnum();
}
IEnumerable<int> GetEnum()
{
for(var i = 0; i < 5; i ++)
{
Console.WriteLine("Doing work...");
Thread.Sleep(50); //Download some information from a website, or from a database
yield return i;
}
}
Imagine that return GetEnum(); was return new int[] { 1, 2, 3 }
Now, with arrays, iterating them multiple times isn't necessarily a bad thing. In your case, you can do the work in one loop, but that's not the reason resharper warns you. It warns you because of the possibility that Test2() returns a lazy enumerable that does work every time it's iterated.
If you run the above code, you'll get this output:
Test2 called
Doing work...
0
Doing work...
1
Doing work...
2
Doing work...
3
Doing work...
4
Doing work...
Doing work...
Doing work...
Doing work...
Doing work...
Note that Test2 itself is only called once, but the enumerable is iterated twice (and the work is done twice!).
You can avoid this by writing:
var items = Test2().ToList();
Which will immediately evaluate the enumerable and put it into a list. In this case, the work is only done once.

As many pointed out, the purpose of this warning is to point out that an expensive operation may be happening more than once. This happens because ReSharper sees that your method returns a IEnumerable which could lead to lazy evaluation, if you where using yield returns or most LINQ methods.
ReSharper stops warning about multiple evaluation when it can know for sure that the thing you are iterating over is a collection. You can provide that information to ReSharper in 2 ways.
Change the return type of Test2 to IList<string>
Before the first
foreach add System.Diagnostics.Debug.Assert(items is
IList<string>);
If you use ToList() over the returned IEnumerable<string> ReSharper will also know that you are iterating over a collection, but you would also be creating an unnecessary temporary list (you already had an array), paying the cost of time and memory to build that new list.

IEnumerable<T> is an interface that has an enumerator that will be called every time you want to access your collection of data (the foreach loops). Resharper warns you that if your data is not ordered, and you call this enumerator on the dataset multiple times, then the runtime will probably need to go through your collection multiple times which can put load and slow down execution time.
In order to avoid that you can cast your dataset to an ordered collection first: e.g. call .ToArray() or .ToList() on your items variable.

Why does this method result in an infinite loop?

One of my coworkers came to me with a question about this method that results in an infinite loop. The actual code is a bit too involved to post here, but essentially the problem boils down to this:
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
items = items.Select(item => items.First(i => i == item));
return items;
}
This should (you would think) just be a very inefficient way to create a copy of a list. I called it with:
var foo = GoNuts(new[]{1,2,3,4,5,6});
The result is an infinite loop. Strange.
I think that modifying the parameter is, stylistically a bad thing, so I changed the code slightly:
var foo = items.Select(item => items.First(i => i == item));
return foo;
That worked. That is, the program completed; no exception.
More experiments showed that this works, too:
items = items.Select(item => items.First(i => i == item)).ToList();
return items;
As does a simple
return items.Select(item => .....);
Curious.
It's clear that the problem has to do with reassigning the parameter, but only if evaluation is deferred beyond that statement. If I add the ToList() it works.
I have a general, vague, idea of what's going wrong. It looks like the Select is iterating over its own output. That's a little bit strange in itself, because typically an IEnumerable will throw if the collection it's iterating changes.
What I don't understand, because I'm not intimately familiar with the internals of how this stuff works, is why re-assigning the parameter causes this infinite loop.
Is there somebody with more knowledge of the internals who would be willing to explain why the infinite loop occurs here?

The key to answering this is deferred execution. When you do this
items = items.Select(item => items.First(i => i == item));
you do not iterate the items array passed into the method. Instead, you assign it a new IEnumerable<int>, which references itself back, and starts iterating only when the caller starts enumerating the results.
That is why all your other fixes have dealt with the problem: all you needed to do is to stop feeding IEnumerable<int> back to itself:
Using var foo breaks self-reference by using a different variable,
Using return items.Select... breaks self-reference by not using intermediate variables at all,
Using ToList() breaks self-reference by avoiding deferred execution: by the time items is re-assigned, old items has been iterated over, so you end up with a plain in-memory List<int>.
But if it's feeding on itself, how does it get anything at all?
That's right, it does not get anything! The moment you try iterating items and ask it for the first item, the deferred sequence asks the sequence fed to it for the first item to process, which means that the sequence is asking itself for the first item to process. At this point, it's turtles all the way down, because in order to return the first item to process the sequence must first get the first item to process from itself.

It looks like the Select is iterating over its own output
You are correct. You are returning a query that iterates over itself.
The key is that you reference items within the lambda. The items reference is not resolved ("closed over") until the query iterates, at which point items now references the query instead of the source collection. That's where the self-reference occurs.
Picture a deck of cards with a sign in front of it labelled items. Now picture a man standing beside the deck of cards whose assignment is to iterate the collection called items. But then you move the sign from the deck to the man. When you ask the man for the first "item" - he looks for the collection marked "items" - which is now him! So he asks himself for the first item, which is where the circular reference occurs.
When you assign the result to a new variable, you then have a query that iterates over a different collection, and so does not result in an infinite loop.
When you call ToList, you hydrate the query to a new collection and also do not get an infinite loop.
Other things that would break the circular reference:
Hydrating items within the lambda by calling ToList
Assigning items to another variable and referencing that within the lambda.

After studying the two answers given and poking around a bit, I came up with a little program that better illustrates the problem.
private int GetFirst(IEnumerable<int> items, int foo)
{
Console.WriteLine("GetFirst {0}", foo);
var rslt = items.First(i => i == foo);
Console.WriteLine("GetFirst returns {0}", rslt);
return rslt;
}
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
items = items.Select(item =>
{
Console.WriteLine("Select item = {0}", item);
return GetFirst(items, item);
});
return items;
}
If you call that with:
var newList = GoNuts(new[]{1, 2, 3, 4, 5, 6});
You'll get this output repeatedly until you finally get StackOverflowException.
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
...
What this shows is exactly what dasblinkenlight made clear in his updated answer: the query goes into an infinite loop trying to get the first item.
Let's write GoNuts a slightly different way:
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
var originalItems = items;
items = items.Select(item =>
{
Console.WriteLine("Select item = {0}", item);
return GetFirst(originalItems, item);
});
return items;
}
If you run that, it succeeds. Why? Because in this case it's clear that the call to GetFirst is passing a reference to the original items that were passed to the method. In the first case, GetFirst is passing a reference to the new items collection, which hasn't yet been realized. In turn, GetFirst says, "Hey, I need to enumerate this collection." And thus begins the first recursive call that eventually leads to StackOverflowException.
Interestingly, I was right and wrong when I said that it was consuming its own output. The Select is consuming the original input, as I would expect. The First is trying to consume the output.
Lots of lessons to be learned here. To me, the most important is "don't modify the value of input parameters."
Thanks to dasblinkenlight, D Stanley, and Lucas Trzesniewski for their help.

Check if yield return contains items

I'm trying to optimize a routine that looks sort of like this (simplified):
public async Task<IEnumerable<Bar>> GetBars(ObjectId id){
var output = new Collection<Bar>();
var page = 1;
var hasMore = true;
while(hasMore) {
var foos = await client.GetFoos(id, page);
foreach(var foo : foos) {
if(!Proceed(foo)) {
hasMore = false;
break;
}
output.Add(new Bar().Map(foo)
}
page++;
return output;
}
The method that calls GetBars() looks something like this
public async Task<Baz> GetBaz(ObjectId id){
var bars = await qux.GetBars();
if(bars.Any() {
var bazBaseData = qux.GetBazBaseData(id);
var bazAdditionalData = qux.GetBazAdditionalData(id);
return new Baz().Map(await bazBaseData, await bazAdditionalData, bars);
}
}
GetBaz() returns between 0 and a lot of items. Since we run through a few million id's we initially added the if(bars.Any()) statement as an initial attempt of speeding up the application.
Since the GetBars() is awaited it blocks the thread until it has collected all its data (which can take some time). My idea was to use yield return and then replace the if(bars.Any()) with a check that tests if we get at least one element, so we can fire off the two other async methods in the meantime (which also takes some time to execute).
My question is then how to do this. I know System.Linq.Count()and System.Linq.Any() defeats the whole idea of yield return and if I check the first item in the enumerable it is removed from the enumerable.
Is there another/better option besides adding for instance an out parameter to GetBars()?
TL;DR: How do I check whether an enumerable from a yield return contains any objects without starting to iterate it?

For your actual question "How do I check whether an enumerable from a yield return contains any objects without starting to iterate it?" well, you don't.
It's that simple, you can't period since the only thing you can do with an IEnumerable is well, to enumerate it. Calling Any() isn't an issue however since that "does" only enumerate the first element (and not the whole list) but it's not possible to enumerate nothing as a lot of ienumerables don't exist in any form except that of a pipeline (there could be no backing collection, it's not possible to check if something that doesn't exist yet has any elements, by design this makes no sense)
Edit : also , i don't see any yield in your code, are you mixing up awaitable and yield concepts (totally unrelated) ?

What is the yield keyword used for in C#?

In the How Can I Expose Only a Fragment of IList<> question one of the answers had the following code snippet:
IEnumerable<object> FilteredList()
{
foreach(object item in FullList)
{
if(IsItemInPartialList(item))
yield return item;
}
}
What does the yield keyword do there? I've seen it referenced in a couple places, and one other question, but I haven't quite figured out what it actually does. I'm used to thinking of yield in the sense of one thread yielding to another, but that doesn't seem relevant here.

The yield contextual keyword actually does quite a lot here.
The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.
The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:
public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.
Here is a real-life example:
public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
using (var connection = CreateConnection())
{
using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
{
command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return make(reader);
}
}
}
}
}

Iteration. It creates a state machine "under the covers" that remembers where you were on each additional cycle of the function and picks up from there.

Yield has two great uses,
It helps to provide custom iteration without creating temp collections.
It helps to do stateful iteration.
In order to explain above two points more demonstratively, I have created a simple video you can watch it here

Recently Raymond Chen also ran an interesting series of articles on the yield keyword.
The implementation of iterators in C# and its consequences (part 1)
The implementation of iterators in C# and its consequences (part 2)
The implementation of iterators in C# and its consequences (part 3)
The implementation of iterators in C# and its consequences (part 4)
While it's nominally used for easily implementing an iterator pattern, but can be generalized into a state machine. No point in quoting Raymond, the last part also links to other uses (but the example in Entin's blog is esp good, showing how to write async safe code).

At first sight, yield return is a .NET sugar to return an IEnumerable.
Without yield, all the items of the collection are created at once:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
return new List<SomeData> {
new SomeData(),
new SomeData(),
new SomeData()
};
}
}
Same code using yield, it returns item by item:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
yield return new SomeData();
yield return new SomeData();
yield return new SomeData();
}
}
The advantage of using yield is that if the function consuming your data simply needs the first item of the collection, the rest of the items won't be created.
The yield operator allows the creation of items as it is demanded. That's a good reason to use it.

A list or array implementation loads all of the items immediately whereas the yield implementation provides a deferred execution solution.
In practice, it is often desirable to perform the minimum amount of work as needed in order to reduce the resource consumption of an application.
For example, we may have an application that process millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:
Scalability, reliability and predictability are likely to improve since the number of records does not significantly affect the application’s resource requirements.
Performance and responsiveness are likely to improve since processing can start immediately instead of waiting for the entire collection to be loaded first.
Recoverability and utilisation are likely to improve since the application can be stopped, started, interrupted or fail. Only the items in progress will be lost compared to pre-fetching all of the data where only using a portion of the results was actually used.
Continuous processing is possible in environments where constant workload streams are added.
Here is a comparison between build a collection first such as a list compared to using yield.
List Example
public class ContactListStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
var contacts = new List<ContactModel>();
Console.WriteLine("ContactListStore: Creating contact 1");
contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" });
Console.WriteLine("ContactListStore: Creating contact 2");
contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" });
Console.WriteLine("ContactListStore: Creating contact 3");
contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" });
return contacts;
}
}
static void Main(string[] args)
{
var store = new ContactListStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
ContactListStore: Creating contact 1
ContactListStore: Creating contact 2
ContactListStore: Creating contact 3
Ready to iterate through the collection.
Note: The entire collection was loaded into memory without even asking for a single item in the list
Yield Example
public class ContactYieldStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
Console.WriteLine("ContactYieldStore: Creating contact 1");
yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" };
Console.WriteLine("ContactYieldStore: Creating contact 2");
yield return new ContactModel() { FirstName = "Jim", LastName = "Green" };
Console.WriteLine("ContactYieldStore: Creating contact 3");
yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" };
}
}
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
Ready to iterate through the collection.
Note: The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required.
Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
Console.WriteLine("Hello {0}", contacts.First().FirstName);
Console.ReadLine();
}
Console Output
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
Hello Bob
Nice! Only the first contact was constructed when the client "pulled" the item out of the collection.

yield return is used with enumerators. On each call of yield statement, control is returned to the caller but it ensures that the callee's state is maintained. Due to this, when the caller enumerates the next element, it continues execution in the callee method from statement immediately after the yield statement.
Let us try to understand this with an example. In this example, corresponding to each line I have mentioned the order in which execution flows.
static void Main(string[] args)
{
foreach (int fib in Fibs(6))//1, 5
{
Console.WriteLine(fib + " ");//4, 10
}
}
static IEnumerable<int> Fibs(int fibCount)
{
for (int i = 0, prevFib = 0, currFib = 1; i < fibCount; i++)//2
{
yield return prevFib;//3, 9
int newFib = prevFib + currFib;//6
prevFib = currFib;//7
currFib = newFib;//8
}
}
Also, the state is maintained for each enumeration. Suppose, I have another call to Fibs() method then the state will be reset for it.

Intuitively, the keyword returns a value from the function without leaving it, i.e. in your code example it returns the current item value and then resumes the loop. More formally, it is used by the compiler to generate code for an iterator. Iterators are functions that return IEnumerable objects. The MSDN has several articles about them.

If I understand this correctly, here's how I would phrase this from the perspective of the function implementing IEnumerable with yield.
Here's one.
Call again if you need another.
I'll remember what I already gave you.
I'll only know if I can give you another when you call again.

Here is a simple way to understand the concept:
The basic idea is, if you want a collection that you can use "foreach" on, but gathering the items into the collection is expensive for some reason (like querying them out of a database), AND you will often not need the entire collection, then you create a function that builds the collection one item at a time and yields it back to the consumer (who can then terminate the collection effort early).
Think of it this way: You go to the meat counter and want to buy a pound of sliced ham. The butcher takes a 10-pound ham to the back, puts it on the slicer machine, slices the whole thing, then brings the pile of slices back to you and measures out a pound of it. (OLD way).
With yield, the butcher brings the slicer machine to the counter, and starts slicing and "yielding" each slice onto the scale until it measures 1-pound, then wraps it for you and you're done. The Old Way may be better for the butcher (lets him organize his machinery the way he likes), but the New Way is clearly more efficient in most cases for the consumer.

The yield keyword allows you to create an IEnumerable<T> in the form on an iterator block. This iterator block supports deferred executing and if you are not familiar with the concept it may appear almost magical. However, at the end of the day it is just code that executes without any weird tricks.
An iterator block can be described as syntactic sugar where the compiler generates a state machine that keeps track of how far the enumeration of the enumerable has progressed. To enumerate an enumerable, you often use a foreach loop. However, a foreach loop is also syntactic sugar. So you are two abstractions removed from the real code which is why it initially might be hard to understand how it all works together.
Assume that you have a very simple iterator block:
IEnumerable<int> IteratorBlock()
{
Console.WriteLine("Begin");
yield return 1;
Console.WriteLine("After 1");
yield return 2;
Console.WriteLine("After 2");
yield return 42;
Console.WriteLine("End");
}
Real iterator blocks often have conditions and loops but when you check the conditions and unroll the loops they still end up as yield statements interleaved with other code.
To enumerate the iterator block a foreach loop is used:
foreach (var i in IteratorBlock())
Console.WriteLine(i);
Here is the output (no surprises here):
Begin
1
After 1
2
After 2
42
End
As stated above foreach is syntactic sugar:
IEnumerator<int> enumerator = null;
try
{
enumerator = IteratorBlock().GetEnumerator();
while (enumerator.MoveNext())
{
var i = enumerator.Current;
Console.WriteLine(i);
}
}
finally
{
enumerator?.Dispose();
}
In an attempt to untangle this I have crated a sequence diagram with the abstractions removed:
The state machine generated by the compiler also implements the enumerator but to make the diagram more clear I have shown them as separate instances. (When the state machine is enumerated from another thread you do actually get separate instances but that detail is not important here.)
Every time you call your iterator block a new instance of the state machine is created. However, none of your code in the iterator block is executed until enumerator.MoveNext() executes for the first time. This is how deferred executing works. Here is a (rather silly) example:
var evenNumbers = IteratorBlock().Where(i => i%2 == 0);
At this point the iterator has not executed. The Where clause creates a new IEnumerable<T> that wraps the IEnumerable<T> returned by IteratorBlock but this enumerable has yet to be enumerated. This happens when you execute a foreach loop:
foreach (var evenNumber in evenNumbers)
Console.WriteLine(eventNumber);
If you enumerate the enumerable twice then a new instance of the state machine is created each time and your iterator block will execute the same code twice.
Notice that LINQ methods like ToList(), ToArray(), First(), Count() etc. will use a foreach loop to enumerate the enumerable. For instance ToList() will enumerate all elements of the enumerable and store them in a list. You can now access the list to get all elements of the enumerable without the iterator block executing again. There is a trade-off between using CPU to produce the elements of the enumerable multiple times and memory to store the elements of the enumeration to access them multiple times when using methods like ToList().

One major point about Yield keyword is Lazy Execution. Now what I mean by Lazy Execution is to execute when needed. A better way to put it is by giving an example
Example: Not using Yield i.e. No Lazy Execution.
public static IEnumerable<int> CreateCollectionWithList()
{
var list = new List<int>();
list.Add(10);
list.Add(0);
list.Add(1);
list.Add(2);
list.Add(20);
return list;
}
Example: using Yield i.e. Lazy Execution.
public static IEnumerable<int> CreateCollectionWithYield()
{
yield return 10;
for (int i = 0; i < 3; i++)
{
yield return i;
}
yield return 20;
}
Now when I call both methods.
var listItems = CreateCollectionWithList();
var yieldedItems = CreateCollectionWithYield();
you will notice listItems will have a 5 items inside it (hover your mouse on listItems while debugging).
Whereas yieldItems will just have a reference to the method and not the items.
That means it has not executed the process of getting items inside the method. A very efficient way of getting data only when needed.
Actual implementation of yield can be seen in ORM like Entity Framework and NHibernate etc.

The C# yield keyword, to put it simply, allows many calls to a body of code, referred to as an iterator, that knows how to return before it's done and, when called again, continues where it left off - i.e. it helps an iterator become transparently stateful per each item in a sequence that the iterator returns in successive calls.
In JavaScript, the same concept is called Generators.

It is a very simple and easy way to create an enumerable for your object. The compiler creates a class that wraps your method and that implements, in this case, IEnumerable<object>. Without the yield keyword, you'd have to create an object that implements IEnumerable<object>.

It's producing enumerable sequence. What it does is actually creating local IEnumerable sequence and returning it as a method result

This link has a simple example
Even simpler examples are here
public static IEnumerable<int> testYieldb()
{
for(int i=0;i<3;i++) yield return 4;
}
Notice that yield return won't return from the method. You can even put a WriteLine after the yield return
The above produces an IEnumerable of 4 ints 4,4,4,4
Here with a WriteLine. Will add 4 to the list, print abc, then add 4 to the list, then complete the method and so really return from the method(once the method has completed, as would happen with a procedure without a return). But this would have a value, an IEnumerable list of ints, that it returns on completion.
public static IEnumerable<int> testYieldb()
{
yield return 4;
console.WriteLine("abc");
yield return 4;
}
Notice also that when you use yield, what you are returning is not of the same type as the function. It's of the type of an element within the IEnumerable list.
You use yield with the method's return type as IEnumerable. If the method's return type is int or List<int> and you use yield, then it won't compile. You can use IEnumerable method return type without yield but it seems maybe you can't use yield without IEnumerable method return type.
And to get it to execute you have to call it in a special way.
static void Main(string[] args)
{
testA();
Console.Write("try again. the above won't execute any of the function!\n");
foreach (var x in testA()) { }
Console.ReadLine();
}
// static List<int> testA()
static IEnumerable<int> testA()
{
Console.WriteLine("asdfa");
yield return 1;
Console.WriteLine("asdf");
}

Nowadays you can use the yield keyword for async streams.
C# 8.0 introduces async streams, which model a streaming source of data. Data streams often retrieve or generate elements asynchronously. Async streams rely on new interfaces introduced in .NET Standard 2.1. These interfaces are supported in .NET Core 3.0 and later. They provide a natural programming model for asynchronous streaming data sources.
Source: Microsoft docs
Example below
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
List<int> numbers = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await foreach(int number in YieldReturnNumbers(numbers))
{
Console.WriteLine(number);
}
}
public static async IAsyncEnumerable<int> YieldReturnNumbers(List<int> numbers)
{
foreach (int number in numbers)
{
await Task.Delay(1000);
yield return number;
}
}
}

Simple demo to understand yield
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp_demo_yield {
class Program
{
static void Main(string[] args)
{
var letters = new List<string>() { "a1", "b1", "c2", "d2" };
// Not yield
var test1 = GetNotYield(letters);
foreach (var t in test1)
{
Console.WriteLine(t);
}
// yield
var test2 = GetWithYield(letters).ToList();
foreach (var t in test2)
{
Console.WriteLine(t);
}
Console.ReadKey();
}
private static IList<string> GetNotYield(IList<string> list)
{
var temp = new List<string>();
foreach(var x in list)
{
if (x.Contains("2")) {
temp.Add(x);
}
}
return temp;
}
private static IEnumerable<string> GetWithYield(IList<string> list)
{
foreach (var x in list)
{
if (x.Contains("2"))
{
yield return x;
}
}
}
}
}

It's trying to bring in some Ruby Goodness :)
Concept: This is some sample Ruby Code that prints out each element of the array
rubyArray = [1,2,3,4,5,6,7,8,9,10]
rubyArray.each{|x|
puts x # do whatever with x
}
The Array's each method implementation yields control over to the caller (the 'puts x') with each element of the array neatly presented as x. The caller can then do whatever it needs to do with x.
However .Net doesn't go all the way here.. C# seems to have coupled yield with IEnumerable, in a way forcing you to write a foreach loop in the caller as seen in Mendelt's response. Little less elegant.
//calling code
foreach(int i in obCustomClass.Each())
{
Console.WriteLine(i.ToString());
}
// CustomClass implementation
private int[] data = {1,2,3,4,5,6,7,8,9,10};
public IEnumerable<int> Each()
{
for(int iLooper=0; iLooper<data.Length; ++iLooper)
yield return data[iLooper];
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Iterator issue on yield IEnumerable - c#

Related

Removerange method after the use of take method from the list<T> has unexpected behavior

How to prove the method which returns IEnumerable has been called twice?

Why does this method result in an infinite loop?

Check if yield return contains items

What is the yield keyword used for in C#?

Categories

Resources