Check if yield return contains items

Check if yield return contains items - c#

I'm trying to optimize a routine that looks sort of like this (simplified):
public async Task<IEnumerable<Bar>> GetBars(ObjectId id){
var output = new Collection<Bar>();
var page = 1;
var hasMore = true;
while(hasMore) {
var foos = await client.GetFoos(id, page);
foreach(var foo : foos) {
if(!Proceed(foo)) {
hasMore = false;
break;
}
output.Add(new Bar().Map(foo)
}
page++;
return output;
}
The method that calls GetBars() looks something like this
public async Task<Baz> GetBaz(ObjectId id){
var bars = await qux.GetBars();
if(bars.Any() {
var bazBaseData = qux.GetBazBaseData(id);
var bazAdditionalData = qux.GetBazAdditionalData(id);
return new Baz().Map(await bazBaseData, await bazAdditionalData, bars);
}
}
GetBaz() returns between 0 and a lot of items. Since we run through a few million id's we initially added the if(bars.Any()) statement as an initial attempt of speeding up the application.
Since the GetBars() is awaited it blocks the thread until it has collected all its data (which can take some time). My idea was to use yield return and then replace the if(bars.Any()) with a check that tests if we get at least one element, so we can fire off the two other async methods in the meantime (which also takes some time to execute).
My question is then how to do this. I know System.Linq.Count()and System.Linq.Any() defeats the whole idea of yield return and if I check the first item in the enumerable it is removed from the enumerable.
Is there another/better option besides adding for instance an out parameter to GetBars()?
TL;DR: How do I check whether an enumerable from a yield return contains any objects without starting to iterate it?

For your actual question "How do I check whether an enumerable from a yield return contains any objects without starting to iterate it?" well, you don't.
It's that simple, you can't period since the only thing you can do with an IEnumerable is well, to enumerate it. Calling Any() isn't an issue however since that "does" only enumerate the first element (and not the whole list) but it's not possible to enumerate nothing as a lot of ienumerables don't exist in any form except that of a pipeline (there could be no backing collection, it's not possible to check if something that doesn't exist yet has any elements, by design this makes no sense)
Edit : also , i don't see any yield in your code, are you mixing up awaitable and yield concepts (totally unrelated) ?

Related

C# yield return performance

How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
The two scenarios:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
public IEnumerable<T> GetList2()
{
List<T> outputList = new List<T>( collection.Count() );
foreach( var item in collection )
outputList.Add( item.Property );
return outputList;
}

yield return does not create an array that has to be resized, like what List does; instead, it creates an IEnumerable with a state machine.
For instance, let's take this method:
public static IEnumerable<int> Foo()
{
Console.WriteLine("Returning 1");
yield return 1;
Console.WriteLine("Returning 2");
yield return 2;
Console.WriteLine("Returning 3");
yield return 3;
}
Now let's call it and assign that enumerable to a variable:
var elems = Foo();
None of the code in Foo has executed yet. Nothing will be printed on the console. But if we iterate over it, like this:
foreach(var elem in elems)
{
Console.WriteLine( "Got " + elem );
}
On the first iteration of the foreach loop, the Foo method will be executed until the first yield return. Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1), and execute until the next yield return. Same for all subsequent elements.
At the end of the loop, the console will look like this:
Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3
This means you can write methods like this:
public static IEnumerable<int> GetAnswers()
{
while( true )
{
yield return 42;
}
}
You can call the GetAnswers method, and every time you request an element, it'll give you 42; the sequence never ends. You couldn't do this with a List, because lists have to have a finite size.

How much space is reserved to the underlying collection behind a method using yield return syntax?
There's no underlying collection.
There's an object, but it isn't a collection. Just how much space it will take up depends on what it needs to keep track of.
There's a chance it will reallocate
No.
And thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
It will almost certainly take up less memory than creating a list with a predefined capacity.
Let's try a manual example. Say we had the following code:
public static IEnumerable<int> CountToTen()
{
for(var i = 1; i != 11; ++i)
yield return i;
}
To foreach through this will iterate through the numbers 1 to 10 inclusive.
Now let's do this the way we would have to if yield did not exist. We'd do something like:
private class CountToTenEnumerator : IEnumerator<int>
{
private int _current;
public int Current
{
get
{
if(_current == 0)
throw new InvalidOperationException();
return _current;
}
}
object IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
if(_current == 10)
return false;
_current++;
return true;
}
public void Reset()
{
throw new NotSupportedException();
// We *could* just set _current back, but the object produced by
// yield won't do that, so we'll match that.
}
public void Dispose()
{
}
}
private class CountToTenEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new CountToTenEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<int> CountToTen()
{
return new CountToTenEnumerable();
}
Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield, but the basic principle is the same. As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. In practice we can expect yield to store a few more bytes than that, but not a lot.
Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. Since this covers over 99% of use cases yield actually does one allocation rather than two.
Now let's look at:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
While this would result in more memory used than just return collection, it won't result in a lot more; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that.
This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going.
Edit:
You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering.
Now, here we need to add a third possibility: Knowledge of the collection's size.
Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. That can indeed be a considerable saving.
If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo().
This would mean that your GetList2 would result in a faster ToList() than your GetList1.
However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway!
What it should have done here was just return new List<T>(collection); and be done with it.
If though we need to actually do something inside GetList1 or GetList2 (e.g. convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount.

Why would a C# method returning IEnumerable work when it returns a List but not when it yields?

I have a method that generates a set of options for a floating drop-down menu. The method is of type IEnumerable<FloatMenuOption>.
When I return the values as a List, it works fine. But when I yield them one by one, every item runs the useAct lambda for the last one yielded, even though they all have correct labels.
Can anyone explain why this might happen? Why would returning a List instead of yielding the items one by one matter?
public override IEnumerable<FloatMenuOption> GetFloatMenuChoicesFor(Pawn myPawn)
{
List<FloatMenuOption> options = new List<FloatMenuOption>();
foreach ( Communicable commTarget in GetCommTargets() )
{
var localCommTarget = commTarget;
System.Action useAct = () =>
{
Job openJob = new Job();
openJob.commTarget = localCommTarget;
myPawn.MindHumanoid.TakeOrderedJob(openJob);
};
options.Add( new FloatMenuOption(localCommTarget.GetLabel(), useAct) );
}
return options;
//Simply commenting out the above line and uncommenting the two below causes the error
// foreach (var opt in options)
// yield return opt;
}
In either case, the options come out with correct labels (from localCommTarget.GetLabel()). However, if yielded, they all do the useAct lambda configured for the last item in the list, while if returned as a list they each do their own useAct lambda.
Why?

Take a look at Captured variable in a loop in C#
I believe if you change your for loop to:
foreach (var opt in options)
{
var capturedOpt = opt;
yield return capturedOpt;
}
you'll be golden.

In your implementation, if your code yields the results instead of adding them to the list, only the last assignment to localCommTarget is used. The reason for this is the fact that the execution of the lambda expression is deferred to the actual access to the return. In other words, the order of execution is different for the different return strategies.

How do I make a streaming LINQ expression that delivers the filtered out items as well as the filtered items?

I am transforming an Excel spreadsheet into a list of "Elements" (this is a domain term). During this transformation, I need to skip the header rows and throw out malformed rows that cannot be transformed.
Now comes the fun part. I need to capture those malformed records so that I can report on them. I constructed a crazy LINQ statement (below). These are extension methods hiding the messy LINQ operations on the types from the OpenXml library.
var elements = sheet
.Rows() <-- BEGIN sheet data transform
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows() <-- END sheet data transform
.ToElements(strings) <-- BEGIN domain transform
.RemoveBadRecords(out discard)
.OrderByCompositeKey();
The interesting part starts at ToElements, where I transform the row lookup to my domain object list (details: it's called an ElementRow, which is later transformed into an Element). Bad records are created with just a key (the Excel row index) and are uniquely identifiable vs. a real element.
public static IEnumerable<ElementRow> ToElements(this IEnumerable<KeyValuePair<UInt32Value, Cell[]>> map)
{
return map.Select(pair =>
{
try
{
return ElementRow.FromCells(pair.Key, pair.Value);
}
catch (Exception)
{
return ElementRow.BadRecord(pair.Key);
}
});
}
Then, I want to remove those bad records (it's easier to collect all of them before filtering). That method is RemoveBadRecords, which started like this...
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements)
{
return elements.Where(el => el.FormatId != 0);
}
However, I need to report the discarded elements! And I don't want to muddy my transform extension method with reporting. So, I went to the out parameter (taking into account the difficulties of using an out param in an anonymous block)
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements, out List<ElementRow> discard)
{
var temp = new List<ElementRow>();
var filtered = elements.Where(el =>
{
if (el.FormatId == 0) temp.Add(el);
return el.FormatId != 0;
});
discard = temp;
return filtered;
}
And, lo! I thought I was hardcore and would have this working in one shot...
var discard = new List<ElementRow>();
var elements = data
/* snipped long LINQ statement */
.RemoveBadRecords(out discard)
/* snipped long LINQ statement */
discard.ForEach(el => failures.Add(el));
foreach(var el in elements)
{
/* do more work, maybe add more failures */
}
return new Result(elements, failures);
But, nothing was in my discard list at the time I looped through it! I stepped through the code and realized that I successfully created a fully-streaming LINQ statement.
The temp list was created
The Where filter was assigned (but not yet run)
And the discard list was assigned
Then the streaming thing was returned
When discard was iterated, it contained no elements, because the elements weren't iterated over yet.
Is there a way to fix this problem using the thing I constructed? Do I have to force an iteration of the data before or during the bad record filter? Is there another construction that I've missed?
Some Commentary
Jon mentioned that the assignment /was/ happening. I simply wasn't waiting for it. If I check the contents of discard after the iteration of elements, it is, in fact, full! So, I don't actually have an assignment problem. Unless I take Jon's advice on what's good/bad to have in a LINQ statement.

When the statement was actually iterated, the Where clause ran and temp filled up, but discard was never assigned again!
It doesn't need to be assigned again - the existing list which will have been assigned to discard in the calling code will be populated.
However, I'd strongly recommend against this approach. Using an out parameter here is really against the spirit of LINQ. (If you iterate over your results twice, you'll end up with a list which contains all the bad elements twice. Ick!)
I'd suggest materializing the query before removing the bad records - and then you can run separate queries:
var allElements = sheet
.Rows()
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows()
.ToElements(strings)
.ToList();
var goodElements = allElements.Where(el => el.FormatId != 0)
.OrderByCompositeKey();
var badElements = allElements.Where(el => el.FormatId == 0);
By materializing the query in a List<>, you only process each row once in terms of ToRowLookup, ToCellLookup etc. It does mean you need to have enough memory to keep all the elements at a time, of course. There are alternative approaches (such as taking an action on each bad element while filtering it) but they're still likely to end up being fairly fragile.
EDIT: Another option as mentioned by Servy is to use ToLookup, which will materialize and group in one go:
var lookup = sheet
.Rows()
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows()
.ToElements(strings)
.OrderByCompositeKey()
.ToLookup(el => el.FormatId == 0);
Then you can use:
foreach (var goodElement in lookup[false])
{
...
}
and
foreach (var badElement in lookup[true])
{
...
}
Note that this performs the ordering on all elements, good and bad. An alternative is to remove the ordering from the original query and use:
foreach (var goodElement in lookup[false].OrderByCompositeKey())
{
...
}
I'm not personally wild about grouping by true/false - it feels like a bit of an abuse of what's normally meant to be a key-based lookup - but it would certainly work.

ToList method in Linq

If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example
//using linq
list = Students.Where(s => s.Name == "ABC").ToList();
//traditional way
foreach (var student in Students)
{
if (student.Name == "ABC")
list.Add(student);
}
I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.
The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.
Do these things make any impact on performance?

Do these things make any impact on performance?
That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).
But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).
Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):
Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…
What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().

var list = Students.Where(s=>s.Name == "ABC");
This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.
List<Student> studentList = new List<Student>();
var list = Students.Where(s=>s.Name == "ABC");
foreach(Student s in list)
{
studentList.add(s);
}
this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.
And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.
First we have our list
public class TestList<T> : IEnumerable<T>
{
private TestEnumerator<T> _Enumerator;
public TestList()
{
_Enumerator = new TestEnumerator<T>();
}
public IEnumerator<T> GetEnumerator()
{
return _Enumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
internal void Add(T p)
{
_Enumerator.Add(p);
}
}
And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.
public class TestEnumerator : IEnumerator
{
public Item FirstItem = null;
public Item CurrentItem = null;
public TestEnumerator()
{
}
public T Current
{
get { return CurrentItem.Value; }
}
public void Dispose()
{
}
object System.Collections.IEnumerator.Current
{
get { throw new NotImplementedException(); }
}
public bool MoveNext()
{
Program.Counter++;
if (CurrentItem == null)
{
CurrentItem = FirstItem;
return true;
}
if (CurrentItem != null && CurrentItem.NextItem != null)
{
CurrentItem = CurrentItem.NextItem;
return true;
}
return false;
}
public void Reset()
{
CurrentItem = null;
}
internal void Add(T p)
{
if (FirstItem == null)
{
FirstItem = new Item<T>(p);
return;
}
Item<T> lastItem = FirstItem;
while (lastItem.NextItem != null)
{
lastItem = lastItem.NextItem;
}
lastItem.NextItem = new Item<T>(p);
}
}
And then we have a custom item that just wraps our value
public class Item<T>
{
public Item(T item)
{
Value = item;
}
public T Value;
public Item<T> NextItem;
}
To use the actual code we create a "list" with 3 entries.
public static int Counter = 0;
static void Main(string[] args)
{
TestList<int> list = new TestList<int>();
list.Add(1);
list.Add(2);
list.Add(3);
var v = list.Where(c => c == 2).ToList(); //will use movenext 4 times
var v = list.Where(c => true).ToList(); //will also use movenext 4 times
List<int> tmpList = new List<int>(); //And the loop in OP question
foreach(var i in list)
{
tmpList.Add(i);
} //Also 4 times.
}
And conclusion? How does it hit performance?
The MoveNext is called n+1 times in this case. Regardless of how many items we have.
And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list.
The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.
And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.

To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.
In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.

First of all, Why are you even asking me? Measure for yourself and see.
That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:
var list = Students.Where(s => s.Name == "ABC");
won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".
There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.
If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.

What is the yield keyword used for in C#?

In the How Can I Expose Only a Fragment of IList<> question one of the answers had the following code snippet:
IEnumerable<object> FilteredList()
{
foreach(object item in FullList)
{
if(IsItemInPartialList(item))
yield return item;
}
}
What does the yield keyword do there? I've seen it referenced in a couple places, and one other question, but I haven't quite figured out what it actually does. I'm used to thinking of yield in the sense of one thread yielding to another, but that doesn't seem relevant here.

The yield contextual keyword actually does quite a lot here.
The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.
The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:
public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.
Here is a real-life example:
public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
using (var connection = CreateConnection())
{
using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
{
command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return make(reader);
}
}
}
}
}

Iteration. It creates a state machine "under the covers" that remembers where you were on each additional cycle of the function and picks up from there.

Yield has two great uses,
It helps to provide custom iteration without creating temp collections.
It helps to do stateful iteration.
In order to explain above two points more demonstratively, I have created a simple video you can watch it here

Recently Raymond Chen also ran an interesting series of articles on the yield keyword.
The implementation of iterators in C# and its consequences (part 1)
The implementation of iterators in C# and its consequences (part 2)
The implementation of iterators in C# and its consequences (part 3)
The implementation of iterators in C# and its consequences (part 4)
While it's nominally used for easily implementing an iterator pattern, but can be generalized into a state machine. No point in quoting Raymond, the last part also links to other uses (but the example in Entin's blog is esp good, showing how to write async safe code).

At first sight, yield return is a .NET sugar to return an IEnumerable.
Without yield, all the items of the collection are created at once:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
return new List<SomeData> {
new SomeData(),
new SomeData(),
new SomeData()
};
}
}
Same code using yield, it returns item by item:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
yield return new SomeData();
yield return new SomeData();
yield return new SomeData();
}
}
The advantage of using yield is that if the function consuming your data simply needs the first item of the collection, the rest of the items won't be created.
The yield operator allows the creation of items as it is demanded. That's a good reason to use it.

A list or array implementation loads all of the items immediately whereas the yield implementation provides a deferred execution solution.
In practice, it is often desirable to perform the minimum amount of work as needed in order to reduce the resource consumption of an application.
For example, we may have an application that process millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:
Scalability, reliability and predictability are likely to improve since the number of records does not significantly affect the application’s resource requirements.
Performance and responsiveness are likely to improve since processing can start immediately instead of waiting for the entire collection to be loaded first.
Recoverability and utilisation are likely to improve since the application can be stopped, started, interrupted or fail. Only the items in progress will be lost compared to pre-fetching all of the data where only using a portion of the results was actually used.
Continuous processing is possible in environments where constant workload streams are added.
Here is a comparison between build a collection first such as a list compared to using yield.
List Example
public class ContactListStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
var contacts = new List<ContactModel>();
Console.WriteLine("ContactListStore: Creating contact 1");
contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" });
Console.WriteLine("ContactListStore: Creating contact 2");
contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" });
Console.WriteLine("ContactListStore: Creating contact 3");
contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" });
return contacts;
}
}
static void Main(string[] args)
{
var store = new ContactListStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
ContactListStore: Creating contact 1
ContactListStore: Creating contact 2
ContactListStore: Creating contact 3
Ready to iterate through the collection.
Note: The entire collection was loaded into memory without even asking for a single item in the list
Yield Example
public class ContactYieldStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
Console.WriteLine("ContactYieldStore: Creating contact 1");
yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" };
Console.WriteLine("ContactYieldStore: Creating contact 2");
yield return new ContactModel() { FirstName = "Jim", LastName = "Green" };
Console.WriteLine("ContactYieldStore: Creating contact 3");
yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" };
}
}
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
Ready to iterate through the collection.
Note: The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required.
Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
Console.WriteLine("Hello {0}", contacts.First().FirstName);
Console.ReadLine();
}
Console Output
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
Hello Bob
Nice! Only the first contact was constructed when the client "pulled" the item out of the collection.

yield return is used with enumerators. On each call of yield statement, control is returned to the caller but it ensures that the callee's state is maintained. Due to this, when the caller enumerates the next element, it continues execution in the callee method from statement immediately after the yield statement.
Let us try to understand this with an example. In this example, corresponding to each line I have mentioned the order in which execution flows.
static void Main(string[] args)
{
foreach (int fib in Fibs(6))//1, 5
{
Console.WriteLine(fib + " ");//4, 10
}
}
static IEnumerable<int> Fibs(int fibCount)
{
for (int i = 0, prevFib = 0, currFib = 1; i < fibCount; i++)//2
{
yield return prevFib;//3, 9
int newFib = prevFib + currFib;//6
prevFib = currFib;//7
currFib = newFib;//8
}
}
Also, the state is maintained for each enumeration. Suppose, I have another call to Fibs() method then the state will be reset for it.

Intuitively, the keyword returns a value from the function without leaving it, i.e. in your code example it returns the current item value and then resumes the loop. More formally, it is used by the compiler to generate code for an iterator. Iterators are functions that return IEnumerable objects. The MSDN has several articles about them.

If I understand this correctly, here's how I would phrase this from the perspective of the function implementing IEnumerable with yield.
Here's one.
Call again if you need another.
I'll remember what I already gave you.
I'll only know if I can give you another when you call again.

Here is a simple way to understand the concept:
The basic idea is, if you want a collection that you can use "foreach" on, but gathering the items into the collection is expensive for some reason (like querying them out of a database), AND you will often not need the entire collection, then you create a function that builds the collection one item at a time and yields it back to the consumer (who can then terminate the collection effort early).
Think of it this way: You go to the meat counter and want to buy a pound of sliced ham. The butcher takes a 10-pound ham to the back, puts it on the slicer machine, slices the whole thing, then brings the pile of slices back to you and measures out a pound of it. (OLD way).
With yield, the butcher brings the slicer machine to the counter, and starts slicing and "yielding" each slice onto the scale until it measures 1-pound, then wraps it for you and you're done. The Old Way may be better for the butcher (lets him organize his machinery the way he likes), but the New Way is clearly more efficient in most cases for the consumer.

The yield keyword allows you to create an IEnumerable<T> in the form on an iterator block. This iterator block supports deferred executing and if you are not familiar with the concept it may appear almost magical. However, at the end of the day it is just code that executes without any weird tricks.
An iterator block can be described as syntactic sugar where the compiler generates a state machine that keeps track of how far the enumeration of the enumerable has progressed. To enumerate an enumerable, you often use a foreach loop. However, a foreach loop is also syntactic sugar. So you are two abstractions removed from the real code which is why it initially might be hard to understand how it all works together.
Assume that you have a very simple iterator block:
IEnumerable<int> IteratorBlock()
{
Console.WriteLine("Begin");
yield return 1;
Console.WriteLine("After 1");
yield return 2;
Console.WriteLine("After 2");
yield return 42;
Console.WriteLine("End");
}
Real iterator blocks often have conditions and loops but when you check the conditions and unroll the loops they still end up as yield statements interleaved with other code.
To enumerate the iterator block a foreach loop is used:
foreach (var i in IteratorBlock())
Console.WriteLine(i);
Here is the output (no surprises here):
Begin
1
After 1
2
After 2
42
End
As stated above foreach is syntactic sugar:
IEnumerator<int> enumerator = null;
try
{
enumerator = IteratorBlock().GetEnumerator();
while (enumerator.MoveNext())
{
var i = enumerator.Current;
Console.WriteLine(i);
}
}
finally
{
enumerator?.Dispose();
}
In an attempt to untangle this I have crated a sequence diagram with the abstractions removed:
The state machine generated by the compiler also implements the enumerator but to make the diagram more clear I have shown them as separate instances. (When the state machine is enumerated from another thread you do actually get separate instances but that detail is not important here.)
Every time you call your iterator block a new instance of the state machine is created. However, none of your code in the iterator block is executed until enumerator.MoveNext() executes for the first time. This is how deferred executing works. Here is a (rather silly) example:
var evenNumbers = IteratorBlock().Where(i => i%2 == 0);
At this point the iterator has not executed. The Where clause creates a new IEnumerable<T> that wraps the IEnumerable<T> returned by IteratorBlock but this enumerable has yet to be enumerated. This happens when you execute a foreach loop:
foreach (var evenNumber in evenNumbers)
Console.WriteLine(eventNumber);
If you enumerate the enumerable twice then a new instance of the state machine is created each time and your iterator block will execute the same code twice.
Notice that LINQ methods like ToList(), ToArray(), First(), Count() etc. will use a foreach loop to enumerate the enumerable. For instance ToList() will enumerate all elements of the enumerable and store them in a list. You can now access the list to get all elements of the enumerable without the iterator block executing again. There is a trade-off between using CPU to produce the elements of the enumerable multiple times and memory to store the elements of the enumeration to access them multiple times when using methods like ToList().

One major point about Yield keyword is Lazy Execution. Now what I mean by Lazy Execution is to execute when needed. A better way to put it is by giving an example
Example: Not using Yield i.e. No Lazy Execution.
public static IEnumerable<int> CreateCollectionWithList()
{
var list = new List<int>();
list.Add(10);
list.Add(0);
list.Add(1);
list.Add(2);
list.Add(20);
return list;
}
Example: using Yield i.e. Lazy Execution.
public static IEnumerable<int> CreateCollectionWithYield()
{
yield return 10;
for (int i = 0; i < 3; i++)
{
yield return i;
}
yield return 20;
}
Now when I call both methods.
var listItems = CreateCollectionWithList();
var yieldedItems = CreateCollectionWithYield();
you will notice listItems will have a 5 items inside it (hover your mouse on listItems while debugging).
Whereas yieldItems will just have a reference to the method and not the items.
That means it has not executed the process of getting items inside the method. A very efficient way of getting data only when needed.
Actual implementation of yield can be seen in ORM like Entity Framework and NHibernate etc.

The C# yield keyword, to put it simply, allows many calls to a body of code, referred to as an iterator, that knows how to return before it's done and, when called again, continues where it left off - i.e. it helps an iterator become transparently stateful per each item in a sequence that the iterator returns in successive calls.
In JavaScript, the same concept is called Generators.

It is a very simple and easy way to create an enumerable for your object. The compiler creates a class that wraps your method and that implements, in this case, IEnumerable<object>. Without the yield keyword, you'd have to create an object that implements IEnumerable<object>.

It's producing enumerable sequence. What it does is actually creating local IEnumerable sequence and returning it as a method result

This link has a simple example
Even simpler examples are here
public static IEnumerable<int> testYieldb()
{
for(int i=0;i<3;i++) yield return 4;
}
Notice that yield return won't return from the method. You can even put a WriteLine after the yield return
The above produces an IEnumerable of 4 ints 4,4,4,4
Here with a WriteLine. Will add 4 to the list, print abc, then add 4 to the list, then complete the method and so really return from the method(once the method has completed, as would happen with a procedure without a return). But this would have a value, an IEnumerable list of ints, that it returns on completion.
public static IEnumerable<int> testYieldb()
{
yield return 4;
console.WriteLine("abc");
yield return 4;
}
Notice also that when you use yield, what you are returning is not of the same type as the function. It's of the type of an element within the IEnumerable list.
You use yield with the method's return type as IEnumerable. If the method's return type is int or List<int> and you use yield, then it won't compile. You can use IEnumerable method return type without yield but it seems maybe you can't use yield without IEnumerable method return type.
And to get it to execute you have to call it in a special way.
static void Main(string[] args)
{
testA();
Console.Write("try again. the above won't execute any of the function!\n");
foreach (var x in testA()) { }
Console.ReadLine();
}
// static List<int> testA()
static IEnumerable<int> testA()
{
Console.WriteLine("asdfa");
yield return 1;
Console.WriteLine("asdf");
}

Nowadays you can use the yield keyword for async streams.
C# 8.0 introduces async streams, which model a streaming source of data. Data streams often retrieve or generate elements asynchronously. Async streams rely on new interfaces introduced in .NET Standard 2.1. These interfaces are supported in .NET Core 3.0 and later. They provide a natural programming model for asynchronous streaming data sources.
Source: Microsoft docs
Example below
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
List<int> numbers = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await foreach(int number in YieldReturnNumbers(numbers))
{
Console.WriteLine(number);
}
}
public static async IAsyncEnumerable<int> YieldReturnNumbers(List<int> numbers)
{
foreach (int number in numbers)
{
await Task.Delay(1000);
yield return number;
}
}
}

Simple demo to understand yield
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp_demo_yield {
class Program
{
static void Main(string[] args)
{
var letters = new List<string>() { "a1", "b1", "c2", "d2" };
// Not yield
var test1 = GetNotYield(letters);
foreach (var t in test1)
{
Console.WriteLine(t);
}
// yield
var test2 = GetWithYield(letters).ToList();
foreach (var t in test2)
{
Console.WriteLine(t);
}
Console.ReadKey();
}
private static IList<string> GetNotYield(IList<string> list)
{
var temp = new List<string>();
foreach(var x in list)
{
if (x.Contains("2")) {
temp.Add(x);
}
}
return temp;
}
private static IEnumerable<string> GetWithYield(IList<string> list)
{
foreach (var x in list)
{
if (x.Contains("2"))
{
yield return x;
}
}
}
}
}

It's trying to bring in some Ruby Goodness :)
Concept: This is some sample Ruby Code that prints out each element of the array
rubyArray = [1,2,3,4,5,6,7,8,9,10]
rubyArray.each{|x|
puts x # do whatever with x
}
The Array's each method implementation yields control over to the caller (the 'puts x') with each element of the array neatly presented as x. The caller can then do whatever it needs to do with x.
However .Net doesn't go all the way here.. C# seems to have coupled yield with IEnumerable, in a way forcing you to write a foreach loop in the caller as seen in Mendelt's response. Little less elegant.
//calling code
foreach(int i in obCustomClass.Each())
{
Console.WriteLine(i.ToString());
}
// CustomClass implementation
private int[] data = {1,2,3,4,5,6,7,8,9,10};
public IEnumerable<int> Each()
{
for(int iLooper=0; iLooper<data.Length; ++iLooper)
yield return data[iLooper];
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Check if yield return contains items - c#

Related

C# yield return performance

Why would a C# method returning IEnumerable work when it returns a List but not when it yields?

How do I make a streaming LINQ expression that delivers the filtered out items as well as the filtered items?

ToList method in Linq

What is the yield keyword used for in C#?

Categories

Resources