Related
My situation is this. I need to run some validation and massage type code on multiple different types of objects, but for cleanliness (and code reuse), I'd like to make all the calls to this validation look basically the same regardless of object. I am attempting to solve this through overloading, which works fine until I get to Generic Collection objects.
The following example should clarify what I'm talking about here:
private string DoStuff(string tmp) { ... }
private ObjectA DoStuff(ObjectA tmp) { ... }
private ObjectB DoStuff(ObjectB tmp) { ... }
...
private Collection<ObjectA> DoStuff(Collection<ObjectA> tmp) {
foreach (ObjectA obj in tmp) if (DoStuff(obj) == null) tmp.Remove(obj);
if (tmp.Count == 0) return null;
return tmp;
}
private Collection<Object> DoStuff(Collection<ObjectB> tmp) {
foreach (ObjectB obj in tmp) if (DoStuff(obj) == null) tmp.Remove(obj);
if (tmp.Count == 0) return null;
return tmp;
}
...
This seems like a real waste, as I have to duplicate the exact same code for every different Collection<T> type. I would like to make a single instance of DoStuff that handles any Collection<T>, rather than make a separate one for each.
I have tried using ICollection, but this has two problems: first, ICollection does not expose the .Remove method, and I can't write the foreach loop because I don't know the type of the objects in the list. Using something more generic, like object, does not work because I don't have a method DoStuff that accepts an object - I need it to call the appropriate one for the actual object. Writing a DoStuff method which takes an object and does some kind of huge list of if statements to pick the right method and cast appropriately kind of defeats the whole idea of getting rid of redundant code - I might as well just copy and paste all those Collection<T> methods.
I have tried using a generic DoStuff<T> method, but this has the same problem in the foreach loop. Because I don't know the object type at design time, the compiler won't let me call DoStuff(obj).
Technically, the compiler should be able to tell which call needs to be made at compile time, since these are all private methods, and the specific types of the objects being passed in the calls are all known at the point the method is being called. That knowledge just doesn't seem to bubble up to the later methods being called by this method.
I really don't want to use reflection here, as that makes the code even more complicated than just copying and pasting all the Collection<T> methods, and it creates a performance slowdown. Any ideas?
---EDIT 1---
I realized that my generic method references were not displaying correctly, because I had not used the html codes for the angle brackets. This should be fixed now.
---EDIT 2---
Based on a response below, I have altered my Collection<T> method to look like this:
private Collection<T> DoStuff<T>(Collection<T> tmp) {
for (int i = tmp.Count - 1; i >= 0; i--) if (DoStuff(tmp[i]) == null) tmp.RemoveAt(i);
if (tmp.Count == 0) return null;
return tmp;
}
This still does not work, however, as the compiler cannot figure out which overloaded method to call when I call DoStuff(tmp[i]).
You need to pass the method you want to call into the generic method as a parameter. That way the overload resolution happens at a point where the compiler knows what types to expect.
Alternatively, you need to make the per-item DoStuff method generic (or object) to support any possible item in the collection.
(I also separated the RemoveItem call from the first loop, so that it isn't trying to remove an item from the same list being iterated.)
private Collection<T> DoStuff<T>(Collection<T> tmp, Func<T, T> stuffDoer)
{
var removeList = tmp
.Select(v => stuffDoer(v))
.Where(v => v == null)
.ToList();
foreach (var removeItem in removeList) tmp.Remove(removeItem);
if (tmp.Count == 0) return null;
return tmp;
}
private class ObjectA { }
private class ObjectB { }
private string DoStuff(string tmp) { return tmp; }
private ObjectA DoStuff(ObjectA tmp) { return tmp; }
private ObjectB DoStuff(ObjectB tmp) { return tmp; }
Call using this code:
var x = new Collection<ObjectA>
{
new ObjectA(),
new ObjectA(),
null
};
var result = DoStuff(x, DoStuff);
Something like this?:
private Collection DoStuff<T>(Collection tmp)
{
// This will probably assert as you are modifying a collection while looping in it.
foreach (T obj in tmp) if (DoStuff(obj) == null) tmp.Remove(obj);
if (tmp.Count == 0) return null;
return tmp;
}
Where T is the type of the object in the collection.
Please note that you have a line that will most likely assert. SO:
private Collection DoStuff<T>(Collection tmp)
{
// foreach doesn't work if you are modifying the collection.
// Looping backward with an index, so we never encounter an invalid index.
for (int i = tmp.Count - 1; i >= 0; i--) if (DoStuff(tmp[i]) == null) tmp.Remove(tmp[i]);
if (tmp.Count == 0) return null;
return tmp;
}
But at this point... Why make it generic, since you are not using T anymore?
private Collection DoStuff(Collection tmp)
{
// DoStuff can be generic, but you shouldn't need to explicitly pass it a type...
for (int i = tmp.Count - 1; i >= 0; i--) if (DoStuff(tmp[i]) == null) tmp.Remove(tmp[i]);
if (tmp.Count == 0) return null;
return tmp;
}
Often I have to code a loop that needs a special case for the first item, the code never seems as clear as it should ideally be.
Short of a redesign of the C# language, what is the best way to code these loops?
// this is more code to read then I would like for such a common concept
// and it is to easy to forget to update "firstItem"
foreach (x in yyy)
{
if (firstItem)
{
firstItem = false;
// other code when first item
}
// normal processing code
}
// this code is even harder to understand
if (yyy.Length > 0)
{
//Process first item;
for (int i = 1; i < yyy.Length; i++)
{
// process the other items.
}
}
How about:
using (var erator = enumerable.GetEnumerator())
{
if (erator.MoveNext())
{
ProcessFirst(erator.Current);
//ProcessOther(erator.Current); // Include if appropriate.
while (erator.MoveNext())
ProcessOther(erator.Current);
}
}
You could turn that into an extension if you want:
public static void Do<T>(this IEnumerable<T> source,
Action<T> firstItemAction,
Action<T> otherItemAction)
{
// null-checks omitted
using (var erator = source.GetEnumerator())
{
if (!erator.MoveNext())
return;
firstItemAction(erator.Current);
while (erator.MoveNext())
otherItemAction(erator.Current);
}
}
You could try:
collection.first(x=>
{
//...
}).rest(x=>
{
//...
}).run();
first / rest would look like:
FirstPart<T> first<T>(this IEnumerable<T> c, Action<T> a)
{
return new FirstPart<T>(c, a);
}
FirstRest rest<T>(this FirstPart<T> fp, Action<T> a)
{
return new FirstRest(fp.Collection, fp.Action, a);
}
You would need to define classed FirstPart and FirstRest. FirstRest would need a run method like so (Collection, FirstAction, and RestAction are properties):
void run()
{
bool first = true;
foreach (var x in Collection)
{
if (first) {
FirstAction(x);
first = false;
}
else {
RestAction(x);
}
}
}
I'd be tempted to use a bit of linq
using System.Linq;
var theCollectionImWorkingOn = ...
var firstItem = theCollectionImWorkingOn.First();
firstItem.DoSomeWork();
foreach(var item in theCollectionImWorkingOn.Skip(1))
{
item.DoSomeOtherWork();
}
I use the first variable method all the time and it seems totally normal to me.
If you like that better you can use LINQ First() and Skip(1)
var firstItem = yyy.First();
// do the whatever on first item
foreach (var y in yyy.Skip(1))
{
// process the rest of the collection
}
The way you wrote it is probably the cleanest way it can be written. After all, there is logic specific to the first element, so it has to be represented somehow.
In cases like this I would just use a for loop like this:
for(int i = 0; i < yyy.Count; i++){
if(i == 0){
//special logic here
}
}
Using a for loop also would allow you to do something special in other cases like on the last item, on even items in the sequence, ..etc.
IMHO the most cleanest way is: try to avoid special cases for the first item. That may not work in every situation, of course, but "special cases" may indicate that your program logic is more complex than it needs to be.
By the way, I would not code
if (yyy.Length > 0)
{
for(int i = 1; i <yyy.Length; i++)
{
// ...
}
}
but instead
for(int i = 1; i <yyy.Length; i++)
{
// ...
}
(which is itself a simple example of how to avoid unnecessary dealing with a special case.)
Here's a slightly simpler extension method that does the job. This is a combination of KeithS's solution and my answer to a related Java question:
public static void ForEach<T>(this IEnumerable<T> elements,
Action<T> firstElementAction,
Action<T> standardAction)
{
var currentAction = firstElementAction;
foreach(T element in elements)
{
currentAction(element);
currentAction = standardAction;
}
}
Whilst I wouldn't personally do this, there is another way using enumerators, which alleviates the need for conditional logic. Something like this:
void Main()
{
var numbers = Enumerable.Range(1, 5);
IEnumerator num = numbers.GetEnumerator();
num.MoveNext();
ProcessFirstItem(num.Current); // First item
while(num.MoveNext()) // Iterate rest
{
Console.WriteLine(num.Current);
}
}
void ProcessFirstItem(object first)
{
Console.WriteLine("First is: " + first);
}
Sample output would be:
First is: 1
2
3
4
5
Another option I came up with is
enum ItemType
{
First,
Last,
Normal
}
list.Foreach(T item, ItemType itemType) =>
{
if (itemType == ItemType.First)
{
}
// rest of code
};
Writing the extension method is left as an exercise for the reader…
Also should two Boolean flags “IsFirst” and “IsLast” be used instead of ItemType enum, or ItemType be an object that has “IsFirst” and “IsLast” properties?
Both of those are perfectly acceptable algorithms for processing the first element differently, and there really isn't a different way to do it. If this pattern is repeated a lot, you could hide it behind an overload of ForEach():
public static void ForEach<T>(this IEnumerable<T> elements, Action<T> firstElementAction, Action<T> standardAction)
{
var firstItem = true;
foreach(T element in elements)
{
if(firstItem)
{
firstItem = false;
firstElementAction(element)
}
else
standardAction(element)
}
}
...
//usage
yyy.ForEach(t=>(other code when first item), t=>(normal processing code));
Linq makes it a little cleaner:
PerformActionOnFirstElement(yyy.FirstOrDefault());
yyy.Skip(1).ForEach(x=>(normal processing code));
My solution:
foreach (var x in yyy.Select((o, i) => new { Object = o, Index = i } )
{
if (x.Index == 0)
{
// First item logic
}
else
{
// Rest of items
}
}
I have a ListView in virtual mode, and the underlying data is being stored in a List<MyRowObject>. Each column of the ListView corresponds to a public string property of MyRowObject. The columns of my ListView are configurable during runtime, such that any of them can be disabled and they can be reordered. To return a ListViewItem for the RetrieveVirtualItem event, I have a method similar to:
class MyRowObject
{
public string[] GetItems(List<PropertyInfo> properties)
{
string[] arr = new string[properties.Count];
foreach(PropertyInfo property in properties)
{
arr[i] = (string)property.GetValue(this,null);
}
return arr;
}
}
The event handler for RetrieveVirtualItem looks similar to:
private void listView_RetrieveVirtualItem(object sender, RetrieveVirtualItemEventArgs e)
{
e.Item = new ListViewItem(_virtualList[e.ItemIndex].GetItems(_currentColumns));
}
Maybe not surprisingly, benchmarking shows that this method is significantly slower than an implementation that accessed properties directly in a hardcoded order, and the slowdown is just significant enough that I would like to find a better solution.
The most promising idea I've had is to use an anonymous delegate to tell the MyRowObject class how to directly access the properties, but if it's possible I couldn't get the semantics right (given the name of a property stored in a string, is there a way I can write a closure to directly access that property?).
So, is there a nice way to avoid using reflection to populate my ListView without losing any functionality?
The open source extension of ListView is off limit because of company policy.
You could use these 2 functions
private List<Func<T, string>> BuildItemGetters<T>(IEnumerable<PropertyInfo> properties)
{
List<Func<T, string>> getters = new List<Func<T, string>>();
foreach (var prop in properties)
{
var paramExp = Expression.Parameter(typeof(T), "p");
Expression propExp = Expression.Property(paramExp, prop);
if (prop.PropertyType != typeof(string))
propExp = Expression.Call(propExp, toString);
var lambdaExp = Expression.Lambda<Func<T, string>>(propExp, paramExp);
getters.Add(lambdaExp.Compile());
}
return getters;
}
private string[] GetItems<T>(List<Func<T, string>> properties, T obj)
{
int count = properties.Count;
string[] output = new string[count];
for (int i = 0; i < count; i++)
output[i] = properties[i](obj);
return output;
}
Call the BuildItemGetters (sorry for the name, couldn't think of anything ;) once with a list of properties you want to get from the rows. Then just call the GetItems for each row. Where obj is the row and the list is the one you got from the other function.
For T just use the class name of your Row, like:
var props = BuildItemGetters<MyRowObject>(properties);
string[] items = GetItems(props, row);
ofcourse, only call the build when the columns change
BindingSource and PropertyDescriptor are more elegant techniques for performing manual data-binding, which is more-or-less what you're doing with the ListView when it's in VirtualMode. Although it generally uses reflection internally anyway, you can rely on it to work efficiently and seamlessly.
I wrote a blog article recently which explains in detail how to use these mechanisms (although it's in a different context, the principles are the same) - http://www.brad-smith.info/blog/archives/104
Take a look at Reflection.Emit. With this, you can generate code on the fly that accesses a specific property. This CodeProject article has an interesting description of the mechanism: http://www.codeproject.com/KB/cs/fast_dynamic_properties.aspx.
I haven't throughly reviewed the code of the project, but my first impression is that the basic idea looks promising. However, one of the improvements I would make is that some of the pieces of the class should be static and shared, for example InitTypes and the created dynamic assembly. For the rest, it looks like it fits what you're looking for.
I don't know enough about c# to tell you if that is possible, but I'll go and hack my way with something like this:
once, i will try to get 'delegate pointers' for every member that I need, and will do that through reflection - if it were c++, those pointers would be vtable offsets for property getter function
will create a map with string->pointer offset
will use the map to call the getter function directly through the pointer.
Yes, it seems like a magic, but i guess that someone with enough CLR/MSIL knowledge can shed the light on that if it is remotely possible.
Here is another variant caching the get methods for each property.
public class PropertyWrapper<T>
{
private Dictionary<string, MethodBase> _getters = new Dictionary<string, MethodBase>();
public PropertyWrapper()
{
foreach (var item in typeof(T).GetProperties())
{
if (!item.CanRead)
continue;
_getters.Add(item.Name, item.GetGetMethod());
}
}
public string GetValue(T instance, string name)
{
MethodBase getter;
if (_getters.TryGetValue(name, out getter))
return getter.Invoke(instance, null).ToString();
return string.Empty;
}
}
to get a property value:
var wrapper = new PropertyWrapper<MyObject>(); //keep it as a member variable in your form
var myObject = new MyObject{LastName = "Arne");
var value = wrapper.GetValue(myObject, "LastName");
I have this function from a plugin (from a previous post)
// This method implements the test condition for
// finding the ResolutionInfo.
private static bool IsResolutionInfo(ImageResource res)
{
return res.ID == (int)ResourceIDs.ResolutionInfo;
}
And the line thats calling this function:
get
{
return (ResolutionInfo)m_imageResources.Find(IsResolutionInfo);
}
So basically I'd like to get rid of the calling function. It's only called twice (once in the get and the other in the set). And It could possible help me to understand inline functions in c#.
get
{
return (ResolutionInfo)m_imageResources.Find(res => res.ID == (int)ResourceIDs.ResolutionInfo);
}
Does that clear it up at all?
Just to further clear things up, looking at reflector, this is what the Find method looks like:
public T Find(Predicate<T> match)
{
if (match == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
if (match(this._items[i]))
{
return this._items[i];
}
}
return default(T);
}
So as you can see, it loops through the collection, and for every item in the collection, it passes the item at that index to the Predicate that you passed in (through your lambda). Thus, since we're dealing with generics, it automatically knows the type you're dealing with. It'll be Type T which is whatever type that is in your collection. Makes sense?
Just to add , does the "Find" Function on a list (which is what m_imageresources is) automatically pass the parameter to the IsResoulutionInfo function?
Also, what happens first the cast or the function call?
In the How Can I Expose Only a Fragment of IList<> question one of the answers had the following code snippet:
IEnumerable<object> FilteredList()
{
foreach(object item in FullList)
{
if(IsItemInPartialList(item))
yield return item;
}
}
What does the yield keyword do there? I've seen it referenced in a couple places, and one other question, but I haven't quite figured out what it actually does. I'm used to thinking of yield in the sense of one thread yielding to another, but that doesn't seem relevant here.
The yield contextual keyword actually does quite a lot here.
The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.
The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:
public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.
Here is a real-life example:
public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
using (var connection = CreateConnection())
{
using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
{
command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return make(reader);
}
}
}
}
}
Iteration. It creates a state machine "under the covers" that remembers where you were on each additional cycle of the function and picks up from there.
Yield has two great uses,
It helps to provide custom iteration without creating temp collections.
It helps to do stateful iteration.
In order to explain above two points more demonstratively, I have created a simple video you can watch it here
Recently Raymond Chen also ran an interesting series of articles on the yield keyword.
The implementation of iterators in C# and its consequences (part 1)
The implementation of iterators in C# and its consequences (part 2)
The implementation of iterators in C# and its consequences (part 3)
The implementation of iterators in C# and its consequences (part 4)
While it's nominally used for easily implementing an iterator pattern, but can be generalized into a state machine. No point in quoting Raymond, the last part also links to other uses (but the example in Entin's blog is esp good, showing how to write async safe code).
At first sight, yield return is a .NET sugar to return an IEnumerable.
Without yield, all the items of the collection are created at once:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
return new List<SomeData> {
new SomeData(),
new SomeData(),
new SomeData()
};
}
}
Same code using yield, it returns item by item:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
yield return new SomeData();
yield return new SomeData();
yield return new SomeData();
}
}
The advantage of using yield is that if the function consuming your data simply needs the first item of the collection, the rest of the items won't be created.
The yield operator allows the creation of items as it is demanded. That's a good reason to use it.
A list or array implementation loads all of the items immediately whereas the yield implementation provides a deferred execution solution.
In practice, it is often desirable to perform the minimum amount of work as needed in order to reduce the resource consumption of an application.
For example, we may have an application that process millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:
Scalability, reliability and predictability are likely to improve since the number of records does not significantly affect the application’s resource requirements.
Performance and responsiveness are likely to improve since processing can start immediately instead of waiting for the entire collection to be loaded first.
Recoverability and utilisation are likely to improve since the application can be stopped, started, interrupted or fail. Only the items in progress will be lost compared to pre-fetching all of the data where only using a portion of the results was actually used.
Continuous processing is possible in environments where constant workload streams are added.
Here is a comparison between build a collection first such as a list compared to using yield.
List Example
public class ContactListStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
var contacts = new List<ContactModel>();
Console.WriteLine("ContactListStore: Creating contact 1");
contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" });
Console.WriteLine("ContactListStore: Creating contact 2");
contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" });
Console.WriteLine("ContactListStore: Creating contact 3");
contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" });
return contacts;
}
}
static void Main(string[] args)
{
var store = new ContactListStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
ContactListStore: Creating contact 1
ContactListStore: Creating contact 2
ContactListStore: Creating contact 3
Ready to iterate through the collection.
Note: The entire collection was loaded into memory without even asking for a single item in the list
Yield Example
public class ContactYieldStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
Console.WriteLine("ContactYieldStore: Creating contact 1");
yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" };
Console.WriteLine("ContactYieldStore: Creating contact 2");
yield return new ContactModel() { FirstName = "Jim", LastName = "Green" };
Console.WriteLine("ContactYieldStore: Creating contact 3");
yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" };
}
}
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
Ready to iterate through the collection.
Note: The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required.
Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
Console.WriteLine("Hello {0}", contacts.First().FirstName);
Console.ReadLine();
}
Console Output
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
Hello Bob
Nice! Only the first contact was constructed when the client "pulled" the item out of the collection.
yield return is used with enumerators. On each call of yield statement, control is returned to the caller but it ensures that the callee's state is maintained. Due to this, when the caller enumerates the next element, it continues execution in the callee method from statement immediately after the yield statement.
Let us try to understand this with an example. In this example, corresponding to each line I have mentioned the order in which execution flows.
static void Main(string[] args)
{
foreach (int fib in Fibs(6))//1, 5
{
Console.WriteLine(fib + " ");//4, 10
}
}
static IEnumerable<int> Fibs(int fibCount)
{
for (int i = 0, prevFib = 0, currFib = 1; i < fibCount; i++)//2
{
yield return prevFib;//3, 9
int newFib = prevFib + currFib;//6
prevFib = currFib;//7
currFib = newFib;//8
}
}
Also, the state is maintained for each enumeration. Suppose, I have another call to Fibs() method then the state will be reset for it.
Intuitively, the keyword returns a value from the function without leaving it, i.e. in your code example it returns the current item value and then resumes the loop. More formally, it is used by the compiler to generate code for an iterator. Iterators are functions that return IEnumerable objects. The MSDN has several articles about them.
If I understand this correctly, here's how I would phrase this from the perspective of the function implementing IEnumerable with yield.
Here's one.
Call again if you need another.
I'll remember what I already gave you.
I'll only know if I can give you another when you call again.
Here is a simple way to understand the concept:
The basic idea is, if you want a collection that you can use "foreach" on, but gathering the items into the collection is expensive for some reason (like querying them out of a database), AND you will often not need the entire collection, then you create a function that builds the collection one item at a time and yields it back to the consumer (who can then terminate the collection effort early).
Think of it this way: You go to the meat counter and want to buy a pound of sliced ham. The butcher takes a 10-pound ham to the back, puts it on the slicer machine, slices the whole thing, then brings the pile of slices back to you and measures out a pound of it. (OLD way).
With yield, the butcher brings the slicer machine to the counter, and starts slicing and "yielding" each slice onto the scale until it measures 1-pound, then wraps it for you and you're done. The Old Way may be better for the butcher (lets him organize his machinery the way he likes), but the New Way is clearly more efficient in most cases for the consumer.
The yield keyword allows you to create an IEnumerable<T> in the form on an iterator block. This iterator block supports deferred executing and if you are not familiar with the concept it may appear almost magical. However, at the end of the day it is just code that executes without any weird tricks.
An iterator block can be described as syntactic sugar where the compiler generates a state machine that keeps track of how far the enumeration of the enumerable has progressed. To enumerate an enumerable, you often use a foreach loop. However, a foreach loop is also syntactic sugar. So you are two abstractions removed from the real code which is why it initially might be hard to understand how it all works together.
Assume that you have a very simple iterator block:
IEnumerable<int> IteratorBlock()
{
Console.WriteLine("Begin");
yield return 1;
Console.WriteLine("After 1");
yield return 2;
Console.WriteLine("After 2");
yield return 42;
Console.WriteLine("End");
}
Real iterator blocks often have conditions and loops but when you check the conditions and unroll the loops they still end up as yield statements interleaved with other code.
To enumerate the iterator block a foreach loop is used:
foreach (var i in IteratorBlock())
Console.WriteLine(i);
Here is the output (no surprises here):
Begin
1
After 1
2
After 2
42
End
As stated above foreach is syntactic sugar:
IEnumerator<int> enumerator = null;
try
{
enumerator = IteratorBlock().GetEnumerator();
while (enumerator.MoveNext())
{
var i = enumerator.Current;
Console.WriteLine(i);
}
}
finally
{
enumerator?.Dispose();
}
In an attempt to untangle this I have crated a sequence diagram with the abstractions removed:
The state machine generated by the compiler also implements the enumerator but to make the diagram more clear I have shown them as separate instances. (When the state machine is enumerated from another thread you do actually get separate instances but that detail is not important here.)
Every time you call your iterator block a new instance of the state machine is created. However, none of your code in the iterator block is executed until enumerator.MoveNext() executes for the first time. This is how deferred executing works. Here is a (rather silly) example:
var evenNumbers = IteratorBlock().Where(i => i%2 == 0);
At this point the iterator has not executed. The Where clause creates a new IEnumerable<T> that wraps the IEnumerable<T> returned by IteratorBlock but this enumerable has yet to be enumerated. This happens when you execute a foreach loop:
foreach (var evenNumber in evenNumbers)
Console.WriteLine(eventNumber);
If you enumerate the enumerable twice then a new instance of the state machine is created each time and your iterator block will execute the same code twice.
Notice that LINQ methods like ToList(), ToArray(), First(), Count() etc. will use a foreach loop to enumerate the enumerable. For instance ToList() will enumerate all elements of the enumerable and store them in a list. You can now access the list to get all elements of the enumerable without the iterator block executing again. There is a trade-off between using CPU to produce the elements of the enumerable multiple times and memory to store the elements of the enumeration to access them multiple times when using methods like ToList().
One major point about Yield keyword is Lazy Execution. Now what I mean by Lazy Execution is to execute when needed. A better way to put it is by giving an example
Example: Not using Yield i.e. No Lazy Execution.
public static IEnumerable<int> CreateCollectionWithList()
{
var list = new List<int>();
list.Add(10);
list.Add(0);
list.Add(1);
list.Add(2);
list.Add(20);
return list;
}
Example: using Yield i.e. Lazy Execution.
public static IEnumerable<int> CreateCollectionWithYield()
{
yield return 10;
for (int i = 0; i < 3; i++)
{
yield return i;
}
yield return 20;
}
Now when I call both methods.
var listItems = CreateCollectionWithList();
var yieldedItems = CreateCollectionWithYield();
you will notice listItems will have a 5 items inside it (hover your mouse on listItems while debugging).
Whereas yieldItems will just have a reference to the method and not the items.
That means it has not executed the process of getting items inside the method. A very efficient way of getting data only when needed.
Actual implementation of yield can be seen in ORM like Entity Framework and NHibernate etc.
The C# yield keyword, to put it simply, allows many calls to a body of code, referred to as an iterator, that knows how to return before it's done and, when called again, continues where it left off - i.e. it helps an iterator become transparently stateful per each item in a sequence that the iterator returns in successive calls.
In JavaScript, the same concept is called Generators.
It is a very simple and easy way to create an enumerable for your object. The compiler creates a class that wraps your method and that implements, in this case, IEnumerable<object>. Without the yield keyword, you'd have to create an object that implements IEnumerable<object>.
It's producing enumerable sequence. What it does is actually creating local IEnumerable sequence and returning it as a method result
This link has a simple example
Even simpler examples are here
public static IEnumerable<int> testYieldb()
{
for(int i=0;i<3;i++) yield return 4;
}
Notice that yield return won't return from the method. You can even put a WriteLine after the yield return
The above produces an IEnumerable of 4 ints 4,4,4,4
Here with a WriteLine. Will add 4 to the list, print abc, then add 4 to the list, then complete the method and so really return from the method(once the method has completed, as would happen with a procedure without a return). But this would have a value, an IEnumerable list of ints, that it returns on completion.
public static IEnumerable<int> testYieldb()
{
yield return 4;
console.WriteLine("abc");
yield return 4;
}
Notice also that when you use yield, what you are returning is not of the same type as the function. It's of the type of an element within the IEnumerable list.
You use yield with the method's return type as IEnumerable. If the method's return type is int or List<int> and you use yield, then it won't compile. You can use IEnumerable method return type without yield but it seems maybe you can't use yield without IEnumerable method return type.
And to get it to execute you have to call it in a special way.
static void Main(string[] args)
{
testA();
Console.Write("try again. the above won't execute any of the function!\n");
foreach (var x in testA()) { }
Console.ReadLine();
}
// static List<int> testA()
static IEnumerable<int> testA()
{
Console.WriteLine("asdfa");
yield return 1;
Console.WriteLine("asdf");
}
Nowadays you can use the yield keyword for async streams.
C# 8.0 introduces async streams, which model a streaming source of data. Data streams often retrieve or generate elements asynchronously. Async streams rely on new interfaces introduced in .NET Standard 2.1. These interfaces are supported in .NET Core 3.0 and later. They provide a natural programming model for asynchronous streaming data sources.
Source: Microsoft docs
Example below
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
List<int> numbers = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await foreach(int number in YieldReturnNumbers(numbers))
{
Console.WriteLine(number);
}
}
public static async IAsyncEnumerable<int> YieldReturnNumbers(List<int> numbers)
{
foreach (int number in numbers)
{
await Task.Delay(1000);
yield return number;
}
}
}
Simple demo to understand yield
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp_demo_yield {
class Program
{
static void Main(string[] args)
{
var letters = new List<string>() { "a1", "b1", "c2", "d2" };
// Not yield
var test1 = GetNotYield(letters);
foreach (var t in test1)
{
Console.WriteLine(t);
}
// yield
var test2 = GetWithYield(letters).ToList();
foreach (var t in test2)
{
Console.WriteLine(t);
}
Console.ReadKey();
}
private static IList<string> GetNotYield(IList<string> list)
{
var temp = new List<string>();
foreach(var x in list)
{
if (x.Contains("2")) {
temp.Add(x);
}
}
return temp;
}
private static IEnumerable<string> GetWithYield(IList<string> list)
{
foreach (var x in list)
{
if (x.Contains("2"))
{
yield return x;
}
}
}
}
}
It's trying to bring in some Ruby Goodness :)
Concept: This is some sample Ruby Code that prints out each element of the array
rubyArray = [1,2,3,4,5,6,7,8,9,10]
rubyArray.each{|x|
puts x # do whatever with x
}
The Array's each method implementation yields control over to the caller (the 'puts x') with each element of the array neatly presented as x. The caller can then do whatever it needs to do with x.
However .Net doesn't go all the way here.. C# seems to have coupled yield with IEnumerable, in a way forcing you to write a foreach loop in the caller as seen in Mendelt's response. Little less elegant.
//calling code
foreach(int i in obCustomClass.Each())
{
Console.WriteLine(i.ToString());
}
// CustomClass implementation
private int[] data = {1,2,3,4,5,6,7,8,9,10};
public IEnumerable<int> Each()
{
for(int iLooper=0; iLooper<data.Length; ++iLooper)
yield return data[iLooper];
}