Distinction between iterator and enumerator - c#

An interview question for a .NET 3.5 job is "What is the difference between an iterator and an enumerator"?
This is a core distinction to make, what with LINQ, etc.
Anyway, what is the difference? I can't seem to find a solid definition on the net. Make no mistake, I can find the meaning of the two terms but I get slightly different answers. What would be the best answer for an interview?
IMO an iterator "iterates" over a collection, and an enumerator provides the functionality to iterate, but this has to be called.
Also, using the yield keyword is said to save state. What exactly is this state? Is there an example of this benefit occurring?

Iterating means repeating some steps, while enumerating means going through all values in a collection of values. So enumerating usually requires some form of iteration.
In that way, enumerating is a special case of iterating where the step is getting a value from a collection.
Note the "usually" – enumerating may also be performed recursively, but recursion and iteration are so closely related that I would not care about this small difference.
You may also enumerate values you do not explicitly store in a collection. For example, you can enumerate the natural number, primes, or whatever but you would calculate these values during the enumeration and not retrieve them from a physical collection. You understand this case as enumerating a virtual collection with its values defined by some logic.
I assume Reed Copsey got the point. In C# there are two major ways to enumerate something.
Implement Enumerable and a class implementing IEnumerator
Implement an iterator with the yield statement
The first way is harder to implement and uses objects for enumerating. The second way is easier to implement and uses continuations.

In C# 2+, iterators are a way for the compiler to automatically generate the IEnumerable and/or IEnumerable<T> interfaces for you.
Without iterators, you would need to create a class implementing IEnumerator, including Current, MoveNext, and Reset. This requires a fair amount of work. Normally, you would create a private class that implemtented IEnumerator<T> for your type, then yourClass.GetEnumerator() would construct that private class, and return it.
Iterators are a way for the compiler to automatically generate this for you, using a simple syntax (yield). This lets you implement GetEnumerator() directly in your class, without a second class (The IEnumerator) being specified by you. The construction of that class, with all of its members, is done for you.
Iterators are very developer friendly - things are done in a very efficient way, with much less effort.
When you use foreach, the two will behave identically (provided you write your custom IEnumerator correctly). Iterators just make life much simpler.

What C# calls an iterator is more commonly (outside of the C# world) called a generator or generator function (e.g. in Python). A generator function is a specialized case of coroutine. A C# iterator (generator) is a special form of an enumerator (a data type implementing the IEnumerable interface).
I dislike this usage of the term iterator for a C# generator because it is just as much an enumerator as it is an iterator. Too late for Microsoft to change its mind though.
For contrast consider that in C++ an iterator is a value which is used primarily to access sequential elements in a collection. It can be advanced, derferenced to retrieve a value, and tested to see whether the end of the collection has been reached.

"Whereas a foreach statement is the consumer of the enumerator, an iterator is the producer of the enumerator."
The above is how "C# 5.0 In A NutShell" explains it, and has been helpful for me.
In other words, the foreach statement uses MoveNext(), and the Current property of the IEnumerator to iterate through a sequence, while the iterator is used to produce the implementation of the IEnumerator that will be used by the foreach statement. In C#, when you write an iterator method containing a yield statement, the compiler will generate a private enumerator for you. And when you iterate through the items in the sequence, it will call the MoveNext() and Current property of the private enumerator. These methods/properties are implemented by your code in the iterator method that will be called repeately to yield values until there are not values left to yield.
This is my understanding of how C# define enumerators, and iterators.

To understand iterators we first need to understand enumerators.
Enumerators are specialist objects which provide one with the means to move through an ordered list of items one at a time (the same kind of thing is sometimes called a ‘cursor’). The .NET framework provides two important interfaces relating to enumerators: IEnumerator and IEnumerable. Objects which implement IEnumerator are themselves enumerators; they support the following members:
the property Current, which points to a position on the list
the method MoveNext, which moves the Current item one along the list
the method Reset, which moves the Current item to its initial position (which is before the first item).
On the other hand, Iterаtors implement the enumerаtor pаttern. .NET 2.0 introduced the iterаtor, which is а compiler-mаnifested enumerаtor. When the enumerаble object cаlls GetEnumerаtor, either directly or indirectly, the compiler generаtes аnd returns аn аppropriаte iterаtor object. Optionаlly, the iterаtor cаn be а combined enumerаble аnd enumerаtor object.
The essentiаl ingredient of аn iterаtor block is the yield stаtement. There is one big difference between iterаtors аnd enumerаtors: Iterаtors do not implement the Reset method. Cаlling the Reset method on аn iterаtor cаuses аn exception.
The point of iterators is to allow the easy implementation of enumerators. Where a method needs to return either an enumerator or an enumerable class for an ordered list of items, it is written so as to return each item in its correct order using the ‘yield’ statement.

Since no examples were given, here is one that was helpful to me.
An enumerator is an object that you get when you call .GetEnumerator() on a class or type that implements the IEnumerator interface. When this interface is implemented, you have created all the code necessary for the compilor to enable you to use foreach to "iterate" over your collection.
Don't get that word 'iterate" confused with iterator though. Both the Enumerator and the iterator allow you to "iterate". Enumerating and iterating are basically the same process, but are implemented differently. Enumerating means you've impleneted the IEnumerator interface. Iterating means you've created the iterator construct in your class (demonstrated below), and you are calling foreach on your class, at which time the compilor automatically creates the enumerator functionality for you.
Also note that you don't have to do squat with your enumerator. You can call MyClass.GetEnumerator() all day long, and do nothing with it (example:
IEnumerator myEnumeratorThatIWillDoNothingWith = MyClass.GetEnumerator()).
Note too that your iterator construct in your class only realy gets used when you are actually using it, i.e. you've called foreach on your class.
Here is an iterator example from msdn:
public class DaysOfTheWeek : System.Collections.IEnumerable
{
string[] days = { "Sun", "Mon", "Tue", "Wed", "Thr", "Fri", "Sat" };
//This is the iterator!!!
public System.Collections.IEnumerator GetEnumerator()
{
for (int i = 0; i < days.Length; i++)
{
yield return days[i];
}
}
}
class TestDaysOfTheWeek
{
static void Main()
{
// Create an instance of the collection class
DaysOfTheWeek week = new DaysOfTheWeek();
// Iterate with foreach - this is using the iterator!!! When the compiler
//detects your iterator, it will automatically generate the Current,
//MoveNext and Dispose methods of the IEnumerator or IEnumerator<T> interface
foreach (string day in week)
{
System.Console.Write(day + " ");
}
}
}
// Output: Sun Mon Tue Wed Thr Fri Sat

"Iterators are a new feature in C# 2.0. An iterator is a method, get accessor or operator that enables you to support foreach iteration in a class or struct without having to implement the entire IEnumerable interface. Instead, you provide just an iterator, which simply traverses the data structures in your class. When the compiler detects your iterator, it will automatically generate the Current, MoveNext and Dispose methods of the IEnumerable or IEnumerable interface." - msdn

Enumeration deals with objects while iteration deals with values only. Enumeration is used when we use vector hashtable etc while iteration are used in while loop for loop etc. I've never use the yield keyword so I couldn't tell you.

Iteration deals with arrays and strings while enumerating deals with objects
In JavaScript you can iterate an array or a string with :
forEach Loop
for loop
for of Loop
do while Loop
while Loop
And you can enumerate an object with :
for in Loop
Object.keys() Method
Object.values() Method
Object.entries() Method

Related

FOR-EACH over an IEnumerable vs a List

Is there any benefit or difference if my for-each loop is going through the method argument if I pass in that argument as an IEnumerable or if I pass that argument as a List?
If your IEnumerable is implemented by List then no; no difference. There is a big conceptual difference though; the IEnumerable says "I can be enumerated" which means also that the number of items is not known and the enumeration cannot be reversed, or random accessed. The List says "I am a fully formed list, already populated; I can be reversed and randomly accessed".
So you should generally build your function interface to accept the lowest functionality compatible with your operation; if you are only going to enumerate forwards, iteratively, then accept IEnumerable - this allows your function to be used in more scenarios.
If you made your function accept only List() then any caller with an array or IEnumerable passed into it, must convert their input into List() before calling your function - which may well be poorer performance than simply passing through their array or IEnumerable directly. In this sense accepting an IEnumerable invites better performance code.
In the general case, there can be a difference if the collection has an explicit interface implementation of IEnumerable
List has the explicit implementation, but does not change behavior. There is no difference in your case.
See: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs looking at GetEnumerator and similar
No there isn't. In both cases the for-each is translated to something like this
var enumerator = input.GetEnumerator();
while(enumerator.MoveNext())
{
// loop body.
// The current value is accessed through: enumerator.Current
}
Additionally, if the enumerator is disposable, it will be disposed after the loop.
Jon Skeet gives a detailed description here.
If you pass the same object, it doesn't matter whether your method accepts IEnumerable or List.
However, if all you're going to do inside the method is enumerate the object, it's best to expect an IEnumerable in the method argument, you don't want to limit the caller of the method by expecting a List.
No, there is no benefit or difference as to how the foreach loop would go through the collection.
As Olivier Jacot-Descombes has pointed out, the foreach loop will simply go through the elements one by one using the enumerator.
However, it can make a difference if your logic goes through the same collection at least twice. In this case if IEnumerable<> is used, you might end up regenerating the elements each time you go over the iterator.
ReSharper even has a special warning for this type of code: PossibleMultipleEnumeration
I am not saying that you should not use IEnumerable<>. Everything has its time and place and it's not always a good idea to use the most generic interface. Be careful with your choice.

IEnumerator Purpose

I don't quite understand what the use of IEnumerator from the C# Collections is.
What is it used for and why should it be used?
I tried looking online at http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx
but that article doesn't make much sense. The reason I ask is in Unity3d Game Engine, it's used along with the yield function. I am trying to make sense of the reason for the use of IEnumerator.
The IEnumerator interface, when implemented by a class allows it to be iterated through using the built-in foreach syntax.
In the class that needs to be iterated for the IEnumerator interface defines a method signature for the GetEnumerator function that controls looping through the object.
public IEnumerator<OrderLine> GetEnumerator()
{
for (int i=0;i<maxItems;i++)
{
yield return item[i];
}
}
As you can see in this example, the yield statement lets you return control to the caller without losing its place in the enumeration. Control will be passed back to the line after the yield statement when the caller hits the next increment of the foreach loop.
You rarely use IEnumerator explicitly, but there are a lot of extension methods that work with it, as well as the foreach. All of the collections implement it, and it's an example of the Iterator pattern.
It's quite simply just an interface which allows objects to perform operations for easily iterating over collections. You could use it to create objects which iterate over a custom collection with a foreach construct or similiar syntax.

Why don't the Linq extension methods sit on IEnumerator rather than IEnumerable?

There are lots of Linq algorithms that only need to do one pass through the input e.g. Select.
Yet all the Linq extension methods sit on IEnumerable rather than IEnumerator
var e = new[] { 1, 2, 3, 4, 5 }.GetEnumerator();
e.Select(x => x * x); // Doesn't work
This means you can't use Linq in any situation where you are reading from an "already opened" stream.
This scenario is happening a lot for a project I am currently working on - I want to return an IEnumerator whose IDispose method will close the stream, and have all the downstream Linq code operate on this.
In short, I have an "already opened" stream of results which I can convert into an appropriately disposable IEnumerator - but unfortunately all of the downstream code requires an IEnumerable rather than an IEnumerator, even though it's only going to do one "pass".
i.e. I'm wanting to "implement" this return type on a variety of different sources (CSV files, IDataReaders, etc.):
class TabularStream
{
Column[] Columns;
IEnumerator<object[]> RowStream;
}
In order to get the "Columns" I have to have already opened the CSV file, initiated the SQL query, or whatever. I can then return an "IEnumerator" whose Dispose method closes the resource - but all of the Linq operations require an IEnumerable.
The best workaround I know of is to implement an IEnumerable whose GetEnumerator() method returns the one-and-only IEnumerator and throws an error if something tries to do a GetEnumerator() call twice.
Does this all sound OK or is there a much better way for me to implement "TabularStream" in a way that's easy to use from Linq?
Using IEnumerator<T> directly is rarely a good idea, in my view.
For one thing, it encodes the fact that it's destructive - whereas LINQ queries can usually be run multiple times. They're meant to be side-effect-free, whereas the act of iterating over an IEnumerator<T> is naturally side-effecting.
It also makes it virtually impossible to perform some of the optimizations in LINQ to Objects, such as using the Count property if you're actually asking an ICollection<T> for its count.
As for your workaround: yes, a OneShotEnumerable would be a reasonable approach.
While I generally agree with Jon Skeet's answer, I have also come across a very few cases where working with IEnumerator indeed seemed more appropriate than wrapping them in a once-only-IEnumerable.
I'll start by illustrating one such case and by describing my own solution to the issue.
Case example: Forward-only, non-rewindable database cursors
ESRI's API for accessing geo-databases (ArcObjects) has forward-only database cursors that cannot be reset. They are essentially that API's equivalent of IEnumerator. But there is no equivalent to IEnumerable. So if you want to wrap that API in "the .NET way", you have three options (which I explored in the following order):
Wrap the cursor as an IEnumerator (since that's what it really is) and work directly with that (which is cumbersome).
Wrap the cursor, or the wrapping IEnumerator from (1), as a once-only IEnumerable (to make it LINQ-compatible and generally easier to work with). The mistake here is that it isn't an IEnumerable, because it cannot be enumerated more than once, and this might be overlooked by users or maintainers of your code.
Don't wrap the cursor itself as an IEnumerable, but that which can be used to retrieve a cursor (e.g. the query criteria and the reference to the database object being queried). That way, several iterations are possible simply be re-executing the whole query. This is what I eventually decided on back then.
That last option is the pragmatic solution that I would generally recommend for similar cases (if applicable). If you are looking for other solutions, read on.
Re-implement LINQ query operators for the IEnumerator<T> interface?
It's technically possible to implement some or all of LINQ's query operators for the IEnumerator<T> interface. One approach would be to write a bunch of extension methods, such as:
public static IEnumerator<T> Where(this IEnumerator<T> xs, Func<T, bool> predicate)
{
while (xs.MoveNext())
{
T x = xs.Current;
if (predicate(x)) yield return x;
}
yield break;
}
Let's consider a few key issues:
Operators must never return an IEnumerable<T>, because that would mean that you can break out of your own "LINQ to IEnumerator" world and escape into regular LINQ. There you'd end up with the non-repeatability issue already described above.
You cannot process the results of some query with a foreach loop… unless each of the IEnumerator<T> objects returned by your query operators implements a GetEnumerator method that returns this. Supplying that additional method would mean that you cannot use yield return/break, but have to write IEnumerator<T> classes manually.
This is just plain weird and possibly an abuse of either IEnumerator<T> or the foreach construct.
If returning IEnumerable<T> is forbidden and returning IEnumerator<T> is cumbersome (because foreach doesn't work), why not return plain arrays? Because then queries can no longer be lazy.
IQueryable + IEnumerator = IQueryator
What about delaying the execution of a query until it has been fully composed? In the IEnumerable world, that is what IQueryable does; so we could theoretically build an IEnumerator equivalent, which I shall call IQueryator.
IQueryator could check for logical errors, such as doing anything with the sequence after it has been completely consumed by a preceding operation like Count. I.e. all-consuming operators like Count would always have to be the last in a query operator concatenation.
IQueryator could return an array (like suggested above) or some other read-only collection, but not by the indiviual operators; only when the query gets executed.
Implementing IQueryator would take quite some time... the question is, would it actually be worth the effort?

Does List<T> create garbage in C# in foreach

Correct me if im wrong but while doing a foreach an IEnumerable<T> creates garbage no matter what T is. But I'm wondering if you have a List<T> where T is Entity. Then say there is a derived class in the list like Entity2D. Will it have to create a new enumerator for each derived class? Therefore creating garbage?
Also does having an interface let's say IEntity as T create garbage?
List<T>'s GetEnumerator method actually is quite efficient.
When you loop through the elements of a List<T>, it calls GetEnumerator. This, in turn, generates an internal struct which holds a reference to the original list, an index, and a version ID to track for changes in the list.
However, since a struct is being used, it's really not creating "garbage" that the GC will ever deal with.
As for "create a new enumerator for each derived class" - .NET generics works differently than C++ templates. In .NET, the List<T> class (and it's internal Enumerator<T> struct) is defined one time, and usable for any T. When used, a generic type for that specific type of T is required, but this is only the type information for that newly created type, and quite small in general. This differs from C++ templates, for example, where each type used is created at compile time, and "built in" to the executable.
In .NET, the executable specifies the definition for List<T>, not List<int>, List<Entity2D>, etc...
I think you may be interested in this article which explains why List(T) will not create "garbage", as opposed to Collection(T):
Now, here comes the tricky part. Rumor has it that many of the types in System.Collections.Generic will not allocate an enumerator when using foreach. List's GetEnumerator, for example, returns a struct, which will just sit on the stack. Look for yourself with .NET Reflector, if you don't believe me. To prove to myself that a foreach over a List doesn't cause any heap allocations, I changed entities to be a List, did the exact same foreach loop, and ran the profiler. No enumerator!
[...]
However, there is definitely a caveat to the above. Foreach loops over Lists can still generate garbage. [Casting List to IEnumerable] Even though we're still doing a foreach over a List, when the list is cast to an interface, the value type enumerator must be boxed, and placed on the heap.
An interesting note: as Reed Copsey pointed out, the List<T>.Enumerator type is actually a struct. This is both good and horrible.
It's good in the sense that calling foreach on a List<T> actually doesn't create garbage, as no new reference type objects are allocated for the garbage collector to worry about.
It's horrible in the sense that suddenly the return value of GetEnumerator is a value type, against almost every .NET developer's intuition (it is generally expected that GetEnumerator will return a non-descript IEnumerator<T>, as this is what is guaranteed by the IEnumerable<T> contract; List<T> gets around this by explicitly implementing IEnumerable<T>.GetEnumerator and publicly exposing a more specific implementation of IEnumerator<T> which happens to be a value type).
So any code that, for example, passes a List<T>.Enumerator to a separate method which in turn calls MoveNext on that enumerator object, faces the potential issue of an infinite loop. Like this:
int CountListMembers<T>(List<T> list)
{
using (var e = list.GetEnumerator())
{
int count = 0;
while (IncrementEnumerator(e, ref count)) { }
return count;
}
}
bool IncrementEnumerator<T>(IEnumerator<T> enumerator, ref int count)
{
if (enumerator.MoveNext())
{
++count;
return true;
}
return false;
}
The above code is very stupid; it's only meant as a trivial example of one scenario in which the return value of List<T>.GetEnumerator can cause highly unintuitive (and potentially disastrous) behavior.
But as I said, it's still kind of good in that it doesn't create garbage ;)
Regardless of whether it's a List<Entity>, List<Entity2D>, or List<IEntity>, GetEnumerator will be called once per foreach. Further, it is irrelevant whether e.g. List<Entity> contains instances of Entity2D. An IEnumerable<T>'s implementation of GetEnumerator may create reference objects which will be collected. As Reed noted, List<T> in MS .NET avoids this by using only value types.
The class List<T> implements IEnumerator<T> explicitly, so that calling GetEnumerator on a variable of type List<T> will cause it to return a List<T>.Enumerator, which has value-type semantics, whereas calling it on a variable of type IEnumerator<T> which holds a reference to a List<T> will cause it to return a value of type IEnumerator<T>, which will have reference semantics.

Would C# benefit from distinctions between kinds of enumerators, like C++ iterators?

I have been thinking about the IEnumerator.Reset() method. I read in the MSDN documentation that it only there for COM interop. As a C++ programmer it looks to me like a IEnumerator which supports Reset is what I would call a forward iterator, while an IEnumerator which does not support Reset is really an input iterator.
So part one of my question is, is this understanding correct?
The second part of my question is, would it be of any benefit in C# if there was a distinction made between input iterators and forward iterators (or "enumerators" if you prefer)? Would it not help eliminate some confusion among programmers, like the one found in this SO question about cloning iterators?
EDIT: Clarification on forward and input iterators. An input iterator only guarantees that you can enumerate the members of a collection (or from a generator function or an input stream) only once. This is exactly how IEnumerator works in C#. Whether or not you can enumerate a second time, is determined by whether or not Reset is supported. A forward iterator, does not have this restriction. You can enumerate over the members as often as you want.
Some C# programmers don't underestand why an IEnumerator cannot be reliably used in a multipass algorithm. Consider the following case:
void PrintContents(IEnumerator<int> xs)
{
while (iter.MoveNext())
Console.WriteLine(iter.Current);
iter.Reset();
while (iter.MoveNext())
Console.WriteLine(iter.Current);
}
If we call PrintContents in this context, no problem:
List<int> ys = new List<int>() { 1, 2, 3 }
PrintContents(ys.GetEnumerator());
However look at the following:
IEnumerable<int> GenerateInts() {
System.Random rnd = new System.Random();
for (int i=0; i < 10; ++i)
yield return Rnd.Next();
}
PrintContents(GenerateInts());
If the IEnumerator supported Reset, in other words supported multi-pass algorithms, then each time you iterated over the collection it would be different. This would be undesirable, because it would be surprising behavior. This example is a bit faked, but it does occur in the real world (e.g. reading from file streams).
Reset was a big mistake. I call shenanigans on Reset. In my opinion, the correct way to reflect the distinction you are making between "forward iterators" and "input iterators" in the .NET type system is with the distinction between IEnumerable<T> and IEnumerator<T>.
See also this answer, where Microsoft's Eric Lippert (in an unofficial capactiy, no doubt, my point is only that he's someone with more credentials than I have to make the claim that this was a design mistake) makes a similar point in comments. Also see also his awesome blog.
Interesting question. My take is that of course C# would benefit. However, it wouldn't be easy to add.
The distinction exists in C++ because of its much more flexible type system. In C#, you don't have a robust generic way to clone objects, which is necessary to represent forward iterators (to support multi-pass iteration). And of course, for this to be really useful, you'd also need to support bidirectional and random-access iterators/enumerators. And to get them all working smoothly, you really need some form of duck-typing, like C++ templates have.
Ultimately, the scopes of the two concepts are different.
In C++, iterators are supposed to represent everything you need to know about a range of values. Given a pair of iterators, I don't need the original container. I can sort, I can search, I can manipulate and copy elements as much as I like. The original container is out of the picture.
In C#, enumerators are not meant to do quite as much. Ultimately, they're just designed to let you run through the sequence in a linear manner.
As for Reset(), it is widely accepted that it was a mistake to add it in the first place. If it had worked, and been implemented correctly, then yes, you could say your enumerator was analogous to forward iterators, but in general, it's best to ignore it as a mistake. And then all enumerators are similar only to input iterators.
Unfortunately.
Coming from the C# perspective:
You almost never use IEnumerator directly. Usually you do a foreach statement, which expects a IEnumerable.
IEnumerable _myCollection;
...
foreach (var item in _myCollection) { /* Do something */ }
You don't pass around IEnumerator either. If you want to pass an collection which needs iteration, you pass IEnumerable. Since IEnumerable has a single function, which returns an IEnumerator, it can be used to iterate the collection multiple times (multiple passes).
There's no need for a Reset() function on IEnumerator because if you want to start over, you just throw away the old one (garbage collected) and get a new one.
The .NET framework would benefit immensely if there were a means of asking an IEnumerator<T> about what abilities it could support and what promises it could make. Such features would also be helpful in IEnumerable<T>, but being able to ask the questions of an enumerator would allow code that can receive an enumerator from wrappers like ReadOnlyCollection to use the underlying collection in improve ways without having to involve the wrapper.
Given any enumerator for a collection that is capable of being enumerated in its entirety and isn't too big, one could produce from it an IEnumerable<T> that would always yield the same sequence of items (specifically the set of items remaining in the enumerator) by reading its entire content to an array, disposing and discarding the enumerator, and getting an enumerators from the array (using that in place of the original abandoned enumerator), wrapping the array in a ReadOnlyCollection<T>, and returning that. Although such an approach would work with any kind of enumerable collection meeting the above criteria, it would be horribly inefficient with most of them. Having a means of asking an enumerator to yield its remaining contents in an immutable IEnumerable<T> would allow many kinds of enumerators to perform the indicated action much more efficiently.
I don't think so. I would call IEnumerable a forward iterator, and an input iterator. It does not allow you to go backwards, or modify the underlying collection. With the addition of the foreach keyword, iterators are almost a non-thought most of the time.
Opinion:
The difference between input iterators (get each one) vs. output iterators (do something to each one) is too trivial to justify an addition to the framework. Also, in order to do an output iterator, you would need to pass a delegate to the iterator. The input iterator seems more natural to C# programmers.
There's also IList<T> if the programmer wants random access.

Categories