I don't quite understand what the use of IEnumerator from the C# Collections is.
What is it used for and why should it be used?
I tried looking online at http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx
but that article doesn't make much sense. The reason I ask is in Unity3d Game Engine, it's used along with the yield function. I am trying to make sense of the reason for the use of IEnumerator.
The IEnumerator interface, when implemented by a class allows it to be iterated through using the built-in foreach syntax.
In the class that needs to be iterated for the IEnumerator interface defines a method signature for the GetEnumerator function that controls looping through the object.
public IEnumerator<OrderLine> GetEnumerator()
{
for (int i=0;i<maxItems;i++)
{
yield return item[i];
}
}
As you can see in this example, the yield statement lets you return control to the caller without losing its place in the enumeration. Control will be passed back to the line after the yield statement when the caller hits the next increment of the foreach loop.
You rarely use IEnumerator explicitly, but there are a lot of extension methods that work with it, as well as the foreach. All of the collections implement it, and it's an example of the Iterator pattern.
It's quite simply just an interface which allows objects to perform operations for easily iterating over collections. You could use it to create objects which iterate over a custom collection with a foreach construct or similiar syntax.
Related
Everytime I've had to return a collection, I've returned a List. I've just read that I should return IEnumerable or similar interface (IQueryable for instance).
The problem I see is that often I want to work with a List. To do that, I'd have to do a .ToList() on the returned result.
Example
//...
List<Guid> listOfGuids = MyMethod().ToList();
//...
public IEnumerable<Guid> MyMethod()
{
using (var context = AccesDataRépart.GetNewContextRépart())
{
return context.MyTable.ToList();
}
}
Is executing a .ToList() twice the right practice.
If the caller actually needs a list, return a list (if that's what you have). Returning an IEnumerable when you already have a list, and when you know the caller is going to need a list, is just being wasteful, and for no real benefit.
If you feel that there is a chance that you'll be changing the underlying type of the object you are returning in future versions of the method it can, potentially, make it a bit easier on the library implementer to return an interface instead, but it's easier on the caller of the method when a more derived type is returned (they have the ability to do more with it than if they are just given an interface).
It is the reverse with input parameters. When passing parameters in the more derived the type the more "power" the library implementer has to work with the type, especially in future revisions, but using a much less restrictive type makes life easier on the caller of your library, as they don't need to convert what they have to what your method accepts.
This makes these decisions something to think about a fair bit when writing a libraries public API. You need to consider how much "power" you need right now, as well as how much you think you might need in the future. Once you know how restrictive/general the types need to be for you to do your job, you can then work to make your methods more convenient to use for callers. There is no one answers that will apply in every case. Saying that you should always return IEnumerable instead of List isn't proper, just the same as saying that you should always return List is also improper. You need to make a judgement call based on the specific situation you are in.
I would recommend just returning a List<T>, or perhaps an IList<T>. The reason that someone might recommend against returning List, is that it locks you in to that implementation. Depending on the usage of the API, that might not be a concern.
My general rule of thumb is to be more permissive in what you accept and more specific in what you return. So, IEnumerable<T> for method parameters, and IList<T>, List<T> or possibly even T[] for method return values.
You don't have to call ToList on the returned value, It is already a List. The reason you can't return IEnumerable is that you have using statement around your DataContext it will be disposed. So modify your method return type as List<T> and then don't call ToList on the returned value.
//...
List<Guid> listOfGuids = MyMethod(); //No ToList here
//...
public List<Guid> MyMethod()
{
using (var context = AccesDataRépart.GetNewContextRépart())
{
return context.MyTable.ToList();
}
}
I've just read that I should return IEnumerable or similar interface
(IQueryable for instance).
Don't worry about that - return IList<> or List<> if you actually need a list object at the point the collection is consumed. The problem with returning IEnumerable can be that no-one knows what the cost of enumerating it is going to be - which is a down-side to the whole Linq concept that doesn't always get fair mention from the people who are encouraging everyone to return IEnumerable everywhere.
It really depends. Do you want to enumerate the collection before or after returning it?
Enumerate before: Every time you call ToList, ToArray, etc. you are enumerating the IEnumerable. If you are doing this many times after it is returned, this can be redundant and wasteful. Either returning it in an already enumerated form (e.g., IList, Array) or enumerating it once after returned and using that for the future processing probably be more preferable.
Enumerate after: Returning an IEnumerable allows you to defer the enumeration of the collection until later (e.g., save processing up front). If it turns out that you never end up enumerating the collection, or you only enumerate a subset of it, then the IEnumerable approach can be very advantageous.
I use the yield return keyword quite a bit, but I find it lacking when I want to add a range to the IEnumerable. Here's a quick example of what I would like to do:
IEnumerable<string> SomeRecursiveMethod()
{
// some code
// ...
yield return SomeRecursiveMethod();
}
Naturally this results in an error, which can be resolved by doing a simple loop. Is there a better way to do this? A loop feels a bit clunky.
No, there isn't I'm afraid. F# does support this with yield!, but there's no equivalent in C# - you have to use the loop, basically. Sorry... I feel your pain. I mentioned it in one of my Edulinq blog posts, where it would have made things simpler.
Note that using yield return recursively can be expensive - see Wes Dyer's post on iterators for more information (and mentioning a "yield foreach" which was under consideration four years ago...)
If you already have an IEnumerable to loop over, and the return type is IEnumerable (as is the case for functions that could use yield return), you can simply return that enumeration.
If you have cases where you need to combine results from multiple IEnumerables, you can use the IEnumerable<T>.Concat extension method.
In your recursive example, though, you need to terminate the enumeration/concatenation based on the contents of the enumeration. I don't think my method will support this.
The yield keyword is indeed very nice. But nesting it in a for loop will cause more glue code to be generated and executed.
If you can live with a less functional style of programming, you can pass a List around to which you append:
void GenerateList(List<string> result)
{
result.Add("something")
// more code.
GenerateList(result);
}
There are lots of Linq algorithms that only need to do one pass through the input e.g. Select.
Yet all the Linq extension methods sit on IEnumerable rather than IEnumerator
var e = new[] { 1, 2, 3, 4, 5 }.GetEnumerator();
e.Select(x => x * x); // Doesn't work
This means you can't use Linq in any situation where you are reading from an "already opened" stream.
This scenario is happening a lot for a project I am currently working on - I want to return an IEnumerator whose IDispose method will close the stream, and have all the downstream Linq code operate on this.
In short, I have an "already opened" stream of results which I can convert into an appropriately disposable IEnumerator - but unfortunately all of the downstream code requires an IEnumerable rather than an IEnumerator, even though it's only going to do one "pass".
i.e. I'm wanting to "implement" this return type on a variety of different sources (CSV files, IDataReaders, etc.):
class TabularStream
{
Column[] Columns;
IEnumerator<object[]> RowStream;
}
In order to get the "Columns" I have to have already opened the CSV file, initiated the SQL query, or whatever. I can then return an "IEnumerator" whose Dispose method closes the resource - but all of the Linq operations require an IEnumerable.
The best workaround I know of is to implement an IEnumerable whose GetEnumerator() method returns the one-and-only IEnumerator and throws an error if something tries to do a GetEnumerator() call twice.
Does this all sound OK or is there a much better way for me to implement "TabularStream" in a way that's easy to use from Linq?
Using IEnumerator<T> directly is rarely a good idea, in my view.
For one thing, it encodes the fact that it's destructive - whereas LINQ queries can usually be run multiple times. They're meant to be side-effect-free, whereas the act of iterating over an IEnumerator<T> is naturally side-effecting.
It also makes it virtually impossible to perform some of the optimizations in LINQ to Objects, such as using the Count property if you're actually asking an ICollection<T> for its count.
As for your workaround: yes, a OneShotEnumerable would be a reasonable approach.
While I generally agree with Jon Skeet's answer, I have also come across a very few cases where working with IEnumerator indeed seemed more appropriate than wrapping them in a once-only-IEnumerable.
I'll start by illustrating one such case and by describing my own solution to the issue.
Case example: Forward-only, non-rewindable database cursors
ESRI's API for accessing geo-databases (ArcObjects) has forward-only database cursors that cannot be reset. They are essentially that API's equivalent of IEnumerator. But there is no equivalent to IEnumerable. So if you want to wrap that API in "the .NET way", you have three options (which I explored in the following order):
Wrap the cursor as an IEnumerator (since that's what it really is) and work directly with that (which is cumbersome).
Wrap the cursor, or the wrapping IEnumerator from (1), as a once-only IEnumerable (to make it LINQ-compatible and generally easier to work with). The mistake here is that it isn't an IEnumerable, because it cannot be enumerated more than once, and this might be overlooked by users or maintainers of your code.
Don't wrap the cursor itself as an IEnumerable, but that which can be used to retrieve a cursor (e.g. the query criteria and the reference to the database object being queried). That way, several iterations are possible simply be re-executing the whole query. This is what I eventually decided on back then.
That last option is the pragmatic solution that I would generally recommend for similar cases (if applicable). If you are looking for other solutions, read on.
Re-implement LINQ query operators for the IEnumerator<T> interface?
It's technically possible to implement some or all of LINQ's query operators for the IEnumerator<T> interface. One approach would be to write a bunch of extension methods, such as:
public static IEnumerator<T> Where(this IEnumerator<T> xs, Func<T, bool> predicate)
{
while (xs.MoveNext())
{
T x = xs.Current;
if (predicate(x)) yield return x;
}
yield break;
}
Let's consider a few key issues:
Operators must never return an IEnumerable<T>, because that would mean that you can break out of your own "LINQ to IEnumerator" world and escape into regular LINQ. There you'd end up with the non-repeatability issue already described above.
You cannot process the results of some query with a foreach loop… unless each of the IEnumerator<T> objects returned by your query operators implements a GetEnumerator method that returns this. Supplying that additional method would mean that you cannot use yield return/break, but have to write IEnumerator<T> classes manually.
This is just plain weird and possibly an abuse of either IEnumerator<T> or the foreach construct.
If returning IEnumerable<T> is forbidden and returning IEnumerator<T> is cumbersome (because foreach doesn't work), why not return plain arrays? Because then queries can no longer be lazy.
IQueryable + IEnumerator = IQueryator
What about delaying the execution of a query until it has been fully composed? In the IEnumerable world, that is what IQueryable does; so we could theoretically build an IEnumerator equivalent, which I shall call IQueryator.
IQueryator could check for logical errors, such as doing anything with the sequence after it has been completely consumed by a preceding operation like Count. I.e. all-consuming operators like Count would always have to be the last in a query operator concatenation.
IQueryator could return an array (like suggested above) or some other read-only collection, but not by the indiviual operators; only when the query gets executed.
Implementing IQueryator would take quite some time... the question is, would it actually be worth the effort?
I have been thinking about the IEnumerator.Reset() method. I read in the MSDN documentation that it only there for COM interop. As a C++ programmer it looks to me like a IEnumerator which supports Reset is what I would call a forward iterator, while an IEnumerator which does not support Reset is really an input iterator.
So part one of my question is, is this understanding correct?
The second part of my question is, would it be of any benefit in C# if there was a distinction made between input iterators and forward iterators (or "enumerators" if you prefer)? Would it not help eliminate some confusion among programmers, like the one found in this SO question about cloning iterators?
EDIT: Clarification on forward and input iterators. An input iterator only guarantees that you can enumerate the members of a collection (or from a generator function or an input stream) only once. This is exactly how IEnumerator works in C#. Whether or not you can enumerate a second time, is determined by whether or not Reset is supported. A forward iterator, does not have this restriction. You can enumerate over the members as often as you want.
Some C# programmers don't underestand why an IEnumerator cannot be reliably used in a multipass algorithm. Consider the following case:
void PrintContents(IEnumerator<int> xs)
{
while (iter.MoveNext())
Console.WriteLine(iter.Current);
iter.Reset();
while (iter.MoveNext())
Console.WriteLine(iter.Current);
}
If we call PrintContents in this context, no problem:
List<int> ys = new List<int>() { 1, 2, 3 }
PrintContents(ys.GetEnumerator());
However look at the following:
IEnumerable<int> GenerateInts() {
System.Random rnd = new System.Random();
for (int i=0; i < 10; ++i)
yield return Rnd.Next();
}
PrintContents(GenerateInts());
If the IEnumerator supported Reset, in other words supported multi-pass algorithms, then each time you iterated over the collection it would be different. This would be undesirable, because it would be surprising behavior. This example is a bit faked, but it does occur in the real world (e.g. reading from file streams).
Reset was a big mistake. I call shenanigans on Reset. In my opinion, the correct way to reflect the distinction you are making between "forward iterators" and "input iterators" in the .NET type system is with the distinction between IEnumerable<T> and IEnumerator<T>.
See also this answer, where Microsoft's Eric Lippert (in an unofficial capactiy, no doubt, my point is only that he's someone with more credentials than I have to make the claim that this was a design mistake) makes a similar point in comments. Also see also his awesome blog.
Interesting question. My take is that of course C# would benefit. However, it wouldn't be easy to add.
The distinction exists in C++ because of its much more flexible type system. In C#, you don't have a robust generic way to clone objects, which is necessary to represent forward iterators (to support multi-pass iteration). And of course, for this to be really useful, you'd also need to support bidirectional and random-access iterators/enumerators. And to get them all working smoothly, you really need some form of duck-typing, like C++ templates have.
Ultimately, the scopes of the two concepts are different.
In C++, iterators are supposed to represent everything you need to know about a range of values. Given a pair of iterators, I don't need the original container. I can sort, I can search, I can manipulate and copy elements as much as I like. The original container is out of the picture.
In C#, enumerators are not meant to do quite as much. Ultimately, they're just designed to let you run through the sequence in a linear manner.
As for Reset(), it is widely accepted that it was a mistake to add it in the first place. If it had worked, and been implemented correctly, then yes, you could say your enumerator was analogous to forward iterators, but in general, it's best to ignore it as a mistake. And then all enumerators are similar only to input iterators.
Unfortunately.
Coming from the C# perspective:
You almost never use IEnumerator directly. Usually you do a foreach statement, which expects a IEnumerable.
IEnumerable _myCollection;
...
foreach (var item in _myCollection) { /* Do something */ }
You don't pass around IEnumerator either. If you want to pass an collection which needs iteration, you pass IEnumerable. Since IEnumerable has a single function, which returns an IEnumerator, it can be used to iterate the collection multiple times (multiple passes).
There's no need for a Reset() function on IEnumerator because if you want to start over, you just throw away the old one (garbage collected) and get a new one.
The .NET framework would benefit immensely if there were a means of asking an IEnumerator<T> about what abilities it could support and what promises it could make. Such features would also be helpful in IEnumerable<T>, but being able to ask the questions of an enumerator would allow code that can receive an enumerator from wrappers like ReadOnlyCollection to use the underlying collection in improve ways without having to involve the wrapper.
Given any enumerator for a collection that is capable of being enumerated in its entirety and isn't too big, one could produce from it an IEnumerable<T> that would always yield the same sequence of items (specifically the set of items remaining in the enumerator) by reading its entire content to an array, disposing and discarding the enumerator, and getting an enumerators from the array (using that in place of the original abandoned enumerator), wrapping the array in a ReadOnlyCollection<T>, and returning that. Although such an approach would work with any kind of enumerable collection meeting the above criteria, it would be horribly inefficient with most of them. Having a means of asking an enumerator to yield its remaining contents in an immutable IEnumerable<T> would allow many kinds of enumerators to perform the indicated action much more efficiently.
I don't think so. I would call IEnumerable a forward iterator, and an input iterator. It does not allow you to go backwards, or modify the underlying collection. With the addition of the foreach keyword, iterators are almost a non-thought most of the time.
Opinion:
The difference between input iterators (get each one) vs. output iterators (do something to each one) is too trivial to justify an addition to the framework. Also, in order to do an output iterator, you would need to pass a delegate to the iterator. The input iterator seems more natural to C# programmers.
There's also IList<T> if the programmer wants random access.
An interview question for a .NET 3.5 job is "What is the difference between an iterator and an enumerator"?
This is a core distinction to make, what with LINQ, etc.
Anyway, what is the difference? I can't seem to find a solid definition on the net. Make no mistake, I can find the meaning of the two terms but I get slightly different answers. What would be the best answer for an interview?
IMO an iterator "iterates" over a collection, and an enumerator provides the functionality to iterate, but this has to be called.
Also, using the yield keyword is said to save state. What exactly is this state? Is there an example of this benefit occurring?
Iterating means repeating some steps, while enumerating means going through all values in a collection of values. So enumerating usually requires some form of iteration.
In that way, enumerating is a special case of iterating where the step is getting a value from a collection.
Note the "usually" – enumerating may also be performed recursively, but recursion and iteration are so closely related that I would not care about this small difference.
You may also enumerate values you do not explicitly store in a collection. For example, you can enumerate the natural number, primes, or whatever but you would calculate these values during the enumeration and not retrieve them from a physical collection. You understand this case as enumerating a virtual collection with its values defined by some logic.
I assume Reed Copsey got the point. In C# there are two major ways to enumerate something.
Implement Enumerable and a class implementing IEnumerator
Implement an iterator with the yield statement
The first way is harder to implement and uses objects for enumerating. The second way is easier to implement and uses continuations.
In C# 2+, iterators are a way for the compiler to automatically generate the IEnumerable and/or IEnumerable<T> interfaces for you.
Without iterators, you would need to create a class implementing IEnumerator, including Current, MoveNext, and Reset. This requires a fair amount of work. Normally, you would create a private class that implemtented IEnumerator<T> for your type, then yourClass.GetEnumerator() would construct that private class, and return it.
Iterators are a way for the compiler to automatically generate this for you, using a simple syntax (yield). This lets you implement GetEnumerator() directly in your class, without a second class (The IEnumerator) being specified by you. The construction of that class, with all of its members, is done for you.
Iterators are very developer friendly - things are done in a very efficient way, with much less effort.
When you use foreach, the two will behave identically (provided you write your custom IEnumerator correctly). Iterators just make life much simpler.
What C# calls an iterator is more commonly (outside of the C# world) called a generator or generator function (e.g. in Python). A generator function is a specialized case of coroutine. A C# iterator (generator) is a special form of an enumerator (a data type implementing the IEnumerable interface).
I dislike this usage of the term iterator for a C# generator because it is just as much an enumerator as it is an iterator. Too late for Microsoft to change its mind though.
For contrast consider that in C++ an iterator is a value which is used primarily to access sequential elements in a collection. It can be advanced, derferenced to retrieve a value, and tested to see whether the end of the collection has been reached.
"Whereas a foreach statement is the consumer of the enumerator, an iterator is the producer of the enumerator."
The above is how "C# 5.0 In A NutShell" explains it, and has been helpful for me.
In other words, the foreach statement uses MoveNext(), and the Current property of the IEnumerator to iterate through a sequence, while the iterator is used to produce the implementation of the IEnumerator that will be used by the foreach statement. In C#, when you write an iterator method containing a yield statement, the compiler will generate a private enumerator for you. And when you iterate through the items in the sequence, it will call the MoveNext() and Current property of the private enumerator. These methods/properties are implemented by your code in the iterator method that will be called repeately to yield values until there are not values left to yield.
This is my understanding of how C# define enumerators, and iterators.
To understand iterators we first need to understand enumerators.
Enumerators are specialist objects which provide one with the means to move through an ordered list of items one at a time (the same kind of thing is sometimes called a ‘cursor’). The .NET framework provides two important interfaces relating to enumerators: IEnumerator and IEnumerable. Objects which implement IEnumerator are themselves enumerators; they support the following members:
the property Current, which points to a position on the list
the method MoveNext, which moves the Current item one along the list
the method Reset, which moves the Current item to its initial position (which is before the first item).
On the other hand, Iterаtors implement the enumerаtor pаttern. .NET 2.0 introduced the iterаtor, which is а compiler-mаnifested enumerаtor. When the enumerаble object cаlls GetEnumerаtor, either directly or indirectly, the compiler generаtes аnd returns аn аppropriаte iterаtor object. Optionаlly, the iterаtor cаn be а combined enumerаble аnd enumerаtor object.
The essentiаl ingredient of аn iterаtor block is the yield stаtement. There is one big difference between iterаtors аnd enumerаtors: Iterаtors do not implement the Reset method. Cаlling the Reset method on аn iterаtor cаuses аn exception.
The point of iterators is to allow the easy implementation of enumerators. Where a method needs to return either an enumerator or an enumerable class for an ordered list of items, it is written so as to return each item in its correct order using the ‘yield’ statement.
Since no examples were given, here is one that was helpful to me.
An enumerator is an object that you get when you call .GetEnumerator() on a class or type that implements the IEnumerator interface. When this interface is implemented, you have created all the code necessary for the compilor to enable you to use foreach to "iterate" over your collection.
Don't get that word 'iterate" confused with iterator though. Both the Enumerator and the iterator allow you to "iterate". Enumerating and iterating are basically the same process, but are implemented differently. Enumerating means you've impleneted the IEnumerator interface. Iterating means you've created the iterator construct in your class (demonstrated below), and you are calling foreach on your class, at which time the compilor automatically creates the enumerator functionality for you.
Also note that you don't have to do squat with your enumerator. You can call MyClass.GetEnumerator() all day long, and do nothing with it (example:
IEnumerator myEnumeratorThatIWillDoNothingWith = MyClass.GetEnumerator()).
Note too that your iterator construct in your class only realy gets used when you are actually using it, i.e. you've called foreach on your class.
Here is an iterator example from msdn:
public class DaysOfTheWeek : System.Collections.IEnumerable
{
string[] days = { "Sun", "Mon", "Tue", "Wed", "Thr", "Fri", "Sat" };
//This is the iterator!!!
public System.Collections.IEnumerator GetEnumerator()
{
for (int i = 0; i < days.Length; i++)
{
yield return days[i];
}
}
}
class TestDaysOfTheWeek
{
static void Main()
{
// Create an instance of the collection class
DaysOfTheWeek week = new DaysOfTheWeek();
// Iterate with foreach - this is using the iterator!!! When the compiler
//detects your iterator, it will automatically generate the Current,
//MoveNext and Dispose methods of the IEnumerator or IEnumerator<T> interface
foreach (string day in week)
{
System.Console.Write(day + " ");
}
}
}
// Output: Sun Mon Tue Wed Thr Fri Sat
"Iterators are a new feature in C# 2.0. An iterator is a method, get accessor or operator that enables you to support foreach iteration in a class or struct without having to implement the entire IEnumerable interface. Instead, you provide just an iterator, which simply traverses the data structures in your class. When the compiler detects your iterator, it will automatically generate the Current, MoveNext and Dispose methods of the IEnumerable or IEnumerable interface." - msdn
Enumeration deals with objects while iteration deals with values only. Enumeration is used when we use vector hashtable etc while iteration are used in while loop for loop etc. I've never use the yield keyword so I couldn't tell you.
Iteration deals with arrays and strings while enumerating deals with objects
In JavaScript you can iterate an array or a string with :
forEach Loop
for loop
for of Loop
do while Loop
while Loop
And you can enumerate an object with :
for in Loop
Object.keys() Method
Object.values() Method
Object.entries() Method