How does IEnumerable<T> work in background - c#

I am wandering about the more in-depth functionality of the IEnumerable<T> interface.
Basically, it works as an intermediary step in execution. For example, if you write:
IEnumerable<int> temp = new int[]{1,2,3}.Select(x => 2*x);
The result of the Select function will not be calculated (enumerated) until something is done with temp to allow it (such as List<int> list = temp.ToList()).
However, what puzzles me is, since IEnumerable<T> is an interface, it cannot, by definition, be instantiated. So, what is the collection the actual items (in the example 2*x items) reside in?
Moreover, if we were to write IEnumerable<int> temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
I cannot seem to find a thorough (more in-depth) explanation as to the actual implementation of this interface and its functionality (for example, if there is an underlying collection, how does the yield keyword work).
Basically, what I am asking for is a more elaborate explanation on the functionality of IEnumerable<T>.

Implementation shouldn't matter. All these (LINQ) methods return IEnumerable<T>, interface members are the only members you can access, and that should be enough to use them.
However, if you really have to know, you can find actual implementations on http://sourceof.net.
Enumerable.cs
But, for some of the methods you won't be able to find explicit class declaration, because some of them use yield return, which means proper class (with state machine) is generated by compiler during compilation. e.g. Enumerable.Repeat is implemented that way:
public static IEnumerable<int> Range(int start, int count) {
long max = ((long)start) + count - 1;
if (count < 0 || max > Int32.MaxValue)
throw Error.ArgumentOutOfRange("count");
return RangeIterator(start, count);
}
static IEnumerable<int> RangeIterator(int start, int count) {
for (int i = 0; i < count; i++)
yield return start + i;
}
You can read more about that on MSDN: Iterators (C# and Visual Basic)

Not all objects that implement IEnumerable defer execution in some way. The API of the interface makes it possible to defer execution, but it doesn't require it. There are likewise implementations that don't defer execution in any way.
So, what is the collection the actual items (in the example 2*x items) reside in?
There is none. Whenever the next value is requested it computes that one value on demand, gives it to the caller, and then forgets the value. It doesn't store it anywhere else.
Moreover, if we were to write IEnumerable<int> temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
There wouldn't be one. It would compute each new value immediately when you ask for the next value and it won't remember it afterward. It only stores enough information to be able to compute the next value, which means it only needs to store the element and the number of values left to yield.
While the actual .NET implementations will use much more concise means of creating such a type, creating an enumerable that defers execution is not particularly hard. Doing so even the long way is more tedious than difficult. You simply compute the next value in the MoveNext method of the iterator. In the example you asked of, Repeat, this is easy as you only need to compute if there is another value, not what it is:
public class Repeater<T> : IEnumerator<T>
{
private int count;
private T element;
public Repeater(T element, int count)
{
this.element = element;
this.count = count;
}
public T Current { get { return element; } }
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose() { }
public bool MoveNext()
{
if (count > 0)
{
count--;
return true;
}
else
return false;
}
public void Reset()
{
throw new NotSupportedException();
}
}
(I've omitted an IEnumerable type that just returns a new instance of this type, or a static Repeat method that creates a new instance of that enumerable. There isn't anything particularly interesting to see there.)
A slightly more interesting example would be something like Count:
public class Counter : IEnumerator<int>
{
private int remaining;
public Counter(int start, int count)
{
Current = start;
this.remaining = count;
}
public int Current { get; private set; }
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose() { }
public bool MoveNext()
{
if (remaining > 0)
{
remaining--;
Current++;
return true;
}
else
return false;
}
public void Reset()
{
throw new NotSupportedException();
}
}
Here we're not only computing if we have another value, but what that next value is, each time a new value is requested of us.

So, what is the collection the actual items (in the example 2*x items) reside in?
It is not residing anywhere. There is code that will produce the individual items "on demand" when you iterate, but the 2*x numbers are not computed upfront. They are also not stored anywhere, unless you call ToList or ToArray.
Moreover, if we were to write IEnumerable temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
The same picture is here: the returned implementation of IEnumerable is not public, and it returns its items on demand, without storing them anywhere.
C# compiler provides a convenient way to implement IEnumerable without defining a class for it. All you need is to declare your method return type as IEnumerable<T>, and use yield return to supply values on as-needed basis.

Related

What is the default concrete type of IEnumerable

(Sorry for the vague title; couldn't think of anything better. Feel free to rephrase.)
So let's say my function or property returns an IEnumerable<T>:
public IEnumerable<Person> Adults
{
get
{
return _Members.Where(i => i.Age >= 18);
}
}
If I run a foreach on this property without actually materializing the returned enumerable:
foreach(var Adult in Adults)
{
//...
}
Is there a rule that governs whether IEnumerable<Person> will be materialized to array or list or something else?
Also is it safe to cast Adults to List<Person> or Array without calling ToList() or ToArray()?
Edit
Many people have spent a lot of effort into answering this question. Thanks to all of them. However, the gist of this question still remains unanswered. Let me put in some more details:
I understand that foreach doesn't require the target object to be an array or list. It doesn't even need to be a collection of any kind. All it needs the target object to do is to implement enumeration. However if I place inspect the value of target object, it reveals that the actual underlying object is List<T> (just like it shows object (string) when you inspect a boxed string object). This is where the confusion starts. Who performed this materialization? I inspected the underlying layers (Where() function's source) and it doesn't look like those functions are doing this.
So my problem lies at two levels.
First one is purely theoretical. Unlike many other disciplines like physics and biology, in computer sciences we always know precisely how something works (answering #zzxyz's last comment); so I was trying to dig about the agent who created List<T> and how it decided it should choose a List and not an Array and if there is a way of influencing that decision from our code.
My second reason was practical. Can I rely on the type of actual underlying object and cast it to List<T>? I need to use some List<T> functionality and I was wondering if for example ((List<Person>)Adults).BinarySearch() is as safe as Adults.ToList().BinarySearch()?
I also understand that it isn't going to create any performance penalty even if I do call ToList() explicitly. I was just trying to understand how it is working. Anyway, thanks again for the time; I guess I have spent just too much time on it.
In general terms all you need for a foreach to work is to have an object with an accessible GetEnumerator() method that returns an object that has the following methods:
void Reset()
bool MoveNext()
T Current { get; private set; } // where `T` is some type.
You don't even need an IEnumerable or IEnumerable<T>.
This code works as the compiler figures out everything it needs:
void Main()
{
foreach (var adult in new Adults())
{
Console.WriteLine(adult.ToString());
}
}
public class Adult
{
public override string ToString() => "Adult!";
}
public class Adults
{
public class Enumerator
{
public Adult Current { get; private set; }
public bool MoveNext()
{
if (this.Current == null)
{
this.Current = new Adult();
return true;
}
this.Current = null;
return false;
}
public void Reset() { this.Current = null; }
}
public Enumerator GetEnumerator() { return new Enumerator(); }
}
Having a proper enumerable makes the process work more easily and more robustly. The more idiomatic version of the above code is:
public class Adults
{
private class Enumerator : IEnumerator<Adult>
{
public Adult Current { get; private set; }
object IEnumerator.Current => this.Current;
public void Dispose() { }
public bool MoveNext()
{
if (this.Current == null)
{
this.Current = new Adult();
return true;
}
this.Current = null;
return false;
}
public void Reset()
{
this.Current = null;
}
}
public IEnumerator<Adult> GetEnumerator()
{
return new Enumerator();
}
}
This enables the Enumerator to be a private class, i.e. private class Enumerator. The interface then does all of the hard work - it's not even possible to get a reference to the Enumerator class outside of Adults.
The point is that you do not know at compile-time what the concrete type of the class is - and if you did you may not even be able to cast to it.
The interface is all you need, and even that isn't strictly true if you consider my first example.
If you want a List<Adult> or an Adult[] you must call .ToList() or .ToArray() respectively.
There is no such thing as a default concrete type for any interface.
The entire point of an interface is to guarantee properties, methods, events or indexers, without the user need of any knowledge of the concrete type that implements it.
When using an interface, all you can know is the properties, methods, events and indexers this interface declares, and that's all you actually need to know. That's just another aspect of encapsulation - same as when you are using a method of a class you don't need to know the internal implementation of that method.
To answer your question in the comments:
who decides that concrete type in case we don't, just as I did above?
That's the code that created the instance that's implementing the interface.
Since you can't do var Adults = new IEnumerable<Person> - it has to be a concrete type of some sort.
As far as I see in the source code for linq's Enumerable extensions - the where returns either an instance of Iterator<TSource> or an instance of WhereEnumerableIterator<TSource>. I didn't bother checking further what exactly are those types, but I can pretty much guarantee they both implement IEnumerable, or the guys at Microsoft are using a different c# compiler then the rest of us... :-)
The following code hopefully highlights why neither you nor the compiler can assume an underlying collection:
public class OneThroughTen : IEnumerable<int>
{
private static int bar = 0;
public IEnumerator<int> GetEnumerator()
{
while (true)
{
yield return ++bar;
if (bar == 10)
{ yield break; }
}
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}
class Program
{
static void Main(string[] args)
{
IEnumerable<int> x = new OneThroughTen();
foreach (int i in x)
{ Console.Write("{0} ", i); }
}
}
Output being, of course:
1 2 3 4 5 6 7 8 9 10
Note, the code above behaves extremely poorly in the debugger. I don't know why. This code behaves just fine:
public IEnumerator<int> GetEnumerator()
{
while (bar < 10)
{
yield return ++bar;
}
bar = 0;
}
(I used static for bar to highlight that not only does the OneThroughTen not have a specific collection, it doesn't have any collection, and in fact has no instance data whatsoever. We could just as easily return 10 random numbers, which would've been a better example, now that I think on it :))
From your edited question and comments it sounds like you understand the general concept of using IEnumerable, and that you cannot assume that "a list object backs all IEnumerable objects". Your real question is about something that has confused you in the debugger, but we've not really been able to understand exactly what it is you are seeing. Perhaps a screenshot would help?
Here I have 5 IEnumerable<int> variables which I assign in various ways, along with how the "Watch" window describes them. Does this show the confusion you are having? If not, can you construct a similarly short program and screenshot that does?
Coming a bit late into the party here :)
Actually Linq's "Where" decides what's going to be the underlying implementation of IEnumerable's GetEnumerator.
Look at the source code:
https://github.com/dotnet/runtime/blob/918e6a9a278bc66fb191c43d4db4a71e63ffad31/src/libraries/System.Linq/src/System/Linq/Where.cs#L59
You'll see that based on the "source" type, the methods return "WhereSelectArrayIterator" or "WhereSelectListIterator" or a more generic "WhereSelectEnumerableSelector".
Each of this objects implement the GetEnumerator over an Array, or a List, so I'm pretty sure that's why you see the underlying object type being one of these on VS inspector.
Hope this helps clarifying.
I have been digging into this myself. I believe the 'underlying type' is an iterator method, not an actual data structure type.
An iterator method defines how to generate the objects in a sequence
when requested.
https://learn.microsoft.com/en-us/dotnet/csharp/iterators#enumeration-sources-with-iterator-methods
In my usecase/testing, the iterator is System.Linq.Enumerable.SelectManySingleSelectorIterator. I don't think this is a collection data type. It is a method that can enumerate IEnumerables.
Here is a snippet:
public IEnumerable<Item> ItemsToBuy { get; set; }
...
ItemsToBuy = Enumerable.Range(1, rng.Next(1, 20))
.Select(RandomItem(rng, market))
.SelectMany(e => e);
The property is IEnumerable and .SelectMany returns IEnumerable. So what is the actual collection data structure? I don't think there is one in how I am interpreting 'collection data structure'.
Also is it safe to cast Adults to List or Array without
calling ToList() or ToArray()?
Not for me. When attempting to cast ItemsToBuy collection in a foreach loop I get the following runtime exception:
{"Unable to cast object of type
'SelectManySingleSelectorIterator2[System.Collections.Generic.IEnumerable1[CashMart.Models.Item],CashMart.Models.Item]'
to type 'CashMart.Models.Item[]'."}
So I could not cast, but I could .ToArray(). I do suspect there is a performance hit as I would think that the IEnumerable would have to 'do things' to make it an array, including memory allocation for the array even if the entities are already in memory.
However if I place inspect the value of target object, it reveals that
the actual underlying object is List
This was not my experience and I think it may depend on the IEnumerable source as well as the LinQ provider. If I add a where, the returned iterator is:
System.Linq.Enumerable.WhereEnumerableIterator
I am unsure what your _Member source is, but using LinQ-to-Objects, I get an iterator. LinQ-to-Entities must call the database and store the result set in memory somehow and then enumerate on that result. I would doubt that it internally makes it a List, but I don't know much. I suspect instead that _Members may be a List somewhere else in your code thus, even after the .Where, it shows as a List.

How to get a reference to a part of a list without creating a copy in C#

I have a collection class derived from List<T> and need to perform a packaged action on an instance of that collection (i.e. the collection is too large, so I want to split the collection into parts and execute the action on the parts). Please note that action has a typed signature, it is some predefined method which expects a collection of the same type.
I do know
target.addRange(source.getRange(start, packageSize);
but that's not what i want since for this I need a new collection instance and will create copies of the list entries. Since I do know that the action will not manipulate the list in any way I'd preferr to do something like
action(source.Reference(fromIndex, toIndex);
with the intention to not create copies of the list entries (I know these are 'only' references, but still there is a copy. Is there any way to do that in C#?
Of course I could pass a method to action which knows how to retrieve the range, but action should not need to know about my intention to subdivide the execution into parts.
Instead of making a copy of the references into a new list, you can use LINQ, or just plain C# to get a part from the list.
Using Skip and Take:
var iterator = source.Skip(start).Take(end - start);
Since it is an IEnumerable, you can foreach over that from within the method called (so you might need to change your method signature):
foreach (var x in iterator)
{
...
}
Or create your own state machine. The resulting enumerable can be passed along to the method too:
private static IEnumerable<T> Take<T>(List<T> source, int start, int end)
{
for (int i = start; i < end; i++)
{
yield return source[i];
}
}
Just use Linq:
var notAList = source.Skip(fromIndex).Take(toIndex-fromIndex);
action(notAList);
This way you create enumerator which will enumerate your numbers without instantiating space for all of them.
You can use LINQ's Skip() and Take() for that:
public void MyAction<T>(IEnumerable<T> range)
{
// ...
}
and call
MyAction(list.Skip(fromIndex).Take(toIndex-fromIndex));
Note that this operators (Skip and Take) use deferred execution and evaluated lazy. So you indeed can think of this as a kind of reference to the range in the source list.
You can create own iterator:
public class MyPartListIterator: IEnumerator
{
private readonly IList list;
private readonly int _startIdx;
private readonly int _endIdx;
private int _current;
public MyIterator(IList list, int start, int end)
{
//do some validations before etc
this._startIdx = this._current = start;
this._endIdx = end;
this._list= list;
}
public bool MoveNext()
{
//do some checks against list was changed
if (this._current >= this._endIdx) return false;
this._current += 1;
}
public object Current => this._list[this._current];
public void Reset() => this._current = this._startidx;
}
Then you can pass instance of iterator to your action.
Why it is better than Skip and Take? It doesn't enumerate elements before starting index.
You can also prepare generic version of course.
Wrtten from memory, not tested. Used C# 6.0 syntax sugars.
EDIT:
It is not possible to pass the same reference of IList<T> as your big ILIst<T> and that contains only part data of source list.
This is where you can use Extensions to get the functionality you're looking for.
public static class ListExtensions
{
public static void PerformActionOnSubset<T>(this IList<T> collection, int fromIndex,
int toIndex, Action<IList<T>> action)
{
action(collection.Skip(fromIndex).Take(toIndex - fromIndex).ToList());
}
}
myCollectionOfSomethings.PerformActionOnSubset(10, 30, myAction);

GetIterator() and the iterator pattern

I'm trying to implement the Iterator pattern.
Basically, from what I understand, it makes a class "foreachble" and makes the code more secure by not revealing the exact collection type to the user.
I have been experimenting a bit and I found out that if I implement
IEnumerator GetEnumerator() in my class, I get the desired result ... seemingly sparing the headache of messing around with realizing interfaces.
Here is a glimpse to what I mean:
public class ListUserLoggedIn
{
/*
stuff
*/
public List<UserLoggedIn> UserList { get; set; }
public IEnumerator<UserLoggedIn> GetEnumerator()
{
foreach (UserLoggedIn user in this.UserList)
{
yield return user;
}
}
public void traverse()
{
foreach (var item in ListUserLoggedIn.Instance)
{
Console.Write(item.Id);
}
}
}
I guess my question is, is this a valid example of Iterator?
If yes, why is this working, and what can I do to make the iterator return only a part or an anonymous object via "var".
If not, what is the correct way ...
First a smaller and simplified self-contained version:
class Program
{
public IEnumerator<int> GetEnumerator() // IEnumerable<int> works too.
{
for (int i = 0; i < 5; i++)
yield return i;
}
static void Main(string[] args)
{
var p = new Program();
foreach (int x in p)
{
Console.WriteLine(x);
}
}
}
And the 'strange' thing here is that class Program does not implement IEnumerable.
The specs from Ecma-334 say:
§ 8.18 Iterators
The foreach statement is used to iterate over the elements of an enumerable collection. In order to be enumerable, a collection shall have a parameterless GetEnumerator method that returns an enumerator.
So that's why foreach() works on your class. No mention of IEnumerable. But how does the GetEnumerator() produce something that implements Current and MoveNext ? From the same section:
An iterator is a statement block that yields an ordered sequence of values. An iterator is distinguished from a normal statement block by the presence of one or more yield statements
It is important to understand that an iterator is not a kind of member, but is a means of implementing a function member
So the body of your method is an iterator-block, the compiler checks a number of constraints (the method must return an IEnumerable or IEnumerator) and then implements the IEnumerator members for you.
And to the deeper "why", I just learned something too. Based on an annotation by Eric Lippert in "The C# Programming Language 3rd", page 369:
This is called the "pattern-based approach" and it dates from before generics. An interface based approach in C#1 would have been totally based on passing object around and value types would always have had to be boxed. The pattern approach allows
foreach (int x in myIntCollection)
without generics and without boxing. Neat.

The wonders of the yield keyword

Ok, as I was poking around with building a custom enumerator, I had noticed this behavior that concerns the yield
Say you have something like this:
public class EnumeratorExample
{
public static IEnumerable<int> GetSource(int startPoint)
{
int[] values = new int[]{1,2,3,4,5,6,7};
Contract.Invariant(startPoint < values.Length);
bool keepSearching = true;
int index = startPoint;
while(keepSearching)
{
yield return values[index];
//The mind reels here
index ++
keepSearching = index < values.Length;
}
}
}
What makes it possible underneath the compiler's hood to execute the index ++ and the rest of the code in the while loop after you technically do a return from the function?
The compiler rewrites the code into a state machine. The single method you wrote is split up into different parts. Each time you call MoveNext (either implicity or explicitly) the state is advanced and the correct block of code is executed.
Suggested reading if you want to know more details:
The implementation of iterators in C# and its consequences - Raymond Chen
Part 1
Part 2
Part 3
Iterator block implementation details: auto-generated state machines - Jon Skeet
Eric Lippert's blog
The compiler generates a state-machine on your behalf.
From the language specification:
10.14 Iterators
10.14.4 Enumerator objects
When a function member returning an
enumerator interface type is
implemented using an iterator block,
invoking the function member does not
immediately execute the code in the
iterator block. Instead, an enumerator
object is created and returned. This
object encapsulates the code specified
in the iterator block, and execution
of the code in the iterator block
occurs when the enumerator object’s
MoveNext method is invoked. An
enumerator object has the following
characteristics:
• It implements
IEnumerator and IEnumerator, where
T is the yield type of the iterator.
• It implements System.IDisposable.
• It is initialized with a copy of the
argument values (if any) and instance
value passed to the function member.
• It has four potential states,
before, running, suspended, and after,
and is initially in the before state.
An enumerator object is typically an
instance of a compiler-generated
enumerator class that encapsulates the
code in the iterator block and
implements the enumerator interfaces,
but other methods of implementation
are possible. If an enumerator class
is generated by the compiler, that
class will be nested, directly or
indirectly, in the class containing
the function member, it will have
private accessibility, and it will
have a name reserved for compiler use
(§2.4.2).
To get an idea of this, here's how Reflector decompiles your class:
public class EnumeratorExample
{
// Methods
public static IEnumerable<int> GetSource(int startPoint)
{
return new <GetSource>d__0(-2) { <>3__startPoint = startPoint };
}
// Nested Types
[CompilerGenerated]
private sealed class <GetSource>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
// Fields
private int <>1__state;
private int <>2__current;
public int <>3__startPoint;
private int <>l__initialThreadId;
public int <index>5__3;
public bool <keepSearching>5__2;
public int[] <values>5__1;
public int startPoint;
// Methods
[DebuggerHidden]
public <GetSource>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
}
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<values>5__1 = new int[] { 1, 2, 3, 4, 5, 6, 7 };
this.<keepSearching>5__2 = true;
this.<index>5__3 = this.startPoint;
while (this.<keepSearching>5__2)
{
this.<>2__current = this.<values>5__1[this.<index>5__3];
this.<>1__state = 1;
return true;
Label_0073:
this.<>1__state = -1;
this.<index>5__3++;
this.<keepSearching>5__2 = this.<index>5__3 < this.<values>5__1.Length;
}
break;
case 1:
goto Label_0073;
}
return false;
}
[DebuggerHidden]
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
EnumeratorExample.<GetSource>d__0 d__;
if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2))
{
this.<>1__state = 0;
d__ = this;
}
else
{
d__ = new EnumeratorExample.<GetSource>d__0(0);
}
d__.startPoint = this.<>3__startPoint;
return d__;
}
[DebuggerHidden]
IEnumerator IEnumerable.GetEnumerator()
{
return this.System.Collections.Generic.IEnumerable<System.Int32>.GetEnumerator();
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
void IDisposable.Dispose()
{
}
// Properties
int IEnumerator<int>.Current
{
[DebuggerHidden]
get
{
return this.<>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden]
get
{
return this.<>2__current;
}
}
}
}
Yield is magic.
Well, not really. The compiler generates a full class to generate the enumeration that you're doing. It's basically sugar to make your life simpler.
Read this for an intro.
EDIT: Wrong this. Link changed, check again if you have once.
That's one of the most complex parts of the C# compiler. Best read the free sample chapter of Jon Skeet's C# in Depth (or better, get the book and read it :-)
Implementing iterators the easy way
For further explanations see Marc Gravell's answer here:
Can someone demystify the yield keyword?
Here is an excellent blog series (from Microsoft veteran Raymond Chen) that details how yield works:
https://devblogs.microsoft.com/oldnewthing/20080812-00/?p=21273
https://devblogs.microsoft.com/oldnewthing/20080813-00/?p=21253
https://devblogs.microsoft.com/oldnewthing/20080814-00/?p=21243
https://devblogs.microsoft.com/oldnewthing/20080815-00/?p=21223

Why can't IEnumerator's be cloned?

In implementing a basic Scheme interpreter in C# I discovered, to my horror, the following problem:
IEnumerator doesn't have a clone method! (or more precisely, IEnumerable can't provide me with a "cloneable" enumerator).
What I'd like:
interface IEnumerator<T>
{
bool MoveNext();
T Current { get; }
void Reset();
// NEW!
IEnumerator<T> Clone();
}
I cannot come up with an implementation of IEnumerable that would not be able to supply an efficiently cloneable IEnumerator (vectors, linked lists, etc. all would be able to provide a trivial implementation of IEnumerator's Clone() as specified above... it would be easier than providing a Reset() method anyway!).
The absence of the Clone method means that any functional/recursive idiom of enumerating over a sequence won't work.
It also means I can't "seamlessly" make IEnumerable's behave like Lisp "lists" (for which you use car/cdr to enumerate recursively). i.e. the only implemention of "(cdr some IEnumerable)" would be woefully inefficient.
Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?
Can anyone suggest a workaround?
The logic is inexorable! IEnumerable doesn't support Clone, and you need Clone, so you shouldn't be using IEnumerable.
Or more accurately, you shouldn't be using it as the fundamental basis for work on a Scheme interpreter. Why not make a trivial immutable linked list instead?
public class Link<TValue>
{
private readonly TValue value;
private readonly Link<TValue> next;
public Link(TValue value, Link<TValue> next)
{
this.value = value;
this.next = next;
}
public TValue Value
{
get { return value; }
}
public Link<TValue> Next
{
get { return next; }
}
public IEnumerable<TValue> ToEnumerable()
{
for (Link<TValue> v = this; v != null; v = v.next)
yield return v.value;
}
}
Note that the ToEnumerable method gives you convenient usage in the standard C# way.
To answer your question:
Can anyone suggest a realistic,
useful, example of an IEnumerable
object that wouldn't be able to
provide an efficient "Clone()" method?
Is it that there'd be a problem with
the "yield" construct?
An IEnumerable can go anywhere in the world for its data. Here's an example that reads lines from the console:
IEnumerable<string> GetConsoleLines()
{
for (; ;)
yield return Console.ReadLine();
}
There are two problems with this: firstly, a Clone function would not be particularly straightforward to write (and Reset would be meaningless). Secondly, the sequence is infinite - which is perfectly allowable. Sequences are lazy.
Another example:
IEnumerable<int> GetIntegers()
{
for (int n = 0; ; n++)
yield return n;
}
For both these examples, the "workaround" you've accepted would not be much use, because it would just exhaust the available memory or hang up forever. But these are perfectly valid examples of sequences.
To understand C# and F# sequences, you need to look at lists in Haskell, not lists in Scheme.
In case you think the infinite stuff is a red herring, how about reading the bytes from a socket:
IEnumerable<byte> GetSocketBytes(Socket s)
{
byte[] buffer = new bytes[100];
for (;;)
{
int r = s.Receive(buffer);
if (r == 0)
yield break;
for (int n = 0; n < r; n++)
yield return buffer[n];
}
}
If there is some number of bytes being sent down the socket, this will not be an infinite sequence. And yet writing Clone for it would be very difficult. How would the compiler generate the IEnumerable implementation to do it automatically?
As soon as there was a Clone created, both instances would now have to work from a buffer system that they shared. It's possible, but in practice it isn't needed - this isn't how these kinds of sequences are designed to be used. You treat them purely "functionally", like values, applying filters to them recursively, rather than "imperatively" remembering a location within the sequence. It's a little cleaner than low-level car/cdr manipulation.
Further question:
I wonder, what's the lowest level
"primitive(s)" I would need such that
anything I might want to do with an
IEnumerable in my Scheme interpreter
could be implemented in scheme rather
than as a builtin.
The short answer I think would be to look in Abelson and Sussman and in particular the part about streams. IEnumerable is a stream, not a list. And they describe how you need special versions of map, filter, accumulate, etc. to work with them. They also get onto the idea of unifying lists and streams in section 4.2.
As a workaround, you could easily make an extension method for IEnumerator which did your cloning. Just create a list from the enumerator, and use the elements as members.
You'd lose the streaming capabilities of an enumerator, though - since you're new "clone" would cause the first enumerator to fully evaluate.
If you can let the original enumerator go, ie. not use it any more, you can implement a "clone" function that takes the original enumerator, and uses it as the source for one or more enumerators.
In other words, you could build something like this:
IEnumerable<String> original = GetOriginalEnumerable();
IEnumerator<String>[] newOnes = original.GetEnumerator().AlmostClone(2);
^- extension method
produce 2
new enumerators
These could internally share the original enumerator, and a linked list, to keep track of the enumerated values.
This would allow for:
Infinite sequences, as long as both enumerators progress forward (the linked list would be written such that once both enumerators have passed a specific point, those can be GC'ed)
Lazy enumeration, the first of the two enumerators that need a value that hasn't been retrieved from the original enumerator yet, it would obtain it and store it into the linked list before yielding it
Problem here is of course that it would still require a lot of memory if one of the enumerators move far ahead of the other one.
Here is the source code. If you use Subversion, you can download the Visual Studio 2008 solution file with a class library with the code below, as well as a separate unit test porject.
Repository: http://vkarlsen.serveftp.com:81/svnStackOverflow/SO847655
Username and password is both 'guest', without the quotes.
Note that this code is not thread-safe, at all.
public static class EnumeratorExtensions
{
/// <summary>
/// "Clones" the specified <see cref="IEnumerator{T}"/> by wrapping it inside N new
/// <see cref="IEnumerator{T}"/> instances, each can be advanced separately.
/// See remarks for more information.
/// </summary>
/// <typeparam name="T">
/// The type of elements the <paramref name="enumerator"/> produces.
/// </typeparam>
/// <param name="enumerator">
/// The <see cref="IEnumerator{T}"/> to "clone".
/// </param>
/// <param name="clones">
/// The number of "clones" to produce.
/// </param>
/// <returns>
/// An array of "cloned" <see cref="IEnumerator[T}"/> instances.
/// </returns>
/// <remarks>
/// <para>The cloning process works by producing N new <see cref="IEnumerator{T}"/> instances.</para>
/// <para>Each <see cref="IEnumerator{T}"/> instance can be advanced separately, over the same
/// items.</para>
/// <para>The original <paramref name="enumerator"/> will be lazily evaluated on demand.</para>
/// <para>If one enumerator advances far beyond the others, the items it has produced will be kept
/// in memory until all cloned enumerators advanced past them, or they are disposed of.</para>
/// </remarks>
/// <exception cref="ArgumentNullException">
/// <para><paramref name="enumerator"/> is <c>null</c>.</para>
/// </exception>
/// <exception cref="ArgumentOutOfRangeException">
/// <para><paramref name="clones"/> is less than 2.</para>
/// </exception>
public static IEnumerator<T>[] Clone<T>(this IEnumerator<T> enumerator, Int32 clones)
{
#region Parameter Validation
if (Object.ReferenceEquals(null, enumerator))
throw new ArgumentNullException("enumerator");
if (clones < 2)
throw new ArgumentOutOfRangeException("clones");
#endregion
ClonedEnumerator<T>.EnumeratorWrapper wrapper = new ClonedEnumerator<T>.EnumeratorWrapper
{
Enumerator = enumerator,
Clones = clones
};
ClonedEnumerator<T>.Node node = new ClonedEnumerator<T>.Node
{
Value = enumerator.Current,
Next = null
};
IEnumerator<T>[] result = new IEnumerator<T>[clones];
for (Int32 index = 0; index < clones; index++)
result[index] = new ClonedEnumerator<T>(wrapper, node);
return result;
}
}
internal class ClonedEnumerator<T> : IEnumerator<T>, IDisposable
{
public class EnumeratorWrapper
{
public Int32 Clones { get; set; }
public IEnumerator<T> Enumerator { get; set; }
}
public class Node
{
public T Value { get; set; }
public Node Next { get; set; }
}
private Node _Node;
private EnumeratorWrapper _Enumerator;
public ClonedEnumerator(EnumeratorWrapper enumerator, Node firstNode)
{
_Enumerator = enumerator;
_Node = firstNode;
}
public void Dispose()
{
_Enumerator.Clones--;
if (_Enumerator.Clones == 0)
{
_Enumerator.Enumerator.Dispose();
_Enumerator.Enumerator = null;
}
}
public T Current
{
get
{
return _Node.Value;
}
}
Object System.Collections.IEnumerator.Current
{
get
{
return Current;
}
}
public Boolean MoveNext()
{
if (_Node.Next != null)
{
_Node = _Node.Next;
return true;
}
if (_Enumerator.Enumerator.MoveNext())
{
_Node.Next = new Node
{
Value = _Enumerator.Enumerator.Current,
Next = null
};
_Node = _Node.Next;
return true;
}
return false;
}
public void Reset()
{
throw new NotImplementedException();
}
}
This uses reflection to create a new instance and then sets the values on the new instance. I also found this chapter from C# in Depth to be very useful.
Iterator block implementation details: auto-generated state machines
static void Main()
{
var counter = new CountingClass();
var firstIterator = counter.CountingEnumerator();
Console.WriteLine("First list");
firstIterator.MoveNext();
Console.WriteLine(firstIterator.Current);
Console.WriteLine("First list cloned");
var secondIterator = EnumeratorCloner.Clone(firstIterator);
Console.WriteLine("Second list");
secondIterator.MoveNext();
Console.WriteLine(secondIterator.Current);
secondIterator.MoveNext();
Console.WriteLine(secondIterator.Current);
secondIterator.MoveNext();
Console.WriteLine(secondIterator.Current);
Console.WriteLine("First list");
firstIterator.MoveNext();
Console.WriteLine(firstIterator.Current);
firstIterator.MoveNext();
Console.WriteLine(firstIterator.Current);
}
public class CountingClass
{
public IEnumerator<int> CountingEnumerator()
{
int i = 1;
while (true)
{
yield return i;
i++;
}
}
}
public static class EnumeratorCloner
{
public static T Clone<T>(T source) where T : class, IEnumerator
{
var sourceType = source.GetType().UnderlyingSystemType;
var sourceTypeConstructor = sourceType.GetConstructor(new Type[] { typeof(Int32) });
var newInstance = sourceTypeConstructor.Invoke(new object[] { -2 }) as T;
var nonPublicFields = source.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance);
var publicFields = source.GetType().GetFields(BindingFlags.Public | BindingFlags.Instance);
foreach (var field in nonPublicFields)
{
var value = field.GetValue(source);
field.SetValue(newInstance, value);
}
foreach (var field in publicFields)
{
var value = field.GetValue(source);
field.SetValue(newInstance, value);
}
return newInstance;
}
}
This answer was also used on the following question Is it possible to clone an IEnumerable instance, saving a copy of the iteration state?
The purpose of "clonable" enumerators is mainly to be able save iteration position and be able to return to it later. That means, the iterated container must provide more rich interface than just IEnumerable. It is actually something between IEnumerable and IList. Working with IList you can just use integer index as enumerator, or create a simple immutable wrapping class, holding a reference to the list and current position.
If your container does not support random access and can only be iterated forward (like one-directional linked list), it must at least provide ability to get next element, having a reference to the previous one or to some "iteration state" that you can hold in your iterator. So, the interface can look like this:
interface IIterable<T>
{
IIterator<T> GetIterator(); // returns an iterator positioned at start
IIterator<T> GetNext(IIterator<T> prev); // returns an iterator positioned at the next element from the given one
}
interface IIterator<T>
{
T Current { get; }
IEnumerable<T> AllRest { get; }
}
Note that the iterator is immutable, it can not be "moved forward", we only can ask our iterable container to give us a new iterator pointing to the next position. The benefit of that is that you can store iterators anywhere as long as you need, for example have a stack of iterators and return to previously saved position when you need. You can save current position for later use by assigning to a variable, just as you would do with an integer index.
The AllRest property can be useful if you need to iterate from the given position to the end of container using standard language iteration features, like foraech or LinQ. It won't change the iterator position (remember, our iterator is immutable). The implementation can repeatedly GetNext and yleid return.
The GetNext method can actually be a part of iterator itself, like that:
interface IIterable<T>
{
IIterator<T> GetIterator(); // returns an iterator positioned at start
}
interface IIterator<T>
{
T Current { get; }
IIterator<T> GetNext { get; } // returns an iterator positioned at the next element from the given one
IEnumerable<T> AllRest { get; }
}
This is pretty much the same. The logic of determining the next state is just moved from the container implementation to the iterator
implementation. Note that the iterator is still immutable. You can not "move it forward", you only can get another one, pointing to the next element.
Why not this as an extension method:
public static IEnumerator<T> Clone(this IEnumerator<T> original)
{
foreach(var v in original)
yield return v;
}
This would basically create and return a new enumerator without fully evaluating the original.
Edit: Yep, I misread. Paul is correct, this would only work with IEnumerable.
This might help. It needs some code to call the Dispose() on the IEnumerator:
class Program
{
static void Main(string[] args)
{
//var list = MyClass.DequeueAll().ToList();
//var list2 = MyClass.DequeueAll().ToList();
var clonable = MyClass.DequeueAll().ToClonable();
var list = clonable.Clone().ToList();
var list2 = clonable.Clone()ToList();
var list3 = clonable.Clone()ToList();
}
}
class MyClass
{
static Queue<string> list = new Queue<string>();
static MyClass()
{
list.Enqueue("one");
list.Enqueue("two");
list.Enqueue("three");
list.Enqueue("four");
list.Enqueue("five");
}
public static IEnumerable<string> DequeueAll()
{
while (list.Count > 0)
yield return list.Dequeue();
}
}
static class Extensions
{
public static IClonableEnumerable<T> ToClonable<T>(this IEnumerable<T> e)
{
return new ClonableEnumerable<T>(e);
}
}
class ClonableEnumerable<T> : IClonableEnumerable<T>
{
List<T> items = new List<T>();
IEnumerator<T> underlying;
public ClonableEnumerable(IEnumerable<T> underlying)
{
this.underlying = underlying.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return new ClonableEnumerator<T>(this);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
private object GetPosition(int position)
{
if (HasPosition(position))
return items[position];
throw new IndexOutOfRangeException();
}
private bool HasPosition(int position)
{
lock (this)
{
while (items.Count <= position)
{
if (underlying.MoveNext())
{
items.Add(underlying.Current);
}
else
{
return false;
}
}
}
return true;
}
public IClonableEnumerable<T> Clone()
{
return this;
}
class ClonableEnumerator<T> : IEnumerator<T>
{
ClonableEnumerable<T> enumerable;
int position = -1;
public ClonableEnumerator(ClonableEnumerable<T> enumerable)
{
this.enumerable = enumerable;
}
public T Current
{
get
{
if (position < 0)
throw new Exception();
return (T)enumerable.GetPosition(position);
}
}
public void Dispose()
{
}
object IEnumerator.Current
{
get { return this.Current; }
}
public bool MoveNext()
{
if(enumerable.HasPosition(position + 1))
{
position++;
return true;
}
return false;
}
public void Reset()
{
position = -1;
}
}
}
interface IClonableEnumerable<T> : IEnumerable<T>
{
IClonableEnumerable<T> Clone();
}
There already is a way to create a new enumerator -- the same way you created the first one: IEnumerable.GetEnumerator. I'm not sure why you need another mechanism to do the same thing.
And in the spirit of the DRY principle, I'm curious as to why you would want the responsibility for creating new IEnumerator instances to be duplicated in both your enumerable and your enumerator classes. You would be forcing the enumerator to maintain additional state beyond what's required.
For example, imagine an enumerator for a linked list. For the basic implementation of IEnumerable, that class would only need to keep a reference to the current node. But to support your Clone, it would also need to keep a reference to the head of the list -- something it otherwise has no use for*. Why would you add that extra state to the enumerator, when you can just go to the source (the IEnumerable) and get another enumerator?
And why would you double the number of code paths you need to test? Every time you make a new way to manufacture an object, you're adding complexity.
* You would also need the head pointer if you implemented Reset, but according to the docs, Reset is only there for COM interop, and you're free to throw a NotSupportedException.

Categories