Simple way to implement a Collection? - c#

I am developing a collection class, which should implement IEnumerator and IEnumerable.
In my first approach, I implemented them directly. Now I have discovered the yield keyword, and I have been able to simplify everything a whole lot substituting the IEnumerator/IEnumerable interfaces with a readonly property Values that uses yield to return an IEnumerable in a loop.
My question: is it possible to use yield in such a way that I could iterate over the class itself, without implementing IEnumerable/IEnumerator?
I.e., I want to have a functionality similar to the framework collections:
List<int> myList = new List<int>();
foreach (int i in myList)
{
...
}
Is this possible at all?
Update: It seems that my question was badly worded. I don't mind implementing IEnumerator or IEnumerable; I just thought the only way to do it was with the old Current/MoveNext/Reset methods.

You won't have to implement IEnumerable<T> or IEnumerable to get foreach to work - but it would be a good idea to do so. It's very easy to do:
public class Foo : IEnumerable<Bar>
{
public IEnumerator<Bar> GetEnumerator()
{
// Use yield return here, or
// just return Values.GetEnumerator()
}
// Explicit interface implementation for non-generic
// interface; delegates to generic implementation.
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
The alternative which doesn't implement IEnumerable<T> would just call your Values property, but still providing a GetEnumerator() method:
public class Foo
{
public IEnumerator<Bar> GetEnumerator()
{
// Use yield return here, or
// just return Values.GetEnumerator()
}
]
While this will work, it means you won't be able to pass your collection to anything expecting an IEnumerable<T>, such as LINQ to Objects.
It's a little-known fact that foreach will work with any type supporting a GetEnumerator() method which returns a type with appropriate MoveNext() and Current members. This was really to allow strongly-typed collections before generics, where iterating over the collection wouldn't box value types etc. There's really no call for it now, IMO.

You could do somthing like this, but why? IEnumerator is already simple.
Interface MyEnumerator<T>
{
public T GetNext();
}
public static class MyEnumeratorExtender
{
public static void MyForeach<T>(this MyEnumerator<T> enumerator,
Action<T> action)
{
T item = enumerator.GetNext();
while (item != null)
{
action.Invoke(item);
item = enumerator.GetNext();
}
}
}
I'd rather have the in keyword and I wouldn't want to rewrite linq.

Related

Why I can't use OrderBy despite having GetEnumerator and foreach working well?

I've implemented the GetEnumerator method for a simple class and was surprised that I couldn't order the enumerator with linq (a call to this.OrderBy(x => x) is invalid). Can someone please explain what's going on here? Am I doing something wrong or are enumerators only intended to be iterated over?
class Test
{
private Dictionary<int, string> dict
= new Dictionary<int, string>();
public IEnumerator<int> GetEnumerator()
{
return dict.Keys.GetEnumerator();
}
public Test()
{
dict[1] = "test";
dict[2] = "nothing";
}
public IEnumerable<int> SortedKeys
{
get { return this.OrderBy(x => x); } // illegal!
}
public void Print()
{
foreach(var key in this)
Console.WriteLine(dict[key]);
}
}
You have to implement the interface IEnumerable<int> in order for the this.OrderBy to work, how else should it know this can enumerate ints?
OrderBy requires this to implement IEnumerable<T>. It doesn't know your GetEnumerator method is actually an attempt to comply to the interface.
foreach just requires a GetEnumerator() method, no interface implementatio needed.
// put in the interface
class Test : IEnumerable<int>
{
private Dictionary<int, string> dict
= new Dictionary<int, string>();
public IEnumerator<int> GetEnumerator()
{
return dict.Keys.GetEnumerator();
}
public Test()
{
dict[1] = "test";
dict[2] = "nothing";
}
public IEnumerable<int> SortedKeys
{
get { return this.OrderBy(x => x); } // illegal!
}
public void Print()
{
foreach (var key in this)
Console.WriteLine(dict[key]);
}
// this one is required according to the interface too
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
An enumerator is an iterator. It's just an interface that tells the runtime or custom code on how to move to a next element in some sequence, reset the iteration to the first element again or get current element in the iteration.
That is, an enumerator isn't enumerable. An enumerable can create an enumerator to let other code enumerate the enumeration.
In order to be able to call a LINQ extension method you need the object to be enumerable. Your Test class doesn't implement IEnumerable<T> (LINQ extension method signatures look like this: public static IEnumerable<T> Whatever<T>(this IEnumerable<T> someEnumerable)).
Since I want to apply DRY principle on myself (Don't Repeat Yourself), if you want to know how to implement IEnumerable<T> you should look at the following Q&A: How do I implement IEnumerable<T>.
OrderBy() is an extension method on IEnumerable<T>.
Your class does not implement IEnumerable<T>.
foreach still works, because it does not require you to implement IEnumerable<T>; it only requires that there is a method GetEnumerator().
So all you need to do is add:
class Test : IEnumerable<int>
and provide the implementation for the non-generic IEnumerable:
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}

Are there any tools that can help us refactor IEnumerator properties to IList<T> or similar?

We have a very old code base(that actually is not horrible quality). It dates back to when .Net was pre-release, which I suspect is the cause of some of these weird conventions.
Anyway, we just began to drop .Net 1.1 support and are having a hay-day with converting things to generics and using Linq and all that fun stuff. One of the most annoying patterns in our code base though is we'll have something like
private ArrayList mylist;
public IEnumerator MyList
{
get
{
if(mylist==null)
return new EmptyEnumerator.Enumerator;
return mylist.GetEnumerator();
}
}
This pattern is particularly horrible because it prevents us from simply doing foreach(var item in MyList) because IEnumerator doesn't implement IEnumerable. Instead we must do something like this:
IEnumerator enumerator=MyList;
while(enumerator.MoveNext())
{
object item=enumerator.Current;
}
So, for refactoring, we of course want to use something like ReadOnlyCollection<T> or IList<T> or similar. To do this however, we have to update every single reference to MyList to do:
IEnumerator enumerator=MyList;
to
IEnumerator enumerator=MyList.GetEnumerator();
In some cases, we can have over a hundred references to one property. Are there any tools out there that can make this easier? We recently got Resharper(not for this issue, but just for general use), but it doesn't appear to cover this type of scenario.
What it sounds like you need to do is return a class that implements both IEnumerator and IEnumerable<T>
It's not actually that hard to just make your own type that does this:
public class MessedUpIterator<T> : IEnumerable<T>, IEnumerator
{
private IEnumerable<T> source;
private IEnumerator enumerator;
private IEnumerator MyEnumerator
{
get
{
return enumerator ?? source.GetEnumerator();
}
}
public MessedUpIterator(IEnumerable<T> source)
{
this.source = source;
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
object IEnumerator.Current
{
get { return MyEnumerator.Current; }
}
bool IEnumerator.MoveNext()
{
return MyEnumerator.MoveNext();
}
void IEnumerator.Reset()
{
MyEnumerator.Reset();
}
}
Now instead of returning either an IEnumerator or an IEnumerable<T> you can return something that does both.
Note that IEnumerator is implemented explicitly while IEnumerable<T> is implemented implicitly, so it will encourage using it as an IEnumerable, while still make using it as an IEnumerator possible.
Yes, it's ugly, but it certainly could be worse.

Is it possible to determine if an IEnumerable<T> has deffered execution pending?

I have a function that accepts an Enumerable. I need to ensure that the enumerator is evaluated, but I'd rather not create a copy of it (e.g. via ToList() or ToArray()) if it is all ready in a List or some other "frozen" collection. By Frozen I mean collections where the set of items is already established e.g. List, Array, FsharpSet, Collection etc, as opposed to linq stuff like Select() and where().
Is it possible to create a function "ForceEvaluation" that can determine if the enumerable has deffered execution pending, and then evaluate the enumerable?
public void Process(IEnumerable<Foo> foos)
{
IEnumerable<Foo> evalutedFoos = ForceEvaluation(foos)
EnterLockedMode(); // all the deferred processing needs to have been done before this line.
foreach (Foo foo in foos)
{
Bar(foo);
}
}
public IEnumerable ForceEvaluation(IEnumerable<Foo> foos)
{
if(??????)
{ return foos}
else
{return foos.ToList()}
}
}
After some more research I've realized that this is pretty much impossible in any practical sense, and would require complex code inspection of each iterator.
So I'm going to go with a variant of Mark's answer and create a white-list of known safe types and just call ToList() anything not on that is not on the white-list.
Thank you all for your help.
Edit*
After even more reflection, I've realized that this is equivalent to the halting problem. So very impossible.
Something that worked for me way :
IEnumerable<t> deffered = someArray.Where(somecondition);
if (deffered.GetType().UnderlyingSystemType.Namespace.Equals("System.Linq"))
{
//this is a deffered executin IEnumerable
}
You could try a hopeful check against IList<T> or ICollection<T>, but note that these can still be implemented lazily - but it is much rarer, and LINQ doesn't do that - it just uses iterators (not lazy collections). So:
var list = foos as IList<Foo>;
if(list != null) return list; // unchanged
return foos.ToList();
Note that this is different to the regular .ToList(), which gives you back a different list each time, to ensure nothing unexpected happens.
Most concrete collection types (including T[] and List<T>) satisfy IList<T>. I'm not familiar with the F# collections - you'd need to check that.
I would avoid it if you want to make sure it is "frozen". Both Array elements and List<> can be changed at any time (i.e. infamous "collection changed during iteration" exception). If you really need to make sure IEnumerable is evaluated AND not changing underneath your code than copy all items into your own List/Array.
There could be other reasons to try it - i.e. some operations inside run time do special checks for collection being an array to optimize them. Or have special version for specialized interface like ICollection or IQueryable in addition to generic IEnumerable.
EDIT: Example of collection changing during iteration:
IEnumerable<T> collectionAsEnumrable = collection;
foreach(var i in collectionAsEnumrable)
{
// something like following can be indirectly called by
// synchronous method on the same thread
collection.Add(i.Clone());
collection[3] = 33;
}
If it is possible to use a wrapper in your case, you could do something like this
public class ForceableEnumerable<T> : IEnumerable<T>
{
IEnumerable<T> _enumerable;
IEnumerator<T> _enumerator;
public ForceableEnumerable(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
}
public void ForceEvaluation()
{
if (_enumerator != null) {
while (_enumerator.MoveNext()) {
}
}
}
#region IEnumerable<T> Members
public IEnumerator<T> GetEnumerator()
{
_enumerator = _enumerable.GetEnumerator();
return _enumerator;
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}
Or implement the force method like this if you want to evaluate in any case
public void ForceEvaluation()
{
if (_enumerator == null) {
_enumerator = _enumerable.GetEnumerator();
}
while (_enumerator.MoveNext()) {
}
}
EDIT:
If you want to ensure that the enumeration is evaluated only once in any case, you could change GetEnumerator to
public IEnumerator<T> GetEnumerator()
{
if (_enumerator == null) }
_enumerator = _enumerable.GetEnumerator();
}
return _enumerator;
}

C# - Why implement two version of Current when realizing IEnumerable Interface?

I assume the following sample gives a best practice that we should follow when we implement the IEnumerable interface.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.ienumerator.movenext
Here is the question:
Why should we provide two version of Current method?
When the version ONE (object IEnumerator.Current) is used?
When the version TWO (public Person Current ) is used?
How to use PeopleEnum in the foreach statement. // updated
public class PeopleEnum : IEnumerator
{
public Person[] _people;
// Enumerators are positioned before the first element
// until the first MoveNext() call.
int position = -1;
public PeopleEnum(Person[] list)
{
_people = list;
}
public bool MoveNext()
{
position++;
return (position < _people.Length);
}
public void Reset()
{
position = -1;
}
// explicit interface implementation
object IEnumerator.Current /// **version ONE**
{
get
{
return Current;
}
}
public Person Current /// **version TWO**
{
get
{
try
{
return _people[position];
}
catch (IndexOutOfRangeException)
{
throw new InvalidOperationException();
}
}
}
}
The IEnumerator.Current is an explicit interface implementation.
You can only use it if you cast the iterator to an IEnumerator (which is what the framework does with foreach). In other cases, the second version will be used.
You will see that it returns object and actually uses the other implementation which returns a Person.
The second implementation is not required per se by the interface, but is there as a convenience and in order to return the expected type instead of object.
Long-form implementation of IEnumerator is no longer necessary:
public class PeopleEnum : IEnumerable
{
public Person[] _people;
public PeopleEnum(Person[] list)
{
_people = list;
}
public IEnumerator GetEnumerator()
{
foreach (Person person in _people)
yield return person;
}
}
And to further bring it into the 21st century, don't use the non-generic IEnumerable:
public class PeopleEnum : IEnumerable<Person>
{
public Person[] _people;
public PeopleEnum(Person[] list)
{
_people = list;
}
public IEnumerator<Person> GetEnumerator()
{
foreach (Person person in _people)
yield return person;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
I suspect the reason is that this code example was derived from an example class implementing IEnumerator<T> - if the example class PeopleEnum implemented IEnumerator<T> this approach would be required: IEnumerator<T> inherits IEnumerator so you have to implement both interfaces when implementing IEnumerator<T>.
The implementation of the non-generic IEnumerator requires Current to return object - the strongly typed IEnumerator<T> on the other hand requires Current to return an instance of type T - using explicit and direct interface implementation is the only way to fulfill both requirements.
It is there for convenience, eg. using the PeopleEnum.Current in a typesafe way in a while(p.MoveNext()) loop, not explicitly doing a foreach enumeration.
But the only thing you need to do is implement the interface, you could do it implicitly if you wish, however is there a reason for it? If I wanted to use MovePrevious on the class? Would it be cool if I should cast(unbox) the object to Person?
If you think the class could be extended with more manipulation methods the Person Current is a cool thing.
Version two isnt part of the interface. You have to satisfy the interface requirements.

Does dot net have an interface like IEnumerable with a Count property?

Does dot net have an interface like IEnumerable with a count property? I know about interfaces such as IList and ICollection which do offer a Count property but it seems like these interfaces were designed for mutable data structures first and use as a read only interface seems like an afterthought - the presence of an IsReadOnly field and mutators throwing exceptions when this property is true is IMO ample evidence for this.
For the time being I am using a custom interface called IReadOnlyCollection (see my own answer to this post) but I would be glad to know of other alternative approaches.
The key difference between the ICollection family and the IEnumerable family is the absence of certainty as to the count of items present (quite often the items will be generated/loaded/hydrated as needed) - in some cases, an Enumerable may not ever finish generating results, which is why the Count is missing.
Deriving and adding a Count is possible depending on your requirements, but it goes against this spirit, which is the purpose of ICollection - a collection of stuff that's all there.
Another way might be to use the System.Linq.Enumerable.Count method, i.e.
using System.Linq;
class X
{
void Y(IEnumerable<int> collection)
{
int itemCount = collection.Count();
}
}
or use the (System.Linq.Enumerable) .ToList() to pull all the items from the enumerator into a Collection and work from there.
(Also to answer your comment before having 50 rep:- the ".Count()" bit is a call to an extension method on the extension class System.Linq.Enumerable - the extension method is available on all things that derive from IEnumerable because the code has a "using System.Linq" which brings the extension methods in all classes in that namespace into scope - in this case its in the class Enumerable. If you're in VS, pressing F12 will bring you to the definition of S.L.Enumerable. BTW C# In Depth is a fantastic book for learning LINQ properly - its a page turner thats really helps you get the whole picture compared to learning the bits of LINQ piece by piece)
As of .Net 4.5, there are two new interfaces for this: IReadOnlyCollection<T> and IReadOnlyList<T>.
IReadOnlyCollection<T> is IEnumerable<T> with a Count property added, IReadOnlyList<T> also adds indexing.
It sounds like you really just want ReadOnlyCollection<T> - expose it as IList<T>, but by wrapping the original list like this you just get a read-only wrapper with an appropriate count.
Taking into consideration some of the comments I have decided to go with a wrapper class implementing a custom interface...
interface IReadOnlyCollection<T> : IEnumerable<T>
{
int Count { get; }
}
//This can now be not misused by downcasting to List
//The wrapper can also be used with lists since IList inherits from ICollection
public class CollectionWrapper<T> : IReadOnlyCollection<T>
{
public CollectionWrapper(ICollection<T> collection)
{
_collection = collection;
}
public int Count
{
get
{
return _collection.Count;
}
}
public IEnumerator<T> GetEnumerator()
{
return (IEnumerator<T>)_collection.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return (IEnumerator)((IEnumerable)_collection).GetEnumerator();
}
////////Private data///////
ICollection<T> _collection;
}
class Program
{
static void Main(string[] args)
{
List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
CollectionWrapper<int> collection = new CollectionWrapper<int>(list);
Console.WriteLine("Count:{0}", collection.Count);
foreach (var x in collection)
{
Console.WriteLine(x);
}
foreach (var x in (IEnumerable)collection)
{
Console.WriteLine(x);
}
}
}
Thanks all for your suggestions.
Edit: Now cannot be misused by downcasting to List (or whatever).
IList can return IsReadOnly as true, which marks the collection as readonly. Other than that I'm afraid I don't know of anything fitting.
Since it's an interface, you would have to implement the Count property yourself, why don't you create a new interface that inherits IEnumerator and add a Count property?
IList or ICollection would be the way to go, if you want to use the standard interfaces.
Note that you can "hide" methods required by the interface if you don't want them in your class's public interface -- for example, since it's meaningless to add things to a readonly collection you can do this:
void ICollection<DataType>.Add(DataType item)
{
throw new NotSupportedException();
}
public DataType this[int index]
{
get { return InnerList[index]; }
}
DataType IList<DataType>.this[int index]
{
get { return this[index]; }
set { throw new NotSupportedException(); }
}
etc.
An array can be cast to an IList, which makes the IList ReadOnly == true :)
You can get .Count on IEnumerable with an extension method if you add a reference to System.Linq (in 3.5 anyway).
As Jon Skeet mentions, you're much better off using System.Collections.ObjectModel.ReadOnlyCollection instead of creating your own wrapper class.
Then you can implement your sample as follows:
class Program {
static void Main(string[] args) {
List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
ReadOnlyCollection<int> collection = new ReadOnlyCollection<int>(list);
Console.WriteLine("Count:{0}", collection.Count);
foreach (var x in collection) {
Console.WriteLine(x);
}
foreach (var x in (IEnumerable)collection) {
Console.WriteLine(x);
}
}
}

Categories