What does a HashSet Enumerator do? - c#

I'm not used to write C# code, only Java and Python.
Now I found some code example for an algorithm which is only available in C#.
There is one structure I don't understand, it is the Enumerator.
HashSet<Node>.Enumerator enumerator = hashSet.GetEnumerator();
enumerator.MoveNext();
Item next = enumerator.Current;
So Item is the data type stored in the HashSet hashSet. Is this equal to a for-loop iterating over a HashSet or how else can that be translated into python or java?

GetEnumerator() methods are presented in some data structures in C# such as List, Set, etc. It enables doing iteration through. Actually, foreach internally makes use of it.
foreach statement is to iterate through the elements of certain data structures. A foreach can be used when all of the following conditions hold:
The data structure implements either IEnumerable(which is to
satisfy legacy codes before generics) or IEnumerable<T> for some
type T.
You do not need to know the locations in the data structure of the
individual elements.
For example, the string class implements both IEnumerable and IEnumerable<Char>.
The IEnumerable<T> interface implies the data structure requires two methods:
public IEnumerator<T> GetEnumerator()
IEnumerator IEnumerable.GetEnumerator()
The latter method is required only because IEnumerable<T> is a subtype of IEnumerable, and that interface requires a GetEnumerator method that returns a non-generic IEnumerator. Both of these methods should return the same object; hence, because IEnumerator<T> is also a subtype of IEnumerator, this method can simply call the first method:
System.Collections.IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
As you can see, the IEnumerable.GetEnumerator() method returns a reference to another interface named System.Collections.IEnumerator. This interface provides the infrastructure to allow the caller to traverse the internal objects contained by the IEnumerable-compatible container:
public interface IEnumerator
{
bool MoveNext (); // Advance the internal position of the cursor.
object Current { get;} // Get the current item (read-only property).
void Reset (); // Reset the cursor before the first member.
}
Let's exemplify it.
public class PowersOfThree : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new PowersOfThreeEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
internal class PowersOfThreeEnumerator : IEnumerator<int>
{
private int index = 0;
public int Current
{
get { return (int)System.Math.Pow(3, index); }
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
index++;
if (index > 10)
return false;
else
return true;
}
public void Reset()
{
index = 0;
}
public void Dispose()
{
}
}
public class Test
{
public static void Main(string[] str)
{
var p2 = new PowersOfThree();
foreach (int p in p2)
{
System.Console.WriteLine(p);
}
}
}
Current method returns the same element until MoveNext method is called. The initial index is 0 each MoveNext method increments the index from 1 to 10, inclusively, then it returns false. When the enumerator is at this position, subsequent calls to MoveNext also return false.
Do you see that what happened to Current when MoveNext returned false? Can you set Current to the first element of the collection again?
If you don't reinstantiate new enumerator, no.

Related

Why not use yield instead of List<T>.Enumerator

I just find there's a nested struct named Enumerator within the List<> class.
// Returns an enumerator for this list with the given
// permission for removal of elements. If modifications made to the list
// while an enumeration is in progress, the MoveNext and
// GetObject methods of the enumerator will throw an exception.
//
public Enumerator GetEnumerator() {
return new Enumerator(this);
}
[Serializable]
public struct Enumerator : IEnumerator<T>, System.Collections.IEnumerator
{
private List<T> list;
private int index;
private int version;
private T current;
internal Enumerator(List<T> list) {
this.list = list;
index = 0;
version = list._version;
current = default(T);
}
//Omit other code
}
As I know the yield keyword will be translated to above nested struct in IL level, so why Microsoft not use yield directly?

My class that implements IEnumerator and IEnumerable doesn't go to foreach statement

I have a class that stores a string list, I would like to make this class usable in a foreach statement, so I found these two interfaces and I tried to implement them.
public class GroupCollection : IEnumerable, IEnumerator
{
public List<string> Groups { get; set; }
public int Count { get { return Groups.Count; } }
int position = -1;
}
public IEnumerator GetEnumerator()
{
return (IEnumerator)this;
}
public object Current
{
get
{
try
{
return new Group(Groups[position]);
}
catch (IndexOutOfRangeException)
{
throw new InvalidOperationException();
}
}
}
public bool MoveNext()
{
position++;
return position < Groups.Count;
}
public void Reset()
{
position = 0;
}
I'm iterating through a GroupCollection variable twice:
foreach (GroupCollection.Group in groups) // where groups is a GroupCollection
{
}
foreach (GroupCollection.Group in groups)
{
}
// where Group is a nested class in GroupCollection.
When it is at the first foreach it works well (count is 1 at this time). I don't modify anything, and when it goes to the second foreach it doesn't go into the loop. I went through the code line by line in debugging mode and found out that the reset is not called after the first foreach. So should I manually call reset after the foreach? Isn't there a nicer way to do this?
I don't modify anything
Yes you do - your MoveNext() modifies the state of the class. This is why you shouldn't implement both IEnumerable and IEnumerator in the same class. (The C# compiler does for iterator blocks, but that's a special case.) You should be able to call GetEnumerator() twice and get two entirely independent iterators. For example:
foreach (var x in collection)
{
foreach (var y in collection)
{
Console.WriteLine("{0}, {1}", x, y);
}
}
... should give you all possible pairs of items in a collection. But that only works when the iterators are independent.
I went through the code line by line in debugging mode and found out that the reset is not called after the first foreach.
Why would you expect it to? I don't believe the specification says anything about foreach calling Reset - and that's a good job, as many implementations don't really implement it (they throw an exception instead).
Basically, you should make your GetEnumerator() method return a new object which keeps the mutable state of the "cursor" over your data. Note that the simplest way of implementing an iterator in C# is usually to use an iterator block (yield return etc).
I'd also strongly encourage you to implement the generic interfaces rather than just the non-generic ones; that way your type can be used much more easily in LINQ code, the iterator variable in a foreach statement can be implicitly typed appropriately, etc.
Reset is not called at the end of a foreach loop - you could do that in the GetEnumerator call, or just return the enumerator for the List:
public IEnumerator GetEnumerator()
{
return Groups.GetEnumerator;
}
Note that with the yield keyword there is almost no need to implement IEnumerator or IEnumerable explicitly:
public IEnumerator<string> GetEnumerator()
{
foreach(string s in Groups)
yield return s;
}

How can I implement IEnumerator<T>?

This code is not compiling, and it's throwing the following error:
The type 'TestesInterfaces.MyCollection' already contains a definition for 'Current'
But when I delete the ambiguous method, it keeps giving other errors.
Can anyone help?
public class MyCollection<T> : IEnumerator<T>
{
private T[] vector = new T[1000];
private int actualIndex;
public void Add(T elemento)
{
this.vector[vector.Length] = elemento;
}
public bool MoveNext()
{
actualIndex++;
return (vector.Length > actualIndex);
}
public void Reset()
{
actualIndex = -1;
}
void IDisposable.Dispose() { }
public Object Current
{
get
{
return Current;
}
}
public T Current
{
get
{
try
{
T element = vector[actualIndex];
return element;
}
catch (IndexOutOfRangeException e)
{
throw new InvalidOperationException(e.Message);
}
}
}
}
I think you're misunderstanding IEnumerator<T>. Typically, collections implement IEnumerable<T>, not IEnumerator<T>. You can think of them like this:
When a class implements IEnumerable<T>, it is stating "I am a collection of things that can be enumerated."
When a class implements IEnumerator<T>, it is stating "I am a thing that enumerates over something."
It is rare (and probably wrong) for a collection to implement IEnumerator<T>. By doing so, you're limiting your collection to a single enumeration. If you try to loop through the collection within a segment of code that's already looping through the collection, or if you try to loop through the collection on multiple threads simultaneously, you won't be able to do it because your collection is itself storing the state of the enumeration operation. Typically, collections (implementing IEnumerable<T>) return a separate object (implementing IEnumerator<T>) and that separate object is responsible for storing the state of the enumeration operation. Therefore, you can have any number of concurrent or nested enumerations because each enumeration operation is represented by a distinct object.
Also, in order for the foreach statement to work, the object after the in keyword, must implement IEnumerable or IEnumerable<T>. It will not work if the object only implements IEnumerator or IEnumerator<T>.
I believe this is the code you're looking for:
public class MyCollection<T> : IEnumerable<T>
{
private T[] vector = new T[1000];
private int count;
public void Add(T elemento)
{
this.vector[count++] = elemento;
}
public IEnumerator<T> GetEnumerator()
{
return vector.Take(count).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
You need to define the interface the current is implementing.
Object IEnumerator.Current
{
//
}
public T Current
{
//
}
This way your class has 2 Current properties. but you can access them both.
MyCollection<string> col = new MyCollection<string>();
var ienumeratort = col.Current; //Uses IEnumerator<T>
var ienumerator = (IEnumerator)col.Current; //uses IEnumerator
I think with C# 2.0 onwards, you have a very easy way of implementing iterator and compiler does a lot of heavy lifting behind the scene by creating state machine. It's worth looking into it. Having said that, in that case your implementation would look something below:
public class MyCollection<T>
{
private T[] vector = new T[1000];
private int actualIndex;
public void Add(T elemento)
{
this.vector[vector.Length] = elemento;
}
public IEnumerable<T> CreateEnumerable()
{
for (int index = 0; index < vector.Length; index++)
{
yield return vector[(index + actualIndex)];
}
}
}
I am not sure about the purpose of actualIndex though - but i hope you get the idea.
After proper initialization of MyCollection, below is snippet somewhat looks like from consumer perspective:
MyCollection<int> mycoll = new MyCollection<int>();
foreach (var num in mycoll.CreateEnumerable())
{
Console.WriteLine(num);
}

C# - Why implement two version of Current when realizing IEnumerable Interface?

I assume the following sample gives a best practice that we should follow when we implement the IEnumerable interface.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.ienumerator.movenext
Here is the question:
Why should we provide two version of Current method?
When the version ONE (object IEnumerator.Current) is used?
When the version TWO (public Person Current ) is used?
How to use PeopleEnum in the foreach statement. // updated
public class PeopleEnum : IEnumerator
{
public Person[] _people;
// Enumerators are positioned before the first element
// until the first MoveNext() call.
int position = -1;
public PeopleEnum(Person[] list)
{
_people = list;
}
public bool MoveNext()
{
position++;
return (position < _people.Length);
}
public void Reset()
{
position = -1;
}
// explicit interface implementation
object IEnumerator.Current /// **version ONE**
{
get
{
return Current;
}
}
public Person Current /// **version TWO**
{
get
{
try
{
return _people[position];
}
catch (IndexOutOfRangeException)
{
throw new InvalidOperationException();
}
}
}
}
The IEnumerator.Current is an explicit interface implementation.
You can only use it if you cast the iterator to an IEnumerator (which is what the framework does with foreach). In other cases, the second version will be used.
You will see that it returns object and actually uses the other implementation which returns a Person.
The second implementation is not required per se by the interface, but is there as a convenience and in order to return the expected type instead of object.
Long-form implementation of IEnumerator is no longer necessary:
public class PeopleEnum : IEnumerable
{
public Person[] _people;
public PeopleEnum(Person[] list)
{
_people = list;
}
public IEnumerator GetEnumerator()
{
foreach (Person person in _people)
yield return person;
}
}
And to further bring it into the 21st century, don't use the non-generic IEnumerable:
public class PeopleEnum : IEnumerable<Person>
{
public Person[] _people;
public PeopleEnum(Person[] list)
{
_people = list;
}
public IEnumerator<Person> GetEnumerator()
{
foreach (Person person in _people)
yield return person;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
I suspect the reason is that this code example was derived from an example class implementing IEnumerator<T> - if the example class PeopleEnum implemented IEnumerator<T> this approach would be required: IEnumerator<T> inherits IEnumerator so you have to implement both interfaces when implementing IEnumerator<T>.
The implementation of the non-generic IEnumerator requires Current to return object - the strongly typed IEnumerator<T> on the other hand requires Current to return an instance of type T - using explicit and direct interface implementation is the only way to fulfill both requirements.
It is there for convenience, eg. using the PeopleEnum.Current in a typesafe way in a while(p.MoveNext()) loop, not explicitly doing a foreach enumeration.
But the only thing you need to do is implement the interface, you could do it implicitly if you wish, however is there a reason for it? If I wanted to use MovePrevious on the class? Would it be cool if I should cast(unbox) the object to Person?
If you think the class could be extended with more manipulation methods the Person Current is a cool thing.
Version two isnt part of the interface. You have to satisfy the interface requirements.

How to use foreach keyword on custom Objects in C#

Can someone share a simple example of using the foreach keyword with custom objects?
Given the tags, I assume you mean in .NET - and I'll choose to talk about C#, as that's what I know about.
The foreach statement (usually) uses IEnumerable and IEnumerator or their generic cousins. A statement of the form:
foreach (Foo element in source)
{
// Body
}
where source implements IEnumerable<Foo> is roughly equivalent to:
using (IEnumerator<Foo> iterator = source.GetEnumerator())
{
Foo element;
while (iterator.MoveNext())
{
element = iterator.Current;
// Body
}
}
Note that the IEnumerator<Foo> is disposed at the end, however the statement exits. This is important for iterator blocks.
To implement IEnumerable<T> or IEnumerator<T> yourself, the easiest way is to use an iterator block. Rather than write all the details here, it's probably best to just refer you to chapter 6 of C# in Depth, which is a free download. The whole of chapter 6 is on iterators. I have another couple of articles on my C# in Depth site, too:
Iterators, iterator blocks and data pipelines
Iterator block implementation details
As a quick example though:
public IEnumerable<int> EvenNumbers0To10()
{
for (int i=0; i <= 10; i += 2)
{
yield return i;
}
}
// Later
foreach (int x in EvenNumbers0To10())
{
Console.WriteLine(x); // 0, 2, 4, 6, 8, 10
}
To implement IEnumerable<T> for a type, you can do something like:
public class Foo : IEnumerable<string>
{
public IEnumerator<string> GetEnumerator()
{
yield return "x";
yield return "y";
}
// Explicit interface implementation for nongeneric interface
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator(); // Just return the generic version
}
}
(I assume C# here)
If you have a list of custom objects you can just use the foreach in the same way as you do with any other object:
List<MyObject> myObjects = // something
foreach(MyObject myObject in myObjects)
{
// Do something nifty here
}
If you want to create your own container you can use the yield keyword (from .Net 2.0 and upwards I believe) together with the IEnumerable interface.
class MyContainer : IEnumerable<int>
{
private int max = 0;
public MyContainer(int max)
{
this.max = max;
}
public IEnumerator<int> GetEnumerator()
{
for(int i = 0; i < max; ++i)
yield return i;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
And then use it with foreach:
MyContainer myContainer = new MyContainer(10);
foreach(int i in myContainer)
Console.WriteLine(i);
From MSDN Reference:
The foreach statement is not limited to IEnumerable types and can be applied to an instance of any type that satisfies the following conditions:
has the public parameterless GetEnumerator method whose return type is either class, struct, or interface type,
the return type of the GetEnumerator method has the public Current property and the public parameterless MoveNext method whose return type is Boolean.
If you declare those methods, you can use foreach keyword without IEnumerable overhead. To verify this, take this code snipped and see that it produces no compile-time error:
class Item
{
public Item Current { get; set; }
public bool MoveNext()
{
return false;
}
}
class Foreachable
{
Item[] items;
int index;
public Item GetEnumerator()
{
return items[index];
}
}
Foreachable foreachable = new Foreachable();
foreach (Item item in foreachable)
{
}

Categories