Why do we need two interfaces to enumerate a collection? - c#

It is quite a while that I have been trying to understand the idea behind IEnumerable and IEnumerator. I read all the questions and answers I could find over the net, and on StackOverflow in particular, but I am not satisfied. I got to the point where I understand how those interfaces should be used, but not why they are used this way.
I think that the essence of my misunderstanding is that we need two interfaces for one operation. I realized that if both are needed, one was probably not enough. So I took the "hard coded" equivalent of foreach (as I found here):
while (enumerator.MoveNext())
{
object item = enumerator.Current;
// logic
}
and tried to get it to work with one interface, thinking something would go wrong which would make me understand why another interface is needed.
So I created a collection class, and implemented IForeachable:
class Collection : IForeachable
{
private int[] array = { 1, 2, 3, 4, 5 };
private int index = -1;
public int Current => array[index];
public bool MoveNext()
{
if (index < array.Length - 1)
{
index++;
return true;
}
index = -1;
return false;
}
}
and used the foreach equivalent to nominate the collection:
var collection = new Collection();
while (collection.MoveNext())
{
object item = collection.Current;
Console.WriteLine(item);
}
And it works! So what is missing here that make another interface required?
Thanks.
Edit:
My question is not a duplicate of the questions listed in the comments:
This question is why interfaces are needed for enumerating in the first place.
This question and this question are about what are those interfaces and how should they be used.
My question is why they are designed the way they are, not what are they, how they work, and why do we need them in the first place.

What are the two interfaces and what do they do?
The IEnumerable interface is placed on the collection object and defines the GetEnumerator() method, this returns a (normally new) object that has implements the IEnumerator interface. The foreach statement in C# and For Each statement in VB.NET use IEnumerable to access the enumerator in order to loop over the elements in the collection.
The IEnumerator interface is esentially the contract placed on the object that actually does the iteration. It stores the state of the iteration and updates it as the code moves through the collection.
Why not just have the collection be the enumerator too? Why have two separate interfaces?
There is nothing to stop IEnumerator and IEnumerable being implemented on the same class. However, there is a penalty for doing this – It won’t be possible to have two, or more, loops on the same collection at the same time. If it can be absolutely guaranteed that there won’t ever be a need to loop on the collection twice at the same time then that’s fine. But in the majority of circumstances that isn’t possible.
When would someone iterate over a collection more than once at a time?
Here are two examples.
The first example is when there are two loops nested inside each other on the same collection. If the collection was also the enumerator then it wouldn’t be possible to support nested loops on the same collection, when the code gets to the inner loop it is going to collide with the outer loop.
The second example is when there are two, or more, threads accessing the same collection. Again, if the collection was also the enumerator then it wouldn’t be possible to support safe multithreaded iteration over the same collection. When the second thread attempts to loop over the elements in the collection the state of the two enumerations will collide.
Also, because the iteration model used in .NET does not permit alterations to a collection during enumeration these operations are otherwise completely safe.
-- This was from a blog post I wrote many years ago: https://colinmackay.scot/2007/06/24/iteration-in-net-with-ienumerable-and-ienumerator/

Your IForeachable cannot even be iterated from two different threads (you cannot have multiple active iterations at all - even from the same thread), because current enumeration state stored in IForeachable itself. You also have to reset your current position each time you finished enumeration, and if you forgot to do that - well, next caller will think your collection is empty. I can only imagine all kind of hard to track bugs this all might lead to.
On the other hand, because IEnumerable returns new IEnumerator for each caller - you can have multiple enumerations in progress simultaneously, because each caller has it's own enumeration state. I think this reason alone is enough to justify two interfaces. Enumeration is essentially read operation, and it would have been very confusing if you cannot read the same thing simultaneously in multiple places.

Related

Is it safe to access multiple indexes of a C# List<T> concurrently?

I've got a list full of structs, which I want to iterate through and alter concurrently.
The code is conceptually as follows:
Parallel.For(0, pointsList.Count(), i=> pointsList[i] = DoThing(pointsList[i]));
I'm neither adding to nor removing from the list, only accessing and mutating its items.
I imagine this is fine, but thought I should check: is this OK as is, or do I need to use Lock somewhere for fear that I mess up the list object??
It is not guaranteed to be safe. Changing an element in the list increments the list's private _version field, which is how various threads can tell if the list has been modified. If any other thread attempts to enumerate the list or uses a method like List<T>.ForEach() then there could potentially be an issue. It sounds like it would be extremely rare, which isn't necessarily a good thing-- in fact, it makes for the most maddening type of defect.
If you want to be safe, create a new list instead of changing the existing one. This is a common "functional" approach to threading issues that avoids locks.
var newList = pointsList.AsParallel().Select( item => DoThing(item) );
When you're done, you can always replace the old list if you want.
pointsList = newList.ToList();

Effect of Any() on the state of an IEnumerable

suppose i have a code already working that goes like this
....
....
foreach(object item in enumerator.GetItems(arg1, arg2....)
{
}
....
....
where getItems is a method of an abstract class (abstractEnumerator), and getItems returns an IEnumerable<object>.
The problem arises because now i would like to use the Any() LINQ extension before the foreach to check if the enumerable is empty, and i want to be sure that it wont change at all the 'state' of the IEnumerable when it reaches the foreach, in order to make sure that the code behaves exactly as before.
However, the tricky part, is i do not have access to the implementations of abstractEnumerator, and therefore, i do not have access to the underlying implementation of the IEnumerable interface.
For all i know the Reset method of the interface could be returning null without doing anything. Imagine that the implementations of the abstractEnumerator are created by third party developers from another company, so i do not have access to their code.
My question is if i can be sure that the state will remain the same, when adding the Any() before the for each, regardless of the underlying implementation.
According to microsoft reference on Any:
' The enumeration of source is stopped as soon as the result can be determined. '
In this case, that i want to stop on the first element (basically what i want to ask is whether the IEnumerable is empty or not), does this means that the enumeration stops before processing the first element (i.e. the state is the same as if Any() wasnt called, regardless of the implementation), or it stops after processing the first element?
Often enumerables themselves are effectively stateless - the state comes in the enumerator which is returned by GetEnumerator(). However, if you want to avoid calling GetEnumerator() twice (which could easily give different results each time, of course), the simplest thing is just to remember whether or not you saw any elements:
bool any = false;
foreach (var element in GetItems(...))
{
any = true;
}
if (any)
{
// whatever
}
That won't help if you want to take an action before the first iteration, of course. If you want to do that, you could use the iterator yourself:
using (var iterator = GetItems(...).GetEnumerator())
{
if (iterator.MoveNext())
{
// Take your "pre-iteration" action
}
do
{
var item = iterator.Current;
// Use the item
} while (iterator.MoveNext());
}

Why can't I modify the loop variable in a foreach?

Why is a foreach loop a read only loop? What reasons are there for this?
I'm not sure exactly what you mean by a "readonly loop" but I'm guessing that you want to know why this doesn't compile:
int[] ints = { 1, 2, 3 };
foreach (int x in ints)
{
x = 4;
}
The above code will give the following compile error:
Cannot assign to 'x' because it is a 'foreach iteration variable'
Why is this disallowed? Trying to assigning to it probably wouldn't do what you want - it wouldn't modify the contents of the original collection. This is because the variable x is not a reference to the elements in the list - it is a copy. To avoid people writing buggy code, the compiler disallows this.
I would assume it's how the iterator travels through the list.
Say you have a sorted list:
Alaska
Nebraska
Ohio
In the middle of
foreach(var s in States)
{
}
You do a States.Add("Missouri")
How do you handle that? Do you then jump to Missouri even if you're already past that index.
If, by this, you mean:
Why shouldn't I modify the collection that's being foreach'd over?
There's no surety that the items that you're getting come out in a given order, and that adding an item, or removing an item won't cause the order of items in the collection to change, or even the Enumerator to become invalid.
Imagine if you ran the following code:
var items = GetListOfTOfSomething(); // Returns 10 items
int i = 0;
foreach(vat item in items)
{
i++;
if (i == 5)
{
items.Remove(item);
}
}
As soon as you hit the loop where i is 6 (i.e. after the item is removed) anything could happen. The Enumerator might have been invalidated due to you removing an item, everything might have "shuffled up by one" in the underlying collection causing an item to take the place of the removed one, meaning you "skip" one.
If you meant "why can't I change the value that is provided on each iteration" then, if the collection you're working with contains value types, any changes you make won't be preserved as it's a value you're working with, rather than a reference.
The foreach command uses the IEnumerable interface to loop throught the collection. The interface only defined methods for stepping through a collection and get the current item, there is no methods for updating the collection.
As the interface only defines the minimal methods required to read the collecton in one direction, the interface can be implemented by a wide range of collections.
As you only access a single item at a time, the entire collection doesn't have to exist at the same time. This is for example used by LINQ expressions, where it creates the result on the fly as you read it, instead of first creating the entire result and then let you loop through it.
Not sure what you mean with read-only but I'm guessing that understanding what the foreach loop is under the hood will help. It's syntactic sugar and could also be written something like this:
IEnumerator enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
T element = enumerator.Current;
//body goes here
}
If you change the collection (list) it's getting hard to impossible to figure out how to process the iteration.
Assigning to element (in the foreach version) could be viewed as either trying to assign to enumerator.Current which is read only or trying to change the value of the local holding a ref to enumerator.Current in which case you might as well introduce a local yourself because it no longer has anything to do with the enumerated list anymore.
foreach works with everything implementing the IEnumerable interface. In order to avoid synchronization issues, the enumerable shall never be modified while iterating on it.
The problems arise if you add or remove items in another thread while iterating: depending on where you are you might miss an item or apply your code to an extra item. This is detected by the runtime (in some cases or all???) and throws an exception:
System.InvalidOperationException was unhandled
Message="Collection was modified; enumeration operation may not execute."
foreach tries to get next item on each iteration which can cause trouble if you are modifying it from another thread at the same time.

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Partially thread-safe dictionary

I have a class that maintains a private Dictionary instance that caches some data.
The class writes to the dictionary from multiple threads using a ReaderWriterLockSlim.
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
Right now, I have the following:
public ReadOnlyCollection<MyClass> Values() {
using (sync.ReadLock())
return new ReadOnlyCollection<MyClass>(cache.Values.ToArray());
}
Is there a way to do this without copying the collection many times?
I'm using .Net 3.5 (not 4.0)
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
You have three choices.
1) Make a copy of the data, hand out the copy. Pros: no worries about thread safe access to the data. Cons: Client gets a copy of out-of-date data, not fresh up-to-date data. Also, copying is expensive.
2) Hand out an object that locks the underlying collection when it is read from. You'll have to write your own read-only collection that has a reference to the lock of the "parent" collection. Design both objects carefully so that deadlocks are impossible. Pros: "just works" from the client's perspective; they get up-to-date data without having to worry about locking. Cons: More work for you.
3) Punt the problem to the client. Expose the lock, and make it a requirement that clients lock all views on the data themselves before using it. Pros: No work for you. Cons: Way more work for the client, work they might not be willing or able to do. Risk of deadlocks, etc, now become the client's problem, not your problem.
If you want a snapshot of the current state of the dictionary, there's really nothing else you can do with this collection type. This is the same technique used by the ConcurrentDictionary<TKey, TValue>.Values property.
If you don't mind throwing an InvalidOperationException if the collection is modified while you are enumerating it, you could just return cache.Values since it's readonly (and thus can't corrupt the dictionary data).
EDIT: I personally believe the below code is technically answering your question correctly (as in, it provides a way to enumerate over the values in a collection without creating a copy). Some developers far more reputable than I strongly advise against this approach, for reasons they have explained in their edits/comments. In short: This is apparently a bad idea. Therefore I'm leaving the answer but suggesting you not use it.
Unless I'm missing something, I believe you could expose your values as an IEnumerable<MyClass> without needing to copy values by using the yield keyword:
public IEnumerable<MyClass> Values {
get {
using (sync.ReadLock()) {
foreach (MyClass value in cache.Values)
yield return value;
}
}
}
Be aware, however (and I'm guessing you already knew this), that this approach provides lazy evaluation, which means that the Values property as implemented above can not be treated as providing a snapshot.
In other words... well, take a look at this code (I am of course guessing as to some of the details of this class of yours):
var d = new ThreadSafeDictionary<string, string>();
// d is empty right now
IEnumerable<string> values = d.Values;
d.Add("someKey", "someValue");
// if values were a snapshot, this would output nothing...
// but in FACT, since it is lazily evaluated, it will now have
// what is CURRENTLY in d.Values ("someValue")
foreach (string s in values) {
Console.WriteLine(s);
}
So if it's a requirement that this Values property be equivalent to a snapshot of what is in cache at the time the property is accessed, then you're going to have to make a copy.
(begin 280Z28): The following is an example of how someone unfamiliar with the "C# way of doing things" could lock the code:
IEnumerator enumerator = obj.Values.GetEnumerator();
MyClass first = null;
if (enumerator.MoveNext())
first = enumerator.Current;
(end 280Z28)
Review next possibility, just exposes ICollection interface, so in Values() you can return your own implementation. This implementation will use only reference on Dictioanry.Values and always use ReadLock for access items.

Categories