Effect of Any() on the state of an IEnumerable - c#

suppose i have a code already working that goes like this
....
....
foreach(object item in enumerator.GetItems(arg1, arg2....)
{
}
....
....
where getItems is a method of an abstract class (abstractEnumerator), and getItems returns an IEnumerable<object>.
The problem arises because now i would like to use the Any() LINQ extension before the foreach to check if the enumerable is empty, and i want to be sure that it wont change at all the 'state' of the IEnumerable when it reaches the foreach, in order to make sure that the code behaves exactly as before.
However, the tricky part, is i do not have access to the implementations of abstractEnumerator, and therefore, i do not have access to the underlying implementation of the IEnumerable interface.
For all i know the Reset method of the interface could be returning null without doing anything. Imagine that the implementations of the abstractEnumerator are created by third party developers from another company, so i do not have access to their code.
My question is if i can be sure that the state will remain the same, when adding the Any() before the for each, regardless of the underlying implementation.
According to microsoft reference on Any:
' The enumeration of source is stopped as soon as the result can be determined. '
In this case, that i want to stop on the first element (basically what i want to ask is whether the IEnumerable is empty or not), does this means that the enumeration stops before processing the first element (i.e. the state is the same as if Any() wasnt called, regardless of the implementation), or it stops after processing the first element?

Often enumerables themselves are effectively stateless - the state comes in the enumerator which is returned by GetEnumerator(). However, if you want to avoid calling GetEnumerator() twice (which could easily give different results each time, of course), the simplest thing is just to remember whether or not you saw any elements:
bool any = false;
foreach (var element in GetItems(...))
{
any = true;
}
if (any)
{
// whatever
}
That won't help if you want to take an action before the first iteration, of course. If you want to do that, you could use the iterator yourself:
using (var iterator = GetItems(...).GetEnumerator())
{
if (iterator.MoveNext())
{
// Take your "pre-iteration" action
}
do
{
var item = iterator.Current;
// Use the item
} while (iterator.MoveNext());
}

Related

Using LINQ Where result in foreach: hidden if statement, double foreach?

foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}
Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.
To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.
It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.

Why do we need two interfaces to enumerate a collection?

It is quite a while that I have been trying to understand the idea behind IEnumerable and IEnumerator. I read all the questions and answers I could find over the net, and on StackOverflow in particular, but I am not satisfied. I got to the point where I understand how those interfaces should be used, but not why they are used this way.
I think that the essence of my misunderstanding is that we need two interfaces for one operation. I realized that if both are needed, one was probably not enough. So I took the "hard coded" equivalent of foreach (as I found here):
while (enumerator.MoveNext())
{
object item = enumerator.Current;
// logic
}
and tried to get it to work with one interface, thinking something would go wrong which would make me understand why another interface is needed.
So I created a collection class, and implemented IForeachable:
class Collection : IForeachable
{
private int[] array = { 1, 2, 3, 4, 5 };
private int index = -1;
public int Current => array[index];
public bool MoveNext()
{
if (index < array.Length - 1)
{
index++;
return true;
}
index = -1;
return false;
}
}
and used the foreach equivalent to nominate the collection:
var collection = new Collection();
while (collection.MoveNext())
{
object item = collection.Current;
Console.WriteLine(item);
}
And it works! So what is missing here that make another interface required?
Thanks.
Edit:
My question is not a duplicate of the questions listed in the comments:
This question is why interfaces are needed for enumerating in the first place.
This question and this question are about what are those interfaces and how should they be used.
My question is why they are designed the way they are, not what are they, how they work, and why do we need them in the first place.
What are the two interfaces and what do they do?
The IEnumerable interface is placed on the collection object and defines the GetEnumerator() method, this returns a (normally new) object that has implements the IEnumerator interface. The foreach statement in C# and For Each statement in VB.NET use IEnumerable to access the enumerator in order to loop over the elements in the collection.
The IEnumerator interface is esentially the contract placed on the object that actually does the iteration. It stores the state of the iteration and updates it as the code moves through the collection.
Why not just have the collection be the enumerator too? Why have two separate interfaces?
There is nothing to stop IEnumerator and IEnumerable being implemented on the same class. However, there is a penalty for doing this – It won’t be possible to have two, or more, loops on the same collection at the same time. If it can be absolutely guaranteed that there won’t ever be a need to loop on the collection twice at the same time then that’s fine. But in the majority of circumstances that isn’t possible.
When would someone iterate over a collection more than once at a time?
Here are two examples.
The first example is when there are two loops nested inside each other on the same collection. If the collection was also the enumerator then it wouldn’t be possible to support nested loops on the same collection, when the code gets to the inner loop it is going to collide with the outer loop.
The second example is when there are two, or more, threads accessing the same collection. Again, if the collection was also the enumerator then it wouldn’t be possible to support safe multithreaded iteration over the same collection. When the second thread attempts to loop over the elements in the collection the state of the two enumerations will collide.
Also, because the iteration model used in .NET does not permit alterations to a collection during enumeration these operations are otherwise completely safe.
-- This was from a blog post I wrote many years ago: https://colinmackay.scot/2007/06/24/iteration-in-net-with-ienumerable-and-ienumerator/
Your IForeachable cannot even be iterated from two different threads (you cannot have multiple active iterations at all - even from the same thread), because current enumeration state stored in IForeachable itself. You also have to reset your current position each time you finished enumeration, and if you forgot to do that - well, next caller will think your collection is empty. I can only imagine all kind of hard to track bugs this all might lead to.
On the other hand, because IEnumerable returns new IEnumerator for each caller - you can have multiple enumerations in progress simultaneously, because each caller has it's own enumeration state. I think this reason alone is enough to justify two interfaces. Enumeration is essentially read operation, and it would have been very confusing if you cannot read the same thing simultaneously in multiple places.

does foreach loop handle Changes in list length correctly?

does foreach correctly iterate over flexible list?
for example
//will iterate over all items in list?
foreach (var obj in list)
{
//list length changes here
//ex:
list.Add(...);
list.Remove(...);
list.Concat(...);
// and so on
}
and if it does ...how?
You can't modify a collection while enumerating it inside a foreach statement.
You should use another pattern to do what you are trying to do because the for each does not allow you to change the enumerator you are looping to.
For Example:
Imagine if you run a foreach on a sorted list from the beginning, you start processing item with key="A" then you go to "B" then you change "C" to "B", what's going to happen? Your list is resorted and you don't know anymore what you are looping and where you are.
In general you "could" do it with a for(int i=dictionary.count-1; i>=0; --i) or something like that but this also depends on your context, I would really try to use another approach.
Internal Working: IEnumerator<t> is designed to enable the iterator pattern for iterating over collections of elements, rather than the length-index. IEnumerator<t> includes two members.
The first is bool MoveNext(). Using this method, we can move from one element within the collection to the next while at the same time detecting when we have enumerated through every item using the Boolean return.
The second member, a read-only property called Current, returns the element currently in process. With these two members on the collection class, it is possible to iterate over the collection simply using a while loop.
The MoveNext() method in this listing returns false when it moves past the end of the collection. This replaces the need to count elements while looping. (The last member on IEnumerator<t> , Reset(), will reset the enumeration.)
Per the documentation, if changes are made inside the loop the behavior is undefined. Undefined means that there are no restrictions on what it can do, there is no "incorrect behavior" when the behavior is undefined...crash, do what you want, send an email to your boss calling him nasty names and quiting, all equally valid. I would hope for a crash in this case, but again, whatever happens, happens and is considered "correct" according to the documentation.
You cannot change the collection inside the for each loop of the same collection.
if you want you can use for loop to change the collection length.
The collection you use in a foreach loop is immutable. As per MSDN
The foreach statement is used to iterate through the collection to get
the information that you want, but can not be used to add or remove
items from the source collection to avoid unpredictable side effects.
If you need to add or remove items from the source collection, use a
for loop.
But as per this link, it looks like this is now possible from .Net 4.0

Why can't I modify the loop variable in a foreach?

Why is a foreach loop a read only loop? What reasons are there for this?
I'm not sure exactly what you mean by a "readonly loop" but I'm guessing that you want to know why this doesn't compile:
int[] ints = { 1, 2, 3 };
foreach (int x in ints)
{
x = 4;
}
The above code will give the following compile error:
Cannot assign to 'x' because it is a 'foreach iteration variable'
Why is this disallowed? Trying to assigning to it probably wouldn't do what you want - it wouldn't modify the contents of the original collection. This is because the variable x is not a reference to the elements in the list - it is a copy. To avoid people writing buggy code, the compiler disallows this.
I would assume it's how the iterator travels through the list.
Say you have a sorted list:
Alaska
Nebraska
Ohio
In the middle of
foreach(var s in States)
{
}
You do a States.Add("Missouri")
How do you handle that? Do you then jump to Missouri even if you're already past that index.
If, by this, you mean:
Why shouldn't I modify the collection that's being foreach'd over?
There's no surety that the items that you're getting come out in a given order, and that adding an item, or removing an item won't cause the order of items in the collection to change, or even the Enumerator to become invalid.
Imagine if you ran the following code:
var items = GetListOfTOfSomething(); // Returns 10 items
int i = 0;
foreach(vat item in items)
{
i++;
if (i == 5)
{
items.Remove(item);
}
}
As soon as you hit the loop where i is 6 (i.e. after the item is removed) anything could happen. The Enumerator might have been invalidated due to you removing an item, everything might have "shuffled up by one" in the underlying collection causing an item to take the place of the removed one, meaning you "skip" one.
If you meant "why can't I change the value that is provided on each iteration" then, if the collection you're working with contains value types, any changes you make won't be preserved as it's a value you're working with, rather than a reference.
The foreach command uses the IEnumerable interface to loop throught the collection. The interface only defined methods for stepping through a collection and get the current item, there is no methods for updating the collection.
As the interface only defines the minimal methods required to read the collecton in one direction, the interface can be implemented by a wide range of collections.
As you only access a single item at a time, the entire collection doesn't have to exist at the same time. This is for example used by LINQ expressions, where it creates the result on the fly as you read it, instead of first creating the entire result and then let you loop through it.
Not sure what you mean with read-only but I'm guessing that understanding what the foreach loop is under the hood will help. It's syntactic sugar and could also be written something like this:
IEnumerator enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
T element = enumerator.Current;
//body goes here
}
If you change the collection (list) it's getting hard to impossible to figure out how to process the iteration.
Assigning to element (in the foreach version) could be viewed as either trying to assign to enumerator.Current which is read only or trying to change the value of the local holding a ref to enumerator.Current in which case you might as well introduce a local yourself because it no longer has anything to do with the enumerated list anymore.
foreach works with everything implementing the IEnumerable interface. In order to avoid synchronization issues, the enumerable shall never be modified while iterating on it.
The problems arise if you add or remove items in another thread while iterating: depending on where you are you might miss an item or apply your code to an extra item. This is detected by the runtime (in some cases or all???) and throws an exception:
System.InvalidOperationException was unhandled
Message="Collection was modified; enumeration operation may not execute."
foreach tries to get next item on each iteration which can cause trouble if you are modifying it from another thread at the same time.

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Categories