C# Modification of IEnumerable while Enumerating with ForEach

C# Modification of IEnumerable while Enumerating with ForEach - c#

This is something that I was exploring to see if I could take what was
List<MdiChild> openMdiChildren = new List<MdiChild>();
foreach(child in MdiManager.Pages)
{
openMdiChildren.Add(child);
}
foreach(child in openMdiChild)
{
child.Close();
}
and shorten it to not require 2 foreach loops.
Note I've changed what the objects are called to simplify this for this example (these come from 3rd party controls). But for information and understanding
MdiManager.Pages inherits form CollectionBase, which in turn inherits IEnumerable
and MdiChild.Close() removes the open child from the MdiManager.Pages Collection, thus altering the collection and causing the enumeration to throw an exception if the collection was modified during enumeration, e.g..
foreach(child in MdiManage.Pages)
{
child.Close();
}
I was able to the working double foreach to
((IEnumerable) MdiManager.Pages).Cast<MdiChild>.ToList()
.ForEach(new Action<MdiChild>(c => c.Close());
Why does this not have the same issues dealing with modifying the collection during enumeration? My best guess is that when Enumerating over the List created by the ToList call that it is actually executing the actions on the matching item in the MdiManager.Pages collection and not the generated List.
Edit
I want to make it clear that my question is how can I simplify this, I just wanted to understand why there weren't issues with modifying a collection when I performed it as I have it written currently.

Your call to ToList() is what saves you here, as it's essentially duplicating what you're doing above. ToList() actually creates a List<T> (a List<MdiChild> in this case) that contains all of the elements in MdiManager.Pages, then your subsequent call to ForEach operates on that list, not on MdiManager.Pages.
In the end, it's a matter of style preference. I'm not personally a fan of the ForEach function (I prefer the query composition functions like Where and ToList() for their simplicity and the fact that they aren't engineered to have side-effects upon the original source, whereas ForEach is not).
You could also do:
foreach(child in MdiManager.Pages.Cast<MdiChild>().ToList())
{
child.Close();
}
Fundamentally, all three approaches do exactly the same thing (they cache the contents of MdiManager.Pages into a List<MdiChild>, then iterate over that cached list and call Close() on each element.

When you call the ToList() method you're actually enumerating the MdiManager.Pages and creating a List<MdiChild> right there (so that's your foreach loop #1). Then when the ForEach() method executes it will enumerate the List<MdiChild> created previously and execute your action on each item (so that's foreach loop #2).
So essentially it's another way of accomplishing the same thing, just using LINQ.

You could also write it as:
foreach(var page in MdiManager.Pages.Cast<MdiChild>.ToList())
page.Close();
In any case, when you call ToList() extension method on an IEnumerable; you are creating a brand new list. Deleted from its source collection ( in this case, MdiManager.Pages ) will not affect the list output by ToList().
This same technique can be used to delete elements from a source collection without worrying about affecting the source enumerable.

You're mostly right.
ToList() creates a copy of the enumeration, and therefore you are enumerating the copy.
You could also do this, which is equivalent, and shows what you are doing:
var copy = new List<MdiChild>(MdiManager.Pages.Cast<MdiChild>());
foreach(var child in copy)
{
child.Close();
}
Since you are enumerating the elements of the copy enumeration, you don't have to worry about modifying the Pages collection, since each object referece that existed in the Pages collection now also exists in copy and changes to Pages don't affect it.
All the remaining methods on the call, ForEach() and the casts, are superfluous and can be eliminated.

At first glance, the culprit is ToList(), which is a method returning a copy of the items as a List, thus circumventing the problem.

Related

Can I use a normal foreach on a ConcurrentBag?

In a parallel section of my code, I save the results from each thread to a ConcurrentBag. However, when this is complete, I need to iterate through each of these results and run them through my evaluation algorithm. Will a normal foreach actually iterate through all members, or do I need special code? I've also thought about using something like a queue instead of a bag, but I don't know which would be best. The bag will typically contain only 20 or so items at the end of the parallel code.
ie, will actually access and run foreach for ALL members of the ConcurrentBag?
ConcurrentBag futures = new ConcurrentBag();
foreach(move in futures)
{
// stuff
}

You don't need special code, the C#'s foreach will call "GetEnumerator" which gives you a snapshot:
The items enumerated represent a moment-in-time snapshot of the
contents of the bag. It does not reflect any update to the collection
after GetEnumerator was called.

You can use normal ForEach on ConcurrentBag, but it wont produce any performance impact.
This is similar like using ForEach on List.
For better performace use Parallel.Foreach on ConcurrentBag.
Parallel.ForEach(futures,move=> doOperation;);
Use ConcurrentBag, only if u want to perform multithreaded operations.

does foreach loop handle Changes in list length correctly?

does foreach correctly iterate over flexible list?
for example
//will iterate over all items in list?
foreach (var obj in list)
{
//list length changes here
//ex:
list.Add(...);
list.Remove(...);
list.Concat(...);
// and so on
}
and if it does ...how?

You can't modify a collection while enumerating it inside a foreach statement.
You should use another pattern to do what you are trying to do because the for each does not allow you to change the enumerator you are looping to.
For Example:
Imagine if you run a foreach on a sorted list from the beginning, you start processing item with key="A" then you go to "B" then you change "C" to "B", what's going to happen? Your list is resorted and you don't know anymore what you are looping and where you are.
In general you "could" do it with a for(int i=dictionary.count-1; i>=0; --i) or something like that but this also depends on your context, I would really try to use another approach.
Internal Working: IEnumerator<t> is designed to enable the iterator pattern for iterating over collections of elements, rather than the length-index. IEnumerator<t> includes two members.
The first is bool MoveNext(). Using this method, we can move from one element within the collection to the next while at the same time detecting when we have enumerated through every item using the Boolean return.
The second member, a read-only property called Current, returns the element currently in process. With these two members on the collection class, it is possible to iterate over the collection simply using a while loop.
The MoveNext() method in this listing returns false when it moves past the end of the collection. This replaces the need to count elements while looping. (The last member on IEnumerator<t> , Reset(), will reset the enumeration.)

Per the documentation, if changes are made inside the loop the behavior is undefined. Undefined means that there are no restrictions on what it can do, there is no "incorrect behavior" when the behavior is undefined...crash, do what you want, send an email to your boss calling him nasty names and quiting, all equally valid. I would hope for a crash in this case, but again, whatever happens, happens and is considered "correct" according to the documentation.

You cannot change the collection inside the for each loop of the same collection.
if you want you can use for loop to change the collection length.

The collection you use in a foreach loop is immutable. As per MSDN
The foreach statement is used to iterate through the collection to get
the information that you want, but can not be used to add or remove
items from the source collection to avoid unpredictable side effects.
If you need to add or remove items from the source collection, use a
for loop.
But as per this link, it looks like this is now possible from .Net 4.0

The interaction between yield and LINQ

I was reading a piece of code from the "XStreamingReader" library (which seems like a really cool solution for being able to execute LINQ queries over XML documents but without loading the actual document into the memory (like in an XDocument object)
and was wondering about the following:
public IEnumerable<XElement> Elements()
{
using (var reader = readerFactory())
{
reader.MoveToContent();
MoveToNextElement(reader);
while (!reader.EOF)
{
yield return XElement.Load(reader.ReadSubtree());
MoveToNextFollowing(reader);
}
}
}
public IEnumerable<XElement> Elements(XName name)
{
return Elements().Where(x => x.Name == name);
}
Regarding the 2nd method Elements(XName) - The method first calls Elements(), and then use Where() to filter it's results, but i'm kind of intrigued about the order of executions in here since Elements() contains a yield statement.
From what I understand:
- Executing Elements() returns an IEnumerable collection, this collection physically does not contain any items YET.
- Where() is executed on that collection, behind the scene there's a loop which iterates through every item, new items are "Loaded" on the fly, since yield is being used.
- All items which matched the Where statement are returned as an IEnumerable collection, and are PHYSICALLY IN that collection.
First, am I correct with the above assumption?
Second, in case i'm right - what if I wanted to return a "yielded" collection rather than returning a collection which is filled up physically with all the filtered data?
I'm asking this because it loses the entire purpose of NOT reading an entire "matching" block into the memory, but iterating one matching element at a time...

I assume when you say that items are physically in a collection, you mean that there is a structure in memory that contains all the items right now. With Where(), that's not the case, it uses yield too internally (or something that acts the same as yield).
When you try to fetch the first item, Where() iterates the source collection, until it finds the first item that matches. So, the elements are streamed both in Elements() and in Elements(XName) and the whole collection is never in memory, only piece by piece.

Where() is executed on that collection
First, am I correct with the above assumption?
No. Where returns a lazy IEnumerable<XElement>. Later, when that IEnumerable<XElement> is enumerated, then the elements are yielded and filtered.
If the thing which enumerates that lazy IEnumerable happens to collect the elements (such as a call to ToList), then all the elements will be in memory at that point. If the thing which enumerates that lazy IEnumerable happens to process each item one at a time (such as a foreach loop, which does not retain a reference to the XElement), then only one item at a time will be in memory.

All items which matched the Where statement are returned as an IEnumerable collection, and are PHYSICALLY IN that collection. First, am I correct with the above assumption?
No. Where implements an additional enumerator internally, which does what you want it to do. If the IEnumerable is not enumerated, then the reader is never called, and the individual XElement instances never get created, and the filtering code is never run.
See Jon Skeet's article on re-implementing the behavior of the Where clause: http://msmvps.com/blogs/jon_skeet/archive/2010/09/03/reimplementing-linq-to-objects-part-2-quot-where-quot.aspx . He mimics the existing implementation (for explanitory purposes - no need to use his re-implementation in real code), and his code uses yield return.
Note that if you call ToList, though, then the entire enumeration will be evaluated and copied to a list, so be careful what you do with the IEnumerable that Where returns.
Also keep in mind that if the reader returned by readerFactory is reading from memory (e.g. StringReader), then the document will exist physically in memory - there just won't be any instance of DOM nodes until you enumerate them. And once you enumerate those elements, your document will exist twice in memory, one for the original document, one in DOM form. You may want to ensure that your streaming is done against a non-memory stream (e.g. directly from a file or network stream).

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.

Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items

Loop through it once and create a second array which contains the items which should not be deleted.

If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.

You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}

I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}

It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);

Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)

IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.

Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });

A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).

an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Foreach while adding items to the looping collection

This is the situation:
I'm browsing through some code and I wondered if the following statement takes a reference of the selected collection or a copy with which it replaces the original object when the foreach loop finishes. If the first, will it take the new found pages and join them in the loop?
foreach(Page page in Pages)
{
page.AddRange(RetrieveSubPages(page.Id));
}
Edit: I'm sorry, I made a typo.
It should be this:
foreach(Page page in pages)
{
pages.AddRange(RetrieveSubPages(page.Id));
}
What i tried to say is that if i add some objects to the enumerating collection, will it join those objects in the foreach?

It looks like the code doesn't modify the Pages collection, but the content of the objects in the Page objects in the Pages collection. The Page type having at least collection like method.
In general each collection implements iteration in a way suitable for itself, and generally becomes unmodifiable while iterating, but one could implelment a collection which iterates by taking a snapshot of itself.
There is no mechanism to detect exit from a loop which would allow action to be taken at that point (consider how this would interact with exceptions, break and return in the body of the loop).

In most cases, foreach works against the live collection (no explicit clone), and if you try to change the collection while enumerating it, then the enumerator breaks with an exception. So if you are adding to Pages, expect problems.

I think the safest way is this:
Array<Page> newpages = new Array<Page>();
foreach(Page page in pages)
{
newpages.AddRange(RetrieveSubPages(page.Id));
}
pages.AddRange(newpages);
You'd have to extend this a bit if you wanted to recurse into the subpages.

In response to you question, it does not make a copy.
It creates an enumerator and iterates through the collection. If the collection is changed while this enumeration is happening, in the foreach itself, or asynchronously, you will get an exception:
An unhandled exception of type 'System.InvalidOperationException' occurred in mscorlib.dll
Additional information: Collection was modified; enumeration operation may not execute.
You can, use a temporary collection and join the two afterwards, or just not use an enumerator.
for (int i = 0; i < pages.Count; i++)
{
test.AddRange(RetrieveSubPages(pages[i].Id));
}

foreach uses an enumerator.
The collection over which you loop using foreach, has to implement IEnumerable (or IEnumerable<T>).
Then, foreach calls the GetEnumerator method of that collection, and uses the Enumerator to traverse the collection.

You are not modifying the collection you are enumerating, therefore you won't have any problems with this code.
It is also irrelevant, if an clone of the collection is being enumerated, because the objects contained by both, collection and clone, are still the same (reference equals).

I'm pretty sure you'll get an exception thrown complaining that the underlying collection was modified

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.