Remove in foreach just before break - c#

I know there are lots of ways to do it much better but I've seen it in existing code and now I'm wondering whether or not this could have any negative side effects. Please note the break right after Remove. Therefore I don't care about the iterator in general, however, I do care about unexpected behavior (-> potential exceptions).
foreach (var item in items)
{
//do stuff
if (item.IsSomething)
{
items.Remove(item); //is this safe???
break;
}
}
Could it also be possible the compiler optimizes something in a way I don't expect?

The compiler generates a call to Dispose() on the enumerator that is executed in a finally block, but that shouldn't be a problem. If you break right after removing the item, nothing bad should happen, since you don't use the enumerator anymore.
If you want to do it a different way though (for style reasons or whatever), you could do this:
var item = items.FirstOrDefault(i => i.IsSomething);
if (item != null) {
items.Remove(item);
}
It's also a bit shorter :) (I am assuming here you are using a reference or nullable type in your collection).

The compiler and everything else which is in touch with your application guarantees SC-DRF (sequential consistency for data-race-free programs), so you won't see the difference between the program you wrote and the program which is executed (which is anything but the same). Assuming items is not shared between multiple threads this is completely safe to write and has no unexpected behaviors others than if you would call Remove outside the loop.

You can't change the list while iterating within foreach.
The underlying collection cannot be modified while it's being enumerated. A standard approach is to keep the items to remove in second list , and then after Items has been enumerated, then remove each item from Items.

then u can do this -- its more efficient when dealing with large lists (Assuming entity framework)
var reducedList = items.where(a=>a.IsSomething).toList();
foreach(var item in reducedList)
{
reducedList.Remove(item);
}
this reduces the foreach loop iterations

Related

Collection mutation in foreach upon exiting loop: is it an acceptable pattern?

It's well known that the mutation of a collection within an iteration loop is not allowed. The runtime will throw an exception when, for instance, an item is removed.
However, today I was surprised to notice that there's no exception if the mutating operation is followed by any exit-loop statement. That is, the loop ends.
//this won't throw!
var coll = new List<int>(new[] { 1, 2, 3 });
foreach (var item in coll)
{
coll.RemoveAt(1);
break;
}
I watched at the framework code, and it's pretty clear that the exception is thrown only when the iterator will moved forward.
My question is: the above "pattern" could be considered an acceptable practice, or is there any sneaky problem on using it?
In the example you gave, and to answer the question "Is it good practice/acceptable to modify a collection during enumeration if you break after the change", it is fine to do this, as long as you are aware of the side-effects.
The primary side effect, and why I would not recommend doing this in the general case, is that the rest of your foreach loop doesn't execute. I would consider that a problem in almost every instance of a foreach that I have used.
In most (if not all) instances where you could get away with this, a simple if check would suffice (as Servy has in his answer), so you may want to look at what other options you have available if you find yourself writing this kind of code a lot.
The most common general solution is to add to a "kill" list, and then remove after your iteration:
List<int> killList = new List<int>();
foreach (int i in coll)
{
if (i < 0)
killList.Add(i);
...
}
foreach (int i in killList)
coll.Remove(i);
There are various ways to make this code shorter, but this is the most explicit way of doing it.
You can also iterate backwards, which won't cause the exception to be thrown. This is a neat workaround, but you may want to add a comment explaining why you are iterating backwards.
So your example can be relied on to work, for starters. Mutating a collection while iterating fails when you go to ask for the next item. Since this provably never asks for another item after it mutates the list, we know that won't happen. Of course, the fact that it works doesn't mean that it's clear, or that it's a good idea to use it.
What this is trying to do is remove the second item if there is an item to remove. It is designed to not break when trying to remove an item from a collection without two items. This is not a well designed way of doing that though; it's confusing to the readers and doesn't effectively convey its intentions. A much clearer method of accomplishing the same goal is something like the following:
if(coll.Count > 1)
coll.RemoveAt(1);
In the more general case, such a Remove in a foreach can only ever be used to remove one item, so for those cases you're better off transforming the forech into an if that validates that there is an item to remove (if needed, as it is here), and then a call to remove that single item (which may involve a query to find the item to remove, instead of using a hard coded index).

Call a method in the declaration of a foreach

I have a foreach which calls a method to get its collection.
foreach(var item in GetItemDetails(item))
{
}
Visual studio doesn't complain about this, nor does Resharper, however, before I continue to use this approach, I want to reach out and check if it is a recommended approach.
There's nothing wrong with that. The method will only be evaluated once.
It is basically:
using(var iter = GetItemDetails(item).GetEnumerator())
{
while(iter.MoveNext()
{
var item = iter.Current;
// ...
}
}
There's nothing wrong with it.
Just two suggestions:
If what you will put in the loop can written as a method for single items, which would make it re-usable, I would also consider the List.ForEach(...); method.
Info: http://msdn.microsoft.com/en-us/library/bwabdf9z%28v=vs.110%29.aspx
In case you'd be really after performance (which may happen even in C#), the for loop is usually the fastest, though less readable as less concise code:
Info: In .NET, which loop runs faster, 'for' or 'foreach'?

Why can't I modify the loop variable in a foreach?

Why is a foreach loop a read only loop? What reasons are there for this?
I'm not sure exactly what you mean by a "readonly loop" but I'm guessing that you want to know why this doesn't compile:
int[] ints = { 1, 2, 3 };
foreach (int x in ints)
{
x = 4;
}
The above code will give the following compile error:
Cannot assign to 'x' because it is a 'foreach iteration variable'
Why is this disallowed? Trying to assigning to it probably wouldn't do what you want - it wouldn't modify the contents of the original collection. This is because the variable x is not a reference to the elements in the list - it is a copy. To avoid people writing buggy code, the compiler disallows this.
I would assume it's how the iterator travels through the list.
Say you have a sorted list:
Alaska
Nebraska
Ohio
In the middle of
foreach(var s in States)
{
}
You do a States.Add("Missouri")
How do you handle that? Do you then jump to Missouri even if you're already past that index.
If, by this, you mean:
Why shouldn't I modify the collection that's being foreach'd over?
There's no surety that the items that you're getting come out in a given order, and that adding an item, or removing an item won't cause the order of items in the collection to change, or even the Enumerator to become invalid.
Imagine if you ran the following code:
var items = GetListOfTOfSomething(); // Returns 10 items
int i = 0;
foreach(vat item in items)
{
i++;
if (i == 5)
{
items.Remove(item);
}
}
As soon as you hit the loop where i is 6 (i.e. after the item is removed) anything could happen. The Enumerator might have been invalidated due to you removing an item, everything might have "shuffled up by one" in the underlying collection causing an item to take the place of the removed one, meaning you "skip" one.
If you meant "why can't I change the value that is provided on each iteration" then, if the collection you're working with contains value types, any changes you make won't be preserved as it's a value you're working with, rather than a reference.
The foreach command uses the IEnumerable interface to loop throught the collection. The interface only defined methods for stepping through a collection and get the current item, there is no methods for updating the collection.
As the interface only defines the minimal methods required to read the collecton in one direction, the interface can be implemented by a wide range of collections.
As you only access a single item at a time, the entire collection doesn't have to exist at the same time. This is for example used by LINQ expressions, where it creates the result on the fly as you read it, instead of first creating the entire result and then let you loop through it.
Not sure what you mean with read-only but I'm guessing that understanding what the foreach loop is under the hood will help. It's syntactic sugar and could also be written something like this:
IEnumerator enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
T element = enumerator.Current;
//body goes here
}
If you change the collection (list) it's getting hard to impossible to figure out how to process the iteration.
Assigning to element (in the foreach version) could be viewed as either trying to assign to enumerator.Current which is read only or trying to change the value of the local holding a ref to enumerator.Current in which case you might as well introduce a local yourself because it no longer has anything to do with the enumerated list anymore.
foreach works with everything implementing the IEnumerable interface. In order to avoid synchronization issues, the enumerable shall never be modified while iterating on it.
The problems arise if you add or remove items in another thread while iterating: depending on where you are you might miss an item or apply your code to an extra item. This is detected by the runtime (in some cases or all???) and throws an exception:
System.InvalidOperationException was unhandled
Message="Collection was modified; enumeration operation may not execute."
foreach tries to get next item on each iteration which can cause trouble if you are modifying it from another thread at the same time.

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Using foreach (...) syntax while also incrementing an index variable inside the loop

When looking at C# code, I often see patterns like this:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
int i = 0;
foreach (DataType item in items)
{
// Do some stuff with item, then finally
itemProps[i] = item.Prop;
i++;
}
The for-loop iterates over the objects in items, but also keeping a counter (i) for iterating over itemProps as well. I personally don't like this extra i hanging around, and instead would probably do something like:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
for (int i = 0; i < items.Length; i++)
{
// Do some stuff with items[i], then finally
itemProps[i] = items[i].Prop;
}
Is there perhaps some benfit to the first approach I'm not aware of? Is this a result of everybody trying to use that fancy foreach (...) syntax? I'm interested in your opinions on this.
If you are using C# 3.0 that will be better;
OtherDataType[] itemProps = items.Select(i=>i.Prop).ToArray();
With i being outside the array then if would be available after the completion of the loop. If you wanted to count the number of items and the collection didn't provide a .Count or .UBound property then this could be useful.
Like you I would normally use the second method, looks much cleaner to me.
In this case, I don't think so. Sometimes, though, the collection doesn't implement this[int index] but it does implement GetEnumerator(). In the latter case, you don't have much choice.
Some data structures are not well suited for random access but can be iterated over very fast ( Trees, linked lists, etc ). So if you need to iterate over one of these but need a count for some reason, your doomed to go the ugly way...
Semantically they may be equivalent, but in fact using foreach over an enumerator gives the compiler more scope to optimise.
I don't remember all the arguments off the top of my head,but they are well covered in Effective C#, which is recommended reading.
foreach (DataType item in items)
This foreach loop makes it crystal clear that you're iterating over all the DataType item of, well yes, items. Maybe it makes the code a little longer, but it's not a "bad" code. For the other for-loop, you need to check inside the brackets to have an idea for what this loop is used.
The problem with this example lies in the fact that you're iterating over two different arrays in the same time which we don't do that often.. so we are stuck between two strategies.. either we "hack a bit" the fancy-foreach as you call it or we get back on the old-not-so-loved for(int i = 0; i ...). (There are other ways than those 2, of course)
So, I think it's the Vim vs Emacs things coming back in your question with the For vs Foreach loop :) People who like the for(), will say this foreach is useless, might cause performance issues and is just big. People who prefere foreach will say something like, we don't care if there's two extra line if we can read the code and maintenance it easily.
Finally, the i is outside the scope first the first example and inside for the second.. reasons to that?! Because if you use the i outside of your foreach, I would have called differently. And, for my opinion, I prefer the foreach ways because you see immediately what is happening. You also don't have to think about if it's < or =. You know immediately that you are iterating over all the list, However, sadly, people will forget about the i++ at the end :D So, I say Vim!
Lets not forget that some collections do not implement a direct access operator[] and that you have to iterate using the IEnumerable interface which is most easily accessed with foreach().

Categories