Creating a consuming foreach in C# - c#

Is there a way to create a consuming foreach loop in C#?
basically loop through a Collection items and simultaneously consume them?
In plain English: Instead of just looping through the elements, remove the item from the collection, do stuff with it, then go to the next.
EDIT I neglected to mention that I am using a producing consuming pattern and this is why I wanted a consuming foreach. Most of the answers here are completely valid for the simple case I described before this edit, although what I wanted is described in my answer

Use a Stack(last-in-first-out) or Queue(first-in, first-out).
A nice example is this "recursive" queue which lists all the files in a directory.

As pointed out in the comments for your question, using a Stack or a Queue is the best option for a "process-then-remove" pattern. If for some reason you dont't want to use these specialized classes, you can do something like the following with a plain List:
var items = new List<Item>() {item1, item2, item3};
while(items.Count > 0)
{
ProcessItem(item[0]);
items.RemoveAt(0);
}
But what you cannot do is to modify the collection inside the foreach loop, as explained in StepUp's answer.

The collection used in foreach is immutable. This is made by design.
MSDN says:
The foreach statement is used to iterate through the collection to get the information that you want, but can not be used to add or remove items from the source collection to avoid unpredictable side effects. If you need to add or remove items from the source collection, use a for loop.
Update 0.0:
It is possible to use for loop to edit collection:
To add an item to collection, you can use for loop:
List<string> coll = new List<string>() { "1", "2", "3", "4", "5" };
for (int i = coll.Count-1; i>=0 ; i--)
{
/* You can assign value to an item:
coll[i] = "";
You can add an item to the collection:
coll.Add(i.ToString());*/
coll.RemoveAt(i);
//or coll.Remove("your string value to delete");
}

Not exactly a "consuming foreach" but probably what you are looking for is the TPL DataFlow.
See for example:https://leanpub.com/tpldataflowbyexample/read or MSDN: Create a DataFlow Pipeline or MSDN: Implement a Producer / Consumer Scenario
A very very basic example can look like this:
var actionBlock = new ActionBlock<int>(n => Console.WriteLine(n));
for (int i = 0; i < 10; i++) {
actionBlock.Post(i);
}
Console.WriteLine("Done");
You can also link multiple blocks to create a more complex consuming strategy with forks and so on. You can also defined varying Scheduler types to the steps.
For very easy scenarios this might be oversized, but TPL is a very good scaling solution with give you a lot of control on how the dataflow should look like.

As stated in the answer from StepUp the collection used in the foreach loop is immutable. But a similar thing could be performed with BlockingCollection and using GetConsumingEnumerable which consumes the Collection items while iterating through them. Given that my collection is part of concurrent produce consume scheme (that I neglected to mention in my question) this suits me more. In simpler cases though the other answers, using a stack or queue are much more appropriate, or simple loop that would loop from the end of the collection, down to the start and remove the item on the way (a start to end loop with removal will skip items or end up in an IndexOutOfBounds Exception

Related

Collection mutation in foreach upon exiting loop: is it an acceptable pattern?

It's well known that the mutation of a collection within an iteration loop is not allowed. The runtime will throw an exception when, for instance, an item is removed.
However, today I was surprised to notice that there's no exception if the mutating operation is followed by any exit-loop statement. That is, the loop ends.
//this won't throw!
var coll = new List<int>(new[] { 1, 2, 3 });
foreach (var item in coll)
{
coll.RemoveAt(1);
break;
}
I watched at the framework code, and it's pretty clear that the exception is thrown only when the iterator will moved forward.
My question is: the above "pattern" could be considered an acceptable practice, or is there any sneaky problem on using it?
In the example you gave, and to answer the question "Is it good practice/acceptable to modify a collection during enumeration if you break after the change", it is fine to do this, as long as you are aware of the side-effects.
The primary side effect, and why I would not recommend doing this in the general case, is that the rest of your foreach loop doesn't execute. I would consider that a problem in almost every instance of a foreach that I have used.
In most (if not all) instances where you could get away with this, a simple if check would suffice (as Servy has in his answer), so you may want to look at what other options you have available if you find yourself writing this kind of code a lot.
The most common general solution is to add to a "kill" list, and then remove after your iteration:
List<int> killList = new List<int>();
foreach (int i in coll)
{
if (i < 0)
killList.Add(i);
...
}
foreach (int i in killList)
coll.Remove(i);
There are various ways to make this code shorter, but this is the most explicit way of doing it.
You can also iterate backwards, which won't cause the exception to be thrown. This is a neat workaround, but you may want to add a comment explaining why you are iterating backwards.
So your example can be relied on to work, for starters. Mutating a collection while iterating fails when you go to ask for the next item. Since this provably never asks for another item after it mutates the list, we know that won't happen. Of course, the fact that it works doesn't mean that it's clear, or that it's a good idea to use it.
What this is trying to do is remove the second item if there is an item to remove. It is designed to not break when trying to remove an item from a collection without two items. This is not a well designed way of doing that though; it's confusing to the readers and doesn't effectively convey its intentions. A much clearer method of accomplishing the same goal is something like the following:
if(coll.Count > 1)
coll.RemoveAt(1);
In the more general case, such a Remove in a foreach can only ever be used to remove one item, so for those cases you're better off transforming the forech into an if that validates that there is an item to remove (if needed, as it is here), and then a call to remove that single item (which may involve a query to find the item to remove, instead of using a hard coded index).

What is the fastest way of changing Dictionary<K,V>?

This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.

In c# is it safe to expand a List that's being traversed with foreach?

In c# is it safe to expand a List that's being traversed with foreach?
I assume that by expand you mean to add new items to the collection. If so then the answer is not, you will get an exception on the traversal. I don't believe any collection can do this.
You can create a new list and then do an AddRange on the original list.
In c# is it safe to expand a List that's being traversed with foreach? If not then how about other collections?
There are very few collections that safely let you add to them while being iterated. There are quite a few options here - The most common would be to either build a new collection from the original, or add items into a temporary collection while iterating, then add them all to the original collection at the end.
The only collections in the framework which are designed with iteration and insertion in mind are some of the concurrent collections. For example, you can be iterating a BlockingCollection<T> via GetConsumingEnumerable and Add items to it at the same time. However, this is intended for a different purpose - it's typically used when having a separate consumer and producer thread, one adding, while the other processes items. As such, doing this within its own loop would be a very odd use case.
No, you will get an exception. While it isn't recommended, you can accomplish what you're looking for by using a simple for loop. The reason you're getting an exception is because of how foreach works. When compiled it is actually using the IEnumerable<T> or IEnumerable that is implemented by the List<T> to get the items. Now you can create your own collection which would allow something like this, but again, not recommended.
foreach is only for seeing values of any collection, if you'll be change number of elements in collection - exception will be thrown. If you will be change values in collection - nothing will happen, but Microsoft advice not to use foreach for such case.
IF you need to change elements or number of elements use list.ToArray() and FOR cycle through the array.
Simply create a new list. For example with buttons:
List<Button> list = new List<Button>();
list.Add(new Button());
list.Add(new Button());
foreach (Button button in new List<Button>(list))
list.Add(new Button());
Not the best solution, but probably the easiest.
Some possibly good, some complicated solutions here. How about cyling though the list with a while based on the lists length and the current position, that should work?
UInt16 n = 0;
while (n < list.Count)
{
... // might add new elements to the end of the list
n++;
}

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Using foreach (...) syntax while also incrementing an index variable inside the loop

When looking at C# code, I often see patterns like this:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
int i = 0;
foreach (DataType item in items)
{
// Do some stuff with item, then finally
itemProps[i] = item.Prop;
i++;
}
The for-loop iterates over the objects in items, but also keeping a counter (i) for iterating over itemProps as well. I personally don't like this extra i hanging around, and instead would probably do something like:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
for (int i = 0; i < items.Length; i++)
{
// Do some stuff with items[i], then finally
itemProps[i] = items[i].Prop;
}
Is there perhaps some benfit to the first approach I'm not aware of? Is this a result of everybody trying to use that fancy foreach (...) syntax? I'm interested in your opinions on this.
If you are using C# 3.0 that will be better;
OtherDataType[] itemProps = items.Select(i=>i.Prop).ToArray();
With i being outside the array then if would be available after the completion of the loop. If you wanted to count the number of items and the collection didn't provide a .Count or .UBound property then this could be useful.
Like you I would normally use the second method, looks much cleaner to me.
In this case, I don't think so. Sometimes, though, the collection doesn't implement this[int index] but it does implement GetEnumerator(). In the latter case, you don't have much choice.
Some data structures are not well suited for random access but can be iterated over very fast ( Trees, linked lists, etc ). So if you need to iterate over one of these but need a count for some reason, your doomed to go the ugly way...
Semantically they may be equivalent, but in fact using foreach over an enumerator gives the compiler more scope to optimise.
I don't remember all the arguments off the top of my head,but they are well covered in Effective C#, which is recommended reading.
foreach (DataType item in items)
This foreach loop makes it crystal clear that you're iterating over all the DataType item of, well yes, items. Maybe it makes the code a little longer, but it's not a "bad" code. For the other for-loop, you need to check inside the brackets to have an idea for what this loop is used.
The problem with this example lies in the fact that you're iterating over two different arrays in the same time which we don't do that often.. so we are stuck between two strategies.. either we "hack a bit" the fancy-foreach as you call it or we get back on the old-not-so-loved for(int i = 0; i ...). (There are other ways than those 2, of course)
So, I think it's the Vim vs Emacs things coming back in your question with the For vs Foreach loop :) People who like the for(), will say this foreach is useless, might cause performance issues and is just big. People who prefere foreach will say something like, we don't care if there's two extra line if we can read the code and maintenance it easily.
Finally, the i is outside the scope first the first example and inside for the second.. reasons to that?! Because if you use the i outside of your foreach, I would have called differently. And, for my opinion, I prefer the foreach ways because you see immediately what is happening. You also don't have to think about if it's < or =. You know immediately that you are iterating over all the list, However, sadly, people will forget about the i++ at the end :D So, I say Vim!
Lets not forget that some collections do not implement a direct access operator[] and that you have to iterate using the IEnumerable interface which is most easily accessed with foreach().

Categories