Detecting modifications with an IEnumerable - c#

I have a question that I am surprised hasn't already been asked in exactly this format.
If I have an IEnumerable that is generated based on iterating through a source of data, (and using a yield return statement), how can I detect when there has been a modification to the source after an access via an Enumerator that was generated via a GetEnumerator call?
Here is the strange part: I'm not multi-threading. I think my question has a flaw in it somewhere, because this should be simple. . . I just want to know when the source has changed and the iterator is out of date.
Thank you so much.

You would need to handle creating the enumerator yourself in order to track this information, or, at a minimum use yield return; with your own type of modification tracking in place.
Most of the framework collection classes, for example, keep a "version" number. When they make an enumerator, they keep a snapshot of that version number, and check it during MoveNext(). You could make the same check before calling yield return XXX;

Most collection classes in the .NET BCL uses a version attribute for change tracking. That is: the enumerator is constructed with a version number (integer) and checks the original source of the version number is still the same each iteration (when movenext is called). The collection in turn increments the version attribute each time a modification is made. This tracking mechanism is simple and effective.
2 other ways i've seen are:
Having the collection hold an internal collection containing weak references to outstanding enumerators. and each time a modification is made to the collection, it makes each enumerator which is still alive invalid.
Or implementing events in the collection ( INotifyCollectionChanged ) and simply register on that event in the enumerator. And if raised, mark the enumerator as invalid. This method is relatively easy to implement, generic and comes without to much overhead but requires your collection to support events

Microsoft suggests any modification to an IEnumerable collection should void any existing IEnumerator objects, but that policy is seldom particularly helpful and can sometimes be a nuisance. There is no reason why the author of an IEnumerable/IEnumerator should feel a need to throw an exception if a collection is modified in a way that will not prevent the IEnumerator from returning the same data as it would have returned without such modification. I would go further and suggest that it should be considered desirable, when possible, to have an enumerator remain functional if it can obey the following constraints:
Items which are in the collection throughout the duration of enumeration must be returned exactly once.
Each items which is added or deleted during enumeration may be returned zero or one times, but no more than one. If an object is removed from the collection and re-added, it may be regarded as having been originally housed in one item but put into a new one, so enumeration may legitimately return the old one, the new one, both, or neither.
The VisualBasic.Collection class behaves according to the above constraints; such behavior can be very useful, making it possible to enumerate through the class and remove items meeting a particular criterion.
Of course, designing a collection to behave sensibly if it's modified during enumeration may not necessarily be easier than throwing an exception, but for collections of reasonable size such semantics may be obtained by having the enumerator convert the collection to a list and enumerate the contents of the list. If desired, and especially if thread safety is not required, it may be helpful to have the collection keep a strong or weak reference to the list returned by its enumerator, and void such reference any time it is modified. Another option would be to have a "real" reference to the collection be held in a wrapper class, and have the inner class keep a count of how many enumerators exist (enumerators would get a reference to the real collection). If an attempt is made to modify the collection while enumerators exist, replace the collection instance with a copy and then make the modifications on that (the copy would start with a reference count of zero). Such a design would avoid making redundant copies of the list except in the scenario where an IEnumerator is abandoned without being Dispose'd; even in that scenario, unlike scenarios involving WeakReferences or events, no objects would be kept alive any longer than necessary.

I haven't found an answer, but as a work around I have just been catching the exception like this (WPF example):
while (slideShowOn)
{
if (this.Model.Images.Count < 1)
{
break;
}
var _bitmapsEnumerator = this.Model.Images.GetEnumerator();
try
{
while (_bitmapsEnumerator.MoveNext())
{
this.Model.SelectedImage = _bitmapsEnumerator.Current;
Dispatcher.Invoke(new Action(() => { }), DispatcherPriority.ContextIdle, null);
Thread.Sleep(41);
}
}
catch (System.InvalidOperationException ex)
{
// Scratch this bit: the error message isn't restricted to English
// if (ex.Message == "Collection was modified; enumeration operation may not execute.")
// {
//
// }
// else throw ex;
}
}

Related

Why is the error handling for IEnumerator.Current different from IEnumerator<T>.Current?

I would have thought that executing the following code for an empty collection that implements IEnumerable<T> would throw an exception:
var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
var type = enumerator.Current.GetType(); // Surely should throw?
Because the collection is empty, then accessing IEnumerator.Current is invalid, and I would have expected an exception. However, no exception is thrown for List<T>.
This is allowed by the documentation for IEnumerator<T>.Current, which states that Current is undefined under any of the following conditions:
The enumerator is positioned before the first element in the collection, immediately after the enumerator is created. MoveNext must be called to advance the enumerator to the first element of the collection before reading the value of Current.
The last call to MoveNext returned false, which indicates the end of
the collection.
The enumerator is invalidated due to changes made in the collection, such as adding, modifying, or deleting elements.
(I'm assuming that "fails to throw an exception" can be categorised as "undefined behaviour"...)
However, if you do the same thing but use an IEnumerable instead, you DO get an exception. This behaviour is specified by the documentation for IEnumerator.Current, which states:
Current should throw an InvalidOperationException if the last call to MoveNext returned false, which indicates the end of the collection.
My question is: Why this difference? Is there a good technical reason that I'm unaware of?
It means identical-seeming code can behave very differently depending on whether it's using IEnumerable<T> or IEnumerable, as the following program demonstrates (note how the code inside showElementType1() and showElementType1() is identical):
using System;
using System.Collections;
using System.Collections.Generic;
namespace ConsoleApplication2
{
class Program
{
public static void Main()
{
var list = new List<int>();
showElementType1(list); // Does not throw an exception.
showElementType2(list); // Throws an exception.
}
private static void showElementType1(IEnumerable<int> collection)
{
var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
var type = enumerator.Current.GetType(); // No exception thrown here.
Console.WriteLine(type);
}
private static void showElementType2(IEnumerable collection)
{
var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
var type = enumerator.Current.GetType(); // InvalidOperationException thrown here.
Console.WriteLine(type);
}
}
}
The problem with IEnumerable<T> is that Current is of type T. Instead of throwing an exception, default(T) is returned (it is set from MoveNextRare).
When using IEnumerable you don't have the type, and you can't return a default value.
The actual problem is you don't check the return value of MoveNext. If it returns false, you shouldn't call Current. The exception is okay. I think they found it more convenient to return default(T) in the IEnumerable<T> case.
Exception handling brings overhead, returning default(T) doesn't (that much). Maybe they just thought there was nothing useful to return from the Current property in the case of IEnumerable (they don't know the type). That problem is 'solved' in IEnumerable<T> when using default(T).
According to this bug report (thanks Jesse for commenting):
For performance reasons the Current property of generated Enumerators is kept extremely simple - it simply returns the value of the generated 'current' backing field.
This could point in the direction of the overhead of exception handling. Or the required extra step to validate the value of current.
They effectively just wave the responsibility to foreach, since that is the main user of the enumerator:
The vast majority of interactions with enumerators are in the form of foreach loops which already guard against accessing current in either of these states so it would be wasteful to burn extra CPU cycles for every iteration to check for these states that almost no one will ever encounter.
To better match how people tend to implement it in practice. As is the change of wording from "Current also throws an exception …" in previous versions of the documentation to "Current should throw …" in the current version.
Depending on how the implementation is working, throwing an exception might be quite a bit of work, and yet because of how Current is used in conjunction with MoveNext() then that exceptional state is hardly ever going to come up. This is all the more so when we consider that the vast majority of uses are compiler-generated and don't actually have scope for a bug where Current is called before MoveNext() or after it has returned false to ever happen. With normal use we can expect the case to never come up.
So if you are writing an implementation of IEnumerable or IEnumerable<T> where catching the error condition is tricky, you might well decide not to do so. And if you do make that decision, it probably isn't going to cause you any problems. Yes, you broke the rules, but it probably didn't matter.
And since it won't cause any problems except for someone who uses the interface in a buggy way, documenting it as undefined behaviour moves the burden from the implementer to the caller to not do something the caller shouldn't be doing in the first place.
But all that said, since IEnumerable.Current is still documented as "should throw InvalidOperationException for backwards compatibility and since doing so would match the "undefined" behaviour of IEnumerable<T>.Current, the probably the best way to perfectly fulfil the documented behaviour of the interface is to have IEnumerable<T>.Current throw an InvalidOperationException in such cases, and have IEnumerable.Current just call into that.
In a way this is the opposite to the fact that IEnumerable<T> also inherits from IDisposable. The compiler-generated uses of IEnumerable will check if the implementation also implements IDisposable and call Dispose() if it does, but aside from the slight performance overhead of that test it meant that both implementers and hand-coded callers would sometimes forget about that, and not implement or call Dispose() when they should. Forcing all implementations to have at least an empty Dispose() made life easier for people in the opposite way to just having Current have undefined behaviour when it isn't valid.
If there were no backwards compatibility issues then we would probably have Current documented as undefined in such cases for both interfaces, and both interfaces inheriting from IDisposable. We probably also wouldn't have Reset() which is nothing but a nuisance.

Prevent collection modifications while reading sequentially

I'm working with large collections of objects and sequential reads of them.
I found most questions along these lines refer to multi-threading, but I am more concerned with errors within the thread itself due to misuse of a distributable library.
A system within the library manages a potentially large collection of objects, at one point it performs a sequential read of this collection performing an operation on each element.
Depending on the element implementation, which can be extended outside the library, an object may attempt to remove itself from the collection.
I would like that to be an option, but if this happens when the collection is being sequentially read this can lead to errors. I would like to be able to lock the contents of the collection while its being read and put any removal request on a schedule to be executed after the sequential read has finished.
The removal request has to go through the system since objects do not have public access to the collection, I could just go with an isReading flag but I wonder if there is a more elegant construct.
Does C# or .NET provide a tool to do this? perhaps to lock the list contents so I can intercept removal requests during sequential reads? or would I have to implement that behavior from scratch for this scenario?
You may want to look into using the SynchronizedCollection<T> class in .NET 2.0+.
Alternatively, have a look at the answer to this question: What is the difference between SynchronizedCollection<T> and the other concurrent collections?
You can use the next trick
List<T> collection;
for(int index = collection; index >= 0; --index)
{
var item = collection[index];
if(MUST BE DELETED)
{
collection.RemoveAt(index); // this is faster
OR
collection.Remove(item);
}
}
this code will not crash at collection modified and will process each item of collection

Concurrent collection supporting removal of a specified item?

Quite simple: Other than ConcurrentDictionary (which I'll use if I have to but it's not really the correct concept), is there any Concurrent collection (IProducerConsumer implementation) that supports removal of specific items based on simple equality of an item or a predicate defining a condition for removal?
Explanation: I have a multi-threaded, multi-stage workflow algorithm, which pulls objects from the DB and sticks them in a "starting" queue. From there they are grabbed by the next stage, further worked on, and stuffed into other queues. This process continues through a few more stages. Meanwhile, the first stage is invoked again by its supervisor and pulls objects out of the DB, and those can include objects still in process (because they haven't finished being processed and so haven't been re-persisted with the flag set saying they're done).
The solution I am designing is a master "in work" collection; objects go in that queue when they are retrieved for processing by the first stage, and are removed after they have been re-saved to the DB as "processed" by whatever stage of the workflow completed the necessary processing. While the object is in that list, it will be ignored if it is re-retrieved by the first stage.
I had planned to use a ConcurrentBag, but the only removal method (TryTake) removes an arbitrary item from the bag, not a specified one (and ConcurrentBag is slow in .NET 4). ConcurrentQueue and ConcurrentStack also do not allow removal of an item other than the next one it'll give you, leaving ConcurrentDictionary, which would work but is more than I need (all I really need is to store the Id of the records being processed; they don't change during the workflow).
The reason why there is no such a data structure is that all collections have lookup operation time of O(n). These are IndexOf, Remove(element) etc. They all enumerate through all elements and checking them for equality.
Only hash tables have lookup time of O(1). In concurrent scenario O(n) lookup time would lead to very long lock of a collection. Other threads will not be able to add elements during this time.
In dictionary only the cell hit by hash will be locked. Other threads can continue adding while one is checking for equality through elements in hash cell.
My advice is go on and use ConcurrentDictionary.
By the way, you are right that ConcurrentDictionary is a bit oversized for your solution. What you really need is to check quickly weather an object is in work or not. A HashSet would be a perfect for that. It does basically nothing then Add(element), Contains(element), Remove(element). There is a ConcurrentHeshSet implementation in java. For c# I found this: How to implement ConcurrentHashSet in .Net don't know how good is it.
As a first step I would still write a wrapper with HashSet interface around ConcurrentDictionary bring it up and running and then try different implementations and see performance differences.
As already explained by it other posts its not possible to remove items from a Queue or ConcurrentQueue by default, but actually the easiest way to get around is to extend or wrap the item.
public class QueueItem
{
public Boolean IsRemoved { get; private set; }
public void Remove() { IsRemoved = true; }
}
And when dequeuing:
QueueItem item = _Queue.Dequeue(); // Or TryDequeue if you use a concurrent dictionary
if (!item.IsRemoved)
{
// Do work here
}
It's really hard to make a collection thread-safe in the generic sense. There are so many factors that go into thread-safety that are outside the responsibility or purview of a library/framework class that affect the ability for it to be truly "thread-safe"... One of the drawbacks as you've pointed out is the performance. It's impossible to write a performant collection that is also thread-safe because it has to assume the worst...
The generally recommended practice is to use whatever collection you want and access it in a thread-safe way. This is basically why there aren't more thread-safe collections in the framework. More on this can be found at http://blogs.msdn.com/b/bclteam/archive/2005/03/15/396399.aspx#9534371

IEnumerable<T> thread safety?

I have a main thread that populates a List<T>. Further I create a chain of objects that will execute on different threads, requiring access to the List. The original list will never be written to after it's generated. My thought was to pass the list as IEnumerable<T> to the objects executing on other threads, mainly for the reason of not allowing those implementing those objects to write to the list by mistake. In other words if the original list is guaranteed not be written to, is it safe for multiple threads to use .Where or foreach on the IEnumerable?
I am not sure if the iterator in itself is thread safe if the original collection is never changed.
IEnumerable<T> can't be modified. So what can be non thread safe with it? (If you don't modify the actual List<T>).
For non thread safety you need writing and reading operations.
"Iterator in itself" is instantiated for each foreach.
Edit: I simplified my answer a bit, but #Eric Lippert added valuable comment. IEnumerable<T> doesn't define modifying methods, but it doesn't mean that access operators are thread safe (GetEnumerator, MoveNext and etc.) Simplest example: GetEnumerator implemented as this:
Every time returns same instance of IEnumerator
Resets it's position
More sophisticated example is caching.
This is interesting point, but fortunately I don't know any standard class that has not thread-safe implementation of IEnumerable.
Each thread that calls Where or foreach gets its own enumerator - they don't share one enumerator object for the same list. So since the List isn't being modified, and since each thread is working with its own copy of an enumerator, there should be no thread safety issues.
You can see this at work in one thread - Just create a List of 10 objects, and get two enumerators from that List. Use one enumerator to enumerate through 5 items, and use the other to enumerate through 5 items. You will see that both enumerators enumerated through only the first 5 items, and that the second one did not start where the first enumerator left off.
As long as you are certain that the List will never be modified then it will be safe to read from multiple threads. This includes the use of the IEnumerator instances it provides.
This is going to be true for most collections. In fact, all collections in the BCL should be stable during enumeration. In other words, the enumerator will not modify the data structure. I can think of some obscure cases, like a splay-tree, were enumerating it might modify the structure. Again, none of the BCL collections do that.
If you are certain that the list will not be modified after creation, you should guarantee that by converting it to a ReadOnlyCollection<T>. Of course if you keep the original list that the read only collection uses you can modify it, but if you toss the original list away you're effectively making it permentantly read only.
From the Thread Safety section of the collection:
A ReadOnlyCollection can support multiple readers concurrently, as long as the collection is not modified.
So if you don't touch the original list again and stop referencing it, you can ensure that multiple threads can read it without worry (so long as you don't do anything wacky with trying to modify it again).
In other words if the original list is guaranteed not be written to, is it safe for multiple threads to use .Where or foreach on the IEnumerable?
Yes it's only a problem if the list gets mutated.
But note than IEnumerable<T> can be cast back to a list and then modified.
But there is another alternative: wrap your list into a ReadOnlyCollection<T> and pass that around. If you now throw away the original list you basically created a new immutable list.
If you are using net framework 4.5 or greater, this could be a great soulution
http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
(microsoft already implemented a thread safe enumerable)

Are there any C# collections where modification does not invalidate iterators?

Are there any data structures in the C# Collections library where modification of the structure does not invalidate iterators?
Consider the following:
List<int> myList = new List<int>();
myList.Add( 1 );
myList.Add( 2 );
List<int>.Enumerator myIter = myList.GetEnumerator();
myIter.MoveNext(); // myIter.Current == 1
myList.Add( 3 );
myIter.MoveNext(); // throws InvalidOperationException
Yes, take a look at the System.Collections.Concurrent namespace in .NET 4.0.
Note that for some of the collections in this namespace (e.g., ConcurrentQueue<T>), this works by only exposing an enumerator on a "snapshot" of the collection in question.
From the MSDN documentation on ConcurrentQueue<T>:
The enumeration represents a
moment-in-time snapshot of the
contents of the queue. It does not
reflect any updates to the collection
after GetEnumerator was called. The
enumerator is safe to use concurrently
with reads from and writes to the
queue.
This is not the case for all of the collections, though. ConcurrentDictionary<TKey, TValue>, for instance, gives you an enumerator that maintains updates to the underlying collection between calls to MoveNext.
From the MSDN documentation on ConcurrentDictionary<TKey, TValue>:
The enumerator returned from the
dictionary is safe to use concurrently
with reads and writes to the
dictionary, however it does not
represent a moment-in-time snapshot of
the dictionary. The contents exposed
through the enumerator may contain
modifications made to the dictionary
after GetEnumerator was called.
If you don't have 4.0, then I think the others are right and there is no such collection provided by .NET. You can always build your own, however, by doing the same thing ConcurrentQueue<T> does (iterate over a snapshot).
According to this MSDN article on IEnumerator the invalidation behaviour you have found is required by all implementations of IEnumerable.
An enumerator remains valid as long as the collection remains unchanged. If
changes are made to the collection, such as adding, modifying, or deleting
elements, the enumerator is irrecoverably invalidated and the next call to
MoveNext or Reset throws an InvalidOperationException. If the collection is
modified between MoveNext and Current, Current returns the element that it is
set to, even if the enumerator is already invalidated.
Supporting this behavior requires some pretty complex internal handling, so most of the collections don't support this (I'm not sure about the Concurrent namespace).
However, you can very well simulate this behavior using immutable collections. They don't allow you to modify the collection by design, but you can work with them in a slightly different way and this kind of processing allows you to use enumerator concurrently without complex handling (implemented in Concurrent collections).
You can implement a collection like that easily, or you can use FSharpList<T> from FSharp.Core.dll (not a standard part of .NET 4.0 though):
open Microsoft.FSharp.Collections;
// Create immutable list from other collection
var list = ListModule.OfSeq(anyCollection);
// now we can use `GetEnumerable`
var en = list.GetEnumerable();
// To modify the collection, you create a new collection that adds
// element to the front (without actually copying everything)
var added = new FSharpList<int>(42, list);
The benefit of immutable collections is that you can work with them (by creating copies) without affecting the original one, and so the behavior you wanted is "for free". For more information, there is a great series by Eric Lippert.
The only way to do this is to make a copy of the list before you iterate it:
var myIter = new List<int>(myList).GetEnumerator();
No, they do not exist. ALl C# standard collections invalidate the numerator when the structure changes.
Use a for loop instead of a foreach, and then you can modify it. I wouldn't advise it though....

Categories