I have a main thread that populates a List<T>. Further I create a chain of objects that will execute on different threads, requiring access to the List. The original list will never be written to after it's generated. My thought was to pass the list as IEnumerable<T> to the objects executing on other threads, mainly for the reason of not allowing those implementing those objects to write to the list by mistake. In other words if the original list is guaranteed not be written to, is it safe for multiple threads to use .Where or foreach on the IEnumerable?
I am not sure if the iterator in itself is thread safe if the original collection is never changed.
IEnumerable<T> can't be modified. So what can be non thread safe with it? (If you don't modify the actual List<T>).
For non thread safety you need writing and reading operations.
"Iterator in itself" is instantiated for each foreach.
Edit: I simplified my answer a bit, but #Eric Lippert added valuable comment. IEnumerable<T> doesn't define modifying methods, but it doesn't mean that access operators are thread safe (GetEnumerator, MoveNext and etc.) Simplest example: GetEnumerator implemented as this:
Every time returns same instance of IEnumerator
Resets it's position
More sophisticated example is caching.
This is interesting point, but fortunately I don't know any standard class that has not thread-safe implementation of IEnumerable.
Each thread that calls Where or foreach gets its own enumerator - they don't share one enumerator object for the same list. So since the List isn't being modified, and since each thread is working with its own copy of an enumerator, there should be no thread safety issues.
You can see this at work in one thread - Just create a List of 10 objects, and get two enumerators from that List. Use one enumerator to enumerate through 5 items, and use the other to enumerate through 5 items. You will see that both enumerators enumerated through only the first 5 items, and that the second one did not start where the first enumerator left off.
As long as you are certain that the List will never be modified then it will be safe to read from multiple threads. This includes the use of the IEnumerator instances it provides.
This is going to be true for most collections. In fact, all collections in the BCL should be stable during enumeration. In other words, the enumerator will not modify the data structure. I can think of some obscure cases, like a splay-tree, were enumerating it might modify the structure. Again, none of the BCL collections do that.
If you are certain that the list will not be modified after creation, you should guarantee that by converting it to a ReadOnlyCollection<T>. Of course if you keep the original list that the read only collection uses you can modify it, but if you toss the original list away you're effectively making it permentantly read only.
From the Thread Safety section of the collection:
A ReadOnlyCollection can support multiple readers concurrently, as long as the collection is not modified.
So if you don't touch the original list again and stop referencing it, you can ensure that multiple threads can read it without worry (so long as you don't do anything wacky with trying to modify it again).
In other words if the original list is guaranteed not be written to, is it safe for multiple threads to use .Where or foreach on the IEnumerable?
Yes it's only a problem if the list gets mutated.
But note than IEnumerable<T> can be cast back to a list and then modified.
But there is another alternative: wrap your list into a ReadOnlyCollection<T> and pass that around. If you now throw away the original list you basically created a new immutable list.
If you are using net framework 4.5 or greater, this could be a great soulution
http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
(microsoft already implemented a thread safe enumerable)
Related
I have a question about the thread-safety of the List<T> collection.
Here is my test class:
Test t = new Test();
t.a = 100;
t.b = 20;
t.c = 10;
Then let's say 10 instance of above have been created and added to the List as below.
List<Test> tCollection = new List<Test>();
tCollection.add(t);
Later I iterate through the objects of test in tCollection.
foreach(Test t in tCollection)
{
// do calculation
}
Is adding objects to List<Test> and iteration through List<Test> is thread safe?
The answer is No.
Usual collections are all non thread-safe.
You have to use the thread-safe analogues of collections*. read more about these collections on MSDN.
The default .net collections are not thread-safe.
This means that they contain no additional code that deals with multi-threaded access, since such code would decrease their performance in single-thread scenarios.
As Alexander Galkin said in the other answer, Microsoft provides some collections that are tailor-made for multi-threaded access. However you will note that there is no ConcurrentList<T> that works exactly like a List. This is because creating a thread-safe list with with all the properties of a list such as random access, insertion and removal and good performance is practically impossible.
The closest equivalent in your case would be a System.Collections.Concurrent.ConcurrentQueue<T> or ConcurrentStack<T>.
However, there may be an easier way in your case:
Read-only access to a List is always thread-safe. You can iterate over your list, or access random elements, from as many threads as you want, provided that you do not change the list.
Adding elements to the list is not thread-safe. This will be obvious if you iterate over the list while adding elements, you will get an Exception: "Collection was modified after the enumerator was instantiated".
But even adding elements from multiple threads without iterating over the list at the same time is not safe. You will not get such an obvious error, instead it will sometimes work and sometimes not.
If you only add elements once after the program starts, then you can get by with a normal list. Just make sure that no reader threads are started yet while you add elements to the list in a single writer thread. Once you added all elements to the list, you can iterate over it from as many threads as you want.
For more complex scenarios it is usually better to switch to ConcurrentQueue or ConcurrentStack, instead of implementing your own thread-safe methods using lock.
What happens when a thread is adding or removing elements of a ConcurrentBag<T> while another thread is enumerating this bag? Will the new elements also show up in the enumeration and will the removed elements not show up?
One can read the fine manual to discover:
ConcurrentBag<T>.GetEnumerator Method
The enumeration represents a moment-in-time snapshot of the contents of the bag. It does not reflect any updates to the collection after GetEnumerator was called. The enumerator is safe to use concurrently with reads from and writes to the bag.
Emphasis mine.
Justin Etheredge has a blog post explaining the features of the ConcurrentBag class:
In order to implement the enumerable as thread-safe, the GetEnumerator method returns a moment-in-time snapshot of the ConcurrentBag as it was when you started iterating over it. In this way, any items added after the enumeration started won’t be seen while iterating over the data structure.
This means: When starting to enumerate the ConcurrentBag<T>, a snapshot of the current state is created. The enumeration will only show the elements that were present in the bag at the time the enumeration started.
Other threads can still add and remove elements as they like but this will not change the set of elements seen by the enumeration.
BlockingCollection contains only methods to add individual items. What if I want to add a collection? Should I just use foreach loop?
Why BlockingCollection doesn't contain method to add a collection? I think such method can be pretty useful.
ICollection interfaces and many of the BCL list-type classes don't have an AddRange method for some reason and it is annoying.
Yes, you'll need to foreach over the collection, you could write you own extension method if you're using it a lot.
The BlockingCollection<T> is a wrapper of an underlying collection that implements the IProducerConsumerCollection<T> interface, which by default is a ConcurrentQueue<T>. This interface has only the method TryAdd for adding elements. It doesn't have a TryAddRange. The reason is, I guess, because not all native IProducerConsumerCollection<T> implementation are equipped with AddRange functionality. The ConcurrentStack<T> does have the PushRange method, but the ConcurrentQueue<T> doesn't have something equivalent.
My understanding is that if this API existed, it would have atomic semantics. It wouldn't be just a convenience method for reducing the code needed for doing multiple non-atomic insertions. Quoting from the ConcurrentStack<T>.PushRange documentation:
When adding multiple items to the stack, using PushRange is a more efficient mechanism than using Push one item at a time. Additionally, PushRange guarantees that all of the elements will be added atomically, meaning that no other threads will be able to inject elements between the elements being pushed. Items at lower indices in the items array will be pushed before items at higher indices.
Atomicity is especially important when adding in a blocking-stack, because the item that is added can be immediately taken by another thread that is currently blocked. If I add the items A, B and C with the order C-B-A, it's because I want the A to be placed at the top of the stack, and picked by another thread first. But since the additions are not atomic, another thread could win the race and take the C before the current thread manages to add the B and the A. According to my experiments this happens rarely, but there is no way to prevent it. In case it is absolutely necessary to enforce the insertion order, there is no other option than implementing a custom BlockingStack<T> collection from scratch.
Quite simple: Other than ConcurrentDictionary (which I'll use if I have to but it's not really the correct concept), is there any Concurrent collection (IProducerConsumer implementation) that supports removal of specific items based on simple equality of an item or a predicate defining a condition for removal?
Explanation: I have a multi-threaded, multi-stage workflow algorithm, which pulls objects from the DB and sticks them in a "starting" queue. From there they are grabbed by the next stage, further worked on, and stuffed into other queues. This process continues through a few more stages. Meanwhile, the first stage is invoked again by its supervisor and pulls objects out of the DB, and those can include objects still in process (because they haven't finished being processed and so haven't been re-persisted with the flag set saying they're done).
The solution I am designing is a master "in work" collection; objects go in that queue when they are retrieved for processing by the first stage, and are removed after they have been re-saved to the DB as "processed" by whatever stage of the workflow completed the necessary processing. While the object is in that list, it will be ignored if it is re-retrieved by the first stage.
I had planned to use a ConcurrentBag, but the only removal method (TryTake) removes an arbitrary item from the bag, not a specified one (and ConcurrentBag is slow in .NET 4). ConcurrentQueue and ConcurrentStack also do not allow removal of an item other than the next one it'll give you, leaving ConcurrentDictionary, which would work but is more than I need (all I really need is to store the Id of the records being processed; they don't change during the workflow).
The reason why there is no such a data structure is that all collections have lookup operation time of O(n). These are IndexOf, Remove(element) etc. They all enumerate through all elements and checking them for equality.
Only hash tables have lookup time of O(1). In concurrent scenario O(n) lookup time would lead to very long lock of a collection. Other threads will not be able to add elements during this time.
In dictionary only the cell hit by hash will be locked. Other threads can continue adding while one is checking for equality through elements in hash cell.
My advice is go on and use ConcurrentDictionary.
By the way, you are right that ConcurrentDictionary is a bit oversized for your solution. What you really need is to check quickly weather an object is in work or not. A HashSet would be a perfect for that. It does basically nothing then Add(element), Contains(element), Remove(element). There is a ConcurrentHeshSet implementation in java. For c# I found this: How to implement ConcurrentHashSet in .Net don't know how good is it.
As a first step I would still write a wrapper with HashSet interface around ConcurrentDictionary bring it up and running and then try different implementations and see performance differences.
As already explained by it other posts its not possible to remove items from a Queue or ConcurrentQueue by default, but actually the easiest way to get around is to extend or wrap the item.
public class QueueItem
{
public Boolean IsRemoved { get; private set; }
public void Remove() { IsRemoved = true; }
}
And when dequeuing:
QueueItem item = _Queue.Dequeue(); // Or TryDequeue if you use a concurrent dictionary
if (!item.IsRemoved)
{
// Do work here
}
It's really hard to make a collection thread-safe in the generic sense. There are so many factors that go into thread-safety that are outside the responsibility or purview of a library/framework class that affect the ability for it to be truly "thread-safe"... One of the drawbacks as you've pointed out is the performance. It's impossible to write a performant collection that is also thread-safe because it has to assume the worst...
The generally recommended practice is to use whatever collection you want and access it in a thread-safe way. This is basically why there aren't more thread-safe collections in the framework. More on this can be found at http://blogs.msdn.com/b/bclteam/archive/2005/03/15/396399.aspx#9534371
Are there any data structures in the C# Collections library where modification of the structure does not invalidate iterators?
Consider the following:
List<int> myList = new List<int>();
myList.Add( 1 );
myList.Add( 2 );
List<int>.Enumerator myIter = myList.GetEnumerator();
myIter.MoveNext(); // myIter.Current == 1
myList.Add( 3 );
myIter.MoveNext(); // throws InvalidOperationException
Yes, take a look at the System.Collections.Concurrent namespace in .NET 4.0.
Note that for some of the collections in this namespace (e.g., ConcurrentQueue<T>), this works by only exposing an enumerator on a "snapshot" of the collection in question.
From the MSDN documentation on ConcurrentQueue<T>:
The enumeration represents a
moment-in-time snapshot of the
contents of the queue. It does not
reflect any updates to the collection
after GetEnumerator was called. The
enumerator is safe to use concurrently
with reads from and writes to the
queue.
This is not the case for all of the collections, though. ConcurrentDictionary<TKey, TValue>, for instance, gives you an enumerator that maintains updates to the underlying collection between calls to MoveNext.
From the MSDN documentation on ConcurrentDictionary<TKey, TValue>:
The enumerator returned from the
dictionary is safe to use concurrently
with reads and writes to the
dictionary, however it does not
represent a moment-in-time snapshot of
the dictionary. The contents exposed
through the enumerator may contain
modifications made to the dictionary
after GetEnumerator was called.
If you don't have 4.0, then I think the others are right and there is no such collection provided by .NET. You can always build your own, however, by doing the same thing ConcurrentQueue<T> does (iterate over a snapshot).
According to this MSDN article on IEnumerator the invalidation behaviour you have found is required by all implementations of IEnumerable.
An enumerator remains valid as long as the collection remains unchanged. If
changes are made to the collection, such as adding, modifying, or deleting
elements, the enumerator is irrecoverably invalidated and the next call to
MoveNext or Reset throws an InvalidOperationException. If the collection is
modified between MoveNext and Current, Current returns the element that it is
set to, even if the enumerator is already invalidated.
Supporting this behavior requires some pretty complex internal handling, so most of the collections don't support this (I'm not sure about the Concurrent namespace).
However, you can very well simulate this behavior using immutable collections. They don't allow you to modify the collection by design, but you can work with them in a slightly different way and this kind of processing allows you to use enumerator concurrently without complex handling (implemented in Concurrent collections).
You can implement a collection like that easily, or you can use FSharpList<T> from FSharp.Core.dll (not a standard part of .NET 4.0 though):
open Microsoft.FSharp.Collections;
// Create immutable list from other collection
var list = ListModule.OfSeq(anyCollection);
// now we can use `GetEnumerable`
var en = list.GetEnumerable();
// To modify the collection, you create a new collection that adds
// element to the front (without actually copying everything)
var added = new FSharpList<int>(42, list);
The benefit of immutable collections is that you can work with them (by creating copies) without affecting the original one, and so the behavior you wanted is "for free". For more information, there is a great series by Eric Lippert.
The only way to do this is to make a copy of the list before you iterate it:
var myIter = new List<int>(myList).GetEnumerator();
No, they do not exist. ALl C# standard collections invalidate the numerator when the structure changes.
Use a for loop instead of a foreach, and then you can modify it. I wouldn't advise it though....