Parallel.For(0, someStringArray.Count, (i) =>
{
someStringArray[i] = someStringArray[i].Trim();
});
I'm sure that only reading through a collection with Parallel.For is considered thread-safe.
EDIT: The array is not being accessed through other parts of code.
Is this particular Parallel.For() loop thread-safe?
Depending on the implementation of the type of someStringArray, it is as long as you are not modiying the collection itself while enumerating.
At no time are two threads modifying the same element in the collection in your example.
Generally speaking, collections are not considered thread-safe because their internal logic may fail if you for example read an item from one thread while another thread adds an item and cause the collection to resize (and items to get copied) in the middle of the read operation.
This shouldn't happen with fixed-size arrays though. List<T> uses an array under the hood but that's an implementation detail that you shouldn't really rely on in the general case.
Related
Ok, I have read Thread safe collections in .NET and Why lock Thread safe collections?.
The former question being java centered, doesn't answer my question and the answer to later question tells that I don't need to lock the collection because they are supposed to thread-safe. (which is what I thought)
Now coming to my question,
I lot of developers I see, (on github and in my organisation) have started using the new thread-safe collection. However, often they don'tremove the lock around read & write operations.
I don't understand this. Isn't a thread-safe collection ... well, thread-safe completely ?
What could be the implications involved in not locking a thread-safe collection ?
EDIT: PS: here's my case,
I have a lot of classes, and some of them have an attribute on them. Very often I need to check if a given type has that attribute or not (using reflection of course). This could be expensive on performance. So decided to create a cache using a ConcurrentDictionary<string,bool>. string being the typeName and bool specifying if it has the attribute. At First, the cache is empty, the plan was to keep on adding to it as and when required. I came across GetOrAdd() method of ConcurrentDictionary. And my question is about the same, if I should call this method without locking ?
The remarks on MSDN says:
If you call GetOrAdd simultaneously on different threads,
addValueFactory may be called multiple times, but its key/value pair
might not be added to the dictionary for every call.
You should not lock a thread safe collection, it exposes methods to update the collection that are already locked, use them as intended.
The thread safe collection may not match your needs for instance if you want to prevent modification while an enumerator is opened on the collection (the provided thread safe collections allow modifications). If that's the case you'd better use a regular collection and lock it everywhere. The internal locks of the thread safe collections aren't publicly available.
It's hard to answer about implication in not locking a thread-safe collection. You don't need to lock a thread-safe collection but you may have to lock your code that does multiple things. Hard to tell without seeing the code.
Yes the method is thread safe but it might call the AddValueFactory multiple times if you hit an Add for the same key at the same time. In the end only one of the values will be added, the others will be discarded. It might not be an issue... you'll have to check how often you may reach this situation but I think it's not common and you can live with the performance penalty in an edge case that may never occur.
You could also build your dictionnary in a static ctor or before you need it. This way, the dictionnary is filled once and you don't ever write to it. The dictionary is then read only and you don't need any lock neither a thread safe collection.
A method of a class typically changes the object from state A to state B. However, another thread may also change the state of the object during the execution of that method, potentially leaving the object in an instable state.
For instance, a list may want to check if its underlying data buffer is large enough before adding a new item:
void Add(object item)
{
int requiredSpace = Count + 1;
if (buffer.Length < requiredSpace)
{
// increase underlying buffer
}
buffer[Count] = item;
}
Now if a list has buffer space for only one more item, and two threads attempt to add an item at the same time, they may both decide that no additional buffer space is required, potentially causing an IndexOutOfRangeException on one of these threads.
Thread-safe classes ensure that this does not happen.
This does not mean that using a thread-safe class makes your code thread-safe:
int count = myConcurrentCollection.Count;
myCurrentCollection.Add(item);
count++;
if (myConcurrentCollection.Count != count)
{
// some other thread has added or removed an item
}
So although the collection is thread safe, you still need to consider thread-safety for your own code. The enumerator example Guillaume mentioned is a perfect example of where threading issues might occur.
In regards to your comment, the documentation for ConcurrentDictionary mentions:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
So yes these overloads (that take a delegate) are exceptions.
I have a question about the thread-safety of the List<T> collection.
Here is my test class:
Test t = new Test();
t.a = 100;
t.b = 20;
t.c = 10;
Then let's say 10 instance of above have been created and added to the List as below.
List<Test> tCollection = new List<Test>();
tCollection.add(t);
Later I iterate through the objects of test in tCollection.
foreach(Test t in tCollection)
{
// do calculation
}
Is adding objects to List<Test> and iteration through List<Test> is thread safe?
The answer is No.
Usual collections are all non thread-safe.
You have to use the thread-safe analogues of collections*. read more about these collections on MSDN.
The default .net collections are not thread-safe.
This means that they contain no additional code that deals with multi-threaded access, since such code would decrease their performance in single-thread scenarios.
As Alexander Galkin said in the other answer, Microsoft provides some collections that are tailor-made for multi-threaded access. However you will note that there is no ConcurrentList<T> that works exactly like a List. This is because creating a thread-safe list with with all the properties of a list such as random access, insertion and removal and good performance is practically impossible.
The closest equivalent in your case would be a System.Collections.Concurrent.ConcurrentQueue<T> or ConcurrentStack<T>.
However, there may be an easier way in your case:
Read-only access to a List is always thread-safe. You can iterate over your list, or access random elements, from as many threads as you want, provided that you do not change the list.
Adding elements to the list is not thread-safe. This will be obvious if you iterate over the list while adding elements, you will get an Exception: "Collection was modified after the enumerator was instantiated".
But even adding elements from multiple threads without iterating over the list at the same time is not safe. You will not get such an obvious error, instead it will sometimes work and sometimes not.
If you only add elements once after the program starts, then you can get by with a normal list. Just make sure that no reader threads are started yet while you add elements to the list in a single writer thread. Once you added all elements to the list, you can iterate over it from as many threads as you want.
For more complex scenarios it is usually better to switch to ConcurrentQueue or ConcurrentStack, instead of implementing your own thread-safe methods using lock.
I've already read previous questions here about ConcurrentBag but did not find an actual sample of implementation in multi-threading.
ConcurrentBag is a thread-safe bag implementation, optimized for scenarios where the same thread will be both producing and consuming data stored in the bag."
Currently this is the current usage in my code (this is simplified not actual codes):
private void MyMethod()
{
List<Product> products = GetAllProducts(); // Get list of products
ConcurrentBag<Product> myBag = new ConcurrentBag<Product>();
//products were simply added here in the ConcurrentBag to simplify the code
//actual code process each product before adding in the bag
Parallel.ForEach(
products,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
product => myBag.Add(product));
ProcessBag(myBag); // method to process each items in the concurrentbag
}
My questions:
Is this the right usage of ConcurrentBag? Is it ok to use ConcurrentBag in this kind of scenario?
For me I think a simple List<Product> and a manual lock will do better. The reason for this is that the scenario above already breaks the "same thread will be both producing and consuming data stored in the bag" rule.
Also I also found out that the ThreadLocal storage created in each thread in the parallel will still exist after the operation (even if the thread is reused is this right?) which may cause an undesired memory leak.
Am I right in this one guys? Or a simple clear or empty method to remove the items in the ConcurrentBag is enough?
This looks like an ok use of ConcurrentBag. The thread local variables are members of the bag, and will become eligible for garbage collection at the same time the bag is (clearing the contents won't release them). You are right that a simple List with a lock would suffice for your case. If the work you are doing in the loop is at all significant, the type of thread synchronization won't matter much to the overall performance. In that case, you might be more comfortable using what you are familiar with.
Another option would be to use ParallelEnumerable.Select, which matches what you are trying to do more closely. Again, any performance difference you are going to see is likely going to be negligible and there's nothing wrong with sticking with what you know.
As always, if the performance of this is critical there's no substitute for trying it and measuring.
It seems to me that bmm6o's is not correct. The ConcurrentBag instance internally contains mini-bags for each thread that adds items to it, so item insertion does not involve any thread locks, and thus all Environment.ProcessorCount threads may get into full swing without being stuck waiting and without any thread context switches. A thread sinchronization may require when iterating over the collected items, but again in the original example the iteration is done by a single thread after all insertions are done. Moreover, if the ConcurrentBag uses Interlocked techniques as the first layer of the thread synchronization, then it is possible not to involve Monitor operations at all.
On the other hand, using a usual List<T> instance and wrapping each its Add() method call with a lock keyword will hurt the performance a lot. First, due to the constant Monitor.Enter() and Monitor.Exit() calls that each require to step deep into the kernel mode and to work with Windows synchronization primitives. Secondly, sometimes occasionally one thread may be blocked by the second thread because the second thread has not finished its addition yet.
As for me, the code above is a really good example of the right usage of ConcurrentBag class.
Is this the right usage of ConcurrentBag? Is it ok to use ConcurrentBag in this kind of scenario?
No, for multiple reasons:
This is not the intended usage scenario for this collection. The ConcurrentBag<T> is intended for mixed producer-consumer scenarios, meaning that each thread is expected to add and take items from the bag. Your scenario is nothing like this. You have many threads that add items, and zero threads that take items. The main application for the ConcurrentBag<T> is for making object-pools (pools of reusable objects that are expensive to create or destroy). And given the availability of the ObjectPool<T> class in the Microsoft.Extensions.ObjectPool package, even this niche application for this collection is contested.
It doesn't preserve the insertion order. Even if preserving the insertion order is not important, getting a shuffled output makes the debugging more difficult.
It creates garbage that have to be collected by the GC. It creates one WorkStealingQueue (internal class) per thread, each containing an expandable array, so the more threads you have the more objects you allocate. Also each time it is enumerated it copies all the items in an array, and exposes an IEnumerator<T> GetEnumerator() property that is boxed on each foreach.
There are better options available, offering both better performance and better ordering behavior.
In your scenario you can store the results of the parallel execution in a simple array. Just create an array with length equal to the products.Count, switch from the Parallel.ForEach to the Parallel.For, and assign the result directly to the corresponding slot of the results array without doing any synchronization at all:
List<Product> products = GetAllProducts(); // Get list of products
Product[] results = Product[products.Count];
Parallel.For(0, products.Count,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
i => results[i] = products[i]);
ProcessResults(results);
This way you'll get the results with perfect ordering, stored in a container that has the most compact size and the fastest enumeration of all .NET collections, doing only a single object allocation.
In case you are concerned about the thread-safety of the above operation, there is nothing to worry about. Each thread writes on different slots in the results array. After the completion of the parallel execution the current thread has full visibility of all the values that are stored in the array, because the TPL includes the appropriate barriers when tasks are queued, and at the beginning/end of task execution (citation).
(I have posted more thoughts about the ConcurrentBag<T> in this answer.)
If List<T> is used with a lock around Add() method it will make threads wait and will reduce the performance gain of using Parallel.ForEach()
BlockingCollection contains only methods to add individual items. What if I want to add a collection? Should I just use foreach loop?
Why BlockingCollection doesn't contain method to add a collection? I think such method can be pretty useful.
ICollection interfaces and many of the BCL list-type classes don't have an AddRange method for some reason and it is annoying.
Yes, you'll need to foreach over the collection, you could write you own extension method if you're using it a lot.
The BlockingCollection<T> is a wrapper of an underlying collection that implements the IProducerConsumerCollection<T> interface, which by default is a ConcurrentQueue<T>. This interface has only the method TryAdd for adding elements. It doesn't have a TryAddRange. The reason is, I guess, because not all native IProducerConsumerCollection<T> implementation are equipped with AddRange functionality. The ConcurrentStack<T> does have the PushRange method, but the ConcurrentQueue<T> doesn't have something equivalent.
My understanding is that if this API existed, it would have atomic semantics. It wouldn't be just a convenience method for reducing the code needed for doing multiple non-atomic insertions. Quoting from the ConcurrentStack<T>.PushRange documentation:
When adding multiple items to the stack, using PushRange is a more efficient mechanism than using Push one item at a time. Additionally, PushRange guarantees that all of the elements will be added atomically, meaning that no other threads will be able to inject elements between the elements being pushed. Items at lower indices in the items array will be pushed before items at higher indices.
Atomicity is especially important when adding in a blocking-stack, because the item that is added can be immediately taken by another thread that is currently blocked. If I add the items A, B and C with the order C-B-A, it's because I want the A to be placed at the top of the stack, and picked by another thread first. But since the additions are not atomic, another thread could win the race and take the C before the current thread manages to add the B and the A. According to my experiments this happens rarely, but there is no way to prevent it. In case it is absolutely necessary to enforce the insertion order, there is no other option than implementing a custom BlockingStack<T> collection from scratch.
I have a main thread that populates a List<T>. Further I create a chain of objects that will execute on different threads, requiring access to the List. The original list will never be written to after it's generated. My thought was to pass the list as IEnumerable<T> to the objects executing on other threads, mainly for the reason of not allowing those implementing those objects to write to the list by mistake. In other words if the original list is guaranteed not be written to, is it safe for multiple threads to use .Where or foreach on the IEnumerable?
I am not sure if the iterator in itself is thread safe if the original collection is never changed.
IEnumerable<T> can't be modified. So what can be non thread safe with it? (If you don't modify the actual List<T>).
For non thread safety you need writing and reading operations.
"Iterator in itself" is instantiated for each foreach.
Edit: I simplified my answer a bit, but #Eric Lippert added valuable comment. IEnumerable<T> doesn't define modifying methods, but it doesn't mean that access operators are thread safe (GetEnumerator, MoveNext and etc.) Simplest example: GetEnumerator implemented as this:
Every time returns same instance of IEnumerator
Resets it's position
More sophisticated example is caching.
This is interesting point, but fortunately I don't know any standard class that has not thread-safe implementation of IEnumerable.
Each thread that calls Where or foreach gets its own enumerator - they don't share one enumerator object for the same list. So since the List isn't being modified, and since each thread is working with its own copy of an enumerator, there should be no thread safety issues.
You can see this at work in one thread - Just create a List of 10 objects, and get two enumerators from that List. Use one enumerator to enumerate through 5 items, and use the other to enumerate through 5 items. You will see that both enumerators enumerated through only the first 5 items, and that the second one did not start where the first enumerator left off.
As long as you are certain that the List will never be modified then it will be safe to read from multiple threads. This includes the use of the IEnumerator instances it provides.
This is going to be true for most collections. In fact, all collections in the BCL should be stable during enumeration. In other words, the enumerator will not modify the data structure. I can think of some obscure cases, like a splay-tree, were enumerating it might modify the structure. Again, none of the BCL collections do that.
If you are certain that the list will not be modified after creation, you should guarantee that by converting it to a ReadOnlyCollection<T>. Of course if you keep the original list that the read only collection uses you can modify it, but if you toss the original list away you're effectively making it permentantly read only.
From the Thread Safety section of the collection:
A ReadOnlyCollection can support multiple readers concurrently, as long as the collection is not modified.
So if you don't touch the original list again and stop referencing it, you can ensure that multiple threads can read it without worry (so long as you don't do anything wacky with trying to modify it again).
In other words if the original list is guaranteed not be written to, is it safe for multiple threads to use .Where or foreach on the IEnumerable?
Yes it's only a problem if the list gets mutated.
But note than IEnumerable<T> can be cast back to a list and then modified.
But there is another alternative: wrap your list into a ReadOnlyCollection<T> and pass that around. If you now throw away the original list you basically created a new immutable list.
If you are using net framework 4.5 or greater, this could be a great soulution
http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
(microsoft already implemented a thread safe enumerable)