If I have an array that can/will be accessed by multiple threads at any given point in time, what exactly causes it to be non-thread safe, and what would be the steps taken to ensure that the array would be thread safe in most situations?
I have looked extensively around on the internet and have found little to no information on this subject, everything seems to be specific scenarios (e.g. is this array, that is being accessed like this by these two threads thread-safe, and on, and on). I would really like of someone could either answer the questions I laid out at the top, or if someone could point towards a good document explaining said items.
EDIT:
After looking around on MSDN, I found the ArrayList class. When you use the synchronize method, it returns a thread-safe wrapper for a given list. When setting data in the list (i.e. list1[someNumber] = anotherNumber;) does the wrapper automatically take care of locking the list, or do you still need to lock it?
When two threads are accessing the exact same resource (e.g., not local copies, but actually the same copy of the same resource), a number of things can happen. In the most obvious scenario, if Thread #1 is accessing a resource and Thread #2 changes it mid-read, some unpredictable behavior can happen. Even with something as simple as an integer, you could have logic errors arise, so try to imagine the horrors that can result from improperly using something more complicated, like a database access class that's declared as static.
The classical way of handling this problem is to put a lock on the sensitive resources so only one thread can use it at a time. So in the above example, Thread #1 would request a lock to a resource and be granted it, then go in to read what it needs to read. Thread #2 would come along mid-read and request a lock to the resource, but be denied and told to wait because Thread #1 is using it. When Thread #1 finishes, it releases the lock and it's OK for Thread #2 to proceed.
There are other situations, but this illustrates one of the most basic problems and solutions. In C#, you may:
1) Use specific .NET objects that are managed as lockable by the framework (like Scorpion-Prince's link to SynchronizedCollection)
2) Use [MethodImpl(MethodImplOptions.Synchronized)] to dictate that a specific method that does something dangerous should only be used by one thread at a time
3) Use the lock statement to isolate specific lines of code that are doing something potentially dangerous
What approach is best is really up to your situation.
If I have an array that can/will be accessed by multiple threads at
any given point in time, what exactly causes it to be non-thread safe,
and what would be the steps taken to ensure that the array would be
thread safe in most situations?
In general terms, the fact that the array is not thread-safe is the notion that two or more threads could be modifying the contents of the array if you do not synchronize access to it.
Speaking generally, for example, let's suppose you have thread 1 doing this work:
for (int i = 0; i < array.Length; i++)
{
array[i] = "Hello";
}
And thread 2 doing this work (on the same shared array)
for (int i = 0; i < array.Length; i++)
{
array[i] = "Goodbye";
}
There isn't anything synchronizing the threads so your results will depend on which thread wins the race first. It could be "Hello" or "Goodbye", in some random order, but will always be at least 'Hello' or 'Goodbye'.
The actual write of the string 'Hello' or 'Goodbye' is guaranteed by the CLR to be atomic. That is to say, the writing of the value 'Hello' cannot be interrupted by a thread trying to write 'Goodbye'. One must occur before or after the other, never in between.
So you need to create some kind of synchronization mechanism to prevent the arrays from stepping on each other. You can accomplish this by using a lock statement in C#.
C# 3.0 and above provide a generic collection class called SynchronizedCollection which "provides a thread-safe collection that contains objects of a type specified by the generic parameter as elements."
Array is thread safe if it is named public and static keywords - instant is not guaranteed - as the System.Array implements the ICollection interface which define some synchronize method to support synchronizing mechanism.
However, coding to enumerate through the array's item is not safe, developer should implement lock statement to make sure there is no change to the array during the array enumeration.
EX:
Array arrThreadSafe = new string[] {"We", "are", "safe"};
lock(arrThreadSafe.SyncRoot)
{
foreach (string item in arrThreadSafe)
{
Console.WriteLine(item);
}
}
Related
I would like to know if I need a lock for a situation of 2 threads, one reading and the other one writing to same Variable both.
For example: We have 2 Threads: A & B, Thread A reads variable x at time T and Thread B writes variable T at time T.
Should I may consider here some type of lock?
In my case I have the Main Thread an many other SubThreads. The Main Thread holds a List<myObj> and before starting any SubThread I create instances of myObj assigning it to the List<myObj> and passing myObj to the SubThread.
At determined moment the List has to be sorted depending on a value contained in myObj, and it can perfectly happen that the List element which is read by the Main Thread, simultaneously is written by a SubThread.
Please some suggestions.
YES
Don't even consider any alternative until you have a deep and thorough understanding of how multi-threading works. Obligatory link: http://www.albahari.com/threading/, at the very least. And unless you have very good reasons, not even then - especially with something as complex as a List.
Any time you're accessing any shared state, make sure that all the ways to share the state are synchronized.
It's possible to use lock-less synchronization, but that's a rather advanced subject, and prone to errors. If you're only updating a primitive value, Interlocked might be good enough.
However, don't forget the contracts of the objects you're working with - List's sort is only safe if the items don't change during the sorting. So before starting the sort, you need to ensure noöne is modifying anything that could change the ordering while the sort is underway.
Do you really need those sub-threads? Do they really need to update the list (/items) from a separate thread? Perhaps posting the change to the UI thread would work well enough, while avoiding those multi-threading issues?
Ok, I have read Thread safe collections in .NET and Why lock Thread safe collections?.
The former question being java centered, doesn't answer my question and the answer to later question tells that I don't need to lock the collection because they are supposed to thread-safe. (which is what I thought)
Now coming to my question,
I lot of developers I see, (on github and in my organisation) have started using the new thread-safe collection. However, often they don'tremove the lock around read & write operations.
I don't understand this. Isn't a thread-safe collection ... well, thread-safe completely ?
What could be the implications involved in not locking a thread-safe collection ?
EDIT: PS: here's my case,
I have a lot of classes, and some of them have an attribute on them. Very often I need to check if a given type has that attribute or not (using reflection of course). This could be expensive on performance. So decided to create a cache using a ConcurrentDictionary<string,bool>. string being the typeName and bool specifying if it has the attribute. At First, the cache is empty, the plan was to keep on adding to it as and when required. I came across GetOrAdd() method of ConcurrentDictionary. And my question is about the same, if I should call this method without locking ?
The remarks on MSDN says:
If you call GetOrAdd simultaneously on different threads,
addValueFactory may be called multiple times, but its key/value pair
might not be added to the dictionary for every call.
You should not lock a thread safe collection, it exposes methods to update the collection that are already locked, use them as intended.
The thread safe collection may not match your needs for instance if you want to prevent modification while an enumerator is opened on the collection (the provided thread safe collections allow modifications). If that's the case you'd better use a regular collection and lock it everywhere. The internal locks of the thread safe collections aren't publicly available.
It's hard to answer about implication in not locking a thread-safe collection. You don't need to lock a thread-safe collection but you may have to lock your code that does multiple things. Hard to tell without seeing the code.
Yes the method is thread safe but it might call the AddValueFactory multiple times if you hit an Add for the same key at the same time. In the end only one of the values will be added, the others will be discarded. It might not be an issue... you'll have to check how often you may reach this situation but I think it's not common and you can live with the performance penalty in an edge case that may never occur.
You could also build your dictionnary in a static ctor or before you need it. This way, the dictionnary is filled once and you don't ever write to it. The dictionary is then read only and you don't need any lock neither a thread safe collection.
A method of a class typically changes the object from state A to state B. However, another thread may also change the state of the object during the execution of that method, potentially leaving the object in an instable state.
For instance, a list may want to check if its underlying data buffer is large enough before adding a new item:
void Add(object item)
{
int requiredSpace = Count + 1;
if (buffer.Length < requiredSpace)
{
// increase underlying buffer
}
buffer[Count] = item;
}
Now if a list has buffer space for only one more item, and two threads attempt to add an item at the same time, they may both decide that no additional buffer space is required, potentially causing an IndexOutOfRangeException on one of these threads.
Thread-safe classes ensure that this does not happen.
This does not mean that using a thread-safe class makes your code thread-safe:
int count = myConcurrentCollection.Count;
myCurrentCollection.Add(item);
count++;
if (myConcurrentCollection.Count != count)
{
// some other thread has added or removed an item
}
So although the collection is thread safe, you still need to consider thread-safety for your own code. The enumerator example Guillaume mentioned is a perfect example of where threading issues might occur.
In regards to your comment, the documentation for ConcurrentDictionary mentions:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
So yes these overloads (that take a delegate) are exceptions.
I've already read previous questions here about ConcurrentBag but did not find an actual sample of implementation in multi-threading.
ConcurrentBag is a thread-safe bag implementation, optimized for scenarios where the same thread will be both producing and consuming data stored in the bag."
Currently this is the current usage in my code (this is simplified not actual codes):
private void MyMethod()
{
List<Product> products = GetAllProducts(); // Get list of products
ConcurrentBag<Product> myBag = new ConcurrentBag<Product>();
//products were simply added here in the ConcurrentBag to simplify the code
//actual code process each product before adding in the bag
Parallel.ForEach(
products,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
product => myBag.Add(product));
ProcessBag(myBag); // method to process each items in the concurrentbag
}
My questions:
Is this the right usage of ConcurrentBag? Is it ok to use ConcurrentBag in this kind of scenario?
For me I think a simple List<Product> and a manual lock will do better. The reason for this is that the scenario above already breaks the "same thread will be both producing and consuming data stored in the bag" rule.
Also I also found out that the ThreadLocal storage created in each thread in the parallel will still exist after the operation (even if the thread is reused is this right?) which may cause an undesired memory leak.
Am I right in this one guys? Or a simple clear or empty method to remove the items in the ConcurrentBag is enough?
This looks like an ok use of ConcurrentBag. The thread local variables are members of the bag, and will become eligible for garbage collection at the same time the bag is (clearing the contents won't release them). You are right that a simple List with a lock would suffice for your case. If the work you are doing in the loop is at all significant, the type of thread synchronization won't matter much to the overall performance. In that case, you might be more comfortable using what you are familiar with.
Another option would be to use ParallelEnumerable.Select, which matches what you are trying to do more closely. Again, any performance difference you are going to see is likely going to be negligible and there's nothing wrong with sticking with what you know.
As always, if the performance of this is critical there's no substitute for trying it and measuring.
It seems to me that bmm6o's is not correct. The ConcurrentBag instance internally contains mini-bags for each thread that adds items to it, so item insertion does not involve any thread locks, and thus all Environment.ProcessorCount threads may get into full swing without being stuck waiting and without any thread context switches. A thread sinchronization may require when iterating over the collected items, but again in the original example the iteration is done by a single thread after all insertions are done. Moreover, if the ConcurrentBag uses Interlocked techniques as the first layer of the thread synchronization, then it is possible not to involve Monitor operations at all.
On the other hand, using a usual List<T> instance and wrapping each its Add() method call with a lock keyword will hurt the performance a lot. First, due to the constant Monitor.Enter() and Monitor.Exit() calls that each require to step deep into the kernel mode and to work with Windows synchronization primitives. Secondly, sometimes occasionally one thread may be blocked by the second thread because the second thread has not finished its addition yet.
As for me, the code above is a really good example of the right usage of ConcurrentBag class.
Is this the right usage of ConcurrentBag? Is it ok to use ConcurrentBag in this kind of scenario?
No, for multiple reasons:
This is not the intended usage scenario for this collection. The ConcurrentBag<T> is intended for mixed producer-consumer scenarios, meaning that each thread is expected to add and take items from the bag. Your scenario is nothing like this. You have many threads that add items, and zero threads that take items. The main application for the ConcurrentBag<T> is for making object-pools (pools of reusable objects that are expensive to create or destroy). And given the availability of the ObjectPool<T> class in the Microsoft.Extensions.ObjectPool package, even this niche application for this collection is contested.
It doesn't preserve the insertion order. Even if preserving the insertion order is not important, getting a shuffled output makes the debugging more difficult.
It creates garbage that have to be collected by the GC. It creates one WorkStealingQueue (internal class) per thread, each containing an expandable array, so the more threads you have the more objects you allocate. Also each time it is enumerated it copies all the items in an array, and exposes an IEnumerator<T> GetEnumerator() property that is boxed on each foreach.
There are better options available, offering both better performance and better ordering behavior.
In your scenario you can store the results of the parallel execution in a simple array. Just create an array with length equal to the products.Count, switch from the Parallel.ForEach to the Parallel.For, and assign the result directly to the corresponding slot of the results array without doing any synchronization at all:
List<Product> products = GetAllProducts(); // Get list of products
Product[] results = Product[products.Count];
Parallel.For(0, products.Count,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
i => results[i] = products[i]);
ProcessResults(results);
This way you'll get the results with perfect ordering, stored in a container that has the most compact size and the fastest enumeration of all .NET collections, doing only a single object allocation.
In case you are concerned about the thread-safety of the above operation, there is nothing to worry about. Each thread writes on different slots in the results array. After the completion of the parallel execution the current thread has full visibility of all the values that are stored in the array, because the TPL includes the appropriate barriers when tasks are queued, and at the beginning/end of task execution (citation).
(I have posted more thoughts about the ConcurrentBag<T> in this answer.)
If List<T> is used with a lock around Add() method it will make threads wait and will reduce the performance gain of using Parallel.ForEach()
I have a System.Collections.Generic.List<T> to which I only ever add items in a timer callback. The timer is restarted only after the operation completes.
I have a System.Collections.Concurrent.ConcurrentQueue<T> which stores indices of added items in the list above. This store operation is also always performed in the same timer callback described above.
Is a read operation that iterates the queue and accesses the corresponding items in the list thread safe?
Sample code:
private List<Object> items;
private ConcurrentQueue<int> queue;
private Timer timer;
private void callback(object state)
{
int index = items.Count;
items.Add(new object());
if (true)//some condition here
queue.Enqueue(index);
timer.Change(TimeSpan.FromMilliseconds(500), TimeSpan.FromMilliseconds(-1));
}
//This can be called from any thread
public IEnumerable<object> AccessItems()
{
foreach (var index in queue)
{
yield return items[index];
}
}
My understanding:
Even if the list is resized when it is being indexed, I am only accessing an item that already exists, so it does not matter whether it is read from the old array or the new array. Hence this should be thread-safe.
Is a read operation that iterates the queue and accesses the corresponding items in the list thread safe?
Is it documented as being thread safe?
If no, then it is foolish to treat it as thread safe, even if it is in this implementation by accident. Thread safety should be by design.
Sharing memory across threads is a bad idea in the first place; if you don't do it then you don't have to ask whether the operation is thread safe.
If you have to do it then use a collection designed for shared memory access.
If you can't do that then use a lock. Locks are cheap if uncontended.
If you have a performance problem because your locks are contended all the time then fix that problem by changing your threading architecture rather than trying to do dangerous and foolish things like low-lock code. No one writes low-lock code correctly except for a handful of experts. (I am not one of them; I don't write low-lock code either.)
Even if the list is resized when it is being indexed, I am only accessing an item that already exists, so it does not matter whether it is read from the old array or the new array.
That's the wrong way to think about it. The right way to think about it is:
If the list is resized then the list's internal data structures are being mutated. It is possible that the internal data structure is mutated into an inconsistent form halfway through the mutation, that will be made consistent by the time the mutation is finished. Therefore my reader can see this inconsistent state from another thread, which makes the behaviour of my entire program unpredictable. It could crash, it could go into an infinite loop, it could corrupt other data structures, I don't know, because I'm running code that assumes a consistent state in a world with inconsistent state.
Big edit
The ConcurrentQueue is only safe with regard to the Enqueue(T) and T Dequeue() operations.
You're doing a foreach on it and that doesn't get synchronized at the required level.
The biggest problem in your particular case is the fact the enumerating of the Queue (which is a Collection in it's own right) might throw the wellknown "Collection has been modified" exception. Why is that the biggest problem ? Because you are adding things to the queue after you've added the corresponding objects to the list (there's also a great need for the List to be synchronized but that + the biggest problem get solved with just one "bullet"). While enumerating a collection it is not easy to swallow the fact that another thread is modifying it (even if on a microscopic level the modification is a safe - ConcurrentQueue does just that).
Therefore you absolutely need synchronize the access to the queues (and the central List while you're at it) using another means of synchronization (and by that I mean you can also forget abount ConcurrentQueue and use a simple Queue or even a List since you never Dequeue things).
So just do something like:
public void Writer(object toWrite) {
this.rwLock.EnterWriteLock();
try {
int tailIndex = this.list.Count;
this.list.Add(toWrite);
if (..condition1..)
this.queue1.Enqueue(tailIndex);
if (..condition2..)
this.queue2.Enqueue(tailIndex);
if (..condition3..)
this.queue3.Enqueue(tailIndex);
..etc..
} finally {
this.rwLock.ExitWriteLock();
}
}
and in the AccessItems:
public IEnumerable<object> AccessItems(int queueIndex) {
Queue<object> whichQueue = null;
switch (queueIndex) {
case 1: whichQueue = this.queue1; break;
case 2: whichQueue = this.queue2; break;
case 3: whichQueue = this.queue3; break;
..etc..
default: throw new NotSupportedException("Invalid queue disambiguating params");
}
List<object> results = new List<object>();
this.rwLock.EnterReadLock();
try {
foreach (var index in whichQueue)
results.Add(this.list[index]);
} finally {
this.rwLock.ExitReadLock();
}
return results;
}
And, based on my entire understanding of the cases in which your app accesses the List and the various Queues, it should be 100% safe.
End of big edit
First of all: What is this thing you call Thread-Safe ? by Eric Lippert
In your particular case, I guess the answer is no.
It is not the case that inconsistencies might arrise in the global context (the actual list).
Instead it is possible that the actual readers (who might very well "collide" with the unique writer) end up with inconsistencies in themselves (their very own Stacks meaning: local variables of all methods, parameters and also their logically isolated portion of the heap)).
The possibility of such "per-Thread" inconsistencies (the Nth thread wants to learn the number of elements in the List and finds out that value is 39404999 although in reality you only added 3 values) is enough to declare that, generally speaking that architecture is not thread-safe ( although you don't actually change the globally accessible List, simply by reading it in a flawed manner ).
I suggest you use the ReaderWriterLockSlim class.
I think you will find it fits your needs:
private ReaderWriterLockSlim rwLock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private List<Object> items;
private ConcurrentQueue<int> queue;
private Timer timer;
private void callback(object state)
{
int index = items.Count;
this.rwLock.EnterWriteLock();
try {
// in this place, right here, there can be only ONE writer
// and while the writer is between EnterWriteLock and ExitWriteLock
// there can exist no readers in the following method (between EnterReadLock
// and ExitReadLock)
// we add the item to the List
// AND do the enqueue "atomically" (as loose a term as thread-safe)
items.Add(new object());
if (true)//some condition here
queue.Enqueue(index);
} finally {
this.rwLock.ExitWriteLock();
}
timer.Change(TimeSpan.FromMilliseconds(500), TimeSpan.FromMilliseconds(-1));
}
//This can be called from any thread
public IEnumerable<object> AccessItems()
{
List<object> results = new List<object>();
this.rwLock.EnterReadLock();
try {
// in this place there can exist a thousand readers
// (doing these actions right here, between EnterReadLock and ExitReadLock)
// all at the same time, but NO writers
foreach (var index in queue)
{
this.results.Add ( this.items[index] );
}
} finally {
this.rwLock.ExitReadLock();
}
return results; // or foreach yield return you like that more :)
}
No because you are reading and writing to/from the same object concurrently. This is not documented to be safe so you can't be sure it is safe. Don't do it.
The fact that it is in fact unsafe as of .NET 4.0 means nothing, btw. Even if it was safe according to Reflector it could change anytime. You can't rely on the current version to predict future versions.
Don't try to get away with tricks like this. Why not just do it in an obviously safe way?
As a side note: Two timer callbacks can execute at the same time, so your code is doubly broken (multiple writers). Don't try to pull off tricks with threads.
It is thread-safish. The foreach statement uses the ConcurrentQueue.GetEnumerator() method. Which promises:
The enumeration represents a moment-in-time snapshot of the contents of the queue. It does not reflect any updates to the collection after GetEnumerator was called. The enumerator is safe to use concurrently with reads from and writes to the queue.
Which is another way of saying that your program isn't going to blow up randomly with an inscrutable exception message like the kind you'll get when you use the Queue class. Beware of the consequences though, implicit in this guarantee is that you may well be looking at a stale version of the queue. Your loop will not be able to see any elements that were added by another thread after your loop started executing. That kind of magic doesn't exist and is impossible to implement in a consistent way. Whether or not that makes your program misbehave is something you will have to think about and can't be guessed from the question. It is pretty rare that you can completely ignore it.
Your usage of the List<> is however utterly unsafe.
I have recently noticed that inside the collection objects contained in System.Collections.Concurrent namespace it is common to see Collection.TrySomeAction() rather then Collection.SomeAction().
What is the cause of this? I assume it has something to do with locking?
So I am wondering under what conditions could an attempt to (for example) Dequeue an item from a stack, queue, bag etc.. fail?
Collections in System.Collections.Concurrent namespace are considered to be thread-safe, so it is possible to use them to write multi-threaded programs that share data between threads.
Before .NET 4, you had to provide your own synchronization mechanisms if multiple threads might be accessing a single shared collection. You had to lock the collection each time you modified its elements. You also might need to lock the collection each time you accessed (or enumerated) it. That’s for the simplest of multi-threaded scenarios. Some applications would create background threads that delivered results to a shared collection over time. Another thread would read and process those results. You needed to implement your own message passing scheme between threads to notify each other when new results were available, and when those new results had been consumed. The classes and interfaces in System.Collections.Concurrent provide a consistent implementation for those and other common multi-threaded programming problems involving shared data across threads in lock-free way.
Try<something> has semantics - try to do that action and return operation result. DoThat semantics usually use exception thrown mechanics to indicate error which can be not efficient. As examples there they can return false,
if you try add new element, you might already have it in ConcurentDictionary;
if you try to get element from collection, it might not exists there;
if you try to update element there are can be more recent element already, so method ensures it updates only element which intended to update.
Try to read:
Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4 - best to start;
Parallel Programming in the .NET Framework;
read articles on Concurrency
Thread-safe Collections in .NET Framework 4 and Their Performance Characteristics
What do you mean with fail?
Take the following example:
var queue = new Queue<string>();
string temp = queue.Dequeue();
// do something with temp
The above code with throw an exception, since we try to dequeue from an empty queue. Now, if you use a ConcurrentQueue<T> instead:
var queue = new ConcurrentQueue<string>();
string temp;
if (queue.TryDequeue(out temp))
{
// do something with temp
}
The above code will not throw an exception. The queue will still fail to dequeue an item, but the code will not fail in the way of throwing an exception. The real use for this becomes apparent in a multithreaded environment. Code for the non-concurrent Queue<T> would typically look something like this:
lock (queueLock)
{
if (queue.Count > 0)
{
string temp = queue.Dequeue();
// do something with temp
}
}
In order to avoid race conditions, we need to use a lock to ensure that nothing happens with the queue in the time that passes from checking Count do calling Dequeue. With ConcurrentQueue<T>, we don't really need to check Count, but can instead call TryDequeue.
If you examine the types found in the Systems.Collections.Concurrent namespace you will find that many of them wrap two operations that are typically called sequentially, and that would traditionally require locking (Count followed by Dequeue in ConcurrentQueue<T>, GetOrAdd in ConcurrentDictionary<TKey, TValue> replaces sequences of calling ContainsKey, adding an item and getting it, and so on).
If there is nothing to be "dequeued", for example... This "Try-Pattern" is used commonly all across FCL and BCL elements. That has nothing to do with locking, concurrent collections are (or at least should be) mostly implemented without locks...