Is it safe to iterate over an unchanging dictionary from multiple threads? - c#

I've got code like this that's executed from many threads simultaneously (over shared a and b objects of type Dictionary<int, double>):
foreach (var key in a.Keys.Union(b.Keys)) {
dist += Math.Pow(b[key] - a[key], 2);
}
The dictionaries don't change during the lifetime of the threads. Is this safe? So far, it seems OK, but I wanted to be sure.

From the dictionary documentation:
A Dictionary can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
As long as you're never writing, it should be safe.

Only if You Guarantee No Writes Occur
A Dictionary(Of TKey, TValue) can support multiple readers
concurrently, as long as the collection is not modified. Even so,
enumerating through a collection is intrinsically not a thread-safe
procedure. In the rare case where an enumeration contends with write
accesses, the collection must be locked during the entire enumeration.
To allow the collection to be accessed by multiple threads for reading
and writing, you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary(Of TKey,
TValue).
Public static (Shared in Visual Basic) members of this type are thread
safe.
Sources
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
http://msdn.microsoft.com/en-us/library/dd287191.aspx

Related

Migrating from Dictionary to ConcurrentDictionary, what are the common traps that I should be aware of?

I am looking at migrating from Dictionary to ConcurrentDictionary for a multi thread environment.
Specific to my use case, a kvp would typically be <string, List<T>>
What do I need to look out for?
How do I implement successfully for thread safety?
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?
What do I need to look out for?
Depends on what you are trying to achieve :)
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?
Those should be handled by the dictionary itself in thread safe manner. With several caveats:
Functions accepting factories like ConcurrentDictionary<TKey,TValue>.GetOrAdd(TKey, Func<TKey,TValue>) are not thread safe in terms of factory invocation (i.e. dictionary does not guarantee that the factory would be invoked only one time if multiple threads try to get or add the item, for example). Quote from the docs:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary<TKey,TValue> class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
In your particular case value - List<T> is not thread safe itself so while dictionary operations will be thread safe (with the exception from the previous point), mutating operations with value itself - will not, consider using something like ConcurrentBag or switching to IReadOnlyDictionary.
Personally I would be cautions working with concurrent dictionary via explicitly implemented interfaces like IDictionary<TKey, TValue> and/or indexer (can lead to race conditions in read-update-write scenarios).
The ConcurrentDictionary<TKey,TValue> collection is surprisingly difficult to master. The pitfalls that are waiting to trap the unwary are numerous and subtle. Here are some of them:
Giving the impression that the ConcurrentDictionary<TKey,TValue> blesses with thread-safery everything it contains. That's not true. If the TValue is a mutable class, and is allowed to be mutated by multiple threads, it can be corrupted just as easily as if it wasn't contained in the dictionary.
Using the ConcurrentDictionary<TKey,TValue> with patterns familiar from the Dictionary<TKey,TValue>. Race conditions can trivially emerge. For example if (dict.Contains(x)) list = dict[x] is wrong. In a multithreaded environment it is entirely possible that the key x will be removed between the dict.Contains(x) and the list = dict[x], resulting in a KeyNotFoundException. The ConcurrentDictionary<TKey,TValue> is equiped with special atomic APIs that should be used instead of the previous chatty check-then-act pattern.
Using the Count == 0 for checking if the dictionary is empty. The Count property is very cheep for a Dictionary<TKey,TValue>, and very expensive for a ConcurrentDictionary<TKey,TValue>. The correct property to use is the IsEmpty.
Assuming that the AddOrUpdate method can be safely used for updating a mutable TValue object. This is not a correct assumption. The "Update" in the name of the method means "update the dictionary, by replacing an existing value with a new value". It doesn't mean "modify an existing value".
Assuming that enumerating a ConcurrentDictionary<TKey,TValue> will yield the entries that were stored in the dictionary at the point in time that the enumeration started. That's not true. The enumerator does not maintain a snapshot of the dictionary. The behavior of the enumerator is not documented precisely. It's not even guaranteed that a single enumeration of a ConcurrentDictionary<TKey,TValue> will yield unique keys. In case you want to do an enumeration with snapshot semantics you must first take a snapshot explicitly with the (expensive) ToArray method, and then enumerate the snapshot. You might even consider switching to an ImmutableDictionary<TKey,TValue>, which is exceptionally good at providing these semantics.
Assuming that calling extension methods on ConcurrentDictionary<TKey,TValue>s interfaces is safe. This is not the case. For example the ToArray method is safe because it's a native method of the class. The ToList is not safe because it is a LINQ extension method on the IEnumerable<KeyValuePair<TKey,TValue>> interface. This method internally first calls the Count property of the ICollection<KeyValuePair<TKey,TValue>> interface, and then the CopyTo of the same interface. In a multithread environment the Count obtained by the first operation might not be compatible with the second operation, resulting in either an ArgumentException, or a list that contains empty elements at the end.
In conclusion, migrating from a Dictionary<TKey,TValue> to a ConcurrentDictionary<TKey,TValue> is not trivial. In many scenarios sticking with the Dictionary<TKey,TValue> and adding synchronization around it might be an easier (and safer) path to thread-safety. IMHO the ConcurrentDictionary<TKey,TValue> should be considered more as a performance-optimization over a synchronized Dictionary<TKey,TValue>, than as the tool of choice when a dictionary is needed in a multithreading scenario.

When to lock a thread-safe collection in .net ? ( & when not to lock ? )

Ok, I have read Thread safe collections in .NET and Why lock Thread safe collections?.
The former question being java centered, doesn't answer my question and the answer to later question tells that I don't need to lock the collection because they are supposed to thread-safe. (which is what I thought)
Now coming to my question,
I lot of developers I see, (on github and in my organisation) have started using the new thread-safe collection. However, often they don'tremove the lock around read & write operations.
I don't understand this. Isn't a thread-safe collection ... well, thread-safe completely ?
What could be the implications involved in not locking a thread-safe collection ?
EDIT: PS: here's my case,
I have a lot of classes, and some of them have an attribute on them. Very often I need to check if a given type has that attribute or not (using reflection of course). This could be expensive on performance. So decided to create a cache using a ConcurrentDictionary<string,bool>. string being the typeName and bool specifying if it has the attribute. At First, the cache is empty, the plan was to keep on adding to it as and when required. I came across GetOrAdd() method of ConcurrentDictionary. And my question is about the same, if I should call this method without locking ?
The remarks on MSDN says:
If you call GetOrAdd simultaneously on different threads,
addValueFactory may be called multiple times, but its key/value pair
might not be added to the dictionary for every call.
You should not lock a thread safe collection, it exposes methods to update the collection that are already locked, use them as intended.
The thread safe collection may not match your needs for instance if you want to prevent modification while an enumerator is opened on the collection (the provided thread safe collections allow modifications). If that's the case you'd better use a regular collection and lock it everywhere. The internal locks of the thread safe collections aren't publicly available.
It's hard to answer about implication in not locking a thread-safe collection. You don't need to lock a thread-safe collection but you may have to lock your code that does multiple things. Hard to tell without seeing the code.
Yes the method is thread safe but it might call the AddValueFactory multiple times if you hit an Add for the same key at the same time. In the end only one of the values will be added, the others will be discarded. It might not be an issue... you'll have to check how often you may reach this situation but I think it's not common and you can live with the performance penalty in an edge case that may never occur.
You could also build your dictionnary in a static ctor or before you need it. This way, the dictionnary is filled once and you don't ever write to it. The dictionary is then read only and you don't need any lock neither a thread safe collection.
A method of a class typically changes the object from state A to state B. However, another thread may also change the state of the object during the execution of that method, potentially leaving the object in an instable state.
For instance, a list may want to check if its underlying data buffer is large enough before adding a new item:
void Add(object item)
{
int requiredSpace = Count + 1;
if (buffer.Length < requiredSpace)
{
// increase underlying buffer
}
buffer[Count] = item;
}
Now if a list has buffer space for only one more item, and two threads attempt to add an item at the same time, they may both decide that no additional buffer space is required, potentially causing an IndexOutOfRangeException on one of these threads.
Thread-safe classes ensure that this does not happen.
This does not mean that using a thread-safe class makes your code thread-safe:
int count = myConcurrentCollection.Count;
myCurrentCollection.Add(item);
count++;
if (myConcurrentCollection.Count != count)
{
// some other thread has added or removed an item
}
So although the collection is thread safe, you still need to consider thread-safety for your own code. The enumerator example Guillaume mentioned is a perfect example of where threading issues might occur.
In regards to your comment, the documentation for ConcurrentDictionary mentions:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
So yes these overloads (that take a delegate) are exceptions.

are c# primitive arrays volatile?

I declare array like that private double[] array = new double[length]. Is it safe to update this array items in one thread and read in another thread? Will I have up to date value?
Note i do not enumerate array. I only access its items by index.
Arrays are not thread safe, from MSDN:
Enumerating through a collection is intrinsically not a thread-safe
procedure. Even when a collection is synchronized, other threads can
still modify the collection, which causes the enumerator to throw an
exception. To guarantee thread safety during enumeration, you can
either lock the collection during the entire enumeration or catch the
exceptions resulting from changes made by other threads.
If you only update single items at a time I think that you will be safe though, but I would not trust it unless I found documentation that proves it.
Volatile does not guarantee freshness of a value. It prevents some optimizations, but does not guarantee thread synchronization.
Double is not guaranted to be updated atomically. So updating/reading arrays of doubles without synchronization will not be thread safe at all wit or without volatile as you may read partially written values.
No, they are not. You should design your locking system with semaphores or other methods to ensure thread safety. You can check producer/consumer problem.

Is accessing the Dictionary<TKey, TValue> Keys property thread safe?

For example can I go:
string[] keys = new string[items.Count];
items.Keys.CopyTo(keys);
Where items is a:
Dictionary<string, MyObject>
instance that could be being modified by another thread? (Its not clear to me if when accessing the Keys property the inner workings of Dictionary iterates the collection - in which case I know it would not be thread safe).
Update: I realise my above example is not thread safe becuase the size of the Dictonary could change between the lines. however what about:
Dictionary<string, MyObject> keys = items.Keys;
string[] copyOfKeys = new string[keys.Count];
keys.Keys.CopyTo(copyOfKeys );
No, Dictionary is not thread-safe. If you need thread safety and you're using .NET 4 or higher, you should use a ConcurrentDictionary.
From MSDN:
A Dictionary can support multiple readers concurrently,
as long as the collection is not modified. Even so, enumerating
through a collection is intrinsically not a thread-safe procedure. In
the rare case where an enumeration contends with write accesses, the
collection must be locked during the entire enumeration. To allow the
collection to be accessed by multiple threads for reading and writing,
you must implement your own synchronization.
If the dictionary is being mutated on another thread it is not safe to do this. The first thing that comes to mind is that the dictionary might resize its internal buffer which surely does not play well with concurrent iteration.
If you aren't changing the keys and are only writing values this should be safe in practice.
Anyway, I wouldn't put such code into production because the risk is too high. What if some .NET patch changes the internals of Dictionary and suddenly you have a bug. There are other concerns as well.
I recommend you use ConcurrentDictionary or some other safe strategy. Why gamble with the correctness of your code?

Thread safety with Dictionary<int,int> in .Net

I have this function:
static Dictionary<int, int> KeyValueDictionary = new Dictionary<int, int>();
static void IncreaseValue(int keyId, int adjustment)
{
if (!KeyValueDictionary.ContainsKey(keyId))
{
KeyValueDictionary.Add(keyId, 0);
}
KeyValueDictionary[keyId] += adjustment;
}
Which I would have thought would not be thread safe. However, so far in testing it I have not seen any exceptions when calling it from multiple threads at the same time.
My questions: Is it thread safe or have I just been lucky so far? If it is thread safe then why?
However, so far in testing it I have not seen any exceptions when calling it from multiple threads at the same time.
Is it thread safe or have I just been lucky so far? If it is thread safe then why?
You're getting lucky. These types of bugs with threads are so easy to make because testing can you give you a false sense of security that you did things correctly.
It turns out that Dictionary<TKey, TValue> is not thread-safe when you have multiple writers. The documentation explicitly states:
A Dictionary<TKey, TValue> can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
Alternatively, use ConcurrentDictionary. However, you still must write correct code (see note below).
In addition to the lack of thread-safety with Dictionary<TKey, TValue> which you've been lucky to avoid, your code is dangerously flawed. Here's how you can get a bug with your code:
static void IncreaseValue(int keyId, int adjustment) {
if (!KeyValueDictionary.ContainsKey(keyId)) {
// A
KeyValueDictionary.Add(keyId, 0);
}
KeyValueDictionary[keyId] += adjustment;
}
Dictionary is empty.
Thread 1 enters the method with keyId = 17. As the Dictionary is empty, the conditional in the if returns true and thread 1 reaches the line of code marked A.
Thread 1 is paused and thread 2 enters the method with keyId = 17. As the Dictionary is empty, the conditional in the if returns true and thread 2 reaches the line of code marked A.
Thread 2 is paused and thread 1 resumes. Now thread 1 adds (17, 0) to the dictionary.
Thread 1 is paused and now thread 2 resumes. Now thread 2 tries to add (17, 0) to the dictionary. An exception is thrown because of a key violation.
There are other scenarios in which an exception can occur. For example, thread 1 could be paused while loading the value of KeyValueDictionary[keyId] (say it loads keyId = 17, and obtains the value 42), thread 2 could come in and modify the value (say it loads keyId = 17, adds the adjustment 27), and now thread 1 resumes and adds its adjustment to the value it loaded (in particular, it doesn't see the modification that thread 2 made to the value associated with keyId = 17!).
Note that even using a ConcurrentDictionary<TKey, TValue> could lead to the above bugs! Your code is NOT safe for reasons not related to the thread-safety or lack thereof for Dictionary<TKey, TValue>.
To get your code to be thread-safe with a concurrent dictionary, you'll have to say:
KeyValueDictionary.AddOrUpdate(keyId, adjustment, (key, value) => value + adjustment);
Here we are using ConcurrentDictionary.AddOrUpdate.
It's not thread safe, but does not check and so probably doesn't notice silent corruption.
It will appear to be thread safe for a long time because only when it needs to rehash() does it have even a chance of exception. Otherwise, it just corrupts data.
The .NET library has a thread safe dictionary, the ConcurrentDictionary<TKey, TValue> http://msdn.microsoft.com/en-us/library/dd287191.aspx
Updated: I didn't exactly answer the question, so here's updated with more answery to exact question posed.
As per the MSDN:http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
A Dictionary can support multiple readers concurrently,
as long as the collection is not modified. Even so, enumerating
through a collection is intrinsically not a thread-safe procedure. In
the rare case where an enumeration contends with write accesses, the
collection must be locked during the entire enumeration. To allow the
collection to be accessed by multiple threads for reading and writing,
you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary.
Public static (Shared in Visual Basic) members of this type are thread
safe.
You've just been lucky so far. It's not thread-safe.
From the Dictionary<K,V> documentation...
A Dictionary<TKey, TValue> can support multiple readers
concurrently, as long as the collection is not modified. Even so,
enumerating through a collection is intrinsically not a thread-safe
procedure. In the rare case where an enumeration contends with write
accesses, the collection must be locked during the entire enumeration.
To allow the collection to be accessed by multiple threads for reading
and writing, you must implement your own synchronization.

Categories