For example can I go:
string[] keys = new string[items.Count];
items.Keys.CopyTo(keys);
Where items is a:
Dictionary<string, MyObject>
instance that could be being modified by another thread? (Its not clear to me if when accessing the Keys property the inner workings of Dictionary iterates the collection - in which case I know it would not be thread safe).
Update: I realise my above example is not thread safe becuase the size of the Dictonary could change between the lines. however what about:
Dictionary<string, MyObject> keys = items.Keys;
string[] copyOfKeys = new string[keys.Count];
keys.Keys.CopyTo(copyOfKeys );
No, Dictionary is not thread-safe. If you need thread safety and you're using .NET 4 or higher, you should use a ConcurrentDictionary.
From MSDN:
A Dictionary can support multiple readers concurrently,
as long as the collection is not modified. Even so, enumerating
through a collection is intrinsically not a thread-safe procedure. In
the rare case where an enumeration contends with write accesses, the
collection must be locked during the entire enumeration. To allow the
collection to be accessed by multiple threads for reading and writing,
you must implement your own synchronization.
If the dictionary is being mutated on another thread it is not safe to do this. The first thing that comes to mind is that the dictionary might resize its internal buffer which surely does not play well with concurrent iteration.
If you aren't changing the keys and are only writing values this should be safe in practice.
Anyway, I wouldn't put such code into production because the risk is too high. What if some .NET patch changes the internals of Dictionary and suddenly you have a bug. There are other concerns as well.
I recommend you use ConcurrentDictionary or some other safe strategy. Why gamble with the correctness of your code?
Related
I am looking at migrating from Dictionary to ConcurrentDictionary for a multi thread environment.
Specific to my use case, a kvp would typically be <string, List<T>>
What do I need to look out for?
How do I implement successfully for thread safety?
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?
What do I need to look out for?
Depends on what you are trying to achieve :)
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?
Those should be handled by the dictionary itself in thread safe manner. With several caveats:
Functions accepting factories like ConcurrentDictionary<TKey,TValue>.GetOrAdd(TKey, Func<TKey,TValue>) are not thread safe in terms of factory invocation (i.e. dictionary does not guarantee that the factory would be invoked only one time if multiple threads try to get or add the item, for example). Quote from the docs:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary<TKey,TValue> class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
In your particular case value - List<T> is not thread safe itself so while dictionary operations will be thread safe (with the exception from the previous point), mutating operations with value itself - will not, consider using something like ConcurrentBag or switching to IReadOnlyDictionary.
Personally I would be cautions working with concurrent dictionary via explicitly implemented interfaces like IDictionary<TKey, TValue> and/or indexer (can lead to race conditions in read-update-write scenarios).
The ConcurrentDictionary<TKey,TValue> collection is surprisingly difficult to master. The pitfalls that are waiting to trap the unwary are numerous and subtle. Here are some of them:
Giving the impression that the ConcurrentDictionary<TKey,TValue> blesses with thread-safery everything it contains. That's not true. If the TValue is a mutable class, and is allowed to be mutated by multiple threads, it can be corrupted just as easily as if it wasn't contained in the dictionary.
Using the ConcurrentDictionary<TKey,TValue> with patterns familiar from the Dictionary<TKey,TValue>. Race conditions can trivially emerge. For example if (dict.Contains(x)) list = dict[x] is wrong. In a multithreaded environment it is entirely possible that the key x will be removed between the dict.Contains(x) and the list = dict[x], resulting in a KeyNotFoundException. The ConcurrentDictionary<TKey,TValue> is equiped with special atomic APIs that should be used instead of the previous chatty check-then-act pattern.
Using the Count == 0 for checking if the dictionary is empty. The Count property is very cheep for a Dictionary<TKey,TValue>, and very expensive for a ConcurrentDictionary<TKey,TValue>. The correct property to use is the IsEmpty.
Assuming that the AddOrUpdate method can be safely used for updating a mutable TValue object. This is not a correct assumption. The "Update" in the name of the method means "update the dictionary, by replacing an existing value with a new value". It doesn't mean "modify an existing value".
Assuming that enumerating a ConcurrentDictionary<TKey,TValue> will yield the entries that were stored in the dictionary at the point in time that the enumeration started. That's not true. The enumerator does not maintain a snapshot of the dictionary. The behavior of the enumerator is not documented precisely. It's not even guaranteed that a single enumeration of a ConcurrentDictionary<TKey,TValue> will yield unique keys. In case you want to do an enumeration with snapshot semantics you must first take a snapshot explicitly with the (expensive) ToArray method, and then enumerate the snapshot. You might even consider switching to an ImmutableDictionary<TKey,TValue>, which is exceptionally good at providing these semantics.
Assuming that calling extension methods on ConcurrentDictionary<TKey,TValue>s interfaces is safe. This is not the case. For example the ToArray method is safe because it's a native method of the class. The ToList is not safe because it is a LINQ extension method on the IEnumerable<KeyValuePair<TKey,TValue>> interface. This method internally first calls the Count property of the ICollection<KeyValuePair<TKey,TValue>> interface, and then the CopyTo of the same interface. In a multithread environment the Count obtained by the first operation might not be compatible with the second operation, resulting in either an ArgumentException, or a list that contains empty elements at the end.
In conclusion, migrating from a Dictionary<TKey,TValue> to a ConcurrentDictionary<TKey,TValue> is not trivial. In many scenarios sticking with the Dictionary<TKey,TValue> and adding synchronization around it might be an easier (and safer) path to thread-safety. IMHO the ConcurrentDictionary<TKey,TValue> should be considered more as a performance-optimization over a synchronized Dictionary<TKey,TValue>, than as the tool of choice when a dictionary is needed in a multithreading scenario.
Ok, I have read Thread safe collections in .NET and Why lock Thread safe collections?.
The former question being java centered, doesn't answer my question and the answer to later question tells that I don't need to lock the collection because they are supposed to thread-safe. (which is what I thought)
Now coming to my question,
I lot of developers I see, (on github and in my organisation) have started using the new thread-safe collection. However, often they don'tremove the lock around read & write operations.
I don't understand this. Isn't a thread-safe collection ... well, thread-safe completely ?
What could be the implications involved in not locking a thread-safe collection ?
EDIT: PS: here's my case,
I have a lot of classes, and some of them have an attribute on them. Very often I need to check if a given type has that attribute or not (using reflection of course). This could be expensive on performance. So decided to create a cache using a ConcurrentDictionary<string,bool>. string being the typeName and bool specifying if it has the attribute. At First, the cache is empty, the plan was to keep on adding to it as and when required. I came across GetOrAdd() method of ConcurrentDictionary. And my question is about the same, if I should call this method without locking ?
The remarks on MSDN says:
If you call GetOrAdd simultaneously on different threads,
addValueFactory may be called multiple times, but its key/value pair
might not be added to the dictionary for every call.
You should not lock a thread safe collection, it exposes methods to update the collection that are already locked, use them as intended.
The thread safe collection may not match your needs for instance if you want to prevent modification while an enumerator is opened on the collection (the provided thread safe collections allow modifications). If that's the case you'd better use a regular collection and lock it everywhere. The internal locks of the thread safe collections aren't publicly available.
It's hard to answer about implication in not locking a thread-safe collection. You don't need to lock a thread-safe collection but you may have to lock your code that does multiple things. Hard to tell without seeing the code.
Yes the method is thread safe but it might call the AddValueFactory multiple times if you hit an Add for the same key at the same time. In the end only one of the values will be added, the others will be discarded. It might not be an issue... you'll have to check how often you may reach this situation but I think it's not common and you can live with the performance penalty in an edge case that may never occur.
You could also build your dictionnary in a static ctor or before you need it. This way, the dictionnary is filled once and you don't ever write to it. The dictionary is then read only and you don't need any lock neither a thread safe collection.
A method of a class typically changes the object from state A to state B. However, another thread may also change the state of the object during the execution of that method, potentially leaving the object in an instable state.
For instance, a list may want to check if its underlying data buffer is large enough before adding a new item:
void Add(object item)
{
int requiredSpace = Count + 1;
if (buffer.Length < requiredSpace)
{
// increase underlying buffer
}
buffer[Count] = item;
}
Now if a list has buffer space for only one more item, and two threads attempt to add an item at the same time, they may both decide that no additional buffer space is required, potentially causing an IndexOutOfRangeException on one of these threads.
Thread-safe classes ensure that this does not happen.
This does not mean that using a thread-safe class makes your code thread-safe:
int count = myConcurrentCollection.Count;
myCurrentCollection.Add(item);
count++;
if (myConcurrentCollection.Count != count)
{
// some other thread has added or removed an item
}
So although the collection is thread safe, you still need to consider thread-safety for your own code. The enumerator example Guillaume mentioned is a perfect example of where threading issues might occur.
In regards to your comment, the documentation for ConcurrentDictionary mentions:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
So yes these overloads (that take a delegate) are exceptions.
I'm working on making my SortedDictionary thread safe and the thing I'm not sure about is: is it safe to have a call to add to SortedDictionary in one thread, like this:
dictionary.Add(key, value);
and simply get an item from this dictionary in another thread, like this:
variable = dictionary[key];
There is no explicit enumeration in either of those places, so it looks safe, but it would be great to be sure about it.
No, it is not safe to read and write SortedDictionary<K,V> concurrently: adding an element to a sorted dictionary may involve re-balancing of the tree, which may cause the concurrent read operation to take a wrong turn while navigating to the element of interest.
In order to fix this problem you would need to either wrap an instance of SortedDictionary<K,V> in a class that performs explicit locking, or roll your own collection compatible with the interfaces implemented by SortedDictionary<K,V>.
No. Anything that modifies the tree is not thread safe at all. The trick is to fill up the SortedDictionary in one thread, then treat it as immutable and let multiple threads read from it. (You can do this with a SortedDictionary, as stated here. I mention this because there may be a collection/dictionary/map out there somewhere that is changed when it is read, so you should always check.)
If you need to modify it once it's released into the wild, you have a problem. You need to lock it to write to it, and all the readers need to respect that lock, which means they need to lock it too, which means the readers can no longer read it simultaneously. The best way around this is usually to create a whole new SortedDictionary, then, once the new one is immutable, replace the reference to the original with a reference to the new one. (You need a volatile reference to do this right.) The readers will switch dictionaries cleanly without a problem. And the old dictionary won't go away until the last reader has finished reading and released its reference.
(There are n-readers and 1-writer locks, but you want to avoid any locking at all.)
(And keep in mind the reference to the dictionary can change suddenly if you're enumerating. Use a local variable for this rather than refering to the (volatile) reference.)
Java has a ConcurrentSkipListMap, which allows any number of simultaneous reads and writes, but I don't think there's anything like it in .NET yet. And if there is, it's going to be slower for reads than an immutable SortedDictionary anyway.
No, because it is not documented to be safe. That is the real reason. Reasoning with implementation details is not as good because they are details that you cannot rely on.
No, it is not safe to do so. If you want to implement in multithreading than you should do this
private readonly object lockObject = new object();
lock (lockObject )
{
//your dictionary operation here.
}
I have this function:
static Dictionary<int, int> KeyValueDictionary = new Dictionary<int, int>();
static void IncreaseValue(int keyId, int adjustment)
{
if (!KeyValueDictionary.ContainsKey(keyId))
{
KeyValueDictionary.Add(keyId, 0);
}
KeyValueDictionary[keyId] += adjustment;
}
Which I would have thought would not be thread safe. However, so far in testing it I have not seen any exceptions when calling it from multiple threads at the same time.
My questions: Is it thread safe or have I just been lucky so far? If it is thread safe then why?
However, so far in testing it I have not seen any exceptions when calling it from multiple threads at the same time.
Is it thread safe or have I just been lucky so far? If it is thread safe then why?
You're getting lucky. These types of bugs with threads are so easy to make because testing can you give you a false sense of security that you did things correctly.
It turns out that Dictionary<TKey, TValue> is not thread-safe when you have multiple writers. The documentation explicitly states:
A Dictionary<TKey, TValue> can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
Alternatively, use ConcurrentDictionary. However, you still must write correct code (see note below).
In addition to the lack of thread-safety with Dictionary<TKey, TValue> which you've been lucky to avoid, your code is dangerously flawed. Here's how you can get a bug with your code:
static void IncreaseValue(int keyId, int adjustment) {
if (!KeyValueDictionary.ContainsKey(keyId)) {
// A
KeyValueDictionary.Add(keyId, 0);
}
KeyValueDictionary[keyId] += adjustment;
}
Dictionary is empty.
Thread 1 enters the method with keyId = 17. As the Dictionary is empty, the conditional in the if returns true and thread 1 reaches the line of code marked A.
Thread 1 is paused and thread 2 enters the method with keyId = 17. As the Dictionary is empty, the conditional in the if returns true and thread 2 reaches the line of code marked A.
Thread 2 is paused and thread 1 resumes. Now thread 1 adds (17, 0) to the dictionary.
Thread 1 is paused and now thread 2 resumes. Now thread 2 tries to add (17, 0) to the dictionary. An exception is thrown because of a key violation.
There are other scenarios in which an exception can occur. For example, thread 1 could be paused while loading the value of KeyValueDictionary[keyId] (say it loads keyId = 17, and obtains the value 42), thread 2 could come in and modify the value (say it loads keyId = 17, adds the adjustment 27), and now thread 1 resumes and adds its adjustment to the value it loaded (in particular, it doesn't see the modification that thread 2 made to the value associated with keyId = 17!).
Note that even using a ConcurrentDictionary<TKey, TValue> could lead to the above bugs! Your code is NOT safe for reasons not related to the thread-safety or lack thereof for Dictionary<TKey, TValue>.
To get your code to be thread-safe with a concurrent dictionary, you'll have to say:
KeyValueDictionary.AddOrUpdate(keyId, adjustment, (key, value) => value + adjustment);
Here we are using ConcurrentDictionary.AddOrUpdate.
It's not thread safe, but does not check and so probably doesn't notice silent corruption.
It will appear to be thread safe for a long time because only when it needs to rehash() does it have even a chance of exception. Otherwise, it just corrupts data.
The .NET library has a thread safe dictionary, the ConcurrentDictionary<TKey, TValue> http://msdn.microsoft.com/en-us/library/dd287191.aspx
Updated: I didn't exactly answer the question, so here's updated with more answery to exact question posed.
As per the MSDN:http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
A Dictionary can support multiple readers concurrently,
as long as the collection is not modified. Even so, enumerating
through a collection is intrinsically not a thread-safe procedure. In
the rare case where an enumeration contends with write accesses, the
collection must be locked during the entire enumeration. To allow the
collection to be accessed by multiple threads for reading and writing,
you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary.
Public static (Shared in Visual Basic) members of this type are thread
safe.
You've just been lucky so far. It's not thread-safe.
From the Dictionary<K,V> documentation...
A Dictionary<TKey, TValue> can support multiple readers
concurrently, as long as the collection is not modified. Even so,
enumerating through a collection is intrinsically not a thread-safe
procedure. In the rare case where an enumeration contends with write
accesses, the collection must be locked during the entire enumeration.
To allow the collection to be accessed by multiple threads for reading
and writing, you must implement your own synchronization.
In the context of this statement,
A Dictionary can support
multiple readers concurrently, as long
as the collection is not modified.
Even so, enumerating through a
collection is intrinsically not a
thread-safe procedure. In the rare
case where an enumeration contends
with write accesses, the collection
must be locked during the entire
enumeration. To allow the collection
to be accessed by multiple threads for
reading and writing, you must
implement your own synchronization.
What does read and write mean? My understanding is that a read is an operation which looks up a key and provides a reference to it's value and a write is an operation which adds or removes a key value pair from the dictionary. However, I can't find anything conclusive that regarding this.
So the big question is, while implementing a thread safe dictionary, would an operation that updates the value for an existing key in the dictionary be consider a reader or writer? I plan to have multiple threads accessing unique keys in a dictionary and modifying their values, but the threads will not add/remove new keys.
The obvious implication, assuming modifying an existing value is not a write operation on the dictionary, is that my implementation of the thread safe dictionay can be a lot more efficient, as I would not need to get an exclusive lock every time I try to update the value to an existing key.
Usage of ConcurrentDictionary from .Net 4.0 is not an option.
A major point not yet mentioned is that if TValue is a class type, the things held by a Dictionary<TKey,TValue> will be the identities of TValue objects. If one receives a reference from the dictionary, the dictionary will neither know nor care about anything one may do with the object referred to thereby.
One useful little utility class in cases where all of the keys associated with a dictionary will be known in advance of code that needs to use it is:
class MutableValueHolder<T>
{
public T Value;
}
If one wants to have multi-threaded code count how many times various strings appear in a bunch of files, and one knows in advance all the strings of interest, one may then use something like a Dictionary<string, MutableValueHolder<int>> for that purpose. Once the dictionary is loaded with all the proper strings and a MutableValueHolder<int> instance for each one, then any number of threads may retrieve references to MutableValueHolder<int> objects, and use Threading.Interlocked.Increment or other such methods to modify the Value associated with each one, without having to write to the Dictionary at all.
overwriting an existing value should be treated as a write operation
Anything that can affect the results of another read should be considered a write.
Changing a key is most definitly a write since it will cause the item to move in the internal hash or index or however dictionaries do their O(log(n)) stuff...
What you might want to do is look at ReaderWriterLock
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlock.aspx
Updating a value is conceptually a write operation. When updating a value with concurrent access where a read is performed before a write is completed, you read out an old value. When two writes conflict the wrong value may be stored.
Adding a new value could trigger a grow of the underlying storage. In this case new memory is allocated, all elements are copied into the new memory, the new element is added, the dictionary object is updated to refer to the new memory location for storage and the old memory is released and available for garbage collection. During this time, more writes could cause a big problem. Two writes at the same time could trigger two instances of this memory copying. If you follow through the logic, you'll see an element will get lost since only the last thread to update the reference will know about existing items and not the other items that were trying to be added.
ICollection provides a member to synchronize access and the reference remains valid across grow/shrink operations.
A read operation is anything that gets a key or value from a Dictionary, a write operation is anything that updates or adds a key or a value. So a process updating a key would be considered to be a writer.
A simple way to make a thread safe dictionary is to create your own implementation of IDictionary that simply locks a mutex and then forwards the call to an implementation:
public class MyThreadSafeDictionary<T, J> : IDictionary<T, J>
{
private object mutex = new object();
private IDictionary<T, J> impl;
public MyThreadSafeDictionary(IDictionary<T, J> impl)
{
this.impl = impl;
}
public void Add(T key, J value)
{
lock(mutex) {
impl.Add(key, value);
}
}
// implement the other methods as for Add
}
You could replace the mutex with a reader-writer lock if you are having some threads only read the dictionary.
Also note that Dictionary objects don't support changing keys; the only safe way to achieve want you want is to remove the existing key/value pair and add a new one with the updated key.
Modifying a value is a write and introduces a race condition.
Let's say the original value of mydict[5] = 42.
One thread updates mydict[5] to be 112.
Another thread updates mydict[5] to be 837.
What should the value of mydict[5] be at the end? The order of the threads is important in this case, meaning either you need to make sure the order is explicit or that they don't write.