Clarification of Read and Write on a C# Dictionary - c#

In the context of this statement,
A Dictionary can support
multiple readers concurrently, as long
as the collection is not modified.
Even so, enumerating through a
collection is intrinsically not a
thread-safe procedure. In the rare
case where an enumeration contends
with write accesses, the collection
must be locked during the entire
enumeration. To allow the collection
to be accessed by multiple threads for
reading and writing, you must
implement your own synchronization.
What does read and write mean? My understanding is that a read is an operation which looks up a key and provides a reference to it's value and a write is an operation which adds or removes a key value pair from the dictionary. However, I can't find anything conclusive that regarding this.
So the big question is, while implementing a thread safe dictionary, would an operation that updates the value for an existing key in the dictionary be consider a reader or writer? I plan to have multiple threads accessing unique keys in a dictionary and modifying their values, but the threads will not add/remove new keys.
The obvious implication, assuming modifying an existing value is not a write operation on the dictionary, is that my implementation of the thread safe dictionay can be a lot more efficient, as I would not need to get an exclusive lock every time I try to update the value to an existing key.
Usage of ConcurrentDictionary from .Net 4.0 is not an option.

A major point not yet mentioned is that if TValue is a class type, the things held by a Dictionary<TKey,TValue> will be the identities of TValue objects. If one receives a reference from the dictionary, the dictionary will neither know nor care about anything one may do with the object referred to thereby.
One useful little utility class in cases where all of the keys associated with a dictionary will be known in advance of code that needs to use it is:
class MutableValueHolder<T>
{
public T Value;
}
If one wants to have multi-threaded code count how many times various strings appear in a bunch of files, and one knows in advance all the strings of interest, one may then use something like a Dictionary<string, MutableValueHolder<int>> for that purpose. Once the dictionary is loaded with all the proper strings and a MutableValueHolder<int> instance for each one, then any number of threads may retrieve references to MutableValueHolder<int> objects, and use Threading.Interlocked.Increment or other such methods to modify the Value associated with each one, without having to write to the Dictionary at all.

overwriting an existing value should be treated as a write operation

Anything that can affect the results of another read should be considered a write.
Changing a key is most definitly a write since it will cause the item to move in the internal hash or index or however dictionaries do their O(log(n)) stuff...
What you might want to do is look at ReaderWriterLock
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlock.aspx

Updating a value is conceptually a write operation. When updating a value with concurrent access where a read is performed before a write is completed, you read out an old value. When two writes conflict the wrong value may be stored.
Adding a new value could trigger a grow of the underlying storage. In this case new memory is allocated, all elements are copied into the new memory, the new element is added, the dictionary object is updated to refer to the new memory location for storage and the old memory is released and available for garbage collection. During this time, more writes could cause a big problem. Two writes at the same time could trigger two instances of this memory copying. If you follow through the logic, you'll see an element will get lost since only the last thread to update the reference will know about existing items and not the other items that were trying to be added.
ICollection provides a member to synchronize access and the reference remains valid across grow/shrink operations.

A read operation is anything that gets a key or value from a Dictionary, a write operation is anything that updates or adds a key or a value. So a process updating a key would be considered to be a writer.
A simple way to make a thread safe dictionary is to create your own implementation of IDictionary that simply locks a mutex and then forwards the call to an implementation:
public class MyThreadSafeDictionary<T, J> : IDictionary<T, J>
{
private object mutex = new object();
private IDictionary<T, J> impl;
public MyThreadSafeDictionary(IDictionary<T, J> impl)
{
this.impl = impl;
}
public void Add(T key, J value)
{
lock(mutex) {
impl.Add(key, value);
}
}
// implement the other methods as for Add
}
You could replace the mutex with a reader-writer lock if you are having some threads only read the dictionary.
Also note that Dictionary objects don't support changing keys; the only safe way to achieve want you want is to remove the existing key/value pair and add a new one with the updated key.

Modifying a value is a write and introduces a race condition.
Let's say the original value of mydict[5] = 42.
One thread updates mydict[5] to be 112.
Another thread updates mydict[5] to be 837.
What should the value of mydict[5] be at the end? The order of the threads is important in this case, meaning either you need to make sure the order is explicit or that they don't write.

Related

C# Is it threadsafe to set dictionary object from one thread, get from other thread

Suppose we have some strange class which contains
public property ReadOnlyDictionary<string, string> Map {get; private set;}
method Update which re-sets Map dictionary when called.
Is it threadsafe to call method Update from one thread and get Map from other thread?
public class Example
{
public ReadOnlyDictionary<string, string> Map{ get; private set; }
public void Update(IEnumerable<KeyValuePair<string, string>> items)
{
Map = items
.ToLookup(x => x.Key)
.ToDictionary(x => x.Key, x => x.First())
.ToReadOnlyDictionary(); // helper method
}
}
Is it possible that there will be the moment when we call Map getter and it returns null or something in bad state, because of setter execution is not atomic?
"Thread safe" is a vague term. Wikipedia defines thread-safe code as:
Thread-safe code only manipulates shared data structures in a manner
that ensures that all threads behave properly and fulfill their design
specifications without unintended interaction
So we cannot say that certain piece of code is "thread safe" without knowing that design specification, which includes how this code is actually used.
This code is "thread-safe" in a sense that it's not possible for any thread to observe Example object in a corrupted or partial state.
Reference assignment is atomic, so thread that reads Map can only read either one vesrion of it or another, it cannot observe partial or intermediate state of Map. Because both keys and values of Map are immutable (strings) - the whole object is immutable and can safely be used by multiple threads after it was read from Map property.
However, doing something like this:
var example = GetExample();
if (example.Map.ContainsKey("key")) {
var key = example.Map["key"];
}
Is of course not thread safe, because Map object, as a whole, might have changed between check and read from dictionary, in a way that old version contained "key" key, but new version does not. Doing this on the other hand:
var example = GetExample();
var map = example.Map;
if (map.ContainsKey("key")) {
var key = map["key"];
}
is fine. Any code that does not rely on example.Map to stay the same between reads should be "safe".
So you have to ensure that given code is "thread safe" in your particular use case.
Note that using ConcurrentDictionary is absolutely useless here. Since your dictionary is readonly - it's already safe to read it from multiple threads.
Update: valid point raised in a comment is that ReadOnlyDictionary by itself does not guarantee that underlying data is not modified, because ReadOnlyDictionary is just a wrapper around regular mutable Dictionary.
However, in this case you create ReadOnlyDictionary in Update method from non-shared instance of dictionary (received from ToDictionary), which is not used for anything else except wrapping it in ReadOnlyDictionary. This is a safe usage, because there is no code which has access to mutable underlying dictionary except ReadOnlyDictionary itself, and read only dictionary prevents its modification. Reading from even regular Dictionary is thread safe as long as there are no writes.
because of setter execution is not atomic
The safety of this code is not about atomicity (read/write a reference is an atomic operation in .NET, unless you screw up alignment).
So, the only issue in this code is memory reordering. It might happen that another thread will get the reference to partially constructed object.
But it may happen only on hardware with weak memory model ('weak' means that a lot of reorderings are allowed and there is not many restrictions).
To prevent this you just have to publish this reference via Volatile.Write, which is nop for x86 and x86_64 architectures and a kind of memory fence for ARM and other.
Concurrent Collections = Thread Safe :)
I think you should use Concurrent collection. Concurrent Generics make sense for me.
To make your code simpler and less bug-prone you could use ImmutableDictionary:
public class Example
{
public IReadOnlyDictionary<string, string> Map{ get; private set; } =
ImmutableDictionary.Create<string, string>();
public void Update(IEnumerable<KeyValuePair<string, string>> items)
{
Map = items
.ToImmutableDictionary();
}
}
ImmutableDictionary is guaranteed to be thread safe so there is no way of breaking it. Usual Dictionary is not thread safe however in your implementation there should not be any thread safety issues but this requires this special usage.
ImmutableDictionary can be downloaded here https://www.nuget.org/packages/System.Collections.Immutable/
Another thread-safe dictionary is ConcurrentDictionary. Using it may not necessarily provide additional cost because read operation are lock-free. The docs states:
Read operations on the dictionary are performed in a lock-free manner.
So those two dictionaries are thread-safe in any case (on dictionary API level) and Dictionary is thread-safe in some implementations like yours.

Is it safe to call Add on SortedDictionary in one thread and get Item in another?

I'm working on making my SortedDictionary thread safe and the thing I'm not sure about is: is it safe to have a call to add to SortedDictionary in one thread, like this:
dictionary.Add(key, value);
and simply get an item from this dictionary in another thread, like this:
variable = dictionary[key];
There is no explicit enumeration in either of those places, so it looks safe, but it would be great to be sure about it.
No, it is not safe to read and write SortedDictionary<K,V> concurrently: adding an element to a sorted dictionary may involve re-balancing of the tree, which may cause the concurrent read operation to take a wrong turn while navigating to the element of interest.
In order to fix this problem you would need to either wrap an instance of SortedDictionary<K,V> in a class that performs explicit locking, or roll your own collection compatible with the interfaces implemented by SortedDictionary<K,V>.
No. Anything that modifies the tree is not thread safe at all. The trick is to fill up the SortedDictionary in one thread, then treat it as immutable and let multiple threads read from it. (You can do this with a SortedDictionary, as stated here. I mention this because there may be a collection/dictionary/map out there somewhere that is changed when it is read, so you should always check.)
If you need to modify it once it's released into the wild, you have a problem. You need to lock it to write to it, and all the readers need to respect that lock, which means they need to lock it too, which means the readers can no longer read it simultaneously. The best way around this is usually to create a whole new SortedDictionary, then, once the new one is immutable, replace the reference to the original with a reference to the new one. (You need a volatile reference to do this right.) The readers will switch dictionaries cleanly without a problem. And the old dictionary won't go away until the last reader has finished reading and released its reference.
(There are n-readers and 1-writer locks, but you want to avoid any locking at all.)
(And keep in mind the reference to the dictionary can change suddenly if you're enumerating. Use a local variable for this rather than refering to the (volatile) reference.)
Java has a ConcurrentSkipListMap, which allows any number of simultaneous reads and writes, but I don't think there's anything like it in .NET yet. And if there is, it's going to be slower for reads than an immutable SortedDictionary anyway.
No, because it is not documented to be safe. That is the real reason. Reasoning with implementation details is not as good because they are details that you cannot rely on.
No, it is not safe to do so. If you want to implement in multithreading than you should do this
private readonly object lockObject = new object();
lock (lockObject )
{
//your dictionary operation here.
}

Concurrent collection supporting removal of a specified item?

Quite simple: Other than ConcurrentDictionary (which I'll use if I have to but it's not really the correct concept), is there any Concurrent collection (IProducerConsumer implementation) that supports removal of specific items based on simple equality of an item or a predicate defining a condition for removal?
Explanation: I have a multi-threaded, multi-stage workflow algorithm, which pulls objects from the DB and sticks them in a "starting" queue. From there they are grabbed by the next stage, further worked on, and stuffed into other queues. This process continues through a few more stages. Meanwhile, the first stage is invoked again by its supervisor and pulls objects out of the DB, and those can include objects still in process (because they haven't finished being processed and so haven't been re-persisted with the flag set saying they're done).
The solution I am designing is a master "in work" collection; objects go in that queue when they are retrieved for processing by the first stage, and are removed after they have been re-saved to the DB as "processed" by whatever stage of the workflow completed the necessary processing. While the object is in that list, it will be ignored if it is re-retrieved by the first stage.
I had planned to use a ConcurrentBag, but the only removal method (TryTake) removes an arbitrary item from the bag, not a specified one (and ConcurrentBag is slow in .NET 4). ConcurrentQueue and ConcurrentStack also do not allow removal of an item other than the next one it'll give you, leaving ConcurrentDictionary, which would work but is more than I need (all I really need is to store the Id of the records being processed; they don't change during the workflow).
The reason why there is no such a data structure is that all collections have lookup operation time of O(n). These are IndexOf, Remove(element) etc. They all enumerate through all elements and checking them for equality.
Only hash tables have lookup time of O(1). In concurrent scenario O(n) lookup time would lead to very long lock of a collection. Other threads will not be able to add elements during this time.
In dictionary only the cell hit by hash will be locked. Other threads can continue adding while one is checking for equality through elements in hash cell.
My advice is go on and use ConcurrentDictionary.
By the way, you are right that ConcurrentDictionary is a bit oversized for your solution. What you really need is to check quickly weather an object is in work or not. A HashSet would be a perfect for that. It does basically nothing then Add(element), Contains(element), Remove(element). There is a ConcurrentHeshSet implementation in java. For c# I found this: How to implement ConcurrentHashSet in .Net don't know how good is it.
As a first step I would still write a wrapper with HashSet interface around ConcurrentDictionary bring it up and running and then try different implementations and see performance differences.
As already explained by it other posts its not possible to remove items from a Queue or ConcurrentQueue by default, but actually the easiest way to get around is to extend or wrap the item.
public class QueueItem
{
public Boolean IsRemoved { get; private set; }
public void Remove() { IsRemoved = true; }
}
And when dequeuing:
QueueItem item = _Queue.Dequeue(); // Or TryDequeue if you use a concurrent dictionary
if (!item.IsRemoved)
{
// Do work here
}
It's really hard to make a collection thread-safe in the generic sense. There are so many factors that go into thread-safety that are outside the responsibility or purview of a library/framework class that affect the ability for it to be truly "thread-safe"... One of the drawbacks as you've pointed out is the performance. It's impossible to write a performant collection that is also thread-safe because it has to assume the worst...
The generally recommended practice is to use whatever collection you want and access it in a thread-safe way. This is basically why there aren't more thread-safe collections in the framework. More on this can be found at http://blogs.msdn.com/b/bclteam/archive/2005/03/15/396399.aspx#9534371

Is accessing the Dictionary<TKey, TValue> Keys property thread safe?

For example can I go:
string[] keys = new string[items.Count];
items.Keys.CopyTo(keys);
Where items is a:
Dictionary<string, MyObject>
instance that could be being modified by another thread? (Its not clear to me if when accessing the Keys property the inner workings of Dictionary iterates the collection - in which case I know it would not be thread safe).
Update: I realise my above example is not thread safe becuase the size of the Dictonary could change between the lines. however what about:
Dictionary<string, MyObject> keys = items.Keys;
string[] copyOfKeys = new string[keys.Count];
keys.Keys.CopyTo(copyOfKeys );
No, Dictionary is not thread-safe. If you need thread safety and you're using .NET 4 or higher, you should use a ConcurrentDictionary.
From MSDN:
A Dictionary can support multiple readers concurrently,
as long as the collection is not modified. Even so, enumerating
through a collection is intrinsically not a thread-safe procedure. In
the rare case where an enumeration contends with write accesses, the
collection must be locked during the entire enumeration. To allow the
collection to be accessed by multiple threads for reading and writing,
you must implement your own synchronization.
If the dictionary is being mutated on another thread it is not safe to do this. The first thing that comes to mind is that the dictionary might resize its internal buffer which surely does not play well with concurrent iteration.
If you aren't changing the keys and are only writing values this should be safe in practice.
Anyway, I wouldn't put such code into production because the risk is too high. What if some .NET patch changes the internals of Dictionary and suddenly you have a bug. There are other concerns as well.
I recommend you use ConcurrentDictionary or some other safe strategy. Why gamble with the correctness of your code?

Thread Synchronization in .NET

In my app I have a List of objects. I'm going to have a process (thread) running every few minutes that will update the values in this list. I'll have other processes (other threads) that will just read this data, and they may attempt to do so at the same time.
When the list is being updated, I don't want any other process to be able to read the data. However, I don't want the read-only processes to block each other when no updating is occurring. Finally, if a process is reading the data, the process that updates the data must wait until the process reading the data is finished.
What sort of locking should I implement to achieve this?
This is what you are looking for.
ReaderWriterLockSlim is a class that will handle scenario that you have asked for.
You have 2 pair of functions at your disposal:
EnterWriteLock and ExitWriteLock
EnterReadLock and ExitReadLock
The first one will wait, till all other locks are off, both read and write, so it will give you access like lock() would do.
The second one is compatible with each other, you can have multiple read locks at any given time.
Because there's no syntactic sugar like with lock() statement, make sure you will never forget to Exit lock, because of Exception or anything else. So use it in form like this:
try
{
lock.EnterWriteLock(); //ReadLock
//Your code here, which can possibly throw an exception.
}
finally
{
lock.ExitWriteLock(); //ReadLock
}
You don't make it clear whether the updates to the list will involve modification of existing objects, or adding/removing new ones - the answers in each case are different.
To handling modification of existing items in the list, each object should handle it's own locking.
To allow modification of the list while others are iterating it, don't allow people direct access to the list - force them to work with a read/only copy of the list, like this:
public class Example()
{
public IEnumerable<X> GetReadOnlySnapshot()
{
lock (padLock)
{
return new ReadOnlyCollection<X>( MasterList );
}
}
private object padLock = new object();
}
Using a ReadOnlyCollection<X> to wrap the master list ensures that readers can iterate through a list of fixed content, without blocking modifications made by writers.
You could use ReaderWriterLockSlim. It would satisfy your requirements precisely. However, it is likely to be slower than just using a plain old lock. The reason is because RWLS is ~2x slower than lock and accessing a List would be so fast that it would not be enough to overcome the additional overhead of the RWLS. Test both ways, but it is likely ReaderWriterLockSlim will be slower in your case. Reader writer locks do better in scenarios were the number readers significantly outnumbers the writers and when the guarded operations are long and drawn out.
However, let me present another options for you. One common pattern for dealing with this type of problem is to use two separate lists. One will serve as the official copy which can accept updates and the other will serve as the read-only copy. After you update the official copy you must clone it and swap out the reference for the read-only copy. This is elegant in that the readers require no blocking whatsoever. The reason why readers do not require any blocking type of synchronization is because we are treating the read-only copy as if it were immutable. Here is how it can be done.
public class Example
{
private readonly List<object> m_Official;
private volatile List<object> m_Readonly;
public Example()
{
m_Official = new List<object>();
m_Readonly = m_Official;
}
public void Update()
{
lock (m_Official)
{
// Modify the official copy here.
m_Official.Add(...);
m_Official.Remove(...);
// Now clone the official copy.
var clone = new List<object>(m_Official);
// And finally swap out the read-only copy reference.
m_Readonly = clone;
}
}
public object Read(int index)
{
// It is safe to access the read-only copy here because it is immutable.
// m_Readonly must be marked as volatile for this to work correctly.
return m_Readonly[index];
}
}
The code above would not satisfy your requirements precisely because readers never block...ever. Which means they will still be taking place while writers are updating the official list. But, in a lot of scenarios this winds up being acceptable.

Categories