In a multithreaded application, I have a Dictionary that is accessed by multiple threads for gettinng the value for a specific key. There is also a mechanism using Quartz.Net to update this dictionary.
I am trying to find the best way to make the updated Dictionary availiable for reading in a thread safety manner.
Initialy I considered a ReadWriterLockSlim as a solution, but as I searched for probable performance penalties it might have I came upon Interlocked.Exchange and an overload that can be used with objects. So my question is, could it be used in this scenario? I present a code sample of it's usage.
Thanks very much
public class SingletonDictionaryHolder
{
private Dictionary<string, Person> MyDictionary;
public Person GetPerson(string key)
{
return MyDictionary[key];
}
public void UpdateDictionary(Dictionary<string, Person> updated)
{
Interlocked.Exchange(ref MyDictionary, updated);
}
}
Edit:
Since there is a downvote, I am adding some more information:
Another relative questions is presentated here: https://softwareengineering.stackexchange.com/questions/294514/does-readerwriterlockslim-provide-thread-safety-and-speed-efficiency-compared-t
Note the paragraph: "If writes are rare, you can probably get much better performance by treating the collection as immutable and producing an entirely new list when it changes. You can then use Interlocked.CompareExchange to update the reference to the list in a thread-safe way. This should prevent readers needing to make a defensive copy (which should be a huge win if reading is more common than writing) and remove the need for locking."
And concearning the Intelocked.CompareExchange method, an insight is presented here: Using Interlocked.CompareExchange with a class
Kindly note that a correct architectural design would be to use a MemoryCache that is thread safe by default and a pub/sub mechanism to reflect changes on the cached items - however it was not designed by me and I doubt that there is hope of change in the near future.
Answering my own question, guided by the really helpfull comments.
Interlock.Exchange in not necessary for thread safety since reference assignment is thread safe in all .Net platforms.
So the updated object can be safely assigned to the original one. Threads that will access the object in question after the update will get the fresh new one, something that is completely fine for my scenario.
For future readers coming across this question, please have a look on this: reference assignment is atomic so why is Interlocked.Exchange(ref Object, Object) needed?
Related
I'm trying to understand how using immutable data structures in concurrent programming can obviate the need for locking. I've read a few things on the web but haven't seen any concrete examples yet.
For example, let's say we have some code (C#) that uses lock(s) around a Dictionary< string, object> does this:
class Cache
{
private readonly Dictionary<string, object> _cache = new Dictionary<string, object>();
private readonly object _lock = new object();
object Get(string key, Func<object> expensiveFn)
{
if (!_cache.ContainsKey("key"))
{
lock (_lock)
{
if (!_cache.ContainsKey("key"))
_cache["key"] = expensiveFn();
}
}
return _cache["key"];
}
}
How would that look if _cache was immutable? Would it be possible to remove the lock and also ensure expensiveFn isn't called more than once?
Short answer is that it doesn't, at least not completely.
Immutability only guarantees that another thread won't be able to modify the contents of your data structure while you are working with it. Once you have an instance, that instance can never be modified, so you will always be safe reading it. Any edits would require a copy of the instance to be made, but those copies wouldn't interfere directly with any instances already referenced.
There are still plenty of reasons why you would need locking and synchronization constructs in a multi-threaded application, even with immutable objects. They mostly deal with timing related problems, such as race conditions, or controlling thread flow so that activities happen at the right time. Immutable objects won't really do anything to help with these kinds of problems.
Immutability makes multi-threading easier, but it doesn't make it easy.
As far as your question about what an immutable dictionary would look like. I'd have to say that in most cases it doesn't really make much sense, in your example, to even use an immutable dictionary. Since it is being used as an "active" object that inherently changes as items are added and removed. Even in a language designed around immutability, like F#, there are mutable objects for this purpose. See this link for more details. The immutable versions can be found here.
The basic idea behind immutable data structures reducing (notice that I said "reducing," not "eliminating") the need for locking in concurrency is that every thread is working either on a local copy or against the immutable data structure so there's no need for locking (no thread can modify any other threads' data, just their own). Locking is only needed when several threads can modify the same mutable state at once because otherwise you have the possibility of "dirty reads" and other similar issues.
One example of why immutable data is important:
Assume that you have a person object that is accessed by two different threads.
If thread1 saves the person into a map (the person hash contains the person name), then another thread2 changes the person name.
Now thread1 will not be able to find this person inside the map while it is actually there!
If person was immutable, the references held by different threads will be different and thread1 will be able to find the person in the map even when user2 changes his name (since a new instance of person will be created).
Specific Answers Only Please! I'm decently familiar with the better(best) practices around collection locking, thread safety etc. Just want some answers / ideas around this specific scenario.
We have some legacy code of the type:
public class GodObject
{
private readonly Dictionary<string, string> _signals;
//bunch of methods accessing the dictionary
private void SampleMethod1()
{
lock(_signals)
{
//critical code section 1
}
}
public void SampleMethod2()
{
lock(_signals)
{
//critical code section 2
}
}
}
All access to the dictionary is inside such lock statements. We're getting some bugs which could be explained if the locking was not explicitly working - meaning 2 or more threads getting simultaneous access to the dictionary.
So my question is this - is there any scenario where the critical sections could be simultaneously accessed by multiple threads?? To me, it should not be possible, since the reference is readonly, it's not as though the object could be changing, and most of the issues around the lock() are around deadlocks rather than syncronization not happening. But maybe i'm missing some nuance or something glaring?
This is running in a long running windows service .NET Framework 3.5.
There are three problems I can imagine occurring outside the code you posted:
Somebody might access the dictionary without locking on it. Using lock on an object will prevent anyone else from using lock on the same object at the same time, but it won't do anything to prevent other threads from using the object without locking on it. Note that because it would not have been overly difficult to have written Dictionary [and for that matter List] in such a way as to allow safe simultaneous use by multiple readers and one writer that only adds information, some people may assume that read methods don't need locking. Unfortunately, that assumption is false: Microsoft could have added such thread safety fairly cheaply, but didn't.
As Servy suggested, someone might be assuming that the the collection won't change between calls to two independent methods.
If some code which acquires a lock assumes a collection isn't going to change while the lock is held, but then calls some outside method while holding the lock, it's possible that the outside method could change the object despite the lock being held.
Unless the object which owns the dictionary keeps all references to itself, so that no outside code ever gets a reference to the dictionary, I think the first of these problems is perhaps the most likely. The other two problems can also occur sometimes, however.
I'm working on making my SortedDictionary thread safe and the thing I'm not sure about is: is it safe to have a call to add to SortedDictionary in one thread, like this:
dictionary.Add(key, value);
and simply get an item from this dictionary in another thread, like this:
variable = dictionary[key];
There is no explicit enumeration in either of those places, so it looks safe, but it would be great to be sure about it.
No, it is not safe to read and write SortedDictionary<K,V> concurrently: adding an element to a sorted dictionary may involve re-balancing of the tree, which may cause the concurrent read operation to take a wrong turn while navigating to the element of interest.
In order to fix this problem you would need to either wrap an instance of SortedDictionary<K,V> in a class that performs explicit locking, or roll your own collection compatible with the interfaces implemented by SortedDictionary<K,V>.
No. Anything that modifies the tree is not thread safe at all. The trick is to fill up the SortedDictionary in one thread, then treat it as immutable and let multiple threads read from it. (You can do this with a SortedDictionary, as stated here. I mention this because there may be a collection/dictionary/map out there somewhere that is changed when it is read, so you should always check.)
If you need to modify it once it's released into the wild, you have a problem. You need to lock it to write to it, and all the readers need to respect that lock, which means they need to lock it too, which means the readers can no longer read it simultaneously. The best way around this is usually to create a whole new SortedDictionary, then, once the new one is immutable, replace the reference to the original with a reference to the new one. (You need a volatile reference to do this right.) The readers will switch dictionaries cleanly without a problem. And the old dictionary won't go away until the last reader has finished reading and released its reference.
(There are n-readers and 1-writer locks, but you want to avoid any locking at all.)
(And keep in mind the reference to the dictionary can change suddenly if you're enumerating. Use a local variable for this rather than refering to the (volatile) reference.)
Java has a ConcurrentSkipListMap, which allows any number of simultaneous reads and writes, but I don't think there's anything like it in .NET yet. And if there is, it's going to be slower for reads than an immutable SortedDictionary anyway.
No, because it is not documented to be safe. That is the real reason. Reasoning with implementation details is not as good because they are details that you cannot rely on.
No, it is not safe to do so. If you want to implement in multithreading than you should do this
private readonly object lockObject = new object();
lock (lockObject )
{
//your dictionary operation here.
}
Do I need to declare an static Object and use lock on it like
private static readonly Object padlock = new Object()
public static Test()
{
lock(padlock) {
// Blah Blah Blah
}
}
(Your code wouldn't currently compile, by the way - Readonly should be readonly, and you need to give padlock a type.)
It depends on what you're doing in the method. If the method doesn't use any shared data, or uses it in a way which is already safe, then you're fine.
You generally only need to lock if you're accessing shared data in an otherwise non-thread-safe way. (And all access to that shared data needs to be done in a thread-safe way.)
Having said that, I should point out that "thread safe" is a pretty vague term. Eric Lippert has a great blog post about it... rather than trying to come up with a "one size fits all" approach, you should think about what you're trying to protect against, what scenarios you're anticipating etc.
Jon is right; it is really not clear what you are asking here. The way I would interpret your question is:
If I have some shared state that needs to be made thread-safe by locking it, am I required to declare a private static object as the lock object?
The answer to that question is no, you are not required to do so. However, doing so is a really good idea, so you should do so even if you are not required.
You might think, well, there are lots of objects I could use. If the object I am locking is a reference type, I could use it. Or I could use the Type object associated with the containing class.
The problem with those things as locks is it becomes difficult to track down every possible bit of code that could be using the thing as a lock. Therefore it becomes difficult to analyze the code to ensure that there are no deadlocks due to lock ordering issues. And therefore, you are much more likely to get deadlocks. Having a dedicated lock object makes it much easier; you know that every single usage of that object is for the purposes of locking, and you can then understand what is going on inside those locks.
This is particularly true if you ever have untrusted, hostile code running in your appdomain. Locking a type object requires no particular permissions; what stops hostile code from locking all the types and never unlocking them? Nothing, that's what.
In a multi-threaded program running on a multi-cpu machine do I need to access shared state ( _data in the example code below) using volatile read/writes to ensure correctness.
In other words, can heap objects be cached on the cpu?
Using the example below and assuming multi-threads will access the GetValue and Add methods, I need ThreadA to be able to add data (using the Add Method) and ThreadB to be able to see/get that added data immediately (using the GetValue method). So do I need to add volatile reads/writes to _data to ensure this? Basically I don’t want to added data to be cached on ThreadA’s cpu.
/ I am not Locking (enforcing exclusive thread access) as the code needs to be ultra-fast and I am not removing any data from _data so I don’t need to lock _data.
Thanks.
**** Update ****************************
Obviously you guys think going lock-free using this example is bad idea. But what side effects or exceptions could I face here?
Could the Dictionary type throw an exception if 1 thread is iterating the values for read and another thread is iterating the values for update? Or would I just experience “dirty reads” (which would be fine in my case)?
**** End Update ****************************
public sealed class Data
{
private volatile readonly Dictionary<string, double> _data = new Dictionary<string, double>();
public double GetVaule(string key)
{
double value;
if (!_data.TryGetValue(key, out value))
{
throw new ArgumentException(string.Format("Key {0} does not exist.", key));
}
return value;
}
public void Add(string key, double value)
{
_data.Add(key, value);
}
public void Clear()
{
_data.Clear();
}
}
Thanks for the replies. Regarding the locks, the methods are pretty much constantly called by mulitple threads so my problem is with contested locks not the actual lock operation.
So my question is about cpu caching, can heap objects (the _data instance field) be cached on a cpu? Do i need the access the _data field using volatile reads/writes?
/Also, I am stuck with .Net 2.0.
Thanks for your help.
The MSDN docs for Dictionary<TKey, TValue> say that it's safe for multiple readers but they don't give the "one writer, multiple readers" guarantee that some other classes do. In short, I wouldn't do this.
You say you're avoiding locking because you need the code to be "ultra-fast" - have you tried locking to see what the overhead is? Uncontested locks are very cheap, and when the lock is contested that's when you're benefiting from the added safety. I'd certainly profile this extensively before deciding to worry about the concurrency issues of a lock-free solution. ReaderWriterLockSlim may be useful if you've actually got multiple readers, but it sounds like you've got a single reader and a single writer, at least at the moment - simple locking will be easier in this case.
I think you may be misunderstanding the use of the volatile keyword (either that or I am, and someone please feel free to correct me). The volatile keyword guarantees that get and set operations on the value of the variable itself from multiple threads will always deal with the same copy. For instance, if I have a bool that indicates a state then setting it in one thread will make the new value immediately available to the other.
However, you never change the value of your variable (in this case, a reference). All that you do is manipulate the area of memory that the reference points to. Declaring it as volatile readonly (which, if my understanding is sound, defeats the purpose of volatile by never allowing it to be set) won't have any effect on the actual data that's being manipulated (the back-end store for the Dictionary<>).
All that being said, you really need to use a lock in this case. Your danger extends beyond the prospect of "dirty reads" (meaning that what you read would have been, at some point, valid) into truly unknown territory. As Jon said, you really need proof that locking produces unacceptable performance before you try to go down the road of lockless coding. Otherwise that's the epitome of premature optimization.
The problem is that your add method:
public void Add(string key, double value)
{
_data.Add(key, value);
}
Could cause _data to decide to completely re-organise the data it's holding - at that point a GetVaule request could fail in any possible way.
You need a lock or a different data structure / data structure implementation.
I don't think volatile can be a replacement of locking if you start calling methods on it. You are guaranteeing that the thread A and thread B sees the same copy of the dictionary, but you can still access the dictionary simultaneously. You can use multi-moded locks to increase concurrency. See ReaderWriterLockSlim for example.
Represents a lock that is used to
manage access to a resource, allowing
multiple threads for reading or
exclusive access for writing.
The volatile keyword is not about locking, it is used to indicate that the value of the specified field might be changed or read by different thread or other thing that can run concurrently with your code. This is crucial for the compiler to know, because many optimization processes involve caching the variable value and rearranging the instructions. The volatile keyword will tell the compiler to be "cautious" when optimizing those instructions that reference to volatile variable.
For multi-thread usage of dictionary, there are many ways to do. The simplest way is using lock keyword, which has adequate performance. If you need higher performance, you might need to implement your own dictionary for your specific task.
Volatile is not locking, it has nothing to do with synchronization. It's generally safe to do lock-free reads on read-only data. Note that just because you don't remove anything from _data, you seem to call _data.Add(). That is NOT read-only. So yes, this code will blow up in your face in a variety of exciting and difficult to predict ways.
Use locks, it's simple, it's safer. If you're a lock-free guru (you're not!), AND profiling shows a bottleneck related to contention for the lock, AND you cannot solve the contention issues via partitioning or switching to spin-locks THEN AND ONLY THEN can you investigate a solution to get lock-free reads, which WILL involve writing your own Dictionary from scratch and MAY be faster than the locking solution.
Are you starting to see how far off base you are in your thinking here? Just use a damn lock!