Generic dictionary - possible locking issue? - c#

Specific Answers Only Please! I'm decently familiar with the better(best) practices around collection locking, thread safety etc. Just want some answers / ideas around this specific scenario.
We have some legacy code of the type:
public class GodObject
{
private readonly Dictionary<string, string> _signals;
//bunch of methods accessing the dictionary
private void SampleMethod1()
{
lock(_signals)
{
//critical code section 1
}
}
public void SampleMethod2()
{
lock(_signals)
{
//critical code section 2
}
}
}
All access to the dictionary is inside such lock statements. We're getting some bugs which could be explained if the locking was not explicitly working - meaning 2 or more threads getting simultaneous access to the dictionary.
So my question is this - is there any scenario where the critical sections could be simultaneously accessed by multiple threads?? To me, it should not be possible, since the reference is readonly, it's not as though the object could be changing, and most of the issues around the lock() are around deadlocks rather than syncronization not happening. But maybe i'm missing some nuance or something glaring?
This is running in a long running windows service .NET Framework 3.5.

There are three problems I can imagine occurring outside the code you posted:
Somebody might access the dictionary without locking on it. Using lock on an object will prevent anyone else from using lock on the same object at the same time, but it won't do anything to prevent other threads from using the object without locking on it. Note that because it would not have been overly difficult to have written Dictionary [and for that matter List] in such a way as to allow safe simultaneous use by multiple readers and one writer that only adds information, some people may assume that read methods don't need locking. Unfortunately, that assumption is false: Microsoft could have added such thread safety fairly cheaply, but didn't.
As Servy suggested, someone might be assuming that the the collection won't change between calls to two independent methods.
If some code which acquires a lock assumes a collection isn't going to change while the lock is held, but then calls some outside method while holding the lock, it's possible that the outside method could change the object despite the lock being held.
Unless the object which owns the dictionary keeps all references to itself, so that no outside code ever gets a reference to the dictionary, I think the first of these problems is perhaps the most likely. The other two problems can also occur sometimes, however.

Related

Intelocked.Exchange instead of ReaderWriterLockSlim

In a multithreaded application, I have a Dictionary that is accessed by multiple threads for gettinng the value for a specific key. There is also a mechanism using Quartz.Net to update this dictionary.
I am trying to find the best way to make the updated Dictionary availiable for reading in a thread safety manner.
Initialy I considered a ReadWriterLockSlim as a solution, but as I searched for probable performance penalties it might have I came upon Interlocked.Exchange and an overload that can be used with objects. So my question is, could it be used in this scenario? I present a code sample of it's usage.
Thanks very much
public class SingletonDictionaryHolder
{
private Dictionary<string, Person> MyDictionary;
public Person GetPerson(string key)
{
return MyDictionary[key];
}
public void UpdateDictionary(Dictionary<string, Person> updated)
{
Interlocked.Exchange(ref MyDictionary, updated);
}
}
Edit:
Since there is a downvote, I am adding some more information:
Another relative questions is presentated here: https://softwareengineering.stackexchange.com/questions/294514/does-readerwriterlockslim-provide-thread-safety-and-speed-efficiency-compared-t
Note the paragraph: "If writes are rare, you can probably get much better performance by treating the collection as immutable and producing an entirely new list when it changes. You can then use Interlocked.CompareExchange to update the reference to the list in a thread-safe way. This should prevent readers needing to make a defensive copy (which should be a huge win if reading is more common than writing) and remove the need for locking."
And concearning the Intelocked.CompareExchange method, an insight is presented here: Using Interlocked.CompareExchange with a class
Kindly note that a correct architectural design would be to use a MemoryCache that is thread safe by default and a pub/sub mechanism to reflect changes on the cached items - however it was not designed by me and I doubt that there is hope of change in the near future.
Answering my own question, guided by the really helpfull comments.
Interlock.Exchange in not necessary for thread safety since reference assignment is thread safe in all .Net platforms.
So the updated object can be safely assigned to the original one. Threads that will access the object in question after the update will get the fresh new one, something that is completely fine for my scenario.
For future readers coming across this question, please have a look on this: reference assignment is atomic so why is Interlocked.Exchange(ref Object, Object) needed?

locking only when modifying vs entire method

When should locks be used? Only when modifying data or when accessing it as well?
public class Test {
static Dictionary<string, object> someList = new Dictionary<string, object>();
static object syncLock = new object();
public static object GetValue(string name) {
if (someList.ContainsKey(name)) {
return someList[name];
} else {
lock(syncLock) {
object someValue = GetValueFromSomeWhere(name);
someList.Add(name, someValue);
}
}
}
}
Should there be a lock around the the entire block or is it ok to just add it to the actual modification? My understanding is that there still could be some race condition where one call might not have found it and started to add it while another call right after might have also run into the same situation - but I'm not sure. Locking is still so confusing. I haven't run into any issues with the above similar code but I could just be lucky so far. Any help above would be appriciated as well as any good resources for how/when to lock objects.
You have to lock when reading too, or you can get unreliable data, or even an exception if a concurrent modification physically changes the target data structure.
In the case above, you need to make sure that multiple threads don't try to add the value at the same time, so you need at least a read lock while checking whether it is already present. Otherwise multiple threads could decide to add, find the value is not present (since this check is not locked), and then all try to add in turn (after getting the lock)
You could use a ReaderWriterLockSlim if you have many reads and only a few writes. In the code above you would acquire the read lock to do the check and upgrade to a write lock once you decide you need to add it. In most cases, only a read lock (which allows your reader threads to still run in parallel) would be needed.
There is a summary of the available .Net 4 locking primitives here. Definitely you should understand this before you get too deep into multithreaded code. Picking the correct locking mechanism can make a huge performance difference.
You are correct that you have been lucky so far - that's a frequent feature of concurrency bugs. They are often hard to reproduce without targeted load testing, meaning correct design (and exhaustive testing, of course) is vital to avoid embarrassing and confusing production bugs.
Lock the whole block before you check for the existence of name. Otherwise, in theory, another thread could add it between the check, and your code that adds it.
Actually locking just when you perform the Add really doesn't do anything at all. All that would do is prevent another thread from adding something simultaneously. But since that other thread would have already decided it was going to do the add, it would just try to do it anyway as soon as the lock was released.
If a resource can only be accessed by multiple threads, you do not need any locks.
If a resource can be accessed by multiple threads and can be modified, then all accesses/modifications need to be synchronized. In your example, if GetValueFromSomeWhere takes a long time to return, it is possible for a second call to be made with the same value in name, but the value has not been stored in the Dictionary.
ReaderWriterLock or the slim version if you under 4.0.
You will aquire the reader lock for the reads (will allow for concurrent reads) and upgrade the lock to the writer lock when something is to write (will allow only one write at the time and will block all the reads until is done, as well as the concurrent write-threads).
Make sure to release your locks with the pattern to avoid deadlocking:
void Write(object[] args)
{
this.ReaderWriterLock.AquireWriteLock(TimeOut.Infinite);
try
{
this.myData.Write(args);
}
catch(Exception ex)
{
}
finally
{
this.ReaderWriterLock.RelaseWriterLock();
}
}

Using the C# Volatile keyword in Threaded Application

I have a class that has a few arraylists in it.
My main class creates a new instance of this class. My main class has at least 2 threads adding and removing from my class with the arraylists in it. At the moment everything is running fine but I was just wondering if it would be safer to declare my class with the arraylists in it as volatile eg/
private volatile myclass;
myclass = new myclass();
......
myclass.Add(...)
myclass.Clear(..)
Using the volatile keyword will not make your code thread-safe in this example. The volatile keyword is typically used to ensure that when reading or writing the value of a variable (i.e. class field) that the latest value for that variable is either read from main memory or written straight to main memory, rather than read from cache (e.g. a CPU register) for example. The volatile keyword is a way of saying "do not use caching optimizations with this shared field", and removes the issue where threads may use local copies of a field and so not see each other's updates.
In your case the value of myclass is not actually being updated (i.e. you are not re-assigning myclass) so volatile is not useful for you, and it is not the update of the myclass variable you actually want to make thread-safe in this case anyway.
If you wish to make updating of the actual class thread-safe, then using a "lock" around "Add" and "Clear" is a straight-forward alternative. This will ensure that only one thread at a time can do these operations (which update the internal state of myclass) and so should not be done in parallel.
A lock can be used as follows:
private readonly object syncObj = new object();
private readonly myclass = new myclass();
......
lock (syncObj)
{
myclass.Add(...)
}
lock (syncObj)
{
myclass.Clear(..)
}
You also need to add locking around any code that reads the state that is being updated by "Add", if that is the case although it does not appear in your example code.
It may not be obvious when first writing multi-threaded code why you would need a lock when adding to a collection. If we take List or ArrayList as an example, then the problem arises as internally these collections use an Array as a backing store, and will dynamically "grow" this Array (i.e. by creating a new larger Array and copying the old contents) as certain capacities are met when Add is called. This all happens internally and requires the maintenance of this Array and variables such as what current size the collection is (rather than the Length of the actual array which might be larger). So Adding to the collection may involve multiple steps if the internal Array needs to grow. When using multiple threads in an unsafe manner, multiple threads may indirectly cause growing to happen when Adding, and thus trample all over each others updates. As well as the issue of multiple threads Adding at the same time, there is also the issue that another thread may be trying to read the collection whilst the internal state is being changed. Using locks ensures that operations like these are done without interference from other threads.
At present, the code is wrong; adding a volatile keyword won't fix it. It's not safe to use the .NET classes across threads without adding synchronisation.
It's hard to give straightforward advice without knowing more about the structure of your code. A first step would be to start using the lock keyword around all accesses to the list object; however, there could still be assumptions in the code that don't work across multiple threads.
It's possible to use a collection class that's already safe for multithreaded access, which would avoid the need for getting the lock keyword in the right place, but it's still possible to make errors.
Can you post some more of your code? That way we can give more specific suggestions about making it thread safe.

Use of volatile (Thread.VolatileRead/ Thread.VolatileWrite) in C#

In a multi-threaded program running on a multi-cpu machine do I need to access shared state ( _data in the example code below) using volatile read/writes to ensure correctness.
In other words, can heap objects be cached on the cpu?
Using the example below and assuming multi-threads will access the GetValue and Add methods, I need ThreadA to be able to add data (using the Add Method) and ThreadB to be able to see/get that added data immediately (using the GetValue method). So do I need to add volatile reads/writes to _data to ensure this? Basically I don’t want to added data to be cached on ThreadA’s cpu.
/ I am not Locking (enforcing exclusive thread access) as the code needs to be ultra-fast and I am not removing any data from _data so I don’t need to lock _data.
Thanks.
**** Update ****************************
Obviously you guys think going lock-free using this example is bad idea. But what side effects or exceptions could I face here?
Could the Dictionary type throw an exception if 1 thread is iterating the values for read and another thread is iterating the values for update? Or would I just experience “dirty reads” (which would be fine in my case)?
**** End Update ****************************
public sealed class Data
{
private volatile readonly Dictionary<string, double> _data = new Dictionary<string, double>();
public double GetVaule(string key)
{
double value;
if (!_data.TryGetValue(key, out value))
{
throw new ArgumentException(string.Format("Key {0} does not exist.", key));
}
return value;
}
public void Add(string key, double value)
{
_data.Add(key, value);
}
public void Clear()
{
_data.Clear();
}
}
Thanks for the replies. Regarding the locks, the methods are pretty much constantly called by mulitple threads so my problem is with contested locks not the actual lock operation.
So my question is about cpu caching, can heap objects (the _data instance field) be cached on a cpu? Do i need the access the _data field using volatile reads/writes?
/Also, I am stuck with .Net 2.0.
Thanks for your help.
The MSDN docs for Dictionary<TKey, TValue> say that it's safe for multiple readers but they don't give the "one writer, multiple readers" guarantee that some other classes do. In short, I wouldn't do this.
You say you're avoiding locking because you need the code to be "ultra-fast" - have you tried locking to see what the overhead is? Uncontested locks are very cheap, and when the lock is contested that's when you're benefiting from the added safety. I'd certainly profile this extensively before deciding to worry about the concurrency issues of a lock-free solution. ReaderWriterLockSlim may be useful if you've actually got multiple readers, but it sounds like you've got a single reader and a single writer, at least at the moment - simple locking will be easier in this case.
I think you may be misunderstanding the use of the volatile keyword (either that or I am, and someone please feel free to correct me). The volatile keyword guarantees that get and set operations on the value of the variable itself from multiple threads will always deal with the same copy. For instance, if I have a bool that indicates a state then setting it in one thread will make the new value immediately available to the other.
However, you never change the value of your variable (in this case, a reference). All that you do is manipulate the area of memory that the reference points to. Declaring it as volatile readonly (which, if my understanding is sound, defeats the purpose of volatile by never allowing it to be set) won't have any effect on the actual data that's being manipulated (the back-end store for the Dictionary<>).
All that being said, you really need to use a lock in this case. Your danger extends beyond the prospect of "dirty reads" (meaning that what you read would have been, at some point, valid) into truly unknown territory. As Jon said, you really need proof that locking produces unacceptable performance before you try to go down the road of lockless coding. Otherwise that's the epitome of premature optimization.
The problem is that your add method:
public void Add(string key, double value)
{
_data.Add(key, value);
}
Could cause _data to decide to completely re-organise the data it's holding - at that point a GetVaule request could fail in any possible way.
You need a lock or a different data structure / data structure implementation.
I don't think volatile can be a replacement of locking if you start calling methods on it. You are guaranteeing that the thread A and thread B sees the same copy of the dictionary, but you can still access the dictionary simultaneously. You can use multi-moded locks to increase concurrency. See ReaderWriterLockSlim for example.
Represents a lock that is used to
manage access to a resource, allowing
multiple threads for reading or
exclusive access for writing.
The volatile keyword is not about locking, it is used to indicate that the value of the specified field might be changed or read by different thread or other thing that can run concurrently with your code. This is crucial for the compiler to know, because many optimization processes involve caching the variable value and rearranging the instructions. The volatile keyword will tell the compiler to be "cautious" when optimizing those instructions that reference to volatile variable.
For multi-thread usage of dictionary, there are many ways to do. The simplest way is using lock keyword, which has adequate performance. If you need higher performance, you might need to implement your own dictionary for your specific task.
Volatile is not locking, it has nothing to do with synchronization. It's generally safe to do lock-free reads on read-only data. Note that just because you don't remove anything from _data, you seem to call _data.Add(). That is NOT read-only. So yes, this code will blow up in your face in a variety of exciting and difficult to predict ways.
Use locks, it's simple, it's safer. If you're a lock-free guru (you're not!), AND profiling shows a bottleneck related to contention for the lock, AND you cannot solve the contention issues via partitioning or switching to spin-locks THEN AND ONLY THEN can you investigate a solution to get lock-free reads, which WILL involve writing your own Dictionary from scratch and MAY be faster than the locking solution.
Are you starting to see how far off base you are in your thinking here? Just use a damn lock!

Difference between lock(locker) and lock(variable_which_I_am_using)

I'm using C# & .NEt 3.5. What is the difference between the OptionA and OptionB ?
class MyClass
{
private object m_Locker = new object();
private Dicionary<string, object> m_Hash = new Dictionary<string, object>();
public void OptionA()
{
lock(m_Locker){
// Do something with the dictionary
}
}
public void OptionB()
{
lock(m_Hash){
// Do something with the dictionary
}
}
}
I'm starting to dabble in threading (primarly for creating a cache for a multi-threaded app, NOT using the HttpCache class, since it's not attached to a web site), and I see the OptionA syntax in a lot of the examples I see online, but I don't understand what, if any, reason that is done over OptionB.
Option B uses the object to be protected to create a critical section. In some cases, this more clearly communicates the intent. If used consistently, it guarantees only one critical section for the protected object will be active at a time:
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
Option A is less restrictive. It uses a secondary object to create a critical section for the object to be protected. If multiple secondary objects are used, it's possible to have more than one critical section for the protected object active at a time.
private object m_LockerA = new object();
private object m_LockerB = new object();
lock (m_LockerA)
{
// It's possible this block is active in one thread
// while the block below is active in another
// Do something with the dictionary
}
lock (m_LockerB)
{
// It's possible this block is active in one thread
// while the block above is active in another
// Do something with the dictionary
}
Option A is equivalent to Option B if you use only one secondary object. As far as reading code, Option B's intent is clearer. If you're protecting more than one object, Option B isn't really an option.
It's important to understand that lock(m_Hash) does NOT prevent other code from using the hash. It only prevents other code from running that is also using m_Hash as its locking object.
One reason to use option A is because classes are likely to have private variables that you will use inside the lock statement. It is much easier to just use one object which you use to lock access to all of them instead of trying to use finer grain locks to lock access to just the members you will need. If you try to go with the finer grained method you will probably have to take multiple locks in some situations and then you need to make sure you are always taking them in the same order to avoid deadlocks.
Another reason to use option A is because it is possible that the reference to m_Hash will be accessible outside your class. Perhaps you have a public property which supplies access to it, or maybe you declare it as protected and derived classes can use it. In either case once external code has a reference to it, it is possible that the external code will use it for a lock. This also opens up the possibility of deadlocks since you have no way to control or know what order the lock will be taken in.
Actually, it is not good idea to lock on object if you are using its members.
Jeffrey Richter wrote in his book "CLR via C#" that there is no guarantee that a class of object that you are using for synchronization will not use lock(this) in its implementation (It's interesting, but it was a recommended way for synchronization by Microsoft for some time... Then, they found that it was a mistake), so it is always a good idea to use a special separate object for synchronization. So, as you can see OptionB will not give you a guarantee of deadlock - safety.
So, OptionA is much safer that OptionB.
It's not what you're "Locking", its the code that's contained between the lock { ... } thats important and that you're preventing from being executed.
If one thread takes out a lock() on any object, it prevents other threads from obtaining a lock on the same object, and hence prevents the second thread from executing the code between the braces.
So that's why most people just create a junk object to lock on, it prevents other threads from obtaining a lock on that same junk object.
I think the scope of the variable you "pass" in will determine the scope of the lock.
i.e. An instance variable will be in respect of the instance of the class whereas a static variable will be for the whole AppDomain.
Looking at the implementation of the collections (using Reflector), the pattern seems to follow that an instance variable called SyncRoot is declared and used for all locking operations in respect of the instance of the collection.
Well, it depends on what you wanted to lock(be made threadsafe).
Normally I would choose OptionB to provide threadsafe access to m_Hash ONLY. Where as OptionA, I would used for locking value type, which can't be used with the lock, or I had a group of objects that need locking concurrently, but I don't what to lock the whole instance by using lock(this)
Locking the object that you're using is simply a matter of convenience. An external lock object can make things simpler, and is also needed if the shared resource is private, like with a collection (in which case you use the ICollection.SyncRoot object).
OptionA is the way to go here as long as in all your code, when accessing the m_hash you use the m_Locker to lock on it.
Now Imagine this case. You lock on the object. And that object in one of the functions you call has a lock(this) code segment. In this case that is a sure unrecoverable deadlock

Categories