Using the C# Volatile keyword in Threaded Application - c#

I have a class that has a few arraylists in it.
My main class creates a new instance of this class. My main class has at least 2 threads adding and removing from my class with the arraylists in it. At the moment everything is running fine but I was just wondering if it would be safer to declare my class with the arraylists in it as volatile eg/
private volatile myclass;
myclass = new myclass();
......
myclass.Add(...)
myclass.Clear(..)

Using the volatile keyword will not make your code thread-safe in this example. The volatile keyword is typically used to ensure that when reading or writing the value of a variable (i.e. class field) that the latest value for that variable is either read from main memory or written straight to main memory, rather than read from cache (e.g. a CPU register) for example. The volatile keyword is a way of saying "do not use caching optimizations with this shared field", and removes the issue where threads may use local copies of a field and so not see each other's updates.
In your case the value of myclass is not actually being updated (i.e. you are not re-assigning myclass) so volatile is not useful for you, and it is not the update of the myclass variable you actually want to make thread-safe in this case anyway.
If you wish to make updating of the actual class thread-safe, then using a "lock" around "Add" and "Clear" is a straight-forward alternative. This will ensure that only one thread at a time can do these operations (which update the internal state of myclass) and so should not be done in parallel.
A lock can be used as follows:
private readonly object syncObj = new object();
private readonly myclass = new myclass();
......
lock (syncObj)
{
myclass.Add(...)
}
lock (syncObj)
{
myclass.Clear(..)
}
You also need to add locking around any code that reads the state that is being updated by "Add", if that is the case although it does not appear in your example code.
It may not be obvious when first writing multi-threaded code why you would need a lock when adding to a collection. If we take List or ArrayList as an example, then the problem arises as internally these collections use an Array as a backing store, and will dynamically "grow" this Array (i.e. by creating a new larger Array and copying the old contents) as certain capacities are met when Add is called. This all happens internally and requires the maintenance of this Array and variables such as what current size the collection is (rather than the Length of the actual array which might be larger). So Adding to the collection may involve multiple steps if the internal Array needs to grow. When using multiple threads in an unsafe manner, multiple threads may indirectly cause growing to happen when Adding, and thus trample all over each others updates. As well as the issue of multiple threads Adding at the same time, there is also the issue that another thread may be trying to read the collection whilst the internal state is being changed. Using locks ensures that operations like these are done without interference from other threads.

At present, the code is wrong; adding a volatile keyword won't fix it. It's not safe to use the .NET classes across threads without adding synchronisation.
It's hard to give straightforward advice without knowing more about the structure of your code. A first step would be to start using the lock keyword around all accesses to the list object; however, there could still be assumptions in the code that don't work across multiple threads.
It's possible to use a collection class that's already safe for multithreaded access, which would avoid the need for getting the lock keyword in the right place, but it's still possible to make errors.
Can you post some more of your code? That way we can give more specific suggestions about making it thread safe.

Related

Generic dictionary - possible locking issue?

Specific Answers Only Please! I'm decently familiar with the better(best) practices around collection locking, thread safety etc. Just want some answers / ideas around this specific scenario.
We have some legacy code of the type:
public class GodObject
{
private readonly Dictionary<string, string> _signals;
//bunch of methods accessing the dictionary
private void SampleMethod1()
{
lock(_signals)
{
//critical code section 1
}
}
public void SampleMethod2()
{
lock(_signals)
{
//critical code section 2
}
}
}
All access to the dictionary is inside such lock statements. We're getting some bugs which could be explained if the locking was not explicitly working - meaning 2 or more threads getting simultaneous access to the dictionary.
So my question is this - is there any scenario where the critical sections could be simultaneously accessed by multiple threads?? To me, it should not be possible, since the reference is readonly, it's not as though the object could be changing, and most of the issues around the lock() are around deadlocks rather than syncronization not happening. But maybe i'm missing some nuance or something glaring?
This is running in a long running windows service .NET Framework 3.5.
There are three problems I can imagine occurring outside the code you posted:
Somebody might access the dictionary without locking on it. Using lock on an object will prevent anyone else from using lock on the same object at the same time, but it won't do anything to prevent other threads from using the object without locking on it. Note that because it would not have been overly difficult to have written Dictionary [and for that matter List] in such a way as to allow safe simultaneous use by multiple readers and one writer that only adds information, some people may assume that read methods don't need locking. Unfortunately, that assumption is false: Microsoft could have added such thread safety fairly cheaply, but didn't.
As Servy suggested, someone might be assuming that the the collection won't change between calls to two independent methods.
If some code which acquires a lock assumes a collection isn't going to change while the lock is held, but then calls some outside method while holding the lock, it's possible that the outside method could change the object despite the lock being held.
Unless the object which owns the dictionary keeps all references to itself, so that no outside code ever gets a reference to the dictionary, I think the first of these problems is perhaps the most likely. The other two problems can also occur sometimes, however.

Should a lock variable be declared volatile?

I have the following Lock statement:
private readonly object ownerLock_ = new object();
lock (ownerLock_)
{
}
Should I use volatile keyword for my lock variable?
private readonly volatile object ownerLock_ = new object();
On MSDN I saw that it usually used for a field that is accessed without locking, so if I use Lock I don't need to use volatile?
From MSDN:
The volatile modifier is usually used for a field that is accessed by
multiple threads without using the lock statement to serialize access.
If you're only ever accessing the data that the lock "guards" while you own the lock, then yes - making those fields volatile is superfluous. You don't need to make the ownerLock_ variable volatile either. (You haven't currently shown any actual code within the lock statement, which makes it hard to talk about in concrete terms - but I'm assuming you'd actually be reading/modifying some data within the lock statement.)
volatile should be very rarely used in application code. If you want lock-free access to a single variable, Interlocked is almost always simpler to reason about. If you want lock-free access beyond that, I would almost always start locking. (Or try to use immutable data structures to start with.)
I'd only expect to see volatile within code which is trying to build higher level abstractions for threading - so within the TPL codebase, for example. It's really a tool for experts who really understand the .NET memory model thoroughly... of whom there are very few, IMO.
If something is readonly it's thread-safe, period. (Well, almost. An expert might be able to figure out how to get a NullReferenceException on your lock statement, but it wouldn't be easy.) With readonly you don't need volatile, Interlocked, or locking. It's the ideal keyword for multi-threading, and you should use it where ever you can. It works great for a lock object where its big disadvantage (you can't change the value) doesn't matter.
Also, while the reference is immutable, the object referenced may not be. "new object()" is here, but if it was a List or something else mutable--and not thread-safe--you would want to lock the reference (and all other references to it, if any) to keep the object from changing in two threads at once.

What is exactly a "thread-safe type"? When do we need to use the "lock" statement?

I read all documentation about thread-safe types and the "lock" statement, but I am still not getting it 100%.
When exactly do I need to use the "lock" statement? How it relates to (non) thread-safe types? Thank you.
Imagine an instance of a class with a global variable in it. Imagine two threads call a method on that object at exactly the same time, and that method updates the global variable inside.
The likelihood is that value in the variable will get corrupted. Different languages and compilers/interpreters will deal with this in different ways (or not at all...) but the point is that you get "undesired" and "unpredictable" results.
Now imagine that the method obtains a "lock" on the variable before attempting to read from or write to it. The first thread to call the method will get a "lock" on the variable, the second thread to call the method will have to wait until the lock is released by the first thread. While you still have a race condition (i.e. the second thread might overwrite the value from the first) at least you have predictable results because no two threads (that are unaware of each other) can modify the value at the same time.
You use the lock statement to obtain that lock on the variable. Typically you'd define a separate object variable and use that for the lock object:
public class MyThreadSafeClass
{
private readonly object lockObject = new object();
private string mySharedString;
public void ThreadSafeMethod(string newValue)
{
lock (lockObject)
{
// Once one thread has got inside this lock statement, any others will have to wait outside for their turn...
mySharedString = newValue;
}
}
}
A type is deemed "thread-safe" if it applies the principle that no corruption will occur if shared data is accessed by multiple threads at the same time.
Beware the difference between "immutable" and "thread-safe". Thread-safe says that you have coded for the scenario and won't get corruption if two threads access shared state at the same time, whereas immutability is simply saying you return a new object rather than modifying it. Immutable objects are thread-safe, but not all thread-safe objects are immutable.
Thread safe code means code that can be accessed with many threads and still operate correctly.
In C#, this normally requires some sort of synchronization mechanism. A simple one is the lock statement (which is behind the scenes a call to Monitor.Enter). A code block that is surrounded by a lock block can only be accessed by one thread at a time.
Any use of a type that is not thread safe requires you to manage synchronization yourself.
A good resource to learn about threading in C# is the free eBook by Joe Albahari, found here.
http://en.wikipedia.org/wiki/Thread_safety

How do I use the volatile keyword in this model?

I have a data class with lots of data in it (TV schedule data).
The data is queried from one side and periodically updated from the other side.
There are two threads: the first thread queries the data on request and the second thread updates the data on regular intervals.
To prevent locking, I use two instances (copies) of the data class: the live instance and the backup instance.
Initially, both instances are filled with the same data. The first thread only reads from the live instance.
The second thread periodically updates both instances as follows:
Update the backup instance.
Swap the backup and live instance (i.e. the backup instance becomes the live instance).
Update the backup instance.
Both backup instance and live instance are now up-to-date.
My questions is: how should I use the volatile keyword here?
public class data
{
// Lots of fields here.
// Should these fields also be declared volatile?
}
I have already made the references volatile:
public volatile data live
public volatile data backup
fields should be declared volatile if you plan to modify them outside locks, or without Interlocked. Here is the best article that explain volatile deeply: http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
To be honest, I would just lock on it. The correctness is so much easier to check, and the need for the backup is removed.
With your plan here, the fields would also have to be volatile. Consider the case otherwise:
public class Data
{
public int SimpleInt;
}
Here we have just a single public field for simplicity, the same applies to more realistic structures. (Incidentally, captials for class names is a more common convention in C#).
Now consider live.SimpleInt as seen by thread A. Because live could be cached, we need to have it as volatile. However, consider that when the object is swapped with backup, and then swapped back to live, then live will have the same memory location as it did before (unless the GC has moved it). Therefore live.SimpleInt will have the same memory location as it did before, and therefore if it was not volatile, thread A may be using a cached version of live.SimpleInt.
However, if you created a new Data object, rather than swapping in and out, then the new value of live.SimpleInt will not be in the thread's cache, and it could be safely non-volatile.
It's also important to consider that the fields of the fields will have to be volatile too.
Indeed now you need just one stored Data object. The new one will be created as an object referenced only by one thread (hence it cannot be damaged by or do damage to another thread), and its creation will be based on values read from live, which is also safe as the other thread is only reading (barring some memoisation techniques that mean that "reads" are really writes behind the scenes, reads can't harm other reads, though they can be harmed by writes) altered while visible to just a single thread, and hence only the final write requires any concern about synchronisation which should indeed be safe with only volatile or a MemoryBarrier used for protection, since assigning a reference is atomic, and since you don't care about the old value anymore.
I do not think you are going to get the effect you want by marking things with volatile. Consider this code.
volatile data live;
void Thread1()
{
if (live.Field1)
{
Console.WriteLine(live.Field1);
}
}
In the example above false could be written to the console if the second thread swapped the live and backup references between the time the first thread entered the if and called Console.WriteLine.
If that problem does not concern you then all you really need to do is mark the live variable as volatile. You do not need to mark the individual fields in data as volatile. The reason is because volatile reads create acquire fence memory barriers and volatile writes create release fence memory barriers. What that means is that when thread 2 swaps the references then all writes to the individual fields of data must commit first and when thread 1 wants to read the individual fields of the live instance the live variable must be reacquired from main memory first. You do not need to mark the backup variable as volatile because it is never used by thread 1.
The advanced threading section in Joe Albahari's ebook goes into a great deal of detail on the semantics of volatile and should explain why you only need to mark your live reference as such.

Difference between lock(locker) and lock(variable_which_I_am_using)

I'm using C# & .NEt 3.5. What is the difference between the OptionA and OptionB ?
class MyClass
{
private object m_Locker = new object();
private Dicionary<string, object> m_Hash = new Dictionary<string, object>();
public void OptionA()
{
lock(m_Locker){
// Do something with the dictionary
}
}
public void OptionB()
{
lock(m_Hash){
// Do something with the dictionary
}
}
}
I'm starting to dabble in threading (primarly for creating a cache for a multi-threaded app, NOT using the HttpCache class, since it's not attached to a web site), and I see the OptionA syntax in a lot of the examples I see online, but I don't understand what, if any, reason that is done over OptionB.
Option B uses the object to be protected to create a critical section. In some cases, this more clearly communicates the intent. If used consistently, it guarantees only one critical section for the protected object will be active at a time:
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
Option A is less restrictive. It uses a secondary object to create a critical section for the object to be protected. If multiple secondary objects are used, it's possible to have more than one critical section for the protected object active at a time.
private object m_LockerA = new object();
private object m_LockerB = new object();
lock (m_LockerA)
{
// It's possible this block is active in one thread
// while the block below is active in another
// Do something with the dictionary
}
lock (m_LockerB)
{
// It's possible this block is active in one thread
// while the block above is active in another
// Do something with the dictionary
}
Option A is equivalent to Option B if you use only one secondary object. As far as reading code, Option B's intent is clearer. If you're protecting more than one object, Option B isn't really an option.
It's important to understand that lock(m_Hash) does NOT prevent other code from using the hash. It only prevents other code from running that is also using m_Hash as its locking object.
One reason to use option A is because classes are likely to have private variables that you will use inside the lock statement. It is much easier to just use one object which you use to lock access to all of them instead of trying to use finer grain locks to lock access to just the members you will need. If you try to go with the finer grained method you will probably have to take multiple locks in some situations and then you need to make sure you are always taking them in the same order to avoid deadlocks.
Another reason to use option A is because it is possible that the reference to m_Hash will be accessible outside your class. Perhaps you have a public property which supplies access to it, or maybe you declare it as protected and derived classes can use it. In either case once external code has a reference to it, it is possible that the external code will use it for a lock. This also opens up the possibility of deadlocks since you have no way to control or know what order the lock will be taken in.
Actually, it is not good idea to lock on object if you are using its members.
Jeffrey Richter wrote in his book "CLR via C#" that there is no guarantee that a class of object that you are using for synchronization will not use lock(this) in its implementation (It's interesting, but it was a recommended way for synchronization by Microsoft for some time... Then, they found that it was a mistake), so it is always a good idea to use a special separate object for synchronization. So, as you can see OptionB will not give you a guarantee of deadlock - safety.
So, OptionA is much safer that OptionB.
It's not what you're "Locking", its the code that's contained between the lock { ... } thats important and that you're preventing from being executed.
If one thread takes out a lock() on any object, it prevents other threads from obtaining a lock on the same object, and hence prevents the second thread from executing the code between the braces.
So that's why most people just create a junk object to lock on, it prevents other threads from obtaining a lock on that same junk object.
I think the scope of the variable you "pass" in will determine the scope of the lock.
i.e. An instance variable will be in respect of the instance of the class whereas a static variable will be for the whole AppDomain.
Looking at the implementation of the collections (using Reflector), the pattern seems to follow that an instance variable called SyncRoot is declared and used for all locking operations in respect of the instance of the collection.
Well, it depends on what you wanted to lock(be made threadsafe).
Normally I would choose OptionB to provide threadsafe access to m_Hash ONLY. Where as OptionA, I would used for locking value type, which can't be used with the lock, or I had a group of objects that need locking concurrently, but I don't what to lock the whole instance by using lock(this)
Locking the object that you're using is simply a matter of convenience. An external lock object can make things simpler, and is also needed if the shared resource is private, like with a collection (in which case you use the ICollection.SyncRoot object).
OptionA is the way to go here as long as in all your code, when accessing the m_hash you use the m_Locker to lock on it.
Now Imagine this case. You lock on the object. And that object in one of the functions you call has a lock(this) code segment. In this case that is a sure unrecoverable deadlock

Categories