I'm looking at some code that I don't understand the point of.
private object myProperty_lock = new Object();
private SomeType myProperty_backing;
public SomeType MyProperty
{
get { lock(myProperty_lock) { return myProperty_backing; } }
set { lock(myProperty_lock) { myProperty_backing = value; } }
}
This pattern is used many times within the same class.
Each time this pattern is used, there's a new lock object. (It's not a shared lock object for all properties.)
The types used are reference types and primitives. (No non-primitive structs.)
Does this code do anything? References & primitives are assigned atomically, so we don't need to protect against a thread switch in the middle of the assignment. The lock object isn't used anywhere else, so there's no protection there.
Is there something with memory barriers, perhaps? I had assumed that a lock inside a method didn't affect things outside of that method.
The fact that code is inside a method does not imply a memory barrier. So you may be on the right track for suspecting that the locks are for that fresh read memory guarantees.
Of course it also could have been added due to the person adding it was a cargo cult programmer and did not understand why to do it and only did it because he saw a code example that does it.
The problem I see here is that by using lock the developer indicates a concern regarding thread safety. They thought that concurrent threads might be accessing this property.
My first question would be whether that's actually the case - is there concurrent access to this property?
There might be a valid scenario, but is there a reason why any number of threads might be able to set that reference? What sort of logic is happening if one thread sets the property, presumably for some valid reason, only to have it immediately overwritten by another thread? How is the application doing something predictable? Did the reference set by the previous caller just not matter? Then why did it set the property?
And what about the object - SomeType - returned from the property? Now any number of threads can have a reference to the same instance. Can SomeType can be altered, and if so, is it thread safe?
I normally wouldn't wonder, but when I see something that looks odd with multithreading I like to dig a little deeper. Maybe they have it all patched together and it works, but sometimes they don't.
Related
I have been learning about locking on threads and I have not found an explanation for why creating a typical System.Object, locking it and carrying out whatever actions are required during the lock provides the thread safety?
Example
object obj = new object()
lock (obj) {
//code here
}
At first I thought that it was just being used as a place holder in examples and meant to be swapped out with the Type you are dealing with. But I find examples such as Dennis Phillips points out, doesn't appear to be anything different than actually using an instance of Object.
So taking an example of needing to update a private dictionary, what does locking an instance of System.Object do to provide thread safety as opposed to actually locking the dictionary (I know locking the dictionary in this case could case synchronization issues)?
What if the dictionary was public?
//what if this was public?
private Dictionary<string, string> someDict = new Dictionary<string, string>();
var obj = new Object();
lock (obj) {
//do something with the dictionary
}
The lock itself provides no safety whatsoever for the Dictionary<TKey, TValue> type. What a lock does is essentially
For every use of lock(objInstance) only one thread will ever be in the body of the lock statement for a given object (objInstance)
If every use of a given Dictionary<TKey, TValue> instance occurs inside a lock. And every one of those lock uses the same object then you know that only one thread at a time is ever accessing / modifying the dictionary. This is critical to preventing multiple threads from reading and writing to it at the same time and corrupting its internal state.
There is one giant problem with this approach though: You have to make sure every use of the dictionary occurs inside a lock and it uses the same object. If you forget even one then you've created a potential race condition, there will be no compiler warnings and likely the bug will remain undiscovered for some time.
In the second sample you showed you're using a local object instance (var indicates a method local) as a lock parameter for an object field. This is almost certainly the wrong thing to do. The local will live only for the lifetime of the method. Hence 2 calls to the method will use lock on different locals and hence all methods will be able to simultaneously enter the lock.
It used to be common practice to lock on the shared data itself:
private Dictionary<string, string> someDict = new Dictionary<string, string>();
lock (someDict )
{
//do something with the dictionary
}
But the (somewhat theoretical) objection is that other code, outside of your control, could also lock on someDict and then you might have a deadlock.
So it is recommended to use a (very) private object, declared in 1-to-1 correspondence with the data, to use as a stand-in for the lock. As long as all code that accesses the dictionary locks on on obj the tread-safety is guaranteed.
// the following 2 lines belong together!!
private Dictionary<string, string> someDict = new Dictionary<string, string>();
private object obj = new Object();
// multiple code segments like this
lock (obj)
{
//do something with the dictionary
}
So the purpose of obj is to act as a proxy for the dictionary, and since its Type doesn't matter we use the simplest type, System.Object.
What if the dictionary was public?
Then all bets are off, any code could access the Dictionary and code outside the containing class is not even able to lock on the guard object. And before you start looking for fixes, that simply is not an sustainable pattern. Use a ConcurrentDictionary or keep a normal one private.
The object which is used for locking does not stand in relation to the objects that are modified during the lock. It could be anything, but should be private and no string, as public objects could be modified externally and strings could be used by two locks by mistake.
So far as I understand it, the use of a generic object is simply to have something to lock (as an internally lockable object). To better explain this; say you have two methods within a class, both access the Dictionary, but may be running on different threads. To prevent both methods from modifying the Dictionary at the same time (and potentially causing deadlock), you can lock some object to control the flow. This is better illustrated by the following example:
private readonly object mLock = new object();
public void FirstMethod()
{
while (/* Running some operations */)
{
// Get the lock
lock (mLock)
{
// Add to the dictionary
mSomeDictionary.Add("Key", "Value");
}
}
}
public void SecondMethod()
{
while (/* Running some operation */)
{
// Get the lock
lock (mLock)
{
// Remove from dictionary
mSomeDictionary.Remove("Key");
}
}
}
The use of the lock(...) statement in both methods on the same object prevents the two methods from accessing the resource at the same time.
The important rules for the object you lock on are:
It must be an object visible only to the code that needs to lock on it. This avoids other code also locking on it.
This rules out strings that could be interned, and Type objects.
This rules out this in most cases, and the exceptions are too few and offer little in exploiting, so just don't use this.
Note also that some cases internal to the framework lock on Types and this, so while "it's okay as long as nobody else does it" is true, but it's already too late.
It must be static to protect static static operations, it may be instance to protect instance operations (including those internal to a instance that is held in a static).
You don't want to lock on a value-type. If you really wanted too you could lock on a particular boxing of it, but I can't think of anything that this would gain beyond proving that it's technically possible - it's still going to lead to the code being less clear as to just what locks on what.
You don't want to lock on a field that you may change during the lock being held, as you'll no longer have the lock on what you appear to have the lock on (it's just about plausible that there's a practical use for the effect of this, but there's going to be an impedance between what the code appears to do at first read and what it really does, which is never good).
The same object must be used to lock on all operations that may conflict with each other.
While you can have correctness with overly-broad locks, you can get better performance with finer. E.g. if you had a lock that was protecting 6 operations, and realised that 2 of those operations couldn't interfere with the other 4, so you changed to having 2 lock objects, then you can gain by having better coherency (or crash-and-burn if you were wrong in that analysis!)
The first point rules out locking on anything that is either visible or which could be made visible (e.g. a private instance that is returned by a protected or public member should be considered public as far as this analysis goes, anything captured by a delegate could end up elsewhere, and so on).
The last two points can mean that there's no obvious "type you are dealing with" as you put it, because locks don't protect objects, the protect operations done on objects and you may either have more than one object affected, or the same object affected by more than one group of operations that must be locked.
Hence it can be good practice to have an object that exists purely to lock on. Since it's doing nothing else, it can't get mixed up with other semantics or written over when you don't expect. And since it does nothing else it may as well be the lightest reference type that exists in .NET; System.Object.
Personally, I do prefer to lock on an object related to an operation when it does clearly fit the bill of the "type you are dealing with", and none of the other concerns apply, as it seems to me to be quite self-documenting, but to others the risk of doing it wrong out-weighs that benefit.
I was trying to lock a Boolean variable when I encountered the following error :
'bool' is not a reference type as required by the lock statement
It seems that only reference types are allowed in lock statements, but I'm not sure I understand why.
Andreas is stating in his comment:
When [a value type] object is passed from one thread to the other, a copy is made, so the threads end up working on 2 different objects, which is safe.
Is it true? Does that mean that when I do the following, I am in fact modifying two different x in the xToTrue and the xToFalse method?
public static class Program {
public static Boolean x = false;
[STAThread]
static void Main(string[] args) {
var t = new Thread(() => xToTrue());
t.Start();
// ...
xToFalse();
}
private static void xToTrue() {
Program.x = true;
}
private static void xToFalse() {
Program.x = false;
}
}
(this code alone is clearly useless in its state, it is only for the example)
P.S: I know about this question on How to properly lock a value type. My question is not related to the how but to the why.
Just a wild guess here...
but if the compiler let you lock on a value type, you would end up locking nothing at all... because each time you passed the value type to the lock, you would be passing a boxed copy of it; a different boxed copy. So the locks would be as if they were entirely different objects. (since, they actually are)
Remember that when you pass a value type for a parameter of type object, it gets boxed (wrapped) into a reference type. This makes it a brand-new object each time this happens.
You cannot lock a value type because it doesn't have a sync root record.
Locking is performed by CLR and OS internals mechanisms that rely upon an object having a record that can only be accessed by a single thread at a time - sync block root. Any reference type would have:
Pointer to a type
Sync block root
Pointer to the instance data in heap
It expands to:
System.Threading.Monitor.Enter(x);
try {
...
}
finally {
System.Threading.Monitor.Exit(x);
}
Although they would compile, Monitor.Enter/Exit require a reference type because a value type would be boxed to a different object instance each time so each call to Enter and Exit would be operating on different objects.
From the MSDN Enter method page:
Use Monitor to lock objects (that is, reference types), not value types. When you pass a value type variable to Enter, it is boxed as an object. If you pass the same variable to Enter again, it is boxed as a separate object, and the thread does not block. In this case, the code that Monitor is supposedly protecting is not protected. Furthermore, when you pass the variable to Exit, still another separate object is created. Because the object passed to Exit is different from the object passed to Enter, Monitor throws SynchronizationLockException. For more information, see the conceptual topic Monitors.
If you're asking conceptually why this isn't allowed, I would say the answer stems from the fact that a value type's identity is exactly equivalent to its value (that's what makes it a value type).
So anyone anywhere in the universe talking about the int 4 is talking about the same thing - how then can you possibly claim exclusive access to lock on it?
I was wondering why the .Net team decided to limit developers and allow Monitor operate on references only. First, you think it would be good to lock against a System.Int32 instead of defining a dedicated object variable just for locking purpose, these lockers don't do anything else usually.
But then it appears that any feature provided by the language must have strong semantics not just be useful for developers. So semantics with value-types is that whenever a value-type appears in code its expression is evaluated to a value. So, from semantic point of view, if we write `lock (x)' and x is a primitive value type then it's the same as we would say "lock a block of critical code agaist the value of the variable x" which sounds more than strange, for sure :). Meanwhile, when we meet ref variables in code we are used to think "Oh, it's a reference to an object" and imply that the reference can be shared between code blocks, methods, classes and even threads and processes and thus can serve as a guard.
In two words, value type variables appear in code only to be evaluated to their actual value in each and every expression - nothing more.
I guess that's one of the main points.
Because value types don't have the sync block that the lock statement uses to lock on an object. Only reference types carry the overhead of the type info, sync block etc.
If you box your reference type then you now have an object containing the value type and can lock on that object (I expect) since it now has the extra overhead that objects have (a pointer to a sync block that is used for locking, a pointer to the type information etc). As everyone else is stating though - if you box an object you will get a NEW object every time you box it so you will be locking on different objects every time - which completely defeats the purpose of taking a lock.
This would probably work (although it's completely pointless and I haven't tried it)
int x = 7;
object boxed = (object)x;
//thread1:
lock (boxed){
...
}
//thread2:
lock(boxed){
...
}
As long as everyone uses boxed and the object boxed is only set once you would probably get correct locking since you are locking on the boxed object and it's only being created once. DON'T do this though.. it's just a thought exercise (and might not even work - like I said, I haven't tested it ).
As to your second question - No, the value is not copied for each thread. Both threads will be using the same boolean, but the threads are not guaranteed to see the freshest value for it (when one thread sets the value it might not get written back to the memory location immediately, so any other thread reading the value would get an 'old' result).
The following is taken from MSDN:
The lock (C#) and SyncLock (Visual Basic) statements can be used to ensure that a block of code runs to completion without interruption by other threads. This is accomplished by obtaining a mutual-exclusion lock for a given object for the duration of the code block.
and
The argument provided to the lock keyword must be an object based on a reference type, and is used to define the scope of the lock.
I would assume that this is in part because the lock mechanism uses an instance of that object to create the mutual exclusion lock.
According to this MSDN Thread, the changes to a reference variable may not be visible to all the threads and they might end up using stale values, and AFAIK I think value types do make a copy when they are passed between threads.
To quote exactly from MSDN
It's also important to clarify that the fact the assignment is atomic
does not imply that the write is immediately observed by other
threads. If the reference is not volatile, then it's possible for
another thread to read a stale value from the reference some time
after your thread has updated it. However, the update itself is
guaranteed to be atomic (you won't see a part of the underlying
pointer getting updated).
I think this is one of those cases where the answer to why is "because a Microsoft engineer implemented it that way".
The way locking works under the hood is by creating a table of lock structures in memory and then using the objects vtable to remember the position in the table where the required lock is. This gives the appearance that every object has a lock when in fact they don't. Only those that have been locked do. As value types don't have a reference there is no vtable to store the locks position in.
Why Microsoft chose this strange way of doing things is anyone's guess. They could have made Monitor a class you had to instantiate. I'm sure I have seen an article by an MS employee that said that on reflection this design pattern was a mistake, but I can't seem to find it now.
Is it necessary to protect access to a single variable of a reference type in a multi-threaded application? I currently lock that variable like this:
private readonly object _lock = new object();
private MyType _value;
public MyType Value
{
get { lock (_lock) return _value; }
set { lock (_lock) _value = value; }
}
But I'm wondering if this is really necessary? Isn't assignment of a value to a field atomic? Can anything go wrong if I don't lock in this case?
P.S.: MyType is an immutable class: all the fields are set in the constructor and don't change. To change something, a new instance is created and assigned to the variable above.
Being atomic is rarely enough.
I generally want to get the latest value for a variable, rather than potentially see a stale one - so some sort of memory barrier is required, both for reading and writing. A lock is a simple way to get this right, at the cost of potentially losing some performance due to contention.
I used to believe that making the variable volatile would be enough in this situation. I'm no longer convinced this is the case. Basically I now try to avoid writing lock-free code when shared data is involved, unless I'm able to use building blocks written by people who really understand these things (e.g. Joe Duffy).
There is the volatile keyword for this. Whether it's safe without it depends on the scenario. But the compiler can do funny stuff, such as reorganize order of operation. So even read/write to one field may be unsafe.
It can be an issue. It's not just the assignment itself you have to be concerned with. Due to caching, concurrent threads might see an old version of the object if you don't lock. So whether a lock is necessary will depend on precisely how you use it, and you don't show that.
Here's a free, sample chapter of "Concurrent Programming in Windows" which explains this issue in detail.
it all depends on whether the property will be accessed by multiple threads. and some variable is said to be atomic operation, in this atomic operation case, no need to use lock. sorry for poor english.
in you case, immutable, i think lock is not necessary.
I use a factory pattern to create a custom object which is loaded from a cache if possible.
There are no static members or functions on the custom object.
Assuming 2 threads call the factory and are both returned references to the same object from the cache. (i.e. No new operator, in ref to answer below, object returned from a collection)
If I want to modify a private instance member on inside the class:
a) Shoulb I lock it first?
b) Will the change be reflected in both threads?
I assume yes for both questions, but at the same time it feels like the threads have different instances of the class.
Have I must something fundamental here? Why do I feel like I have?
===============
Following the first few answers I have confirmed what I thought, thanks.
I guess what I really want to know is, if the objects are pretty much only read-only, i.e. after being created they only have one instance member that can change, do I need to do any locks when reading properties that are not affected by that one changeable instance member?
Again I assume no, but I have to come to value the second-opinion of the collective StackOverflow brains trust :)
Reading and writing of most primitive types, string and object references is atomic (everything except non-volatile longs and doubles). This means locks are not necessary if you are simply reading or writing a field or other variable in isolation. So, if you ever see code like this (and I guaranteee you will), you can happily strip out the unnecessary and costly locking:
public SomeClass SomeProperty
{
get
{
lock (someLock)
{
return someField;
}
}
set
{
lock (someLock)
{
someField = value;
}
}
}
If, however, you want to change the value, increasing it by one for example, or if you want to read multiple, related variables then you need to lock if you want sensible results.
It is also worth noting that the Hashtable and Dictionary support mulitple concurrent readers without the need for any kind of locking.
You are assuming that the threads have different instances of the object, which is incorrect. The threads are pointing to the same object; therefore, you have to synchronize access to any members that might be accessed by multiple threads, whether it be through property accesses, method calls, etc, etc.
It should be the same object, and the answer to your questions is YES, YES.
How do you check that there are 2 instances?
Yes, you should lock before setting the private member. This will be reflected in both threads (since they're the same object).
However, you may want to take some time to consider how to handle the locks and threading.
If your factory is creating the object and adding it to the collection, it will need a lock of some form around the routine to create the object. Otherwise, it's possible that the two threads could ask for the object at the same time, before it was created, and create 2 instances.
Also, I wouldn't recommend setting the private member directly - it would be a good idea to have a single object inside of the returned class instance used just for locking, and use some form of accessor (method or property) that locks on that synchronization object and then sets the field.
As for not "trusting" that its two instances - you can always just do something to check in the debugger. Add a check for reference equality, or even temporarily add a GUID to your class that's setup at construction - it's easy to verify that they're the same that way.
If the new operator or the constructor is getting called twice on the object, then you have yourself two instances of the same class.
Do you need both threads to see the results of changing the private member? Or is it perfectly OK with you if both threads get the object, but when one thread changes a private member, the other thread doesn't see that change?
Here's why I ask: In your factory, you could take the object from cache and then clone or copy it before returning it. As a result, each thread would have its own copy of the class, and that copy would have the state that was in the cached object at the time they asked for it. No 2 threads would ever share the exact same instance, because you're making a copy for each thread.
If that fits your requirements, then you can avoid having to lock the object (you will need synchronization on the factory method itself though ... which you also need in your current case).
Be careful with the idea of a "clone", however. If you have a graph of objects and you only do a shallow clone, you'll wind up with references that are still shared across threads, forcing you to need synchronization again. This idea of one-copy-per-thread makes best sense if the object is a very simple object without a lot of references to other objects.
I'm using C# & .NEt 3.5. What is the difference between the OptionA and OptionB ?
class MyClass
{
private object m_Locker = new object();
private Dicionary<string, object> m_Hash = new Dictionary<string, object>();
public void OptionA()
{
lock(m_Locker){
// Do something with the dictionary
}
}
public void OptionB()
{
lock(m_Hash){
// Do something with the dictionary
}
}
}
I'm starting to dabble in threading (primarly for creating a cache for a multi-threaded app, NOT using the HttpCache class, since it's not attached to a web site), and I see the OptionA syntax in a lot of the examples I see online, but I don't understand what, if any, reason that is done over OptionB.
Option B uses the object to be protected to create a critical section. In some cases, this more clearly communicates the intent. If used consistently, it guarantees only one critical section for the protected object will be active at a time:
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
Option A is less restrictive. It uses a secondary object to create a critical section for the object to be protected. If multiple secondary objects are used, it's possible to have more than one critical section for the protected object active at a time.
private object m_LockerA = new object();
private object m_LockerB = new object();
lock (m_LockerA)
{
// It's possible this block is active in one thread
// while the block below is active in another
// Do something with the dictionary
}
lock (m_LockerB)
{
// It's possible this block is active in one thread
// while the block above is active in another
// Do something with the dictionary
}
Option A is equivalent to Option B if you use only one secondary object. As far as reading code, Option B's intent is clearer. If you're protecting more than one object, Option B isn't really an option.
It's important to understand that lock(m_Hash) does NOT prevent other code from using the hash. It only prevents other code from running that is also using m_Hash as its locking object.
One reason to use option A is because classes are likely to have private variables that you will use inside the lock statement. It is much easier to just use one object which you use to lock access to all of them instead of trying to use finer grain locks to lock access to just the members you will need. If you try to go with the finer grained method you will probably have to take multiple locks in some situations and then you need to make sure you are always taking them in the same order to avoid deadlocks.
Another reason to use option A is because it is possible that the reference to m_Hash will be accessible outside your class. Perhaps you have a public property which supplies access to it, or maybe you declare it as protected and derived classes can use it. In either case once external code has a reference to it, it is possible that the external code will use it for a lock. This also opens up the possibility of deadlocks since you have no way to control or know what order the lock will be taken in.
Actually, it is not good idea to lock on object if you are using its members.
Jeffrey Richter wrote in his book "CLR via C#" that there is no guarantee that a class of object that you are using for synchronization will not use lock(this) in its implementation (It's interesting, but it was a recommended way for synchronization by Microsoft for some time... Then, they found that it was a mistake), so it is always a good idea to use a special separate object for synchronization. So, as you can see OptionB will not give you a guarantee of deadlock - safety.
So, OptionA is much safer that OptionB.
It's not what you're "Locking", its the code that's contained between the lock { ... } thats important and that you're preventing from being executed.
If one thread takes out a lock() on any object, it prevents other threads from obtaining a lock on the same object, and hence prevents the second thread from executing the code between the braces.
So that's why most people just create a junk object to lock on, it prevents other threads from obtaining a lock on that same junk object.
I think the scope of the variable you "pass" in will determine the scope of the lock.
i.e. An instance variable will be in respect of the instance of the class whereas a static variable will be for the whole AppDomain.
Looking at the implementation of the collections (using Reflector), the pattern seems to follow that an instance variable called SyncRoot is declared and used for all locking operations in respect of the instance of the collection.
Well, it depends on what you wanted to lock(be made threadsafe).
Normally I would choose OptionB to provide threadsafe access to m_Hash ONLY. Where as OptionA, I would used for locking value type, which can't be used with the lock, or I had a group of objects that need locking concurrently, but I don't what to lock the whole instance by using lock(this)
Locking the object that you're using is simply a matter of convenience. An external lock object can make things simpler, and is also needed if the shared resource is private, like with a collection (in which case you use the ICollection.SyncRoot object).
OptionA is the way to go here as long as in all your code, when accessing the m_hash you use the m_Locker to lock on it.
Now Imagine this case. You lock on the object. And that object in one of the functions you call has a lock(this) code segment. In this case that is a sure unrecoverable deadlock