Is locking instance or member based - c#

I have a question about locking in c#. Does c# lock an instance of an object, or the member.
If i have the following code:
lock(testVar)
{
testVar = testVar.Where(Item => Item.Value == 1).ToList();
//... do some more stuff
}
Does c# keep the lock, even i set testVar to a new value?

All C# objects inherit from System.Object, which itself always contains 4 bytes dedicated to be used when you use the syntactic sugar for lock. That's called a SyncBlock object.
When you create a new object using new, in your case, ToList which generated a new reference to a List<T>, you're actually overriding the old reference, which invalidates your lock. That means that now multiple threads could possibly be inside your lock. The compiler will transform your code into a try-finally block with an extra local variable, to avoid you from shooting your leg.
That is why the best practice is to define a dedicated private readonly variable which will act as a sync root object, instead of using a class member. That way, your intentions are clear to anyone reading your code.
Edit:
There is a nice article on MSDN which describes the objects structure in memory:
SyncTableEntry also stores a pointer to SyncBlock that contains useful
information, but is rarely needed by all instances of an object. This
information includes the object's lock, its hash code, any thunking
data, and its AppDomain index. For most object instances, there will
be no storage allocated for the actual SyncBlock and the syncblk
number will be zero. This will change when the execution thread hits
statements like lock(obj) or obj.GetHashCode.

It locks on the object that the expression (testVar) resolves to. This means that your code does have a thread race, because once the list is reassigned, other concurrent threads could be locking on the new instance.
A good rule of thumb: only ever lock on a readonly field. testVar clearly isn't... but it could be, especially if you use RemoveAll to change the existing list instead of creating a new one. This of course depends on all access to the list happening inside the lock.
Frankly, though, most code doesn't need to be thread-safe. If code does need to be thread safe, the supported use scenarios must be clearly understood by the implementer.

The lock expression translates to a try/finally expression using Monitor.Enter/Monitor.Exit.
Doing a simple test with some code similar to yours (with VS2015 Preview) you can see what the compiler translates the code to.
The code
var testVar = new List<int>();
lock (testVar)
{
testVar = new List<int>();
testVar.Add(1);
}
Is actually translated to this:
List<int> list2;
List<int> list = new List<int>();
bool lockTaken = false;
try
{
list2 = list;
Monitor.Enter(list2, ref lockTaken);
list = new List<int> { 1 };
}
finally
{
if (lockTaken)
{
Monitor.Exit(list2);
}
}
So you can see that the compiler has completely removed your variable testVar and replaced it with 2 variables, namely list and list2. Then the following happens:
list2 is initialized to list and now both references point to the same instance of List<int>.
The call Monitor.Enter(list2, ref lockTaken) associates the synchronization block in the List<int> object with the current thread.
The list variable is assigned to a new instance of List<int>, but list2 still points to the original instance that we locked against.
The lock is release using list2
So even though you think that you are changing the lock variable, you are actually not. Doing that however makes your code hard to read and confusing so you should use a dedicated lock variable as suggested by the other posts.

Related

Is iterating over an array with a for loop a thread safe operation in C# ? What about iterating an IEnumerable<T> with a foreach loop?

Based on my understanding, given a C# array, the act of iterating over the array concurrently from multiple threads is a thread safe operation.
By iterating over the array I mean reading all the positions inside the array by means of a plain old for loop. Each thread is simply reading the content of a memory location inside the array, no one is writing anything so all the threads read the same thing in a consistent manner.
This is a piece of code doing what I wrote above:
public class UselessService
{
private static readonly string[] Names = new [] { "bob", "alice" };
public List<int> DoSomethingUseless()
{
var temp = new List<int>();
for (int i = 0; i < Names.Length; i++)
{
temp.Add(Names[i].Length * 2);
}
return temp;
}
}
So, my understanding is that the method DoSomethingUseless is thread safe and that there is no need to replace the string[] with a thread safe type (like ImmutableArray<string> for instance).
Am I correct ?
Now let's suppose that we have an instance of IEnumerable<T>. We don't know what the underlying object is, we just know that we have an object implementing IEnumerable<T>, so we are able to iterate over it by using the foreach loop.
Based on my understanding, in this scenario there is no guarantee that iterating over this object from multiple threads concurrently is a thread safe operation. Put another way, it is entirely possible that iterating over the IEnumerable<T> instance from different threads at the same time breaks the internal state of the object, so that it becomes corrupted.
Am I correct on this point ?
What about the IEnumerable<T> implementation of the Array class ? Is it thread safe ?
Put another way, is the following code thread safe ? (this is exactly the same code as above, but now the array is iterated by using a foreach loop instead of a for loop)
public class UselessService
{
private static readonly string[] Names = new [] { "bob", "alice" };
public List<int> DoSomethingUseless()
{
var temp = new List<int>();
foreach (var name in Names)
{
temp.Add(name.Length * 2);
}
return temp;
}
}
Is there any reference stating which IEnumerable<T> implementations in the .NET base class library are
actually thread safe ?
Is iterating over an array with a for loop a thread safe operation in C# ?
If you're strictly talking about reading from multiple threads, that will be thread safe for Array and List<T> and just about every collection written by Microsoft, regardless of if you're using a for or foreach loop. Especially in the example you have:
var temp = new List<int>();
foreach (var name in Names)
{
temp.Add(name.Length * 2);
}
You can do that across as many threads as you want. They'll all read the same values from Names happily.
If you write to it from another thread (this wasn't your question, but it's worth noting)
Iterating over an Array or List<T> with a for loop, it'll just keep reading, and it'll happily read the changed values as you come across them.
Iterating with a foreach loop, then it depends on the implementation. If a value in an Array changes part way through a foreach loop, it will just keep enumerating and give you the changed values.
With List<T>, it depends what you consider "thread safe". If you are more concerned with reading accurate data, then it kind of is "safe" since it will throw an exception mid-enumeration and tell you that the collection changed. But if you consider throwing an exception to be not safe, then it's not safe.
But it's worth noting that this is a design decision in List<T>, there is code that explicitly looks for changes and throws an exception. Design decisions brings us to the next point:
Can we assume that every collection that implements IEnumerable is safe to read across multiple threads?
In most cases it will be, but thread-safe reading is not guaranteed. The reason is because every IEnumerable requires an implementation of IEnumerator, which decides how to traverse the items in the collection. And just like any class, you can do anything you want in there, including non-thread-safe things like:
Using static variables
Using a shared cache for reading values
Not making any effort to handle cases where the collection changes mid-enumeration
etc.
You could even do something weird like make GetEnumerator() return the same instance of your enumerator every time its called. That could really make for some unpredictable results.
I consider something to not be thread safe if it can result in unpredictable results. Any of those things could cause unpredictable results.
You can see the source code for the Enumerator that List<T> uses, so you can see that it doesn't do any of that weird stuff, which tells you that enumerating List<T> from multiple threads is safe.
To assert that your code is thread-safe means that we must take your words for granted that there is no code inside the UselessService that will try to replace concurrently the contents of the Names array with something like "tom" and "jerry" or (more sinister) null and null. On the other hand using an ImmutableArray<string> would guarantee that the code is thread-safe, and everybody could be assured about that just by looking the type of the static readonly field, without having to inspect carefully the rest of the code.
You may find interesting these comments from the source code of the ImmutableArray<T>, regarding some implementation details of this struct:
A readonly array with O(1) indexable lookup time.
This type has a documented contract of being exactly one reference-type field in size. Our own System.Collections.Immutable.ImmutableInterlocked class depends on it, as well as others externally.
IMPORTANT NOTICE FOR MAINTAINERS AND REVIEWERS:
This type should be thread-safe. As a struct, it cannot protect its own fields from being changed from one thread while its members are executing on other threads because structs can change in place simply by reassigning the field containing this struct. Therefore it is extremely important that Every member should only dereference this ONCE. If a member needs to reference the array field, that counts as a dereference of this. Calling other instance members (properties or methods) also counts as dereferencing this. Any member that needs to use this more than once must instead assign this to a local variable and use that for the rest of the code instead. This effectively copies the one field in the struct to a local variable so that it is insulated from other threads.

What is the preferred method of updating a reference to an immutable object?

In case we have an immutable object like an ImmutableList(). What is the preferred method for using this object in a multi threaded environment?
Eg
public class MutableListOfObjects()
{
private volatile ImmutableList objList;
public class MutableListOfObjects()
{
objList = new ImmutableList();
}
void Add(object o)
{
// Adding a new object to a list will create a new list to ensure immutability of lists.
// Is declaring the object as volatile enough or do we want to
// use other threading concepts?
objList = objList.Add(o);
}
// Will objList always use that lest version of the list
bool Exist(object o)
{
return objList.Exist(o);
}
}
Is declaring the reference volatile sufficient for achieving the desired behavior? Or is it preferable to use other threading functions?
"Preferred" is contextual. The simplest approach is to use a lock, and in most cases that will do the job very effectively. If you have good reason to think that lock is a problem, then Interlocked is useful:
bool retry;
do {
var snapshot = objList;
var combined = snapshot.Add(o);
retry = Interlocked.CompareExchange(ref objList, combined, snapshot)
!= snapshot;
} while(retry);
This basically works on an optimistic but checked path: most times through, it'll only go through once. Occasionally somebody will change the value of objList while we aren't looking - that's fine, we just try again.
There are, however, pre-canned implementations of thread-safe lists etc, by people who really know what they are talking about. Consider using ConcurrentBag<T> etc. Or just a List<T> with a lock.
A simple and efficient approach is to use ImmutableInterlocked.Update. You pass it a method to perform the add. It calls your add method and then atomically assigns the new value to objList if the list didn't change during the add. If the list changed, Update calls your add method again to retry. It keeps retrying until it is able to write the change.
ImmutableInterlocked.Update(ref objList, l => l.Add(o));
If you have a lot of write contention, such that you'd spend too much time on retries, then using a lock on some stable object (not objList) is preferable.
volatile will not help you in this case - it will not create the lock between reading objList, calling Add() and assigning objList. You should use a locking mechanism instead. volatile just protects against operation reallocations.
In your case you are creating a new list every time an object is added - usually a much better alternative would be to create the list inside a local thread variable (so that it is not subject to multi-threading issues) and once the list is created, mark it as immutable or create a immutable wrapper for it. This way you will get much better performance and memory usage.

Why magic does an locking an instance of System.Object allow differently than locking a specific instance type?

I have been learning about locking on threads and I have not found an explanation for why creating a typical System.Object, locking it and carrying out whatever actions are required during the lock provides the thread safety?
Example
object obj = new object()
lock (obj) {
//code here
}
At first I thought that it was just being used as a place holder in examples and meant to be swapped out with the Type you are dealing with. But I find examples such as Dennis Phillips points out, doesn't appear to be anything different than actually using an instance of Object.
So taking an example of needing to update a private dictionary, what does locking an instance of System.Object do to provide thread safety as opposed to actually locking the dictionary (I know locking the dictionary in this case could case synchronization issues)?
What if the dictionary was public?
//what if this was public?
private Dictionary<string, string> someDict = new Dictionary<string, string>();
var obj = new Object();
lock (obj) {
//do something with the dictionary
}
The lock itself provides no safety whatsoever for the Dictionary<TKey, TValue> type. What a lock does is essentially
For every use of lock(objInstance) only one thread will ever be in the body of the lock statement for a given object (objInstance)
If every use of a given Dictionary<TKey, TValue> instance occurs inside a lock. And every one of those lock uses the same object then you know that only one thread at a time is ever accessing / modifying the dictionary. This is critical to preventing multiple threads from reading and writing to it at the same time and corrupting its internal state.
There is one giant problem with this approach though: You have to make sure every use of the dictionary occurs inside a lock and it uses the same object. If you forget even one then you've created a potential race condition, there will be no compiler warnings and likely the bug will remain undiscovered for some time.
In the second sample you showed you're using a local object instance (var indicates a method local) as a lock parameter for an object field. This is almost certainly the wrong thing to do. The local will live only for the lifetime of the method. Hence 2 calls to the method will use lock on different locals and hence all methods will be able to simultaneously enter the lock.
It used to be common practice to lock on the shared data itself:
private Dictionary<string, string> someDict = new Dictionary<string, string>();
lock (someDict )
{
//do something with the dictionary
}
But the (somewhat theoretical) objection is that other code, outside of your control, could also lock on someDict and then you might have a deadlock.
So it is recommended to use a (very) private object, declared in 1-to-1 correspondence with the data, to use as a stand-in for the lock. As long as all code that accesses the dictionary locks on on obj the tread-safety is guaranteed.
// the following 2 lines belong together!!
private Dictionary<string, string> someDict = new Dictionary<string, string>();
private object obj = new Object();
// multiple code segments like this
lock (obj)
{
//do something with the dictionary
}
So the purpose of obj is to act as a proxy for the dictionary, and since its Type doesn't matter we use the simplest type, System.Object.
What if the dictionary was public?
Then all bets are off, any code could access the Dictionary and code outside the containing class is not even able to lock on the guard object. And before you start looking for fixes, that simply is not an sustainable pattern. Use a ConcurrentDictionary or keep a normal one private.
The object which is used for locking does not stand in relation to the objects that are modified during the lock. It could be anything, but should be private and no string, as public objects could be modified externally and strings could be used by two locks by mistake.
So far as I understand it, the use of a generic object is simply to have something to lock (as an internally lockable object). To better explain this; say you have two methods within a class, both access the Dictionary, but may be running on different threads. To prevent both methods from modifying the Dictionary at the same time (and potentially causing deadlock), you can lock some object to control the flow. This is better illustrated by the following example:
private readonly object mLock = new object();
public void FirstMethod()
{
while (/* Running some operations */)
{
// Get the lock
lock (mLock)
{
// Add to the dictionary
mSomeDictionary.Add("Key", "Value");
}
}
}
public void SecondMethod()
{
while (/* Running some operation */)
{
// Get the lock
lock (mLock)
{
// Remove from dictionary
mSomeDictionary.Remove("Key");
}
}
}
The use of the lock(...) statement in both methods on the same object prevents the two methods from accessing the resource at the same time.
The important rules for the object you lock on are:
It must be an object visible only to the code that needs to lock on it. This avoids other code also locking on it.
This rules out strings that could be interned, and Type objects.
This rules out this in most cases, and the exceptions are too few and offer little in exploiting, so just don't use this.
Note also that some cases internal to the framework lock on Types and this, so while "it's okay as long as nobody else does it" is true, but it's already too late.
It must be static to protect static static operations, it may be instance to protect instance operations (including those internal to a instance that is held in a static).
You don't want to lock on a value-type. If you really wanted too you could lock on a particular boxing of it, but I can't think of anything that this would gain beyond proving that it's technically possible - it's still going to lead to the code being less clear as to just what locks on what.
You don't want to lock on a field that you may change during the lock being held, as you'll no longer have the lock on what you appear to have the lock on (it's just about plausible that there's a practical use for the effect of this, but there's going to be an impedance between what the code appears to do at first read and what it really does, which is never good).
The same object must be used to lock on all operations that may conflict with each other.
While you can have correctness with overly-broad locks, you can get better performance with finer. E.g. if you had a lock that was protecting 6 operations, and realised that 2 of those operations couldn't interfere with the other 4, so you changed to having 2 lock objects, then you can gain by having better coherency (or crash-and-burn if you were wrong in that analysis!)
The first point rules out locking on anything that is either visible or which could be made visible (e.g. a private instance that is returned by a protected or public member should be considered public as far as this analysis goes, anything captured by a delegate could end up elsewhere, and so on).
The last two points can mean that there's no obvious "type you are dealing with" as you put it, because locks don't protect objects, the protect operations done on objects and you may either have more than one object affected, or the same object affected by more than one group of operations that must be locked.
Hence it can be good practice to have an object that exists purely to lock on. Since it's doing nothing else, it can't get mixed up with other semantics or written over when you don't expect. And since it does nothing else it may as well be the lightest reference type that exists in .NET; System.Object.
Personally, I do prefer to lock on an object related to an operation when it does clearly fit the bill of the "type you are dealing with", and none of the other concerns apply, as it seems to me to be quite self-documenting, but to others the risk of doing it wrong out-weighs that benefit.

Lock on an object that might change during code execution

Let's suppose I have a thread that locks on an object reference
Thread #1
lock(myObj) { ... }
later in code I have myObj = new XYZObj();
and then Thread #2 locks on it
lock(myObj) { ... }
Will this code be thread safe, if the object reference has changed? When the object reference changes, the first lock is still valid?
Locks work on instances, not variables.
The lock statement will hold its own reference to the instance so that it will only exit the instance you entered.
The spec says:
where x is an expression of a reference-type, is precisely equivalent
to
System.Threading.Monitor.Enter(x);
try {
...
}
finally {
System.Threading.Monitor.Exit(x);
}
except that x is only evaluated once.
If you re-assign the variable between the two locks, you will get two valid locks on two different instances.
In general, however, you should never do that; it's a recipe for subtle bugs and race conditions.
You should only lock on dedicated readonly lock objects.
No. They will both be locking on different objects.
According to MSDN
Best practice is to define a private object to lock on, or a private
static object variable to protect data common to all instances.
Will this code be thread safe
The statement lock(myObj) { ... } is only safe until a new object reference is assigned to the myObj variable. Addition: Also, it's only safe if any data shared between threads that is used non-atomically mutating inside a lock on an object is only used non-atomically mutating inside locks on that same object.
So, every time you enter a lock for myObj, the actual referenced object is what is being used for the lock, not your variable. If you change the variable to reference a new object, then you're effectively locking different objects in different locks, which obviously isn't what you wanted. But, then again, the next time you come back to the first lock, the first and second lock object might be in sync again, and so it'll be safe again. Maybe!
As you can see, that behavior is completely broken. Is this a hypothetical question or are you really doing like that?

Difference between lock(locker) and lock(variable_which_I_am_using)

I'm using C# & .NEt 3.5. What is the difference between the OptionA and OptionB ?
class MyClass
{
private object m_Locker = new object();
private Dicionary<string, object> m_Hash = new Dictionary<string, object>();
public void OptionA()
{
lock(m_Locker){
// Do something with the dictionary
}
}
public void OptionB()
{
lock(m_Hash){
// Do something with the dictionary
}
}
}
I'm starting to dabble in threading (primarly for creating a cache for a multi-threaded app, NOT using the HttpCache class, since it's not attached to a web site), and I see the OptionA syntax in a lot of the examples I see online, but I don't understand what, if any, reason that is done over OptionB.
Option B uses the object to be protected to create a critical section. In some cases, this more clearly communicates the intent. If used consistently, it guarantees only one critical section for the protected object will be active at a time:
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
Option A is less restrictive. It uses a secondary object to create a critical section for the object to be protected. If multiple secondary objects are used, it's possible to have more than one critical section for the protected object active at a time.
private object m_LockerA = new object();
private object m_LockerB = new object();
lock (m_LockerA)
{
// It's possible this block is active in one thread
// while the block below is active in another
// Do something with the dictionary
}
lock (m_LockerB)
{
// It's possible this block is active in one thread
// while the block above is active in another
// Do something with the dictionary
}
Option A is equivalent to Option B if you use only one secondary object. As far as reading code, Option B's intent is clearer. If you're protecting more than one object, Option B isn't really an option.
It's important to understand that lock(m_Hash) does NOT prevent other code from using the hash. It only prevents other code from running that is also using m_Hash as its locking object.
One reason to use option A is because classes are likely to have private variables that you will use inside the lock statement. It is much easier to just use one object which you use to lock access to all of them instead of trying to use finer grain locks to lock access to just the members you will need. If you try to go with the finer grained method you will probably have to take multiple locks in some situations and then you need to make sure you are always taking them in the same order to avoid deadlocks.
Another reason to use option A is because it is possible that the reference to m_Hash will be accessible outside your class. Perhaps you have a public property which supplies access to it, or maybe you declare it as protected and derived classes can use it. In either case once external code has a reference to it, it is possible that the external code will use it for a lock. This also opens up the possibility of deadlocks since you have no way to control or know what order the lock will be taken in.
Actually, it is not good idea to lock on object if you are using its members.
Jeffrey Richter wrote in his book "CLR via C#" that there is no guarantee that a class of object that you are using for synchronization will not use lock(this) in its implementation (It's interesting, but it was a recommended way for synchronization by Microsoft for some time... Then, they found that it was a mistake), so it is always a good idea to use a special separate object for synchronization. So, as you can see OptionB will not give you a guarantee of deadlock - safety.
So, OptionA is much safer that OptionB.
It's not what you're "Locking", its the code that's contained between the lock { ... } thats important and that you're preventing from being executed.
If one thread takes out a lock() on any object, it prevents other threads from obtaining a lock on the same object, and hence prevents the second thread from executing the code between the braces.
So that's why most people just create a junk object to lock on, it prevents other threads from obtaining a lock on that same junk object.
I think the scope of the variable you "pass" in will determine the scope of the lock.
i.e. An instance variable will be in respect of the instance of the class whereas a static variable will be for the whole AppDomain.
Looking at the implementation of the collections (using Reflector), the pattern seems to follow that an instance variable called SyncRoot is declared and used for all locking operations in respect of the instance of the collection.
Well, it depends on what you wanted to lock(be made threadsafe).
Normally I would choose OptionB to provide threadsafe access to m_Hash ONLY. Where as OptionA, I would used for locking value type, which can't be used with the lock, or I had a group of objects that need locking concurrently, but I don't what to lock the whole instance by using lock(this)
Locking the object that you're using is simply a matter of convenience. An external lock object can make things simpler, and is also needed if the shared resource is private, like with a collection (in which case you use the ICollection.SyncRoot object).
OptionA is the way to go here as long as in all your code, when accessing the m_hash you use the m_Locker to lock on it.
Now Imagine this case. You lock on the object. And that object in one of the functions you call has a lock(this) code segment. In this case that is a sure unrecoverable deadlock

Categories