Take the following as an example
public class MyClass
{
private MyEnum _sharedEnumVal { get; set; }
}
If methods within MyClass ran on different threads and read/updated _sharedEnumVal, am I right in saying that a lock, or other mechanism, would be required to keep the variable thread safe like other primitives or are enums special?
Thanks
Thread-safety is a tricky subject. The updates to the enum are always atomic. So even if thousands of threads try to update the same enum at once, you will never get an invalid, half-updated enum value. The value itself will always be valid. But even when you update the enum it is never guaranteed that other threads would read the "latest" value due to cache-incoherency between multiple cores. To ensure that all cores are synchronized you would need a memory barrier.
But even that is not the guarantee of thread-safety because data races can still happen. Say you have this logic somewhere in your class:
public void DoSomething()
{
if (_sharedEnumVal == MyEnum.First) {
DoPrettyThings();
} else {
DoUglyThings();
}
}
public void UpdateValue(MyEnum newValue)
{
_sharedEnumVal = newValue;
}
and you have these two different threads:
static MyClass threadSafeClass = new MyClass();
void ThreadOne()
{
while (true)
{
threadSafeClass.UpdateValue(MyEnum.Second);
DoSomething();
}
}
void ThreadTwo()
{
while (true)
{
threadSafeClass.UpdateValue(MyEnum.First);
DoSomething();
}
}
Here, although the updates to the enum are atomic, two threads will be "racing" to change and use enum value to their own purposes and when DoSomething is called, there is no guarantee what value the enum would have. You would get completely unexpected results. ThreadTwo might cause pretty things and ThreadOne would cause ugly things to happen, the exact opposite of what's expected.
In that case you would still need locking to ensure thread-safety of the class behavior.
I failed to understand, why this topic was downvoted:).
There are some good points and some bad ideas and some even upvoted here!
So let's sort the bits.
The question here is actually about atomicity.
If the operation is atomic, then it is inherently thread-safe without locking for some operations like read/write and other operations allowed thanks to Interlocked class for given type.
Now, .Net is stating, that int read/write is atomic. Same for all types that fit into 32bit's, 64bit types are not atomic! read/write of the object reference is atomic too.
Some operations are atomic, some not, like increment, unless you are calling Interlocked.Increment.
Now why I talk about int? Well by default, enum is of type int, 32bit, unless explicitly specified otherwise.
That means, that reading/writing is atomic => thread-safe.
Btw, it is usually a bad idea to keep a naked property, I would rather use variable behind the property and play with the variable because it is necessary to use Interlocked methods.
There are many useful ways where atomicity is good enough guarantee to work with without locking. For example background thread status. Or a property that allowing background workers to work, until it is changed to some expected value, providing info for background workers to stop etc.
Also, Interlocked class is extending these scenarios for shared iterating variable and many more.
As Chris Hannon noted, the simple read/write can lead to the stale as data won't be updated unless specifically read/write operations would be decorated by memory barrier or Interlocked operations would be used, Interlocked.Add for reading, interlocked.CompareExchange for writing, where caches will be updated.
Thanks to Chris for good point I missed!
Related
I have a few general questions when dealing with threads. I have been looking around but haven't really seen any answers to my questions
When dealing with multiple variables in a class you want to be thread safe, are you supposed to have one "lock object" for every variable you want to lock in the class? Like this?
static readonly object lockForVarA = new object();
private float varA;
static readonly object lockForVarB = new object();
private float varB;
Also is this a valid way to handle thread safing a custom type?
public class SomeClass
{
public SomeClass()
{
//Do some kind of work IE load an assembly
}
}
public class SomeOtherClass : BaseClassFiringFromRandomThread
{
static readonly object someClassLock = new object();
SomeClass someClass;
public override void Init()//this is fired from any available thread, can be fired multiple times and even at the same time
{
lock(someClassLock)
{
if(someClass == null)
someClass = new SomeClass();
}
}
}
This code is in the constructor of a class that can be called from any thread at any time
When dealing with multiple variables in a class you want to be thread safe, are you supposed to have one "lock object" for every variable you want to lock in the class?
There are two rules:
Be "fine grained". Have as many locks as possible, one for each variable. Access the variable under its lock every time you use it. Lock as little code as possible to ensure scalability. If you forget to lock a variable, you'll cause a race condition, and if you get the lock ordering wrong, you'll cause a deadlock, so make sure you get it perfect.
Be "coarse-grained". Have just one lock, and put all the critical sections under that lock. Having many locks decreases contention but increases the chance of deadlocks and other errors, so have as few locks as possible, with as much code as possible in each. Of course, this also increases the risk of deadlocks since now there is lots more code inside the locks that can have inversions, and it decreases scalability.
As you have no doubt noticed, the standard advice is completely contradictory. That's because locks are terrible.
My advice: if you don't share variables across threads then you don't need to have any locks at all.
Also is this a valid way to handle thread safing a custom type?
The code looks reasonable so far, but if your intention is to lazy-load some logic then do not write your own threading logic. Just use Lazy<T> and make it do the work. It was written by experts.
Always use the highest-level tool designed by experts that is available to you. Rolling your own threading primitives is a recipe for disaster.
Whatever you do do not take the advice in the other answer that says you must use double checked locking. There are no circumstances in which you must use double-checked locking. Single checked locking is safer, easier, and more likely to be correct. Only use double-checked locking when (1) you have overwhelming empirical evidence that contention is the cause of a measurable, user-impacting performance problem that will be fixed by going low-lock, and (2) you can explain what rules in the C# memory model make double checked locking safe.
If you can't do (1) then you have no reason to do double checked locking, and if you can't do (2), you can't do it with any confidence of safety.
You need to use a double checked lock pattern. There isn't need to acquire your someClassLock lock once someClass has been initialised, and locking it there will just cause unnecessary contention.
if (someClass == null)
{
lock(someClassLock)
{
if (someClass == null)
someClass = new SomeClass();
}
}
You need the inner if block because it is possible a concurrent thread may have created someClass after the first null check but before your lock was acquired.
Of course, you need to also ensure that SomeClass is written in a way that is itself threadsafe, but this will safely ensure that only one instance of someClass is created.
An alternative method is to use Lazy<T> with a suitable LazyThreadSafetyMode.
I'm looking at some code that I don't understand the point of.
private object myProperty_lock = new Object();
private SomeType myProperty_backing;
public SomeType MyProperty
{
get { lock(myProperty_lock) { return myProperty_backing; } }
set { lock(myProperty_lock) { myProperty_backing = value; } }
}
This pattern is used many times within the same class.
Each time this pattern is used, there's a new lock object. (It's not a shared lock object for all properties.)
The types used are reference types and primitives. (No non-primitive structs.)
Does this code do anything? References & primitives are assigned atomically, so we don't need to protect against a thread switch in the middle of the assignment. The lock object isn't used anywhere else, so there's no protection there.
Is there something with memory barriers, perhaps? I had assumed that a lock inside a method didn't affect things outside of that method.
The fact that code is inside a method does not imply a memory barrier. So you may be on the right track for suspecting that the locks are for that fresh read memory guarantees.
Of course it also could have been added due to the person adding it was a cargo cult programmer and did not understand why to do it and only did it because he saw a code example that does it.
The problem I see here is that by using lock the developer indicates a concern regarding thread safety. They thought that concurrent threads might be accessing this property.
My first question would be whether that's actually the case - is there concurrent access to this property?
There might be a valid scenario, but is there a reason why any number of threads might be able to set that reference? What sort of logic is happening if one thread sets the property, presumably for some valid reason, only to have it immediately overwritten by another thread? How is the application doing something predictable? Did the reference set by the previous caller just not matter? Then why did it set the property?
And what about the object - SomeType - returned from the property? Now any number of threads can have a reference to the same instance. Can SomeType can be altered, and if so, is it thread safe?
I normally wouldn't wonder, but when I see something that looks odd with multithreading I like to dig a little deeper. Maybe they have it all patched together and it works, but sometimes they don't.
Joe Albahari has a great series on multithreading that's a must read and should be known by heart for anyone doing C# multithreading.
In part 4 however he mentions the problems with volatile:
Notice that applying volatile doesn’t prevent a write followed by a
read from being swapped, and this can create brainteasers. Joe Duffy
illustrates the problem well with the following example: if Test1 and
Test2 run simultaneously on different threads, it’s possible for a and
b to both end up with a value of 0 (despite the use of volatile on
both x and y)
Followed by a note that the MSDN documentation is incorrect:
The MSDN documentation states that use of the volatile keyword ensures
that the most up-to-date value is present in the field at all times.
This is incorrect, since as we’ve seen, a write followed by a read can
be reordered.
I've checked the MSDN documentation, which was last changed in 2015 but still lists:
The volatile keyword indicates that a field might be modified by
multiple threads that are executing at the same time. Fields that are
declared volatile are not subject to compiler optimizations that
assume access by a single thread. This ensures that the most
up-to-date value is present in the field at all times.
Right now I still avoid volatile in favor of the more verbose to prevent threads using stale data:
private int foo;
private object fooLock = new object();
public int Foo {
get { lock(fooLock) return foo; }
set { lock(fooLock) foo = value; }
}
As the parts about multithreading were written in 2011, is the argument still valid today? Should volatile still be avoided at all costs in favor of locks or full memory fences to prevent introducing very hard to produce bugs that as mentioned are even dependent on the CPU vendor it's running on?
Volatile in its current implementation is not broken despite popular blog posts claiming such a thing. It is however badly specified and the idea of using a modifier on a field to specify memory ordering is not that great (compare volatile in Java/C# to C++'s atomic specification that had enough time to learn from the earlier mistakes). The MSDN article on the other hand was clearly written by someone who has no business talking about concurrency and is completely bogus.. the only sane option is to completely ignore it.
Volatile guarantees acquire/release semantics when accessing the field and can only be applied to types that allow atomic reads and writes. Not more, not less. This is enough to be useful to implement many lock-free algorithms efficiently such as non-blocking hashmaps.
One very simple sample is using a volatile variable to publish data. Thanks to the volatile on x, the assertion in the following snippet cannot fire:
private int a;
private volatile bool x;
public void Publish()
{
a = 1;
x = true;
}
public void Read()
{
if (x)
{
// if we observe x == true, we will always see the preceding write to a
Debug.Assert(a == 1);
}
}
Volatile is not easy to use and in most situations you are much better off to go with some higher level concept, but when performance is important or you're implementing some low level data structures, volatile can be exceedingly useful.
As I read the MSDN documentation, I believe it is saying that if you see volatile on a variable, you do not have to worry about compiler optimizations screwing up the value because they reorder the operations. It doesn't say that you are protected from errors caused by your own code executing operations on separate threads in the wrong order. (although admittedly, the comment is not clear as to this.)
volatile is a very limited guarantee. It means that the variable isn't subject to compiler optimizations that assume access from a single thread. This means that if you write into a variable from one thread, then read it from another thread, the other thread will definitely have the latest value. Without volatile, one a multiprocessor machine without volatile, the compiler may make assumptions about single-threaded access, for example by keeping the value in a register, which prevents other processors from having access to the latest value.
As the code example you've mentioned shows, it doesn't protect you from having methods in different blocks reordered. In effect volatile makes each individual access to a volatile variable atomic. It doesn't make any guarantees as to the atomicity of groups of such accesses.
If you just want to ensure that your property has an up-to-date single value, you should be able to just use volatile.
The problem comes in if you try to perform multiple parallel operations as if they were atomic. If you have to force several operations to be atomic together, you need to lock the whole operation. Consider the example again, but using locks:
class DoLocksReallySaveYouHere
{
int x, y;
object xlock = new object(), ylock = new object();
void Test1() // Executed on one thread
{
lock(xlock) {x = 1;}
lock(ylock) {int a = y;}
...
}
void Test2() // Executed on another thread
{
lock(ylock) {y = 1;}
lock(xlock) {int b = x;}
...
}
}
The locks cause may cause some synchronization, which may prevent both a and b from having value 0 (I have not tested this). However, since both x and y are locked independently, either a or b can still non-deterministically end up with a value of 0.
So in the case of wrapping the modification of a single variable, you should be safe using volatile, and would not really be any safer using lock. If you need to atomically perform multiple operations, you need to use a lock around the entire atomic block, otherwise scheduling will still cause non-deterministic behavior.
Here are some useful disassemblies for volatile in C#: https://sharplab.io/#gist:625b1181356b543157780baf860c9173
On x86 it is just about:
using memory instead of registers
preventing compiler optimizations like in the case with the endless loop
I use volatile when I just want to tell compiler that a field might be updated from many different threads and I do not need additional features provided by interlocked operations.
Are there overall rules/guidelines for what makes a method thread-safe? I understand that there are probably a million one-off situations, but what about in general? Is it this simple?
If a method only accesses local variables, it's thread safe.
Is that it? Does that apply for static methods as well?
One answer, provided by #Cybis, was:
Local variables cannot be shared among threads because each thread gets its own stack.
Is that the case for static methods as well?
If a method is passed a reference object, does that break thread safety? I have done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method? If so, what are they?"
EDIT
A lot of good points have been made here. I think the real answer to this question is: "There are no simple rules to ensure thread safety." Cool. Fine. But in general I think the accepted answer provides a good, short summary. There are always exceptions. So be it. I can live with that.
If a method (instance or static) only references variables scoped within that method then it is thread safe because each thread has its own stack:
In this instance, multiple threads could call ThreadSafeMethod concurrently without issue.
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number; // each thread will have its own variable for number.
number = parameter1.Length;
return number;
}
}
This is also true if the method calls other class method which only reference locally scoped variables:
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number;
number = this.GetLength(parameter1);
return number;
}
private int GetLength(string value)
{
int length = value.Length;
return length;
}
}
If a method accesses any (object state) properties or fields (instance or static) then you need to use locks to ensure that the values are not modified by a different thread:
public class Thing
{
private string someValue; // all threads will read and write to this same field value
public int NonThreadSafeMethod(string parameter1)
{
this.someValue = parameter1;
int number;
// Since access to someValue is not synchronised by the class, a separate thread
// could have changed its value between this thread setting its value at the start
// of the method and this line reading its value.
number = this.someValue.Length;
return number;
}
}
You should be aware that any parameters passed in to the method which are not either a struct or immutable could be mutated by another thread outside the scope of the method.
To ensure proper concurrency you need to use locking.
for further information see lock statement C# reference and ReadWriterLockSlim.
lock is mostly useful for providing one at a time functionality,
ReadWriterLockSlim is useful if you need multiple readers and single writers.
If a method only accesses local variables, it's thread safe. Is that it?
Absolultely not. You can write a program with only a single local variable accessed from a single thread that is nevertheless not threadsafe:
https://stackoverflow.com/a/8883117/88656
Does that apply for static methods as well?
Absolutely not.
One answer, provided by #Cybis, was: "Local variables cannot be shared among threads because each thread gets its own stack."
Absolutely not. The distinguishing characteristic of a local variable is that it is only visible from within the local scope, not that it is allocated on the temporary pool. It is perfectly legal and possible to access the same local variable from two different threads. You can do so by using anonymous methods, lambdas, iterator blocks or async methods.
Is that the case for static methods as well?
Absolutely not.
If a method is passed a reference object, does that break thread safety?
Maybe.
I've done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
You are going to have to learn to live with disappointment. This is a very difficult subject.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method?
Nope. As you saw from my example earlier an empty method can be non-thread-safe. You might as well ask "is there a short list of rules that ensures a method is correct". No, there is not. Thread safety is nothing more than an extremely complicated kind of correctness.
Moreover, the fact that you are asking the question indicates your fundamental misunderstanding about thread safety. Thread safety is a global, not a local property of a program. The reason why it is so hard to get right is because you must have a complete knowledge of the threading behaviour of the entire program in order to ensure its safety.
Again, look at my example: every method is trivial. It is the way that the methods interact with each other at a "global" level that makes the program deadlock. You can't look at every method and check it off as "safe" and then expect that the whole program is safe, any more than you can conclude that because your house is made of 100% non-hollow bricks that the house is also non-hollow. The hollowness of a house is a global property of the whole thing, not an aggregate of the properties of its parts.
There is no hard and fast rule.
Here are some rules to make code thread safe in .NET and why these are not good rules:
Function and all functions it calls must be pure (no side effects) and use local variables. Although this will make your code thread-safe, there is also very little amount of interesting things you can do with this restriction in .NET.
Every function that operates on a common object must lock on a common thing. All locks must be done in same order. This will make the code thread safe, but it will be incredibly slow, and you might as well not use multiple threads.
...
There is no rule that makes the code thread safe, the only thing you can do is make sure that your code will work no matter how many times is it being actively executed, each thread can be interrupted at any point, with each thread being in its own state/location, and this for each function (static or otherwise) that is accessing common objects.
It must be synchronized, using an object lock, stateless, or immutable.
link: http://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html
Is it necessary to protect access to a single variable of a reference type in a multi-threaded application? I currently lock that variable like this:
private readonly object _lock = new object();
private MyType _value;
public MyType Value
{
get { lock (_lock) return _value; }
set { lock (_lock) _value = value; }
}
But I'm wondering if this is really necessary? Isn't assignment of a value to a field atomic? Can anything go wrong if I don't lock in this case?
P.S.: MyType is an immutable class: all the fields are set in the constructor and don't change. To change something, a new instance is created and assigned to the variable above.
Being atomic is rarely enough.
I generally want to get the latest value for a variable, rather than potentially see a stale one - so some sort of memory barrier is required, both for reading and writing. A lock is a simple way to get this right, at the cost of potentially losing some performance due to contention.
I used to believe that making the variable volatile would be enough in this situation. I'm no longer convinced this is the case. Basically I now try to avoid writing lock-free code when shared data is involved, unless I'm able to use building blocks written by people who really understand these things (e.g. Joe Duffy).
There is the volatile keyword for this. Whether it's safe without it depends on the scenario. But the compiler can do funny stuff, such as reorganize order of operation. So even read/write to one field may be unsafe.
It can be an issue. It's not just the assignment itself you have to be concerned with. Due to caching, concurrent threads might see an old version of the object if you don't lock. So whether a lock is necessary will depend on precisely how you use it, and you don't show that.
Here's a free, sample chapter of "Concurrent Programming in Windows" which explains this issue in detail.
it all depends on whether the property will be accessed by multiple threads. and some variable is said to be atomic operation, in this atomic operation case, no need to use lock. sorry for poor english.
in you case, immutable, i think lock is not necessary.