I've been raised to believe that if multiple threads can access a variable, then all reads from and writes to that variable must be protected by synchronization code, such as a "lock" statement, because the processor might switch to another thread halfway through a write.
However, I was looking through System.Web.Security.Membership using Reflector and found code like this:
public static class Membership
{
private static bool s_Initialized = false;
private static object s_lock = new object();
private static MembershipProvider s_Provider;
public static MembershipProvider Provider
{
get
{
Initialize();
return s_Provider;
}
}
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
// Perform initialization...
s_Initialized = true;
}
}
}
Why is the s_Initialized field read outside of the lock? Couldn't another thread be trying to write to it at the same time? Are reads and writes of variables atomic?
For the definitive answer go to the spec. :)
Partition I, Section 12.6.6 of the CLI spec states: "A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size is atomic when all the write accesses to a location are the same size."
So that confirms that s_Initialized will never be unstable, and that read and writes to primitve types smaller than 32 bits are atomic.
In particular, double and long (Int64 and UInt64) are not guaranteed to be atomic on a 32-bit platform. You can use the methods on the Interlocked class to protect these.
Additionally, while reads and writes are atomic, there is a race condition with addition, subtraction, and incrementing and decrementing primitive types, since they must be read, operated on, and rewritten. The interlocked class allows you to protect these using the CompareExchange and Increment methods.
Interlocking creates a memory barrier to prevent the processor from reordering reads and writes. The lock creates the only required barrier in this example.
This is a (bad) form of the double check locking pattern which is not thread safe in C#!
There is one big problem in this code:
s_Initialized is not volatile. That means that writes in the initialization code can move after s_Initialized is set to true and other threads can see uninitialized code even if s_Initialized is true for them. This doesn't apply to Microsoft's implementation of the Framework because every write is a volatile write.
But also in Microsoft's implementation, reads of the uninitialized data can be reordered (i.e. prefetched by the cpu), so if s_Initialized is true, reading the data that should be initialized can result in reading old, uninitialized data because of cache-hits (ie. the reads are reordered).
For example:
Thread 1 reads s_Provider (which is null)
Thread 2 initializes the data
Thread 2 sets s\_Initialized to true
Thread 1 reads s\_Initialized (which is true now)
Thread 1 uses the previously read Provider and gets a NullReferenceException
Moving the read of s_Provider before the read of s_Initialized is perfectly legal because there is no volatile read anywhere.
If s_Initialized would be volatile, the read of s_Provider would not be allowed to move before the read of s_Initialized and also the initialization of the Provider is not allowed to move after s_Initialized is set to true and everything is ok now.
Joe Duffy also wrote an Article about this problem: Broken variants on double-checked locking
Hang about -- the question that is in the title is definitely not the real question that Rory is asking.
The titular question has the simple answer of "No" -- but this is no help at all, when you see the real question -- which i don't think anyone has given a simple answer to.
The real question Rory asks is presented much later and is more pertinent to the example he gives.
Why is the s_Initialized field read
outside of the lock?
The answer to this is also simple, though completely unrelated to the atomicity of variable access.
The s_Initialized field is read outside of the lock because locks are expensive.
Since the s_Initialized field is essentially "write once" it will never return a false positive.
It's economical to read it outside the lock.
This is a low cost activity with a high chance of having a benefit.
That's why it's read outside of the lock -- to avoid paying the cost of using a lock unless it's indicated.
If locks were cheap the code would be simpler, and omit that first check.
(edit: nice response from rory follows. Yeh, boolean reads are very much atomic. If someone built a processor with non-atomic boolean reads, they'd be featured on the DailyWTF.)
The correct answer seems to be, "Yes, mostly."
John's answer referencing the CLI spec indicates that accesses to variables not larger than 32 bits on a 32-bit processor are atomic.
Further confirmation from the C# spec, section 5.5, Atomicity of variable references:
Reads and writes of the following data types are atomic: bool, char,
byte, sbyte, short, ushort, uint, int, float, and reference types. In
addition, reads and writes of enum types with an underlying type in
the previous list are also atomic. Reads and writes of other types,
including long, ulong, double, and decimal, as well as user-defined
types, are not guaranteed to be atomic.
The code in my example was paraphrased from the Membership class, as written by the ASP.NET team themselves, so it was always safe to assume that the way it accesses the s_Initialized field is correct. Now we know why.
Edit: As Thomas Danecker points out, even though the access of the field is atomic, s_Initialized should really be marked volatile to make sure that the locking isn't broken by the processor reordering the reads and writes.
The Initialize function is faulty. It should look more like this:
private static void Initialize()
{
if(s_initialized)
return;
lock(s_lock)
{
if(s_Initialized)
return;
s_Initialized = true;
}
}
Without the second check inside the lock it's possible the initialisation code will be executed twice. So the first check is for performance to save you taking a lock unnecessarily, and the second check is for the case where a thread is executing the initialisation code but hasn't yet set the s_Initialized flag and so a second thread would pass the first check and be waiting at the lock.
Reads and writes of variables are not atomic. You need to use Synchronisation APIs to emulate atomic reads/writes.
For an awesome reference on this and many more issues to do with concurrency, make sure you grab a copy of Joe Duffy's latest spectacle. It's a ripper!
"Is accessing a variable in C# an atomic operation?"
Nope. And it's not a C# thing, nor is it even a .net thing, it's a processor thing.
OJ is spot on that Joe Duffy is the guy to go to for this kind of info. ANd "interlocked" is a great search term to use if you're wanting to know more.
"Torn reads" can occur on any value whose fields add up to more than the size of a pointer.
An If (itisso) { check on a boolean is atomic, but even if it was not
there is no need to lock the first check.
If any thread has completed the Initialization then it will be true. It does not matter if several threads are checking at once. They will all get the same answer, and, there will be no conflict.
The second check inside the lock is necessary because another thread may have grabbed the lock first and completed the initialization process already.
You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.
That is not correct. You will still encounter the problem of a second thread passing the check before the first thread has had a chance to to set the flag which will result in multiple executions of the initialisation code.
I think you're asking if s_Initialized could be in an unstable state when read outside the lock. The short answer is no. A simple assignment/read will boil down to a single assembly instruction which is atomic on every processor I can think of.
I'm not sure what the case is for assignment to 64 bit variables, it depends on the processor, I would assume that it is not atomic but it probably is on modern 32 bit processors and certainly on all 64 bit processors. Assignment of complex value types will not be atomic.
I thought they were - I'm not sure of the point of the lock in your example unless you're also doing something to s_Provider at the same time - then the lock would ensure that these calls happened together.
Does that //Perform initialization comment cover creating s_Provider? For instance
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}
Otherwise that static property-get's just going to return null anyway.
Perhaps Interlocked gives a clue. And otherwise this one i pretty good.
I would have guessed that their not atomic.
To make your code always work on weakly ordered architectures, you must put a MemoryBarrier before you write s_Initialized.
s_Provider = new MemershipProvider;
// MUST PUT BARRIER HERE to make sure the memory writes from the assignment
// and the constructor have been wriitten to memory
// BEFORE the write to s_Initialized!
Thread.MemoryBarrier();
// Now that we've guaranteed that the writes above
// will be globally first, set the flag
s_Initialized = true;
The memory writes that happen in the MembershipProvider constructor and the write to s_Provider are not guaranteed to happen before you write to s_Initialized on a weakly ordered processor.
A lot of thought in this thread is about whether something is atomic or not. That is not the issue. The issue is the order that your thread's writes are visible to other threads. On weakly ordered architectures, writes to memory do not occur in order and THAT is the real issue, not whether a variable fits within the data bus.
EDIT: Actually, I'm mixing platforms in my statements. In C# the CLR spec requires that writes are globally visible, in-order (by using expensive store instructions for every store if necessary). Therefore, you don't need to actually have that memory barrier there. However, if it were C or C++ where no such guarantee of global visibility order exists, and your target platform may have weakly ordered memory, and it is multithreaded, then you would need to ensure that the constructors writes are globally visible before you update s_Initialized, which is tested outside the lock.
What you're asking is whether accessing a field in a method multiple times atomic -- to which the answer is no.
In the example above, the initialise routine is faulty as it may result in multiple initialization. You would need to check the s_Initialized flag inside the lock as well as outside, to prevent a race condition in which multiple threads read the s_Initialized flag before any of them actually does the initialisation code. E.g.,
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}
Ack, nevermind... as pointed out, this is indeed incorrect. It doesn't prevent a second thread from entering the "initialize" code section. Bah.
You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.
Related
So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.
However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory.
So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other threads/processors. In particular, this depends on the time the cache is flushed (if at all). I remember having read a comment from Eric Lippert saying something along the lines of "the presence of volatile fields automatically disables cache optimizations". But I'm not sure what exactly this means. Does it mean caching is completely disabled for the whole program just because we have a single volatile field somewhere? If not, what is the granularity the cache is disabled for?
Also, I read something about strong and weak volatile semantics and that C# follows the strong semantics where every write will always go straight to main memory no matter if it's a volatile field or not. I am very confused about all of this.
I'll address the last question first. Microsoft's .NET implementation has release semantics on writes1. It's not C# per se, so the same program, no matter the language, in a different implementation can have weak non-volatile writes.
The visibility of side-effects is regarding multiple threads. Forget about CPUs, cores and caches. Imagine, instead, that each thread has a snapshot of what is on the heap that requires some sort of synchronization to communicate side-effects between threads.
So, what does C# say? The C# language specification (newer draft) says fundamentally the same as the Common Language Infrastructure standard (CLI; ECMA-335 and ISO/IEC 23271) with some differences. I'll talk about them later on.
So, what does the CLI say? That only volatile operations are visible side-effects.
Note that it also says that non-volatile operations on the heap are side-effects as well, but not guaranteed to be visible. Just as important2, it doesn't state they're guaranteed to not be visible either.
What exactly happens on volatile operations? A volatile read has acquire semantics, it precedes any following memory reference. A volatile write has release semantics, it follows any preceding memory reference.
Acquiring a lock performs a volatile read, and releasing a lock performs a volatile write.
Interlocked operations have acquire and release semantics.
There's another important term to learn, which is atomicity.
Reads and writes, volatile or not, are guaranteed to be atomic on primitive values up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They're also guaranteed to be atomic for references. For other types, such as long structs, the operations are not atomic, they may require multiple, independent memory accesses.
However, even with volatile semantics, read-modify-write operations, such as v += 1 or the equivalent ++v (or v++, in terms of side-effects) , are not atomic.
Interlocked operations guarantee atomicity for certain operations, typically addition, subtraction and compare-and-swap (CAS), i.e. write some value if and only if the current value is still some expected value. .NET also has an atomic Read(ref long) method for integers of 64 bits which works even in 32-bit architectures.
I'll keep referring to acquire semantics as volatile reads and release semantics as volatile writes, and either or both as volatile operations.
What does this all mean in terms of order?
That a volatile read is a point before which no memory references may cross, and a volatile write is a point after which no memory references may cross, both at the language level and at the machine level.
That non-volatile operations may cross to after following volatile reads if there are no volatile writes in between, and cross to before preceding volatile writes if there are no volatile reads in between.
That volatile operations within a thread are sequential and may not be reordered.
That volatile operations in a thread are made visible to all other threads in the same order. However, there is no total order of volatile operations from all threads, i.e. if one threads performs V1 and then V2, and another thread performs V3 and then V4, then any order that has V1 before V2 and V3 before V4 can be observed by any thread. In this case, it can be either of the following:
V1 V2 V3 V4
V1 V3 V2 V4
V1 V3 V4 V2
V3 V1 V2 V4
V3 V1 V4 V2
V3 V4 V1 V2
That is, any possible order of observed side-effects are valid for any thread for a single execution. There is no requirement on total ordering, such that all threads observe only one of the possible orders for a single execution.
How are things synchronized?
Essentially, it boils down to this: a synchronization point is where you have a volatile read that happens after a volatile write.
In practice, you must detect if a volatile read in one thread happened after a volatile write in another thread3. Here's a basic example:
public class InefficientEvent
{
private volatile bool signalled = false;
public Signal()
{
signalled = true;
}
public InefficientWait()
{
while (!signalled)
{
}
}
}
However generally inefficient, you can run two different threads, such that one calls InefficientWait() and another one calls Signal(), and the side-effects of the latter when it returns from Signal() become visible to the former when it returns from InefficientWait().
Volatile accesses are not as generally useful as interlocked accesses, which are not as generally useful as synchronization primitives. My advice is that you should develop code safely first, using synchronization primitives (locks, semaphores, mutexes, events, etc.) as needed, and if you find reasons to improve performance based on actual data (e.g. profiling), then and only then see if you can improve.
If you ever reach high contention for fast locks (used only for a few reads and writes without blocking), depending on the amount of contention, switching to interlocked operations may either improve or decrease performance. Especially so when you have to resort to compare-and-swap cycles, such as:
var currentValue = Volatile.Read(ref field);
var newValue = GetNewValue(currentValue);
var oldValue = currentValue;
var spinWait = new SpinWait();
while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue)
{
spinWait.SpinOnce();
newValue = GetNewValue(currentValue);
oldValue = currentValue;
}
Meaning, you have to profile the solution as well and compare with the current state. And be aware of the A-B-A problem.
There's also SpinLock, which you must really profile against monitor-based locks, because although they may make the current thread yield, they don't put the current thread to sleep, akin to the shown usage of SpinWait.
Switching to volatile operations is like playing with fire. You must make sure through analytical proof that your code is correct, otherwise you may get burned when you least expect.
Usually, the best approach for optimization in the case of high contention is to avoid contention. For instance, to perform a transformation on a big list in parallel, it's often better to divide and delegate the problem to multiple work items that generate results which are merged in a final step, rather than having multiple threads locking the list for updates. This has a memory cost, so it depends on the length of the data set.
What are the differences between the C# specification and the CLI specification regarding volatile operations?
C# specifies side-effects, not mentioning their inter-thread visibility, as being a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.
C# specifies critical execution points at which these side-effects are preserved between threads: references to volatile fields, lock statements, and thread creation and termination.
If we take critical execution points as points where side-effects become visible, it adds to the CLI specification that thread creation and termination are visible side-effects, i.e. new Thread(...).Start() has release semantics on the current thread and acquire semantics at the start of the new thread, and exiting a thread has release semantics on the current thread and thread.Join() has acquire semantics on the waiting thread.
C# doesn't mention volatile operations in general, such as performed by classes in System.Threading instead of only through using fields declared as volatile and using the lock statement. I believe this is not intentional.
C# states that captured variables can be simultaneously exposed to multiple threads. The CIL doesn't mention it, because closures are a language construct.
1.
There are a few places where Microsoft (ex-)employees and MVPs state that writes have release semantics:
Memory Model, by Chris Brumme
Memory Models, Understand the Impact of Low-Lock Techniques in Multithreaded Apps, by Vance Morrison
CLR 2.0 memory model, by Joe Duffy
Which managed memory model?, by Eric Eilebrecht
C# - The C# Memory Model in Theory and Practice, Part 2, by Igor Ostrovsky
In my code, I ignore this implementation detail. I assume non-volatile writes are not guaranteed to become visible.
2.
There is a common misconception that you're allowed to introduce reads in C# and/or the CLI.
The problem with being second, by Grant Richins
Comments on The CLI memory model, and specific specifications, by Jon Skeet
C# - The C# Memory Model in Theory and Practice, Part 2, by Igor Ostrovsky
However, that is true only for local arguments and variables.
For static and instance fields, or arrays, or anything on the heap, you cannot sanely introduce reads, as such introduction may break the order of execution as seen from the current thread of execution, either from legitimate changes in other threads, or from changes through reflection.
That is, you can't turn this:
object local = field;
if (local != null)
{
// code that reads local
}
into this:
if (field != null)
{
// code that replaces reads on local with reads on field
}
if you can ever tell the difference. Specifically, a NullReferenceException being thrown by accessing local's members.
In the case of C#'s captured variables, they're equivalent to instance fields.
It's important to note that the CLI standard:
says that non-volatile accesses are not guaranteed to be visible
doesn't say that non-volatile accesses are guaranteed to not be visible
says that volatile accesses affect the visibility of non-volatile accesses
But you can turn this:
object local2 = local1;
if (local2 != null)
{
// code that reads local2 on the assumption it's not null
}
into this:
if (local1 != null)
{
// code that replaces reads on local2 with reads on local1,
// as long as local1 and local2 have the same value
}
You can turn this:
var local = field;
local?.Method()
into this:
var local = field;
var _temp = local;
(_temp != null) ? _temp.Method() : null
or this:
var local = field;
(local != null) ? local.Method() : null
because you can't ever tell the difference. But again, you cannot turn it into this:
(field != null) ? field.Method() : null
I believe it was prudent in both specifications stating that an optimizing compiler may reorder reads and writes as long as a single thread of execution observes them as written, instead of generally introducing and eliminating them altogether.
Note that read elimination may be performed by either the C# compiler or the JIT compiler, i.e. multiple reads on the same non-volatile field, separated by instructions that don't write to that field and that don't perform volatile operations or equivalent, may be collapsed to a single read. It's as if a thread never synchronizes with other threads, so it keeps observing the same value:
public class Worker
{
private bool working = false;
private bool stop = false;
public void Start()
{
if (!working)
{
new Thread(Work).Start();
working = true;
}
}
public void Work()
{
while (!stop)
{
// TODO: actual work without volatile operations
}
}
public void Stop()
{
stop = true;
}
}
There's no guarantee that Stop() will stop the worker. Microsoft's .NET implementation guarantees that stop = true; is a visible side-effect, but it doesn't guarantee that the read on stop inside Work() is not elided to this:
public void Work()
{
bool localStop = stop;
while (!localStop)
{
// TODO: actual work without volatile operations
}
}
That comment says quite a lot. To perform this optimization, the compiler must prove that there are no volatile operations whatsoever, either directly in the block, or indirectly in the whole methods and properties call tree.
For this specific case, one correct implementation is to declare stop as volatile. But there are more options, such as using the equivalent Volatile.Read and Volatile.Write, using Interlocked.CompareExchange, using a lock statement around accesses to stop, using something equivalent to a lock, such as a Mutex, or Semaphore and SemaphoreSlim if you don't want the lock to have thread-affinity, i.e. you can release it on a different thread than the one that acquired it, or using a ManualResetEvent or ManualResetEventSlim instead of stop in which case you can make Work() sleep with a timeout while waiting for a stop signal before the next iteration, etc.
3.
One significant difference of .NET's volatile synchronization compared to Java's volatile synchronization is that Java requires you to use the same volatile location, whereas .NET only requires that an acquire (volatile read) happens after a release (volatile write). So, in principle you can synchronize in .NET with the following code, but you can't synchronize with the equivalent code in Java:
using System;
using System.Threading;
public class SurrealVolatileSynchronizer
{
public volatile bool v1 = false;
public volatile bool v2 = false;
public int state = 0;
public void DoWork1(object b)
{
var barrier = (Barrier)b;
barrier.SignalAndWait();
Thread.Sleep(100);
state = 1;
v1 = true;
}
public void DoWork2(object b)
{
var barrier = (Barrier)b;
barrier.SignalAndWait();
Thread.Sleep(200);
bool currentV2 = v2;
Console.WriteLine("{0}", state);
}
public static void Main(string[] args)
{
var synchronizer = new SurrealVolatileSynchronizer();
var thread1 = new Thread(synchronizer.DoWork1);
var thread2 = new Thread(synchronizer.DoWork2);
var barrier = new Barrier(3);
thread1.Start(barrier);
thread2.Start(barrier);
barrier.SignalAndWait();
thread1.Join();
thread2.Join();
}
}
This surreal example expects threads and Thread.Sleep(int) to take an exact amount of time. If this is so, it synchronizes correctly, because DoWork2 performs a volatile read (acquire) after DoWork1 performs a volatile write (release).
In Java, even with such surreal expectations fulfilled, this would not guarantee synchronization. In DoWork2, you'd have to read from the same volatile field you wrote to in DoWork1.
I read the specs, and they say nothing about whether or not a volatile write will EVER be observed by another thread (volatile read or not). Is that correct or not?
Let me rephrase the question:
Is it correct that the specification says nothing on this matter?
No. The specification is very clear on this matter.
Is a volatile write guaranteed to be observed on another thread?
Yes, if the other thread has a critical execution point. A special side effect is guaranteed to be observed to be ordered with respect to a critical execution point.
A volatile write is a special side effect, and a number of things are critical execution points, including starting and stopping threads. See the spec for a list of such.
Suppose for example thread Alpha sets volatile int field v to one and starts thread Bravo, which reads v, and then joins Bravo. (That is, blocks on Bravo completing.)
At this point we have a special side effect -- the write -- a critical execution point -- the thread start -- and a second special side effect -- a volatile read. Therefore Bravo is required to read one from v. (Assuming no other thread has written it in the meanwhile of course.)
Bravo now increments v to two and ends. That's a special side effect -- a write -- and a critical execution point -- the end of a thread.
When thread Alpha now resumes and does a volatile read of v it is required that it reads two. (Assuming no other thread has written to it in the meanwhile of course.)
The ordering of the side effect of Bravo's write and Bravo's termination must be preserved; plainly Alpha does not run again until after Bravo's termination, and so it is required to observe the write.
Yes, volatile is about fences and fences are about ordering.
So when? is not in the scope and is actually an implementation detail of all the layers (compiler, JIT, CPU etc.) combined,
but every implementation should have decent and practical answer to the question.
Joe Albahari has a great series on multithreading that's a must read and should be known by heart for anyone doing C# multithreading.
In part 4 however he mentions the problems with volatile:
Notice that applying volatile doesn’t prevent a write followed by a
read from being swapped, and this can create brainteasers. Joe Duffy
illustrates the problem well with the following example: if Test1 and
Test2 run simultaneously on different threads, it’s possible for a and
b to both end up with a value of 0 (despite the use of volatile on
both x and y)
Followed by a note that the MSDN documentation is incorrect:
The MSDN documentation states that use of the volatile keyword ensures
that the most up-to-date value is present in the field at all times.
This is incorrect, since as we’ve seen, a write followed by a read can
be reordered.
I've checked the MSDN documentation, which was last changed in 2015 but still lists:
The volatile keyword indicates that a field might be modified by
multiple threads that are executing at the same time. Fields that are
declared volatile are not subject to compiler optimizations that
assume access by a single thread. This ensures that the most
up-to-date value is present in the field at all times.
Right now I still avoid volatile in favor of the more verbose to prevent threads using stale data:
private int foo;
private object fooLock = new object();
public int Foo {
get { lock(fooLock) return foo; }
set { lock(fooLock) foo = value; }
}
As the parts about multithreading were written in 2011, is the argument still valid today? Should volatile still be avoided at all costs in favor of locks or full memory fences to prevent introducing very hard to produce bugs that as mentioned are even dependent on the CPU vendor it's running on?
Volatile in its current implementation is not broken despite popular blog posts claiming such a thing. It is however badly specified and the idea of using a modifier on a field to specify memory ordering is not that great (compare volatile in Java/C# to C++'s atomic specification that had enough time to learn from the earlier mistakes). The MSDN article on the other hand was clearly written by someone who has no business talking about concurrency and is completely bogus.. the only sane option is to completely ignore it.
Volatile guarantees acquire/release semantics when accessing the field and can only be applied to types that allow atomic reads and writes. Not more, not less. This is enough to be useful to implement many lock-free algorithms efficiently such as non-blocking hashmaps.
One very simple sample is using a volatile variable to publish data. Thanks to the volatile on x, the assertion in the following snippet cannot fire:
private int a;
private volatile bool x;
public void Publish()
{
a = 1;
x = true;
}
public void Read()
{
if (x)
{
// if we observe x == true, we will always see the preceding write to a
Debug.Assert(a == 1);
}
}
Volatile is not easy to use and in most situations you are much better off to go with some higher level concept, but when performance is important or you're implementing some low level data structures, volatile can be exceedingly useful.
As I read the MSDN documentation, I believe it is saying that if you see volatile on a variable, you do not have to worry about compiler optimizations screwing up the value because they reorder the operations. It doesn't say that you are protected from errors caused by your own code executing operations on separate threads in the wrong order. (although admittedly, the comment is not clear as to this.)
volatile is a very limited guarantee. It means that the variable isn't subject to compiler optimizations that assume access from a single thread. This means that if you write into a variable from one thread, then read it from another thread, the other thread will definitely have the latest value. Without volatile, one a multiprocessor machine without volatile, the compiler may make assumptions about single-threaded access, for example by keeping the value in a register, which prevents other processors from having access to the latest value.
As the code example you've mentioned shows, it doesn't protect you from having methods in different blocks reordered. In effect volatile makes each individual access to a volatile variable atomic. It doesn't make any guarantees as to the atomicity of groups of such accesses.
If you just want to ensure that your property has an up-to-date single value, you should be able to just use volatile.
The problem comes in if you try to perform multiple parallel operations as if they were atomic. If you have to force several operations to be atomic together, you need to lock the whole operation. Consider the example again, but using locks:
class DoLocksReallySaveYouHere
{
int x, y;
object xlock = new object(), ylock = new object();
void Test1() // Executed on one thread
{
lock(xlock) {x = 1;}
lock(ylock) {int a = y;}
...
}
void Test2() // Executed on another thread
{
lock(ylock) {y = 1;}
lock(xlock) {int b = x;}
...
}
}
The locks cause may cause some synchronization, which may prevent both a and b from having value 0 (I have not tested this). However, since both x and y are locked independently, either a or b can still non-deterministically end up with a value of 0.
So in the case of wrapping the modification of a single variable, you should be safe using volatile, and would not really be any safer using lock. If you need to atomically perform multiple operations, you need to use a lock around the entire atomic block, otherwise scheduling will still cause non-deterministic behavior.
Here are some useful disassemblies for volatile in C#: https://sharplab.io/#gist:625b1181356b543157780baf860c9173
On x86 it is just about:
using memory instead of registers
preventing compiler optimizations like in the case with the endless loop
I use volatile when I just want to tell compiler that a field might be updated from many different threads and I do not need additional features provided by interlocked operations.
If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions, but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache. We think of lock blocks as protecting the variables read and modified within the block, which means:
Assuming you've properly implemented locking where necessary, those variables can only be read and written to by one thread at a time, and
Reads within the lock block see the latest versions of a variable and writes within the lock block become visible to all threads.
(Right?)
This second point is what interests me. Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads? Pardon my mental fuzziness here about how caches work, but I've read that caches hold several multi-byte "lines" of data. I think what I'm asking is, does a memory barrier force synchronization of all "dirty" cache lines or just some, and if just some, what determines which lines get synchronized?
If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions...
Right. The specification guarantees that.
but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache.
The C# specification says nothing whatsoever about "CPU cache". You've left the realm of what is guaranteed by the specification, and entered the realm of implementation details. There is no requirement that an implementation of C# execute on a CPU that has any particular cache architecture.
Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads?
Rather than try to parse your either-or question, let's say what is actually guaranteed by the language. A special effect is:
Any write to a variable, volatile or not
Any read of a volatile field
Any throw
The order of special effects is preserved at certain special points:
Reads and writes of volatile fields
locks
thread creation and termination
The runtime is required to ensure that special effects are ordered consistently with special points. So, if there is a read of a volatile field before a lock, and a write after, then the read can't be moved after the write.
So, how does the runtime achieve this? Beats the heck out of me. But the runtime is certainly not required to "guarantee that all memory is fresh for all threads". The runtime is required to ensure that certain reads, writes and throws happen in chronological order with respect to special points, and that's all.
The runtime is in particular not required that all threads observe the same order.
Finally, I always end these sorts of discussions by pointing you here:
http://blog.coverity.com/2014/03/26/reordering-optimizations/
After reading that, you should have an appreciation for the sorts of horrid things that can happen even on x86 when you act casual about eliding locks.
Reads within the lock block see the latest versions of a variable and writes within the lock block are visible to all threads.
No, that's definitely a harmful oversimplification.
When you enter the lock statement, there a memory fence which sort of means that you'll always read "fresh" data. When you exit the lock state, there's a memory fence which sort of means that all the data you've written is guaranteed to be written to main memory and available to other threads.
The important point is that if multiple threads only ever read/write memory when they "own" a particular lock, then by definition one of them will have exited the lock before the next one enters it... so all those reads and writes will be simple and correct.
If you have code which reads and writes a variable without taking a lock, then there's no guarantee that it will "see" data written by well-behaved code (i.e. code using the lock), or that well-behaved threads will "see" the data written by that bad code.
For example:
private readonly object padlock = new object();
private int x;
public void A()
{
lock (padlock)
{
// Will see changes made in A and B; may not see changes made in C
x++;
}
}
public void B()
{
lock (padlock)
{
// Will see changes made in A and B; may not see changes made in C
x--;
}
}
public void C()
{
// Might not see changes made in A, B, or C. Changes made here
// might not be visible in other threads calling A, B or C.
x = x + 10;
}
Now it's more subtle than that, but that's why using a common lock to protect a set of variables works.
I have a thread that spins until an int changed by another thread is a certain value.
int cur = this.m_cur;
while (cur > this.Max)
{
// spin until cur is <= max
cur = this.m_cur;
}
Does this.m_cur need to be declared volatile for this to work? Is it possible that this will spin forever due to compiler optimization?
Yes, that's a hard requirement. The just-in-time compiler is allowed to store the value of m_cur in a processor register without refreshing it from memory. The x86 jitter in fact does, the x64 jitter doesn't (at least the last time I looked at it).
The volatile keyword is required to suppress this optimization.
Volatile means something entirely different on Itanium cores, a processor with a weak memory model. Unfortunately that's what made it into the MSDN library and C# Language Specification. What it is going to to mean on an ARM core remains to be seen.
The blog below has some fascinating detail on the memory model in c#. In short, it seems safer to use the volatile keyword.
http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
From the blog below
class Test
{
private bool _loop = true;
public static void Main()
{
Test test1 = new Test();
// Set _loop to false on another thread
new Thread(() => { test1._loop = false;}).Start();
// Poll the _loop field until it is set to false
while (test1._loop == true) ;
// The loop above will never terminate!
}
}
There are two possible ways to get the while loop to terminate: Use a
lock to protect all accesses (reads and writes) to the _loop field
Mark the _loop field as volatile There are two reasons why a read of a
non-volatile field may observe a stale value: compiler optimizations
and processor optimizations.
It depends on how m_cur is being modified. If it's using a normal assignment statement such as m_cur--;, then it does need to be volatile. However, if it's being modified using one of the Interlocked operations, then it doesn't because Interlocked's methods automatically insert a memory barrier to ensure that all threads get the memo.
In general, using Interlocked to modify atomic valued that are shared across threads is the preferable option. Not only does it take care of the memory barrier for you, but it also tends to be a bit faster than other synchronization options.
That said, like others have said polling loops are enormously wasteful. It would be better to pause the thread that needs to wait, and let whoever is modifying m_cur take charge of waking it up when the time comes. Both Monitor.Wait() and Monitor.Pulse() and AutoResetEvent might be well-suited to the task, depending on your specific needs.
I have a question related to the C# memory model and threads. I am not sure if the following code is correct without the volatile keyword.
public class A {
private int variableA = 0;
public A() {
variableA = 1;
Thread B = new Thread(new ThreadStart(() => printA())).Start();
}
private void printA() {
System.Console.WriteLine(variableA);
}
}
My concern is if it is guaranteed that the Thread B will see variableA with value 1 without using volatile? In the main thread I am only assigning 1 to variableA in the constructor. After that I am not touching variableA, it is used only in the Thread B, so locking is probably not necessary.
But, is it guaranteed that the main thread will flush his cache and write the variableA contents to the main memory, so the second thread can read the newly assigned value?
Additionally, is it guaranteed that the second thread will read the variableA contents from the main memory? May some compiler optimizations occur and the Thread B can read the variableA contents from the cache instead of the main memory? It may happen when the order of the instructions is changed.
For sure, adding volatile to the variableA declaration will make the code correct. But, is it neccessary? I am asking because I wrote some code with some non volatile variables initialization in the constructor, and the variables are used later by some Timer threads, and I am not sure if it is totally correct.
What about the same code in Java?
Thanks, Michal
There are a lot of places where implicit memory barriers are created. This is one of them. Starting threads create full barriers. So the write to variableA will get committed before the thread starts and the first reads will be acquired from main memory. Of course, in Microsoft's implementation of the CLR that is somewhat of a moot point because writes already have volatile semantics. But the same guarentee is not made in the ECMA specification so it is theorectically possible that the Mono implemenation could behave differently in this regard.
My concern is if it is guaranteed that
the Thread B will see variableA with
value 1 without using volatile?
In this case...yes. However, if you continue to use variableA in the second thread there is no guarentee after the first read that it will see updates.
But, is it guaranteed that the main
thread will flush his cache and write
the variableA contents to the main
memory, so the second thread can read
the newly assigned value?
Yes.
Additionally, is it guaranteed that
the second thread will read the
variableA contents from the main
memory?
Yes, but only on the first read.
For sure, adding volatile to the
variableA declaration will make the
code correct. But, is it neccessary?
In this very specific and narrow case...no. But, in general it is advised that you use the volatile keyword in these scenarios. Not only will it make your code thread-safe as the scenario gets more complicated, but it also helps to document the fact that the field is going to be used by more than one thread and that you have considered the implications of using a lock-free strategy.
The same code in Java is definitely okay - the creation of a new thread acts as a sort of barrier, effectively. (All actions earlier in the program text than the thread creation "happen before" the new thread starts.)
I don't know what's guaranteed in .NET with respect to new thread creation, however. Even more worrying is the possibility of a delayed read when using Control.BeginInvoke and the like... I haven't seen any guarantees around memory barriers for those situations.
To be honest, I suspect it's fine. I suspect that anything which needs to coordinate between threads like this (either creating a new one or marshalling a call onto an existing one) will use a full memory barrier on both of the threads involved. However, you're absolutely right to be concerned, and I'm hoping that you'll get a more definitive answer from someone smarter than me. You might want to email Joe Duffy to get his point of view on this...
But, is it guaranteed that the main thread will flush his cache and write the variableA contents to the main memory,
Yes, this is guaranteed by the MS CLR memory model. Not necessarily so for other implementations of the CLI (ie, I'm not sure about Mono). The ECMA standard does not require it.
so the second thread can read the newly assigned value?
That requires that the cache has been refreshed. It is probably guaranteed by the creation of the Thread (like Jon Skeet said). It is however not guaranteed by the previous point. The cache is flushed on each write but not on each read.
You could make very sure by using VolatileRead(ref variableA) but it is recommended (Jeffrey Richter) to use the Interlocked class. Note that VolatileWrite() is superfluous in MS.NET.