Does MemoryBarrier really ensure refresh values?

Does MemoryBarrier really ensure refresh values? - c#

Albahari in his marvelous book C# in a nutshell (a free chapter is available online), talks about how memory barrier allows us to get a "refresh" value. His example is:
static void Main()
{
bool complete = false;
var t = new Thread(() =>
{
bool toggle = false;
while (!complete)
{
toggle = !toggle;
}
});
t.Start();
Thread.Sleep(1000);
complete = true;
t.Join(); // Blocks indefinitely
}
This blocks indefinitely as he suggests if you build in release mode. He offers few solutions to solve this block. Use Thread.MemoryBarrier in the while loop, use lock or make "complete" volatile static field.
I would agree with the volatile field solution as volatile enforces a direct memory read rather than a register read for JIT. However I believe this optimization has nothing to do with fences and memory barriers. It's just a matter of JIT optimization as in if JIT prefers reading it from memory or from a register. Actually instead of using MemoryBarrier, any method call "convinces" JIT not to use the register at all as in:
class Program
{
[MethodImpl( MethodImplOptions.NoInlining)]
public static bool Toggle(bool toggle)
{
return !toggle;
}
static void Main()
{
bool complete = false;
var t = new Thread(() =>
{
bool toggle = false;
while (!complete)
{
toggle = Toggle(toggle);
}
});
t.Start();
Thread.Sleep(1000);
complete = true;
t.Join(); // Blocks indefinitely
}
}
Here I am making a dummy toggle call. And from the assembly code generated I can clearly see JIT uses direct memory access for reading the "complete" local variable. Thus my assumption, at least on intel CPU and considering the compiler optimizations, MemoryBarrier has no role in terms of "refreshness". MemoryBarrier just aquires a full fence the preseve the order and that's it. Am I correct to think that way?

I would agree with the volatile field solution as volatile enforces a direct memory read rather than a register read for JIT. However I believe this optimization has nothing to do with fences and memory barriers.
Volatile reads and writes are described in ECMA-335, I.12.6.7. Important parts of this section:
A volatile read has “acquire semantics” meaning that the read is guaranteed to occur prior to any references to memory that occur after the read instruction in the CIL instruction sequence. A volatile write has “release semantics” meaning that the write is guaranteed to happen after any memory references prior to the write instruction in the CIL instruction sequence.
A conforming implementation of the CLI shall guarantee this semantics of volatile operations.
and
An optimizing compiler that converts CIL to native code shall not remove any volatile operation, nor shall it coalesce multiple volatile operations into a single operation.
Acquire and release semantics for x86 and x86-64 architectures doesn't require any memory barriers (because the hardware memory model is not weaker than required by volatile semantics). But for ARM architecture JIT must emit half-fences (one direction memory barriers).
So, in that example with volatile everything works because of optimization restriction. And with MemoryBarrier it works because compiler can't optimize read of that variable into a single read outside the loop, because this read can't cross the MemoryBarrier.
But the code
while (!complete)
{
toggle = Toggle(toggle);
}
is allowed to be optimized into something like this:
var tmp = complete;
while (!tmp)
{
toggle = Toggle(toggle);
}
The reason why it doesn't happen in case of method call is that for some reason optimization was not applied (but it can be applied). So, this code is fragile and implementation-specific, because it relies not on standard, but on implementation details which might be changed.

Related

C# volatile variable: Memory fences VS. caching

So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.
However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory.
So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other threads/processors. In particular, this depends on the time the cache is flushed (if at all). I remember having read a comment from Eric Lippert saying something along the lines of "the presence of volatile fields automatically disables cache optimizations". But I'm not sure what exactly this means. Does it mean caching is completely disabled for the whole program just because we have a single volatile field somewhere? If not, what is the granularity the cache is disabled for?
Also, I read something about strong and weak volatile semantics and that C# follows the strong semantics where every write will always go straight to main memory no matter if it's a volatile field or not. I am very confused about all of this.

I'll address the last question first. Microsoft's .NET implementation has release semantics on writes1. It's not C# per se, so the same program, no matter the language, in a different implementation can have weak non-volatile writes.
The visibility of side-effects is regarding multiple threads. Forget about CPUs, cores and caches. Imagine, instead, that each thread has a snapshot of what is on the heap that requires some sort of synchronization to communicate side-effects between threads.
So, what does C# say? The C# language specification (newer draft) says fundamentally the same as the Common Language Infrastructure standard (CLI; ECMA-335 and ISO/IEC 23271) with some differences. I'll talk about them later on.
So, what does the CLI say? That only volatile operations are visible side-effects.
Note that it also says that non-volatile operations on the heap are side-effects as well, but not guaranteed to be visible. Just as important2, it doesn't state they're guaranteed to not be visible either.
What exactly happens on volatile operations? A volatile read has acquire semantics, it precedes any following memory reference. A volatile write has release semantics, it follows any preceding memory reference.
Acquiring a lock performs a volatile read, and releasing a lock performs a volatile write.
Interlocked operations have acquire and release semantics.
There's another important term to learn, which is atomicity.
Reads and writes, volatile or not, are guaranteed to be atomic on primitive values up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They're also guaranteed to be atomic for references. For other types, such as long structs, the operations are not atomic, they may require multiple, independent memory accesses.
However, even with volatile semantics, read-modify-write operations, such as v += 1 or the equivalent ++v (or v++, in terms of side-effects) , are not atomic.
Interlocked operations guarantee atomicity for certain operations, typically addition, subtraction and compare-and-swap (CAS), i.e. write some value if and only if the current value is still some expected value. .NET also has an atomic Read(ref long) method for integers of 64 bits which works even in 32-bit architectures.
I'll keep referring to acquire semantics as volatile reads and release semantics as volatile writes, and either or both as volatile operations.
What does this all mean in terms of order?
That a volatile read is a point before which no memory references may cross, and a volatile write is a point after which no memory references may cross, both at the language level and at the machine level.
That non-volatile operations may cross to after following volatile reads if there are no volatile writes in between, and cross to before preceding volatile writes if there are no volatile reads in between.
That volatile operations within a thread are sequential and may not be reordered.
That volatile operations in a thread are made visible to all other threads in the same order. However, there is no total order of volatile operations from all threads, i.e. if one threads performs V1 and then V2, and another thread performs V3 and then V4, then any order that has V1 before V2 and V3 before V4 can be observed by any thread. In this case, it can be either of the following:
V1 V2 V3 V4
V1 V3 V2 V4
V1 V3 V4 V2
V3 V1 V2 V4
V3 V1 V4 V2
V3 V4 V1 V2
That is, any possible order of observed side-effects are valid for any thread for a single execution. There is no requirement on total ordering, such that all threads observe only one of the possible orders for a single execution.
How are things synchronized?
Essentially, it boils down to this: a synchronization point is where you have a volatile read that happens after a volatile write.
In practice, you must detect if a volatile read in one thread happened after a volatile write in another thread3. Here's a basic example:
public class InefficientEvent
{
private volatile bool signalled = false;
public Signal()
{
signalled = true;
}
public InefficientWait()
{
while (!signalled)
{
}
}
}
However generally inefficient, you can run two different threads, such that one calls InefficientWait() and another one calls Signal(), and the side-effects of the latter when it returns from Signal() become visible to the former when it returns from InefficientWait().
Volatile accesses are not as generally useful as interlocked accesses, which are not as generally useful as synchronization primitives. My advice is that you should develop code safely first, using synchronization primitives (locks, semaphores, mutexes, events, etc.) as needed, and if you find reasons to improve performance based on actual data (e.g. profiling), then and only then see if you can improve.
If you ever reach high contention for fast locks (used only for a few reads and writes without blocking), depending on the amount of contention, switching to interlocked operations may either improve or decrease performance. Especially so when you have to resort to compare-and-swap cycles, such as:
var currentValue = Volatile.Read(ref field);
var newValue = GetNewValue(currentValue);
var oldValue = currentValue;
var spinWait = new SpinWait();
while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue)
{
spinWait.SpinOnce();
newValue = GetNewValue(currentValue);
oldValue = currentValue;
}
Meaning, you have to profile the solution as well and compare with the current state. And be aware of the A-B-A problem.
There's also SpinLock, which you must really profile against monitor-based locks, because although they may make the current thread yield, they don't put the current thread to sleep, akin to the shown usage of SpinWait.
Switching to volatile operations is like playing with fire. You must make sure through analytical proof that your code is correct, otherwise you may get burned when you least expect.
Usually, the best approach for optimization in the case of high contention is to avoid contention. For instance, to perform a transformation on a big list in parallel, it's often better to divide and delegate the problem to multiple work items that generate results which are merged in a final step, rather than having multiple threads locking the list for updates. This has a memory cost, so it depends on the length of the data set.
What are the differences between the C# specification and the CLI specification regarding volatile operations?
C# specifies side-effects, not mentioning their inter-thread visibility, as being a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.
C# specifies critical execution points at which these side-effects are preserved between threads: references to volatile fields, lock statements, and thread creation and termination.
If we take critical execution points as points where side-effects become visible, it adds to the CLI specification that thread creation and termination are visible side-effects, i.e. new Thread(...).Start() has release semantics on the current thread and acquire semantics at the start of the new thread, and exiting a thread has release semantics on the current thread and thread.Join() has acquire semantics on the waiting thread.
C# doesn't mention volatile operations in general, such as performed by classes in System.Threading instead of only through using fields declared as volatile and using the lock statement. I believe this is not intentional.
C# states that captured variables can be simultaneously exposed to multiple threads. The CIL doesn't mention it, because closures are a language construct.
1.
There are a few places where Microsoft (ex-)employees and MVPs state that writes have release semantics:
Memory Model, by Chris Brumme
Memory Models, Understand the Impact of Low-Lock Techniques in Multithreaded Apps, by Vance Morrison
CLR 2.0 memory model, by Joe Duffy
Which managed memory model?, by Eric Eilebrecht
C# - The C# Memory Model in Theory and Practice, Part 2, by Igor Ostrovsky
In my code, I ignore this implementation detail. I assume non-volatile writes are not guaranteed to become visible.
2.
There is a common misconception that you're allowed to introduce reads in C# and/or the CLI.
The problem with being second, by Grant Richins
Comments on The CLI memory model, and specific specifications, by Jon Skeet
C# - The C# Memory Model in Theory and Practice, Part 2, by Igor Ostrovsky
However, that is true only for local arguments and variables.
For static and instance fields, or arrays, or anything on the heap, you cannot sanely introduce reads, as such introduction may break the order of execution as seen from the current thread of execution, either from legitimate changes in other threads, or from changes through reflection.
That is, you can't turn this:
object local = field;
if (local != null)
{
// code that reads local
}
into this:
if (field != null)
{
// code that replaces reads on local with reads on field
}
if you can ever tell the difference. Specifically, a NullReferenceException being thrown by accessing local's members.
In the case of C#'s captured variables, they're equivalent to instance fields.
It's important to note that the CLI standard:
says that non-volatile accesses are not guaranteed to be visible
doesn't say that non-volatile accesses are guaranteed to not be visible
says that volatile accesses affect the visibility of non-volatile accesses
But you can turn this:
object local2 = local1;
if (local2 != null)
{
// code that reads local2 on the assumption it's not null
}
into this:
if (local1 != null)
{
// code that replaces reads on local2 with reads on local1,
// as long as local1 and local2 have the same value
}
You can turn this:
var local = field;
local?.Method()
into this:
var local = field;
var _temp = local;
(_temp != null) ? _temp.Method() : null
or this:
var local = field;
(local != null) ? local.Method() : null
because you can't ever tell the difference. But again, you cannot turn it into this:
(field != null) ? field.Method() : null
I believe it was prudent in both specifications stating that an optimizing compiler may reorder reads and writes as long as a single thread of execution observes them as written, instead of generally introducing and eliminating them altogether.
Note that read elimination may be performed by either the C# compiler or the JIT compiler, i.e. multiple reads on the same non-volatile field, separated by instructions that don't write to that field and that don't perform volatile operations or equivalent, may be collapsed to a single read. It's as if a thread never synchronizes with other threads, so it keeps observing the same value:
public class Worker
{
private bool working = false;
private bool stop = false;
public void Start()
{
if (!working)
{
new Thread(Work).Start();
working = true;
}
}
public void Work()
{
while (!stop)
{
// TODO: actual work without volatile operations
}
}
public void Stop()
{
stop = true;
}
}
There's no guarantee that Stop() will stop the worker. Microsoft's .NET implementation guarantees that stop = true; is a visible side-effect, but it doesn't guarantee that the read on stop inside Work() is not elided to this:
public void Work()
{
bool localStop = stop;
while (!localStop)
{
// TODO: actual work without volatile operations
}
}
That comment says quite a lot. To perform this optimization, the compiler must prove that there are no volatile operations whatsoever, either directly in the block, or indirectly in the whole methods and properties call tree.
For this specific case, one correct implementation is to declare stop as volatile. But there are more options, such as using the equivalent Volatile.Read and Volatile.Write, using Interlocked.CompareExchange, using a lock statement around accesses to stop, using something equivalent to a lock, such as a Mutex, or Semaphore and SemaphoreSlim if you don't want the lock to have thread-affinity, i.e. you can release it on a different thread than the one that acquired it, or using a ManualResetEvent or ManualResetEventSlim instead of stop in which case you can make Work() sleep with a timeout while waiting for a stop signal before the next iteration, etc.
3.
One significant difference of .NET's volatile synchronization compared to Java's volatile synchronization is that Java requires you to use the same volatile location, whereas .NET only requires that an acquire (volatile read) happens after a release (volatile write). So, in principle you can synchronize in .NET with the following code, but you can't synchronize with the equivalent code in Java:
using System;
using System.Threading;
public class SurrealVolatileSynchronizer
{
public volatile bool v1 = false;
public volatile bool v2 = false;
public int state = 0;
public void DoWork1(object b)
{
var barrier = (Barrier)b;
barrier.SignalAndWait();
Thread.Sleep(100);
state = 1;
v1 = true;
}
public void DoWork2(object b)
{
var barrier = (Barrier)b;
barrier.SignalAndWait();
Thread.Sleep(200);
bool currentV2 = v2;
Console.WriteLine("{0}", state);
}
public static void Main(string[] args)
{
var synchronizer = new SurrealVolatileSynchronizer();
var thread1 = new Thread(synchronizer.DoWork1);
var thread2 = new Thread(synchronizer.DoWork2);
var barrier = new Barrier(3);
thread1.Start(barrier);
thread2.Start(barrier);
barrier.SignalAndWait();
thread1.Join();
thread2.Join();
}
}
This surreal example expects threads and Thread.Sleep(int) to take an exact amount of time. If this is so, it synchronizes correctly, because DoWork2 performs a volatile read (acquire) after DoWork1 performs a volatile write (release).
In Java, even with such surreal expectations fulfilled, this would not guarantee synchronization. In DoWork2, you'd have to read from the same volatile field you wrote to in DoWork1.

I read the specs, and they say nothing about whether or not a volatile write will EVER be observed by another thread (volatile read or not). Is that correct or not?
Let me rephrase the question:
Is it correct that the specification says nothing on this matter?
No. The specification is very clear on this matter.
Is a volatile write guaranteed to be observed on another thread?
Yes, if the other thread has a critical execution point. A special side effect is guaranteed to be observed to be ordered with respect to a critical execution point.
A volatile write is a special side effect, and a number of things are critical execution points, including starting and stopping threads. See the spec for a list of such.
Suppose for example thread Alpha sets volatile int field v to one and starts thread Bravo, which reads v, and then joins Bravo. (That is, blocks on Bravo completing.)
At this point we have a special side effect -- the write -- a critical execution point -- the thread start -- and a second special side effect -- a volatile read. Therefore Bravo is required to read one from v. (Assuming no other thread has written it in the meanwhile of course.)
Bravo now increments v to two and ends. That's a special side effect -- a write -- and a critical execution point -- the end of a thread.
When thread Alpha now resumes and does a volatile read of v it is required that it reads two. (Assuming no other thread has written to it in the meanwhile of course.)
The ordering of the side effect of Bravo's write and Bravo's termination must be preserved; plainly Alpha does not run again until after Bravo's termination, and so it is required to observe the write.

Yes, volatile is about fences and fences are about ordering.
So when? is not in the scope and is actually an implementation detail of all the layers (compiler, JIT, CPU etc.) combined,
but every implementation should have decent and practical answer to the question.

What I do not understand about volatile and Memory-Barrier is

Loop hoisting a volatile read
I have read many places that a volatile variable can not be hoisted from a loop or if, but I cannot find this mentioned any places in the C# spec. Is this a hidden feature?
All writes are volatile in C#
Does this mean that all writes have the same properties without, as with the volatile keyword? Eg ordinary writes in C# has release semantics? and all writes flushes the store buffer of the processor?
Release semantics
Is this a formal way of saying that the store buffer of a processor is emptied when a volatile write is done?
Acquire semantics
Is this a formal way of saying that is should not load a variable into a register, but fetch it from memory every time?
In this article, Igoro speaks of "thread cache". I perfectly understand that this is imaginary, but is he in fact referring to:
Processor store buffer
loading variables into registers instead of fetching from memory every time
Some sort of processor cache (is this L1 and L2 etc)
Or is this just my imagination?
Delayed writing
I have read many places that writes can be delayed. Is this because of the reordering, and the store buffer?
Memory.Barrier
I understand that a side-effect is a call to "lock or" when JIT is transforming IL to asm, and this is why a Memory.Barrier can solve the delayed write to main memory (in the while loop) in fx this example:
static void Main()
{
bool complete = false;
var t = new Thread (() =>
{
bool toggle = false;
while (!complete) toggle = !toggle;
});
t.Start();
Thread.Sleep (1000);
complete = true;
t.Join(); // Blocks indefinitely
}
But is this always the case? Will a call to Memory.Barrier always flush the store buffer fetch updated values into the processor cache? I understand that the complete variable is not hoisted into a register and is fetched from a processor cache, every time, but the processor cache is updated because of the call to Memory.Barrier.
Am I on thin ice here, or have I some sort of understand of volatile and Memory.Barrier?

That's a mouthful..
I'm gonna start with a few of your questions, and update my answer.
Loop hoisting a volatile
I have read many places that a volatile variable can not be hoisted from a loop or if, but I cannot find this mentioned any places in the C# spec. Is this a hidden feature?
MSDN says "Fields that are declared volatile are not subject to compiler optimizations that assume access by a single thread". This is kind of a broad statement, but it includes hoisting or "lifting" variables out of a loop.
All writes are volatile in C#
Does this mean that all writes have the same properties without, as with the volatile keyword? Eg ordinary writes in C# has release semantics? and all writes flushes the store buffer of the processor?
Regular writes are not volatile. They do have release semantics, but they don't flush the CPU's write-buffer. At least, not according to the spec.
From Joe Duffy's CLR 2.0 Memory Model
Rule 2: All stores have release semantics, i.e. no load or store may move after one.
I've read a few articles stating that all writes are volatile in C# (like the one you linked to), but this is a common misconception. From the horse's mouth (The C# Memory Model in Theory and Practice, Part 2):
Consequently, the author might say something like, “In the .NET 2.0 memory model, all writes are volatile—even those to non-volatile fields.” (...) This behavior isn’t guaranteed by the ECMA C# spec, and, consequently, might not hold in future versions of the .NET Framework and on future architectures (and, in fact, does not hold in the .NET Framework 4.5 on ARM).
Release semantics
Is this a formal way of saying that the store buffer of a processor is emptied when a volatile write is done?
No, those are two different things. If an instruction has "release semantics", then no store/load instruction will ever be moved below said instruction. The definition says nothing regarding flushing the write-buffer. It only concerns instruction re-ordering.
Delayed writing
I have read many places that writes can be delayed. Is this because of the reordering, and the store buffer?
Yes. Write instructions can be delayed/reordered by either the compiler, the jitter or the CPU itself.
So a volatile write has two properties: release semantics, and store buffer flushing.
Sort of. I prefer to think of it this way:
The C# Specification of the volatile keyword guarantees one property: that reads have acquire-semantics and writes have release-semantics. This is done by emitting the necessary release/acquire fences.
The actual Microsoft's C# implementation adds another property: reads will be fresh, and writes will be flushed to memory immediately and be made visible to other processors. To accomplish this, the compiler emits an OpCodes.Volatile, and the jitter picks this up and tells the processor not to store this variable on its registers.
This means that a different C# implementation that doesn't guarantee immediacy will be a perfectly valid implementation.
Memory Barrier
bool complete = false;
var t = new Thread (() =>
{
bool toggle = false;
while (!complete) toggle = !toggle;
});
t.Start();
Thread.Sleep(1000);
complete = true;
t.Join(); // blocks
But is this always the case? Will a call to Memory.Barrier always flush the store buffer fetch updated values into the processor cache?
Here's a tip: try to abstract yourself away from concepts like flushing the store buffer, or reading straight from memory. The concept of a memory barrier (or a full-fence) is in no way related to the two former concepts.
A memory barrier has one sole purpose: ensure that store/load instructions below the fence are not moved above the fence, and vice-versa. If C#'s Thread.MemoryBarrier just so happens to flush pending writes, you should think about it as a side-effect, not the main intent.
Now, let's get to the point. The code you posted (which blocks when compiled in Release mode and ran without a debugger) could be solved by introducing a full fence anywhere inside the while block. Why? Let's first unroll the loop. Here's how the first few iterations would look like:
if(complete) return;
toggle = !toggle;
if(complete) return;
toggle = !toggle;
if(complete) return;
toggle = !toggle;
...
Because complete is not marked as volatile and there are no fences, the compiler and the cpu are allowed to move the read of the complete field.
In fact, the CLR's Memory Model (see rule 6) allows loads to be deleted (!) when coalescing adjacent loads. So, this could happen:
if(complete) return;
toggle = !toggle;
toggle = !toggle;
toggle = !toggle;
...
Notice that this is logically equivalent to hoisting the read out of the loop, and that's exactly what the compiler may do.
By introducing a full-fence either before or after toggle = !toggle, you'd prevent the compiler from moving the reads up and merging them together.
if(complete) return;
toggle = !toggle;
#FENCE
if(complete) return;
toggle = !toggle;
#FENCE
if(complete) return;
toggle = !toggle;
#FENCE
...
In conclusion, the key to solving these issues is ensuring that the instructions will be executed in the correct order. It has nothing to do with how long it takes for other processors to see one processor's writes.

Do I need this field to be volatile?

I have a thread that spins until an int changed by another thread is a certain value.
int cur = this.m_cur;
while (cur > this.Max)
{
// spin until cur is <= max
cur = this.m_cur;
}
Does this.m_cur need to be declared volatile for this to work? Is it possible that this will spin forever due to compiler optimization?

Yes, that's a hard requirement. The just-in-time compiler is allowed to store the value of m_cur in a processor register without refreshing it from memory. The x86 jitter in fact does, the x64 jitter doesn't (at least the last time I looked at it).
The volatile keyword is required to suppress this optimization.
Volatile means something entirely different on Itanium cores, a processor with a weak memory model. Unfortunately that's what made it into the MSDN library and C# Language Specification. What it is going to to mean on an ARM core remains to be seen.

The blog below has some fascinating detail on the memory model in c#. In short, it seems safer to use the volatile keyword.
http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
From the blog below
class Test
{
private bool _loop = true;
public static void Main()
{
Test test1 = new Test();
// Set _loop to false on another thread
new Thread(() => { test1._loop = false;}).Start();
// Poll the _loop field until it is set to false
while (test1._loop == true) ;
// The loop above will never terminate!
}
}
There are two possible ways to get the while loop to terminate: Use a
lock to protect all accesses (reads and writes) to the _loop field
Mark the _loop field as volatile There are two reasons why a read of a
non-volatile field may observe a stale value: compiler optimizations
and processor optimizations.

It depends on how m_cur is being modified. If it's using a normal assignment statement such as m_cur--;, then it does need to be volatile. However, if it's being modified using one of the Interlocked operations, then it doesn't because Interlocked's methods automatically insert a memory barrier to ensure that all threads get the memo.
In general, using Interlocked to modify atomic valued that are shared across threads is the preferable option. Not only does it take care of the memory barrier for you, but it also tends to be a bit faster than other synchronization options.
That said, like others have said polling loops are enormously wasteful. It would be better to pause the thread that needs to wait, and let whoever is modifying m_cur take charge of waking it up when the time comes. Both Monitor.Wait() and Monitor.Pulse() and AutoResetEvent might be well-suited to the task, depending on your specific needs.

Why this program does not go into infinite loop in absence of volatility of a boolean condition variable?

I wanted to understand on when exactly I need to declare a variable as volatile. For that I wrote a small program and was expecting it to go into infinite loop because of missing volatility of a condition variable. It did not went into infinite loop and worked fine without volatile keyword.
Two questions:
What should I change in the below code listing - so that it absolutely requires use of volatile?
Is C# compiler smart enough to treat a variable as volatile - if it sees that a variable is being accessed from a different thread?
The above triggered more questions to me :)
a. Is volatile just a hint?
b. When should I declare a variable as volatile in context of multithreading?
c. Should all member variables be declared volatile for a thread safe class? Is that overkill?
Code Listing (Volatility and not thread safety is the focus):
class Program
{
static void Main(string[] args)
{
VolatileDemo demo = new VolatileDemo();
demo.Start();
Console.WriteLine("Completed");
Console.Read();
}
}
public class VolatileDemo
{
public VolatileDemo()
{
}
public void Start()
{
var thread = new Thread(() =>
{
Thread.Sleep(5000);
stop = true;
});
thread.Start();
while (stop == false)
Console.WriteLine("Waiting For Stop Event");
}
private bool stop = false;
}
Thanks.

Firstly, Joe Duffy says "volatile is evil" - that's good enough for me.
If you do want to think about volatile, you must think in terms of memory fences and optimisations - by the compiler, jitter and CPU.
On x86, writes are release fences, which means your background thread will flush the true value to memory.
So, what you are looking for is a caching of the false value in your loop predicate. The complier or jitter may optimise the predicate and only evaluate it once, but I guess it doesn't do that for a read of a class field. The CPU will not cache the false value because you are calling Console.WriteLine which includes a fence.
This code requires volatile and will never terminate without a Volatile.Read:
static void Run()
{
bool stop = false;
Task.Factory.StartNew( () => { Thread.Sleep( 1000 ); stop = true; } );
while ( !stop ) ;
}

I am not an expert in C# concurrency, but AFAIK your expectation is incorrect. Modifying a non-volatile variable from a different thread does not mean that the change will never become visible to other threads. Only that there is no guarantee when (and if) it happens. In your case it did happen (how many times did you run the program btw?), possibly due to the finishing thread flushing its changes as per #Russell's comment. But in a real life setup - involving more complex program flow, more variables, more threads - the update may happen later than 5 seconds, or - maybe once in a thousand cases - may not happen at all.
So running your program once - or even a million times - while not observing any problems only provides statistical, not absolute proof. "Absence of evidence is not evidence of absence".

Try to rewrite it like this:
public void Start()
{
var thread = new Thread(() =>
{
Thread.Sleep(5000);
stop = true;
});
thread.Start();
bool unused = false;
while (stop == false)
unused = !unused; // fake work to prevent optimization
}
And make sure you are running in Release mode and not Debug mode. In Release mode optimizations are applied which actually cause the code to fail in the absence of volatile.
Edit: A bit about volatile:
We all know that there are two distinct entities involved in a program lifecycle that can apply optimizations in the form of variable caching and/or instruction reordering: the compiler and the CPU.
This means that there may be even a large difference between how you wrote your code and how it actually gets executed, as instructions may be reordered with respect to eachother, or reads may be cached in what the compiler perceives as being an "improvement in speed".
Most of the times this is good, but sometimes (especially in the multithreading context) it can cause trouble as seen in this example. To allow the programmer to manually prevent such optimizations, memory fences were introduced, which are special instructions whose role is to prevent both reordering of instructions (just reads, just writes or both) with respect to the fence itself and also force the invalidation of values in CPU caches, such that they need to be re-read every time (which is what we want in the scenario above).
Although you can specify a full fence affecting all variables through Thread.MemoryBarrier(), it's almost always an overkill if you need only one variable to be affected. Thus, for a single variable to be always up-to-date across threads, you can use volatile to introduce read/write fences for that variable only.

volatile keyword is a message to a compiler not to make single-thread optimizations on this variable.
It means that this variable may be modified by multi threads.
This makes the variable value the most 'fresh' while reading.
The piece of code you've pasted here is a good example to use volatile keyword.
It's not a surprise that this code works without 'volatile' keyword. However it may behave more unpredictible when more threads are running and you perform more sophisticated actions on the flag value.
You declare volatile only on those variables which can be modified by several threads.
I don't know exactly how it is in C#, but I assume you can't use volatile on those variables which are modified by read-write actions (such as incrementation). Volatile doesn't use locks while changing the value.
So setting the flag on volatile (like above) is OK, incrementing the variable is not OK - you should use synchronization/locking mechanism then.

When the background thread assigns true to the member variable there is a release fence and the value is written to memory and the other processor's cache is updated or flushed of that address.
The function call to Console.WriteLine is a full memory fence and its semantics of possibly doing anything (short of compiler optimisations) would require that stop not be cached.
However if you remove the call to Console.WriteLine, I find that the function is still halting.
I believe that the compiler in the absence of optimisations the compiler does not cache anything calculated from global memory. The volatile keyword is then an instruction not to even think of caching any expression involving the variable to the compiler / JIT.
This code still halts (at least for me, I am using Mono):
public void Start()
{
stop = false;
var thread = new Thread(() =>
{
while(true)
{
Thread.Sleep(50);
stop = !stop;
}
});
thread.Start();
while ( !(stop ^ stop) );
}
This shows that it's not the while statement preventing caching, because this shows the variable not being cached even within the same expression statement.
This optimisation look sensitive to the memory model, which is platform dependent meaning this would be done in the JIT compiler; which wouldn't have time (or intelligence) to /see/ the usage of the variable in the other thread and prevent caching for that reason.
Perhaps Microsoft doesn't believe programmers capable of knowing when to use volatile and decided to strip them of the responsibility, and then Mono followed suit.

Does Interlocked provide visibility in all threads?

Suppose I have a variable "counter", and there are several threads accessing and setting the value of "counter" by using Interlocked, i.e.:
int value = Interlocked.Increment(ref counter);
and
int value = Interlocked.Decrement(ref counter);
Can I assume that, the change made by Interlocked will be visible in all threads?
If not, what should I do to make all threads synchronize the variable?
EDIT: someone suggested me to use volatile. But when I set the "counter" as volatile, there is compiler warning "reference to volatile field will not be treated as volatile".
When I read online help, it said, "A volatile field should not normally be passed using a ref or out parameter".

InterlockedIncrement/Decrement on x86 CPUs (x86's lock add/dec) are automatically creating memory barrier which gives visibility to all threads (i.e., all threads can see its update as in-order, like sequential memory consistency). Memory barrier makes all pending memory loads/stores to be completed. volatile is not related to this question although C# and Java (and some C/C++ compilers) enforce volatile to make memory barrier. But, interlocked operation already has memory barrier by CPU.
Please also take a look my another answer in stackoverflow.
Note that I have assume that C#'s InterlockedIncrement/Decrement are intrinsic mapping to x86's lock add/dec.

Can I assume that, the change made by Interlocked will be visible in all threads?
This depends on how you read the value. If you "just" read it, then no, this won't always be visible in other threads unless you mark it as volatile. That causes an annoying warning though.
As an alternative (and much preferred IMO), read it using another Interlocked instruction. This will always see the updated value on all threads:
int readvalue = Interlocked.CompareExchange(ref counter, 0, 0);
which returns the value read, and if it was 0 swaps it with 0.
Motivation: the warning hints that something isn't right; combining the two techniques (volatile & interlocked) wasn't the intended way to do this.
Update: it seems that another approach to reliable 32-bit reads without using "volatile" is by using Thread.VolatileRead as suggested in this answer. There is also some evidence that I am completely wrong about using Interlocked for 32-bit reads, for example this Connect issue, though I wonder if the distinction is a bit pedantic in nature.
What I really mean is: don't use this answer as your only source; I'm having my doubts about this.

Actually, they aren't. If you want to safely modify counter, then you are doing the correct thing. But if you want to read counter directly you need to declare it as volatile. Otherwise, the compiler has no reason to believe that counter will change because the Interlocked operations are in code that it might not see.

Interlocked ensures that only 1 thread at a time can update the value. To ensure that other threads can read the correct value (and not a cached value) mark it as volatile.
public volatile int Counter;

No; an Interlocked-at-Write-Only alone does not ensure that variable reads in code are actually fresh; a program that does not correctly read from a field as well might not be Thread-Safe, even under a "strong memory model". This applies to any form of assigning to a field shared between threads.
Here is an example of code that will never terminate due to the JIT. (It was modified from Memory Barriers in .NET to be a runnable LINQPad program updated for the question).
// Run this as a LINQPad program in "Release Mode".
// ~ It will never terminate on .NET 4.5.2 / x64. ~
// The program will terminate in "Debug Mode" and may terminate
// in other CLR runtimes and architecture targets.
class X {
// Adding {volatile} would 'fix the problem', as it prevents the JIT
// optimization that results in the non-terminating code.
public int terminate = 0;
public int y;
public void Run() {
var r = new ManualResetEvent(false);
var t = new Thread(() => {
int x = 0;
r.Set();
// Using Volatile.Read or otherwise establishing
// an Acquire Barrier would disable the 'bad' optimization.
while(terminate == 0){x = x * 2;}
y = x;
});
t.Start();
r.WaitOne();
Interlocked.Increment(ref terminate);
t.Join();
Console.WriteLine("Done: " + y);
}
}
void Main()
{
new X().Run();
}
The explanation from Memory Barriers in .NET:
This time it is JIT, not the hardware. It’s clear that JIT has cached the value of the variable terminate [in the EAX register and the] program is now stuck in the loop highlighted above ..
Either using a lock or adding a Thread.MemoryBarrier inside the while loop will fix the problem. Or you can even use Volatile.Read [or a volatile field]. The purpose of the memory barrier here is only to suppress JIT optimizations. Now that we have seen how software and hardware can reorder memory operations, it’s time to discuss memory barriers ..
That is, an additional barrier construct is required on the read side to prevent issues with Compilation and JIT re-ordering / optimizations: this is a different issue than memory coherency!
Adding volatile here would prevent the JIT optimization, and thus 'fix the problem', even if such results in a warning. This program can also be corrected through the use of Volatile.Read or one of the various other operations that cause a barrier: these barriers are as much a part of the CLR/JIT program correctness as the underlying hardware memory fences.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.