I read in the MS documentation that assigning a 64-bit value on a 32-bit Intel computer is not an atomic operation; that is, the operation is not thread safe. This means that if two people simultaneously assign a value to a static Int64 field, the final value of the field cannot be predicted.
Three part question:
Is this really true?
Is this something I would worry about in the real world?
If my application is multi-threaded do I really need to surround all my Int64 assignments with locking code?
This is not about every variable you encounter. If some variable is used as a shared state or something (including, but not limited to some static fields), you should take care of this issue. It's completely non-issue for local variables that are not hoisted as a consequence of being closed over in a closure or an iterator transformation and are used by a single function (and thus, a single thread) at a time.
Even if the writes were atomic, chances are you would still need to take out a lock whenever you accessed the variable. If you didn't do that, you'd at least have to make the variable volatile to make sure that all threads saw the new value the next time they read the variable (which is almost always what you want). That lets you do atomic, volatile sets - but as soon as you want to do anything more interesting, such as adding 5 to it, you'd be back to locking.
Lock free programming is very, very hard to get right. You need to know exactly what you're doing, and keep the complexity to as small a piece of code as possible. Personally, I rarely even try to attempt it other than for very well known patterns such as using a static initializer to initialize a collection and then reading from the collection without locking.
Using the Interlocked class can help in some situations, but it's almost always a lot easier to just take out a lock. Uncontested locks are "pretty cheap" (admittedly they get expensive with more cores, but so does everything) - don't mess around with lock-free code until you've got good evidence that it's actually going to make a significant difference.
MSDN:
Assigning an instance of this type is
not thread safe on all hardware
platforms because the binary
representation of that instance might
be too large to assign in a single
atomic operation.
But also:
As with any other type, reading and
writing to a shared variable that
contains an instance of this type must
be protected by a lock to guarantee
thread safety.
If you do have a shared variable (say, as a static field of a class, or as field of a shared object), and that field or object is going to be used cross-thread, then, yes, you need to make sure that access to that variable is protected via an atomic operation. The x86 processor has intrinsics to make sure this happens, and this facility is exposed through the System.Threading.Interlocked class methods.
For example:
class Program
{
public static Int64 UnsafeSharedData;
public static Int64 SafeSharedData;
static void Main(string[] args)
{
Action<Int32> unsafeAdd = i => { UnsafeSharedData += i; };
Action<Int32> unsafeSubtract = i => { UnsafeSharedData -= i; };
Action<Int32> safeAdd = i => Interlocked.Add(ref SafeSharedData, i);
Action<Int32> safeSubtract = i => Interlocked.Add(ref SafeSharedData, -i);
WaitHandle[] waitHandles = new[] { new ManualResetEvent(false),
new ManualResetEvent(false),
new ManualResetEvent(false),
new ManualResetEvent(false)};
Action<Action<Int32>, Object> compute = (a, e) =>
{
for (Int32 i = 1; i <= 1000000; i++)
{
a(i);
Thread.Sleep(0);
}
((ManualResetEvent) e).Set();
};
ThreadPool.QueueUserWorkItem(o => compute(unsafeAdd, o), waitHandles[0]);
ThreadPool.QueueUserWorkItem(o => compute(unsafeSubtract, o), waitHandles[1]);
ThreadPool.QueueUserWorkItem(o => compute(safeAdd, o), waitHandles[2]);
ThreadPool.QueueUserWorkItem(o => compute(safeSubtract, o), waitHandles[3]);
WaitHandle.WaitAll(waitHandles);
Debug.WriteLine("Unsafe: " + UnsafeSharedData);
Debug.WriteLine("Safe: " + SafeSharedData);
}
}
The results:
Unsafe: -24050275641
Safe: 0
On an interesting side note, I ran this in x64 mode on Vista 64. This shows that 64 bit fields are treated like 32 bit fields by the runtime, that is, 64 bit operations are non-atomic. Anyone know if this is a CLR issue or an x64 issue?
On a 32-bit x86 platform the largest atomic sized piece of memory is 32-bits.
This means that if something writes to or reads from a 64-bit sized variable it's possible for that read/write to get pre-empted during execution.
For example, you start to assign a value to a 64 bit variable.
After the first 32 bits are written the OS decides that another process is going to get CPU time.
The next process attempts to read the variable you were in the middle of assigning to.
That's just one possible race condition with 64-bit assignment on a 32 bit platform.
However, even with 32 bit variable there can be race conditions with reading and writing therefor any shared variable should be synchronized in some way to solve these race conditions.
Is this really true? Yes, as it turns out. If your registers only have 32 bits in them, and you need to store a 64-bit value to some memory location, it's going to take two load operations and two store operations. If your process gets interrupted by another process between these two load/stores, the other process might corrupt half your data! Strange but true. This has been a problem on every processor ever built - if your datatype is longer than your registers, you will have concurrency issues.
Is this something I would worry about in the real world? Yes and no. Since almost all modern programming is given its own address space, you will only need to worry about this if you're doing multi-threaded programming.
If my application is multi-threaded do I really need to surround all my Int64 assignments with locking code? Sadly, yes if you want to get technical. It's usually easier in practice to use a Mutex or Semaphore around larger code blocks than to lock every individual set statement on globally accessible variables.
Related
Given an array of struct:
public struct Instrument
{
public double NoS;
public double Last;
}
var a1 = new Instrument[100];
And a threading task pool that is writing to those elements on the basis that a single element may at most be written to by two threads concurrently, one for each of the double fields (there is upstream queuing by topic effectively).
And the knowledge that double's can be written atomically on 64 bit. (edit this mistakenly said 32 bit originally)
I need to periodically perform a calculation using all the values in the array and I'd like them to be consistent during the calc.
So I can snapshot the array with:
var snapshot = a1.Clone();
Now the question I have is with regards to the specifics of the syncronisation. If I make the members volatile, I don't think that is going to help the clone at all, as the read/write aquire/releases are not at the array level.
Now I could have an array lock, but this will add a lot of contention on the most frequent process of writing data into the array. So not ideal.
Alternatively I could have a per row lock, but that would be a real pain as they'd all need to be aquired prior to clone, meanwhile I've got the writes all backing up.
Now it doesn't really matter if the snapshot doesn't have the very latest value if its a matter of microseconds etc, so I think I could probably get away with just having no lock. My only concern is if there could be a scenario where there isn't a cache writeback for a sustained period. Is this something I should worry about? The writers are in TPL dataflow and the sole logic is to set the two fields in the struct. I don't really know how or if function scope tends to correlate to cache write backs though.
Thoughts/advice?
edit: What about if I used an interlocked write to the variables in the struct?
edit2: The volume of writes is MUCH higher than the reads. There are also two seperate and concurrent services writing to the Nos & Last fields. So they could be being written simultaneously at once. This causes problems with a reference object approach for atomicity.
edit3: Further detail. Assume array is from 30-1000 elements and each element could be being updated multiple times a second.
Since Instrument contains two doubles (two 64-bit values), you can't write it atomically (even on 64-bit machines). This means that the Clone method can never make a thread-safe copy without doing some kind of synchronization.
TLDR; Don't use a struct, use an immutable class.
You would probably have more luck with a small redesign. Try using immutable data structures and concurrent collections from the .NET framework. For instance, make your Instrument an immutable class:
// Important: Note that Instrument is now a CLASS!!
public class Instrument
{
public Instrument(double nos, double last)
{
this.NoS = nos;
this.Last = last;
}
// NOTE: Private setters. Class can't be changed
// after initialization.
public double NoS { get; private set; }
public double Last { get; private set; }
}
This way updating an Instrument means you have to create a new one, which makes it much easier to reason about this. When you are sure that only one thread is working with a single Instrument you are done, since a worker can now safely do this:
Instrument old = a[5];
var newValue = new Instrument(old.NoS + 1, old.Last - 10);
a[5] = newValue;
Since, reference types are 32-bit (or 64-bit on a 64-bit machine) updating the reference is garanteed to be atomic. The clone will now always result in a correct copy (it might lack behind, but that doesn't seem to be a problem for you).
UPDATE
After re-reading your question, I see that I misread it, since one thread is not writing to an Instrument, but is writing to an instrument value, but the solution is practically the same: use immutable reference types. One simple trick for instance, is to change the backing fields of the NoS and Last properties to objects. This makes updating them atomic:
// Instrument can be a struct again.
public struct Instrument
{
private object nos;
private object last;
public double NoS
{
get { return (double)(this.nos ?? 0d); }
set { this.nos = value; }
}
public double Last
{
get { return (double)(this.last ?? 0d); }
set { this.last = value; }
}
}
When changing one of the properties, the value will be boxed, and boxed values are immutable reference types. This way you can safely update those properties.
And the knowledge that double's can be written atomically on 32 bit.
No, that is not guaranteed:
12.5 Atomicity of variable references
Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short,
ushort, uint, int, float, and reference types. In addition, reads and
writes of enum types with an underlying type in the previous list
shall also be atomic. Reads and writes of other types, including long,
ulong, double, and decimal, as well as user-defined types, need not
be atomic.
(emphasis mine)
No guarantee is made regarding doubles on 32-bit, or even on 64-bit. A strcut composed of 2 doubles is even more problematic. You should rethink your strategy.
You could (ab)use a ReaderWriterLockSlim.
Take a read lock when writing (since you say there is no contention between writers).
And take a write lock when cloning.
Not sure I'd do this though unless there's really no alternative. Could be confusing for whoever maintains this down the line.
Reads and writes of individual array elements, or individual struct fields, are generally independent. If while one thread is writing a particular field of a particular struct instance, no other thread will attempt to access that same field, an array of structs will be implicitly threadsafe without any locking required beyond the logic that enforces the above conditions.
If it is possible that one thread might try to read a double while another thread is writing it, but it's not possible that two threads might try to write simultaneously, there are a number of approaches you can take to ensure that a read won't see a partially-written value. One which hasn't been mentioned yet would be to define an int64 field, and use custom methods to read and write double values there (bitwise-converting them, and using Interlocked as needed).
Another approach would be to have a changeCount variable for each array slot, which gets incremented so the two LSB's are "10" before anything else before the struct is written, and Interlocked.Increment it by 2 afterward (see note below). Before code reads the struct, it should check whether a write is in progress. If not, it should perform the read and ensure a write hasn't started or happened (if a write occurred after the read was started, loop back to the beginning). If a write is in progress when code wants to read, it should acquire a shared lock, check whether the write is still in progress, and if so use an interlocked operation to set the LSB of changeCount and Monitor.Wait on the lock. The code which wrote the struct should notice in its Interlocked.Increment that the LSB got set, and should Pulse the lock. If the memory model ensures that reads by a single thread will be processed in order, and that writes by a single thread will be processed in order, and if only one thread will ever try to write an array slot at a time, this approach should limit the multi-processor overhead to a single Interlocked operation in the non-contention case. Note that one must carefully study the rules about what is or is not implied by the memory model before using this sort of code, since it can be tricky.
BTW, there are two more approaches one could take if one wanted to have each array element be a class type rather than a struct:
Use an immutable class type, and use `Interlocked.CompareExchange` any time you want to update an element. The pattern to use is this:
MyClass oldVal,newVal;
do
{
oldVal = theArray[subscript];
newVal = new MyClass(oldVal.this, oldVal.that+5); // Or whatever change
} while (Threading.Interlocked.CompareExchange(theArray[subscript], newVal, oldVal) != oldVal);
This approach will always yield a logically-correct atomic update of the array element. If, between the time the array element is read and the time it is updated, something else changes the value, the `CompareExchange` will leave the array element unaffected, and the code will loop back and try again. This approach works reasonably well in the absence of contention, though every update will require generating a new object instance. If many threads are trying to update the same array slot, however, and the constructor for `MyClass` takes any significant amount of time to execute, it's possible for code to thrash, repeatedly creating new objects and then finding out they're obsolete by the time they could be stored. Code will always make forward progress, but not necessarily quickly.
Use a mutable class, and lock on the class objects any time one wishes to read or write them. This approach would avoid having to create new class object instances any time something is changed, but locking would add some overhead of its own. Note that both reads and writes would have to be locked, whereas the immutable-class approach only required `Interlocked` methods to be used on writes.
I tend to think arrays of structs are nicer data holders than arrays of class objects, but both approaches have advantages.
Ok, so had a think about this over lunch.
I see two, possibly 3 solutions here.
First important note: The immutable idea does not work in my use case because I have two services running in parallel writing to NoS and Last independently. This means that I would need an extra layer of sync logic between those two services to ensure that whilst the new ref is being created by one services, the other one is not doing the same. Classic race condition problem so definitely not right for this problem (although yes I could have a ref for each double and do it that way but its getting ridiculous at that point)
Solution 1
Whole cache level lock. Maybe use a spinlock and just lock for all updates and the snapshot (with memcpy). This is simplest and probably totally fine for volumes I'm talking about.
Solution 2
Make all writes to doubles use interlocked writes. when I want to snapshot, iterate the array and each value using interlocked read to populate the copy. This may cause per struct tearing but the doubles are intact which is fine as this is continuously updating data so the concept of latest is a little abstract.
Solution 3
Don't think this will work, but what about interlocked writes to all doubles, and then just use memcopy. I am not sure if I will get tearing of the doubles though? (remember I don't care about tearing at struct level).
If solution 3 works then I guess its best performance, but otherwise I am more inclined for solution 1.
I have a thread that spins until an int changed by another thread is a certain value.
int cur = this.m_cur;
while (cur > this.Max)
{
// spin until cur is <= max
cur = this.m_cur;
}
Does this.m_cur need to be declared volatile for this to work? Is it possible that this will spin forever due to compiler optimization?
Yes, that's a hard requirement. The just-in-time compiler is allowed to store the value of m_cur in a processor register without refreshing it from memory. The x86 jitter in fact does, the x64 jitter doesn't (at least the last time I looked at it).
The volatile keyword is required to suppress this optimization.
Volatile means something entirely different on Itanium cores, a processor with a weak memory model. Unfortunately that's what made it into the MSDN library and C# Language Specification. What it is going to to mean on an ARM core remains to be seen.
The blog below has some fascinating detail on the memory model in c#. In short, it seems safer to use the volatile keyword.
http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
From the blog below
class Test
{
private bool _loop = true;
public static void Main()
{
Test test1 = new Test();
// Set _loop to false on another thread
new Thread(() => { test1._loop = false;}).Start();
// Poll the _loop field until it is set to false
while (test1._loop == true) ;
// The loop above will never terminate!
}
}
There are two possible ways to get the while loop to terminate: Use a
lock to protect all accesses (reads and writes) to the _loop field
Mark the _loop field as volatile There are two reasons why a read of a
non-volatile field may observe a stale value: compiler optimizations
and processor optimizations.
It depends on how m_cur is being modified. If it's using a normal assignment statement such as m_cur--;, then it does need to be volatile. However, if it's being modified using one of the Interlocked operations, then it doesn't because Interlocked's methods automatically insert a memory barrier to ensure that all threads get the memo.
In general, using Interlocked to modify atomic valued that are shared across threads is the preferable option. Not only does it take care of the memory barrier for you, but it also tends to be a bit faster than other synchronization options.
That said, like others have said polling loops are enormously wasteful. It would be better to pause the thread that needs to wait, and let whoever is modifying m_cur take charge of waking it up when the time comes. Both Monitor.Wait() and Monitor.Pulse() and AutoResetEvent might be well-suited to the task, depending on your specific needs.
I wanted to understand on when exactly I need to declare a variable as volatile. For that I wrote a small program and was expecting it to go into infinite loop because of missing volatility of a condition variable. It did not went into infinite loop and worked fine without volatile keyword.
Two questions:
What should I change in the below code listing - so that it absolutely requires use of volatile?
Is C# compiler smart enough to treat a variable as volatile - if it sees that a variable is being accessed from a different thread?
The above triggered more questions to me :)
a. Is volatile just a hint?
b. When should I declare a variable as volatile in context of multithreading?
c. Should all member variables be declared volatile for a thread safe class? Is that overkill?
Code Listing (Volatility and not thread safety is the focus):
class Program
{
static void Main(string[] args)
{
VolatileDemo demo = new VolatileDemo();
demo.Start();
Console.WriteLine("Completed");
Console.Read();
}
}
public class VolatileDemo
{
public VolatileDemo()
{
}
public void Start()
{
var thread = new Thread(() =>
{
Thread.Sleep(5000);
stop = true;
});
thread.Start();
while (stop == false)
Console.WriteLine("Waiting For Stop Event");
}
private bool stop = false;
}
Thanks.
Firstly, Joe Duffy says "volatile is evil" - that's good enough for me.
If you do want to think about volatile, you must think in terms of memory fences and optimisations - by the compiler, jitter and CPU.
On x86, writes are release fences, which means your background thread will flush the true value to memory.
So, what you are looking for is a caching of the false value in your loop predicate. The complier or jitter may optimise the predicate and only evaluate it once, but I guess it doesn't do that for a read of a class field. The CPU will not cache the false value because you are calling Console.WriteLine which includes a fence.
This code requires volatile and will never terminate without a Volatile.Read:
static void Run()
{
bool stop = false;
Task.Factory.StartNew( () => { Thread.Sleep( 1000 ); stop = true; } );
while ( !stop ) ;
}
I am not an expert in C# concurrency, but AFAIK your expectation is incorrect. Modifying a non-volatile variable from a different thread does not mean that the change will never become visible to other threads. Only that there is no guarantee when (and if) it happens. In your case it did happen (how many times did you run the program btw?), possibly due to the finishing thread flushing its changes as per #Russell's comment. But in a real life setup - involving more complex program flow, more variables, more threads - the update may happen later than 5 seconds, or - maybe once in a thousand cases - may not happen at all.
So running your program once - or even a million times - while not observing any problems only provides statistical, not absolute proof. "Absence of evidence is not evidence of absence".
Try to rewrite it like this:
public void Start()
{
var thread = new Thread(() =>
{
Thread.Sleep(5000);
stop = true;
});
thread.Start();
bool unused = false;
while (stop == false)
unused = !unused; // fake work to prevent optimization
}
And make sure you are running in Release mode and not Debug mode. In Release mode optimizations are applied which actually cause the code to fail in the absence of volatile.
Edit: A bit about volatile:
We all know that there are two distinct entities involved in a program lifecycle that can apply optimizations in the form of variable caching and/or instruction reordering: the compiler and the CPU.
This means that there may be even a large difference between how you wrote your code and how it actually gets executed, as instructions may be reordered with respect to eachother, or reads may be cached in what the compiler perceives as being an "improvement in speed".
Most of the times this is good, but sometimes (especially in the multithreading context) it can cause trouble as seen in this example. To allow the programmer to manually prevent such optimizations, memory fences were introduced, which are special instructions whose role is to prevent both reordering of instructions (just reads, just writes or both) with respect to the fence itself and also force the invalidation of values in CPU caches, such that they need to be re-read every time (which is what we want in the scenario above).
Although you can specify a full fence affecting all variables through Thread.MemoryBarrier(), it's almost always an overkill if you need only one variable to be affected. Thus, for a single variable to be always up-to-date across threads, you can use volatile to introduce read/write fences for that variable only.
volatile keyword is a message to a compiler not to make single-thread optimizations on this variable.
It means that this variable may be modified by multi threads.
This makes the variable value the most 'fresh' while reading.
The piece of code you've pasted here is a good example to use volatile keyword.
It's not a surprise that this code works without 'volatile' keyword. However it may behave more unpredictible when more threads are running and you perform more sophisticated actions on the flag value.
You declare volatile only on those variables which can be modified by several threads.
I don't know exactly how it is in C#, but I assume you can't use volatile on those variables which are modified by read-write actions (such as incrementation). Volatile doesn't use locks while changing the value.
So setting the flag on volatile (like above) is OK, incrementing the variable is not OK - you should use synchronization/locking mechanism then.
When the background thread assigns true to the member variable there is a release fence and the value is written to memory and the other processor's cache is updated or flushed of that address.
The function call to Console.WriteLine is a full memory fence and its semantics of possibly doing anything (short of compiler optimisations) would require that stop not be cached.
However if you remove the call to Console.WriteLine, I find that the function is still halting.
I believe that the compiler in the absence of optimisations the compiler does not cache anything calculated from global memory. The volatile keyword is then an instruction not to even think of caching any expression involving the variable to the compiler / JIT.
This code still halts (at least for me, I am using Mono):
public void Start()
{
stop = false;
var thread = new Thread(() =>
{
while(true)
{
Thread.Sleep(50);
stop = !stop;
}
});
thread.Start();
while ( !(stop ^ stop) );
}
This shows that it's not the while statement preventing caching, because this shows the variable not being cached even within the same expression statement.
This optimisation look sensitive to the memory model, which is platform dependent meaning this would be done in the JIT compiler; which wouldn't have time (or intelligence) to /see/ the usage of the variable in the other thread and prevent caching for that reason.
Perhaps Microsoft doesn't believe programmers capable of knowing when to use volatile and decided to strip them of the responsibility, and then Mono followed suit.
Suppose I have a variable "counter", and there are several threads accessing and setting the value of "counter" by using Interlocked, i.e.:
int value = Interlocked.Increment(ref counter);
and
int value = Interlocked.Decrement(ref counter);
Can I assume that, the change made by Interlocked will be visible in all threads?
If not, what should I do to make all threads synchronize the variable?
EDIT: someone suggested me to use volatile. But when I set the "counter" as volatile, there is compiler warning "reference to volatile field will not be treated as volatile".
When I read online help, it said, "A volatile field should not normally be passed using a ref or out parameter".
InterlockedIncrement/Decrement on x86 CPUs (x86's lock add/dec) are automatically creating memory barrier which gives visibility to all threads (i.e., all threads can see its update as in-order, like sequential memory consistency). Memory barrier makes all pending memory loads/stores to be completed. volatile is not related to this question although C# and Java (and some C/C++ compilers) enforce volatile to make memory barrier. But, interlocked operation already has memory barrier by CPU.
Please also take a look my another answer in stackoverflow.
Note that I have assume that C#'s InterlockedIncrement/Decrement are intrinsic mapping to x86's lock add/dec.
Can I assume that, the change made by Interlocked will be visible in all threads?
This depends on how you read the value. If you "just" read it, then no, this won't always be visible in other threads unless you mark it as volatile. That causes an annoying warning though.
As an alternative (and much preferred IMO), read it using another Interlocked instruction. This will always see the updated value on all threads:
int readvalue = Interlocked.CompareExchange(ref counter, 0, 0);
which returns the value read, and if it was 0 swaps it with 0.
Motivation: the warning hints that something isn't right; combining the two techniques (volatile & interlocked) wasn't the intended way to do this.
Update: it seems that another approach to reliable 32-bit reads without using "volatile" is by using Thread.VolatileRead as suggested in this answer. There is also some evidence that I am completely wrong about using Interlocked for 32-bit reads, for example this Connect issue, though I wonder if the distinction is a bit pedantic in nature.
What I really mean is: don't use this answer as your only source; I'm having my doubts about this.
Actually, they aren't. If you want to safely modify counter, then you are doing the correct thing. But if you want to read counter directly you need to declare it as volatile. Otherwise, the compiler has no reason to believe that counter will change because the Interlocked operations are in code that it might not see.
Interlocked ensures that only 1 thread at a time can update the value. To ensure that other threads can read the correct value (and not a cached value) mark it as volatile.
public volatile int Counter;
No; an Interlocked-at-Write-Only alone does not ensure that variable reads in code are actually fresh; a program that does not correctly read from a field as well might not be Thread-Safe, even under a "strong memory model". This applies to any form of assigning to a field shared between threads.
Here is an example of code that will never terminate due to the JIT. (It was modified from Memory Barriers in .NET to be a runnable LINQPad program updated for the question).
// Run this as a LINQPad program in "Release Mode".
// ~ It will never terminate on .NET 4.5.2 / x64. ~
// The program will terminate in "Debug Mode" and may terminate
// in other CLR runtimes and architecture targets.
class X {
// Adding {volatile} would 'fix the problem', as it prevents the JIT
// optimization that results in the non-terminating code.
public int terminate = 0;
public int y;
public void Run() {
var r = new ManualResetEvent(false);
var t = new Thread(() => {
int x = 0;
r.Set();
// Using Volatile.Read or otherwise establishing
// an Acquire Barrier would disable the 'bad' optimization.
while(terminate == 0){x = x * 2;}
y = x;
});
t.Start();
r.WaitOne();
Interlocked.Increment(ref terminate);
t.Join();
Console.WriteLine("Done: " + y);
}
}
void Main()
{
new X().Run();
}
The explanation from Memory Barriers in .NET:
This time it is JIT, not the hardware. It’s clear that JIT has cached the value of the variable terminate [in the EAX register and the] program is now stuck in the loop highlighted above ..
Either using a lock or adding a Thread.MemoryBarrier inside the while loop will fix the problem. Or you can even use Volatile.Read [or a volatile field]. The purpose of the memory barrier here is only to suppress JIT optimizations. Now that we have seen how software and hardware can reorder memory operations, it’s time to discuss memory barriers ..
That is, an additional barrier construct is required on the read side to prevent issues with Compilation and JIT re-ordering / optimizations: this is a different issue than memory coherency!
Adding volatile here would prevent the JIT optimization, and thus 'fix the problem', even if such results in a warning. This program can also be corrected through the use of Volatile.Read or one of the various other operations that cause a barrier: these barriers are as much a part of the CLR/JIT program correctness as the underlying hardware memory fences.
I've been raised to believe that if multiple threads can access a variable, then all reads from and writes to that variable must be protected by synchronization code, such as a "lock" statement, because the processor might switch to another thread halfway through a write.
However, I was looking through System.Web.Security.Membership using Reflector and found code like this:
public static class Membership
{
private static bool s_Initialized = false;
private static object s_lock = new object();
private static MembershipProvider s_Provider;
public static MembershipProvider Provider
{
get
{
Initialize();
return s_Provider;
}
}
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
// Perform initialization...
s_Initialized = true;
}
}
}
Why is the s_Initialized field read outside of the lock? Couldn't another thread be trying to write to it at the same time? Are reads and writes of variables atomic?
For the definitive answer go to the spec. :)
Partition I, Section 12.6.6 of the CLI spec states: "A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size is atomic when all the write accesses to a location are the same size."
So that confirms that s_Initialized will never be unstable, and that read and writes to primitve types smaller than 32 bits are atomic.
In particular, double and long (Int64 and UInt64) are not guaranteed to be atomic on a 32-bit platform. You can use the methods on the Interlocked class to protect these.
Additionally, while reads and writes are atomic, there is a race condition with addition, subtraction, and incrementing and decrementing primitive types, since they must be read, operated on, and rewritten. The interlocked class allows you to protect these using the CompareExchange and Increment methods.
Interlocking creates a memory barrier to prevent the processor from reordering reads and writes. The lock creates the only required barrier in this example.
This is a (bad) form of the double check locking pattern which is not thread safe in C#!
There is one big problem in this code:
s_Initialized is not volatile. That means that writes in the initialization code can move after s_Initialized is set to true and other threads can see uninitialized code even if s_Initialized is true for them. This doesn't apply to Microsoft's implementation of the Framework because every write is a volatile write.
But also in Microsoft's implementation, reads of the uninitialized data can be reordered (i.e. prefetched by the cpu), so if s_Initialized is true, reading the data that should be initialized can result in reading old, uninitialized data because of cache-hits (ie. the reads are reordered).
For example:
Thread 1 reads s_Provider (which is null)
Thread 2 initializes the data
Thread 2 sets s\_Initialized to true
Thread 1 reads s\_Initialized (which is true now)
Thread 1 uses the previously read Provider and gets a NullReferenceException
Moving the read of s_Provider before the read of s_Initialized is perfectly legal because there is no volatile read anywhere.
If s_Initialized would be volatile, the read of s_Provider would not be allowed to move before the read of s_Initialized and also the initialization of the Provider is not allowed to move after s_Initialized is set to true and everything is ok now.
Joe Duffy also wrote an Article about this problem: Broken variants on double-checked locking
Hang about -- the question that is in the title is definitely not the real question that Rory is asking.
The titular question has the simple answer of "No" -- but this is no help at all, when you see the real question -- which i don't think anyone has given a simple answer to.
The real question Rory asks is presented much later and is more pertinent to the example he gives.
Why is the s_Initialized field read
outside of the lock?
The answer to this is also simple, though completely unrelated to the atomicity of variable access.
The s_Initialized field is read outside of the lock because locks are expensive.
Since the s_Initialized field is essentially "write once" it will never return a false positive.
It's economical to read it outside the lock.
This is a low cost activity with a high chance of having a benefit.
That's why it's read outside of the lock -- to avoid paying the cost of using a lock unless it's indicated.
If locks were cheap the code would be simpler, and omit that first check.
(edit: nice response from rory follows. Yeh, boolean reads are very much atomic. If someone built a processor with non-atomic boolean reads, they'd be featured on the DailyWTF.)
The correct answer seems to be, "Yes, mostly."
John's answer referencing the CLI spec indicates that accesses to variables not larger than 32 bits on a 32-bit processor are atomic.
Further confirmation from the C# spec, section 5.5, Atomicity of variable references:
Reads and writes of the following data types are atomic: bool, char,
byte, sbyte, short, ushort, uint, int, float, and reference types. In
addition, reads and writes of enum types with an underlying type in
the previous list are also atomic. Reads and writes of other types,
including long, ulong, double, and decimal, as well as user-defined
types, are not guaranteed to be atomic.
The code in my example was paraphrased from the Membership class, as written by the ASP.NET team themselves, so it was always safe to assume that the way it accesses the s_Initialized field is correct. Now we know why.
Edit: As Thomas Danecker points out, even though the access of the field is atomic, s_Initialized should really be marked volatile to make sure that the locking isn't broken by the processor reordering the reads and writes.
The Initialize function is faulty. It should look more like this:
private static void Initialize()
{
if(s_initialized)
return;
lock(s_lock)
{
if(s_Initialized)
return;
s_Initialized = true;
}
}
Without the second check inside the lock it's possible the initialisation code will be executed twice. So the first check is for performance to save you taking a lock unnecessarily, and the second check is for the case where a thread is executing the initialisation code but hasn't yet set the s_Initialized flag and so a second thread would pass the first check and be waiting at the lock.
Reads and writes of variables are not atomic. You need to use Synchronisation APIs to emulate atomic reads/writes.
For an awesome reference on this and many more issues to do with concurrency, make sure you grab a copy of Joe Duffy's latest spectacle. It's a ripper!
"Is accessing a variable in C# an atomic operation?"
Nope. And it's not a C# thing, nor is it even a .net thing, it's a processor thing.
OJ is spot on that Joe Duffy is the guy to go to for this kind of info. ANd "interlocked" is a great search term to use if you're wanting to know more.
"Torn reads" can occur on any value whose fields add up to more than the size of a pointer.
An If (itisso) { check on a boolean is atomic, but even if it was not
there is no need to lock the first check.
If any thread has completed the Initialization then it will be true. It does not matter if several threads are checking at once. They will all get the same answer, and, there will be no conflict.
The second check inside the lock is necessary because another thread may have grabbed the lock first and completed the initialization process already.
You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.
That is not correct. You will still encounter the problem of a second thread passing the check before the first thread has had a chance to to set the flag which will result in multiple executions of the initialisation code.
I think you're asking if s_Initialized could be in an unstable state when read outside the lock. The short answer is no. A simple assignment/read will boil down to a single assembly instruction which is atomic on every processor I can think of.
I'm not sure what the case is for assignment to 64 bit variables, it depends on the processor, I would assume that it is not atomic but it probably is on modern 32 bit processors and certainly on all 64 bit processors. Assignment of complex value types will not be atomic.
I thought they were - I'm not sure of the point of the lock in your example unless you're also doing something to s_Provider at the same time - then the lock would ensure that these calls happened together.
Does that //Perform initialization comment cover creating s_Provider? For instance
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}
Otherwise that static property-get's just going to return null anyway.
Perhaps Interlocked gives a clue. And otherwise this one i pretty good.
I would have guessed that their not atomic.
To make your code always work on weakly ordered architectures, you must put a MemoryBarrier before you write s_Initialized.
s_Provider = new MemershipProvider;
// MUST PUT BARRIER HERE to make sure the memory writes from the assignment
// and the constructor have been wriitten to memory
// BEFORE the write to s_Initialized!
Thread.MemoryBarrier();
// Now that we've guaranteed that the writes above
// will be globally first, set the flag
s_Initialized = true;
The memory writes that happen in the MembershipProvider constructor and the write to s_Provider are not guaranteed to happen before you write to s_Initialized on a weakly ordered processor.
A lot of thought in this thread is about whether something is atomic or not. That is not the issue. The issue is the order that your thread's writes are visible to other threads. On weakly ordered architectures, writes to memory do not occur in order and THAT is the real issue, not whether a variable fits within the data bus.
EDIT: Actually, I'm mixing platforms in my statements. In C# the CLR spec requires that writes are globally visible, in-order (by using expensive store instructions for every store if necessary). Therefore, you don't need to actually have that memory barrier there. However, if it were C or C++ where no such guarantee of global visibility order exists, and your target platform may have weakly ordered memory, and it is multithreaded, then you would need to ensure that the constructors writes are globally visible before you update s_Initialized, which is tested outside the lock.
What you're asking is whether accessing a field in a method multiple times atomic -- to which the answer is no.
In the example above, the initialise routine is faulty as it may result in multiple initialization. You would need to check the s_Initialized flag inside the lock as well as outside, to prevent a race condition in which multiple threads read the s_Initialized flag before any of them actually does the initialisation code. E.g.,
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}
Ack, nevermind... as pointed out, this is indeed incorrect. It doesn't prevent a second thread from entering the "initialize" code section. Bah.
You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.