What Interlocked.CompareExchange is used for in the dapper .net method?

What Interlocked.CompareExchange is used for in the dapper .net method? - c#

In dapper code in Link.TryAdd method, there is the following piece of code:
var snapshot = Interlocked.CompareExchange(ref head, null, null);
Why is this required instead of simple:
var snapshot = head;
both lines do not change the value of head, both lines assign the value of head to snapshot. Why the first one was chosen over the second?
Edit: the code I'm referring to is here: https://github.com/SamSaffron/dapper-dot-net/blob/77227781c562e65c167bf7a933d69291d5bdc6f3/Dapper/SqlMapper.cs

They want to do a volatile read however there is no overload of Thread.VolatileRead that takes a generic type parameter. Using Interlocked.CompareExchange this way achieves the same result.
The problem they are trying to solve is that the JIT compiler can optimize away the assignment to a temp if it sees fit. This can cause threading problems if another thread mutates the head reference while the current thread is using it in a sequence of operations.
Edit:
The issue is not that a stale value is read at beginning of TryAdd. The problem is that on line 105 they need to compare the current head to the previous head (held in snapshot). If there is an optimization then there is no snapshot variable holding the previous value and head is read again at that point. It is very likely that CompareExchange succeeds even though head might have changed between lines 103 and 105. The result is that a node in the list is lost if two threads call TryAdd simultaneously.

mike z is right: This is preventing a (legal) JIT optimization that would break the code.
They could have used the volatile struct trick, though: Read head and assign it to a volatile field of some struct. Next, read it from that field and it is guaranteed to be a volatile read!
The struct itself doesn't matter at all. All that matters is that a volatile field was used to copy the variable through.
Like that:
struct VolatileHelper<T> { public volatile T Value; }
...
var volatileHelper = new VolatileHelper<Field>();
volatileHelper.Value = head;
var snapshot = volatileHelper.Value;
Hopefully, it has no runtime cost. In any case, the cost is less than an interlocked operation which is causing CPU memory coherency traffic.
Actually, the fact that every cache access (even a reading one) requires memory coherency traffic makes this a slow cache! Interlocked operations are a system global resource that does not scale with more CPUs. An Interlocked access uses a global hardware lock (per memory address, but there is only one address here).

Related

.NET Volatile.Read/Write and Interlocked scope

I have read the threading manual and relevant MSDN pages and SO questions several times. Still, I do not completely understand if Volatile.Read/Write and interlocked operations apply only to the relevant variables, or all read/writes before/after that operations.
E.g., imagine I have an array and a counter.
long counter = 0;
var values = new double[1000000];
values[42] = 3.1415;
// Is this line needed instead of simple assignment above,
// or the implicit full-fence of Interlocked will guarantee that
// all threads will see the values[42] after interlocked increment?
//Volatile.Write(ref values[42], 3.1415);
Interlocked.Increment(ref counter);
Does interlocked increment guarantees the same result as if I used Volatile.Write(ref values[42], 3.1415); instead of values[42] = 3.1415;.
What if I have an array of reference types, e.g. some POCO, and set an instance fields before interlocked increment. Does the implicit full fence apply to all read/writes from that thread before it, or only to the counter?
I am implementing a scalable reader/writer scheme and I found the following statement in the Joe Duffy post:
If the variables protected are references to heap objects, you need to worry about using the read protection each time you touch a field. Just like locks, this technique doesn’t compose. As with anything other than simple locking, use this technique with great care and caution; although the built-in acquire and release fences shield you from memory model reordering issues, there are some easy traps you can fall into.
Is this just a general statement to discourage using low-lock constructs, or somehow applies to the example above?

What you are probably missing is an understanding of fences. This is the best resource to read up on them: http://www.albahari.com/threading/part4.aspx
The short answer is Interlocked.Increment issues a full fence which is independent of the variable it is updating. I believe Volatile.Write issues a half fence. A half fence can be constructed from Thread.MemoryBarrier. When we say Interlocked.Increment issues a full fence it means that Thread.MemoryBarrier is called before and after the operation. Volatile.Write calls Thread.MemoryBarrier before the write and Volatile.Read after. The fences determine when memory access can be reordered (and it's not variable specific as Thread.MemoryBarrier is parameterless).

c# concurrency of struct array

Given an array of struct:
public struct Instrument
{
public double NoS;
public double Last;
}
var a1 = new Instrument[100];
And a threading task pool that is writing to those elements on the basis that a single element may at most be written to by two threads concurrently, one for each of the double fields (there is upstream queuing by topic effectively).
And the knowledge that double's can be written atomically on 64 bit. (edit this mistakenly said 32 bit originally)
I need to periodically perform a calculation using all the values in the array and I'd like them to be consistent during the calc.
So I can snapshot the array with:
var snapshot = a1.Clone();
Now the question I have is with regards to the specifics of the syncronisation. If I make the members volatile, I don't think that is going to help the clone at all, as the read/write aquire/releases are not at the array level.
Now I could have an array lock, but this will add a lot of contention on the most frequent process of writing data into the array. So not ideal.
Alternatively I could have a per row lock, but that would be a real pain as they'd all need to be aquired prior to clone, meanwhile I've got the writes all backing up.
Now it doesn't really matter if the snapshot doesn't have the very latest value if its a matter of microseconds etc, so I think I could probably get away with just having no lock. My only concern is if there could be a scenario where there isn't a cache writeback for a sustained period. Is this something I should worry about? The writers are in TPL dataflow and the sole logic is to set the two fields in the struct. I don't really know how or if function scope tends to correlate to cache write backs though.
Thoughts/advice?
edit: What about if I used an interlocked write to the variables in the struct?
edit2: The volume of writes is MUCH higher than the reads. There are also two seperate and concurrent services writing to the Nos & Last fields. So they could be being written simultaneously at once. This causes problems with a reference object approach for atomicity.
edit3: Further detail. Assume array is from 30-1000 elements and each element could be being updated multiple times a second.

Since Instrument contains two doubles (two 64-bit values), you can't write it atomically (even on 64-bit machines). This means that the Clone method can never make a thread-safe copy without doing some kind of synchronization.
TLDR; Don't use a struct, use an immutable class.
You would probably have more luck with a small redesign. Try using immutable data structures and concurrent collections from the .NET framework. For instance, make your Instrument an immutable class:
// Important: Note that Instrument is now a CLASS!!
public class Instrument
{
public Instrument(double nos, double last)
{
this.NoS = nos;
this.Last = last;
}
// NOTE: Private setters. Class can't be changed
// after initialization.
public double NoS { get; private set; }
public double Last { get; private set; }
}
This way updating an Instrument means you have to create a new one, which makes it much easier to reason about this. When you are sure that only one thread is working with a single Instrument you are done, since a worker can now safely do this:
Instrument old = a[5];
var newValue = new Instrument(old.NoS + 1, old.Last - 10);
a[5] = newValue;
Since, reference types are 32-bit (or 64-bit on a 64-bit machine) updating the reference is garanteed to be atomic. The clone will now always result in a correct copy (it might lack behind, but that doesn't seem to be a problem for you).
UPDATE
After re-reading your question, I see that I misread it, since one thread is not writing to an Instrument, but is writing to an instrument value, but the solution is practically the same: use immutable reference types. One simple trick for instance, is to change the backing fields of the NoS and Last properties to objects. This makes updating them atomic:
// Instrument can be a struct again.
public struct Instrument
{
private object nos;
private object last;
public double NoS
{
get { return (double)(this.nos ?? 0d); }
set { this.nos = value; }
}
public double Last
{
get { return (double)(this.last ?? 0d); }
set { this.last = value; }
}
}
When changing one of the properties, the value will be boxed, and boxed values are immutable reference types. This way you can safely update those properties.

And the knowledge that double's can be written atomically on 32 bit.
No, that is not guaranteed:
12.5 Atomicity of variable references
Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short,
ushort, uint, int, float, and reference types. In addition, reads and
writes of enum types with an underlying type in the previous list
shall also be atomic. Reads and writes of other types, including long,
ulong, double, and decimal, as well as user-defined types, need not
be atomic.
(emphasis mine)
No guarantee is made regarding doubles on 32-bit, or even on 64-bit. A strcut composed of 2 doubles is even more problematic. You should rethink your strategy.

You could (ab)use a ReaderWriterLockSlim.
Take a read lock when writing (since you say there is no contention between writers).
And take a write lock when cloning.
Not sure I'd do this though unless there's really no alternative. Could be confusing for whoever maintains this down the line.

Reads and writes of individual array elements, or individual struct fields, are generally independent. If while one thread is writing a particular field of a particular struct instance, no other thread will attempt to access that same field, an array of structs will be implicitly threadsafe without any locking required beyond the logic that enforces the above conditions.
If it is possible that one thread might try to read a double while another thread is writing it, but it's not possible that two threads might try to write simultaneously, there are a number of approaches you can take to ensure that a read won't see a partially-written value. One which hasn't been mentioned yet would be to define an int64 field, and use custom methods to read and write double values there (bitwise-converting them, and using Interlocked as needed).
Another approach would be to have a changeCount variable for each array slot, which gets incremented so the two LSB's are "10" before anything else before the struct is written, and Interlocked.Increment it by 2 afterward (see note below). Before code reads the struct, it should check whether a write is in progress. If not, it should perform the read and ensure a write hasn't started or happened (if a write occurred after the read was started, loop back to the beginning). If a write is in progress when code wants to read, it should acquire a shared lock, check whether the write is still in progress, and if so use an interlocked operation to set the LSB of changeCount and Monitor.Wait on the lock. The code which wrote the struct should notice in its Interlocked.Increment that the LSB got set, and should Pulse the lock. If the memory model ensures that reads by a single thread will be processed in order, and that writes by a single thread will be processed in order, and if only one thread will ever try to write an array slot at a time, this approach should limit the multi-processor overhead to a single Interlocked operation in the non-contention case. Note that one must carefully study the rules about what is or is not implied by the memory model before using this sort of code, since it can be tricky.
BTW, there are two more approaches one could take if one wanted to have each array element be a class type rather than a struct:
Use an immutable class type, and use `Interlocked.CompareExchange` any time you want to update an element. The pattern to use is this:
MyClass oldVal,newVal;
do
{
oldVal = theArray[subscript];
newVal = new MyClass(oldVal.this, oldVal.that+5); // Or whatever change
} while (Threading.Interlocked.CompareExchange(theArray[subscript], newVal, oldVal) != oldVal);
This approach will always yield a logically-correct atomic update of the array element. If, between the time the array element is read and the time it is updated, something else changes the value, the `CompareExchange` will leave the array element unaffected, and the code will loop back and try again. This approach works reasonably well in the absence of contention, though every update will require generating a new object instance. If many threads are trying to update the same array slot, however, and the constructor for `MyClass` takes any significant amount of time to execute, it's possible for code to thrash, repeatedly creating new objects and then finding out they're obsolete by the time they could be stored. Code will always make forward progress, but not necessarily quickly.
Use a mutable class, and lock on the class objects any time one wishes to read or write them. This approach would avoid having to create new class object instances any time something is changed, but locking would add some overhead of its own. Note that both reads and writes would have to be locked, whereas the immutable-class approach only required `Interlocked` methods to be used on writes.
I tend to think arrays of structs are nicer data holders than arrays of class objects, but both approaches have advantages.

Ok, so had a think about this over lunch.
I see two, possibly 3 solutions here.
First important note: The immutable idea does not work in my use case because I have two services running in parallel writing to NoS and Last independently. This means that I would need an extra layer of sync logic between those two services to ensure that whilst the new ref is being created by one services, the other one is not doing the same. Classic race condition problem so definitely not right for this problem (although yes I could have a ref for each double and do it that way but its getting ridiculous at that point)
Solution 1
Whole cache level lock. Maybe use a spinlock and just lock for all updates and the snapshot (with memcpy). This is simplest and probably totally fine for volumes I'm talking about.
Solution 2
Make all writes to doubles use interlocked writes. when I want to snapshot, iterate the array and each value using interlocked read to populate the copy. This may cause per struct tearing but the doubles are intact which is fine as this is continuously updating data so the concept of latest is a little abstract.
Solution 3
Don't think this will work, but what about interlocked writes to all doubles, and then just use memcopy. I am not sure if I will get tearing of the doubles though? (remember I don't care about tearing at struct level).
If solution 3 works then I guess its best performance, but otherwise I am more inclined for solution 1.

How does this MSDN CompareExchange sample not need a volatile read?

I was looking for a thread-safe counter implementation using Interlocked that supported incrementing by arbitrary values, and found this sample straight from the Interlocked.CompareExchange documentation (slightly changed for simplicity):
private int totalValue = 0;
public int AddToTotal(int addend)
{
int initialValue, computedValue;
do
{
// How can we get away with not using a volatile read of totalValue here?
// Shouldn't we use CompareExchange(ref TotalValue, 0, 0)
// or Thread.VolatileRead
// or declare totalValue to be volatile?
initialValue = totalValue;
computedValue = initialValue + addend;
} while (initialValue != Interlocked.CompareExchange(
ref totalValue, computedValue, initialValue));
return computedValue;
}
public int Total
{
// This looks *really* dodgy too, but isn't
// the target of my question.
get { return totalValue; }
}
I get what this code is trying to do, but I'm not sure how it can get away with not using a volatile read of the shared variable when assigning to the temporary variable that is added to.
Is there a chance that initialValue will hold a stale value throughout the loop, making the function never return? Or does the memory-barrier (?) in CompareExchange eliminate any such possibility? Any insight would be appreciated.
EDIT: I should clarify that I understand that if CompareExchange caused the subsequent read of totalValue to be up to date as of the last CompareExchange call, then this code would be fine. But is that guaranteed?

If we read a stale value, then the CompareExchange won't perform the exchange - we're basically saying, "Only do the operation if the value really is the one we've based our calculation on." So long as at some point we get the right value, it's fine. It would be a problem if we kept reading the same stale value forever, so CompareExchange never passed the check, but I strongly suspect that the CompareExchange memory barriers mean that at least after the time through the loop, we'll read an up-to-date value. The worst that could happen would be cycling forever though - the important point is that we can't possibly update the variable in an incorrect way.
(And yes, I think you're right that the Total property is dodgy.)
EDIT: To put it another way:
CompareExchange(ref totalValue, computedValue, initialValue)
means: "If the current state really was initialValue, then my calculations are valid and you should set it to computedValue."
The current state could be wrong for at least two reasons:
The initialValue = totalValue; assignment used a stale read with a different old value
Something changed totalValue after that assignment
We don't need to handle those situations differently at all - so it's fine to do a "cheap" read so long as at some point we'll starting seeing up-to-date values... and I believe the memory barriers involved in CompareExchange will ensure that as we loop round, the stale value we see is only ever as stale as the previous CompareExchange call.
EDIT: To clarify, I think the sample is correct if and only if CompareExchange constitutes a memory barrier with respect to totalValue. If it doesn't - if we can still read arbitrarily-old values of totalValue when we keep going round the loop - then the code is indeed broken, and may never terminate.

Edit:
Someone gave me an upvote after all this time so I re-read the question and the answer and noticed a problem.
I either didn't know about introduced reads or it hasn't crossed my mind. Assuming Interlocked.CompareExchange doesn't introduce any barriers (since it's not documented anywhere), the compiler is allowed to transform your AddToTotal method into the following broken version, where the last two arguments to Interlocked.CompareExchange could see different totalValue values!
public int AddToTotal(int addend)
{
int initialValue;
do
{
initialValue = totalValue;
} while (initialValue != Interlocked.CompareExchange(
ref totalValue, totalValue + addend, totalValue));
return initialValue + addend;
}
For this reason, you can use Volatile.Read. On x86, Volatile.Read is just a standard read anyway (it just prevents compiler reorderings) so there's no reason not to do it. Then the worst that the compiler should be able to do is:
public int AddToTotal(int addend)
{
int initialValue;
do
{
initialValue = Volatile.Read (ref totalValue);
} while (initialValue != Interlocked.CompareExchange(
ref totalValue, initialValue + addend, initialValue));
return initialValue + addend;
}
Unfortunately, Eric Lippert once claimed volatile read doesn't guarantee protection against introduced reads. I seriously hope he's wrong because that would mean lots of low-lock code is almost impossible to write correctly in C#. He himself did mention somewhere that he doesn't consider himself an expert on low-level synchronization so I just assume his statement was incorrect and hope for the best.
Original answer:
Contrary to popular misconception, acquire/release semantics don't ensure a new value gets grabbed from the shared memory, they only affect the order of other memory operations around the one with acquire/release semantics. Every memory access must be at least as recent as the last acquire read and at most as stale as the next release write. (Similar for memory barriers.)
In this code, you only have a single shared variable to worry about: totalValue. The fact that CompareExchange is an atomic RMW operation is enough to ensure that the variable it operates on will get updated. This is because atomic RMW operations must ensure all processors agree on what the most recent value of the variable is.
Regarding the other Total property you mentioned, whether it's correct or not depends on what is required of it. Some points:
int is guaranteed to be atomic, so you will always get a valid value (in this sense, the code you've shown could be viewed as "correct", if nothing but some valid, possibly stale value is required)
if reading without acquire semantics (Volatile.Read or a read of volatile int) means that all memory operations written after it may actually happen before (reads operating on older values and writes becoming visible to other processors before they should)
if not using an atomic RMW operation to read (like Interlocked.CompareExchange(ref x, 0, 0)), a value received may not be what some other processors see as the most recent value
if both the freshest value and ordering in regards to other memory operations is required, Interlocked.CompareExchange should work (the underlying WinAPI's InterlockedCompareExchange uses a full barrier, not so sure about C# or .Net specifications) but if you wish to be sure, you could add an explicit memory barrier after the read

The managed Interlocked.CompareExchange maps directly to the InterlockedCompareExchange in the Win32 API (there is also a 64 bit version).
As you can see in the function signatures, the native API requires the destination to be volatile and, even though it is not required by the managed API, using volatile is recommended by Joe Duffy in his excellent book Concurrent Programming on Windows.

Thread.VolatileRead Implementation

I'm looking at the implementation of the VolatileRead/VolatileWrite methods (using Reflector), and i'm puzzled by something.
This is the implementation for VolatileRead:
[MethodImpl(MethodImplOptions.NoInlining)]
public static int VolatileRead(ref int address)
{
int num = address;
MemoryBarrier();
return num;
}
How come the memory barrier is placed after reading the value of "address"? dosen't it supposed to be the opposite? (place before reading the value, so any pending writes to "address" will be completed by the time we make the actual read.
The same thing goes to VolatileWrite, where the memory barrier is place before the assignment of the value. Why is that?
Also, why does these methods have the NoInlining attribute? what could happen if they were inlined?

I thought that until recently. Volatile reads aren't what you think they are - they're not about guaranteeing that they get the most recent value; they're about making sure that no read which is later in the program code is moved to before this read. That's what the spec guarantees - and likewise for volatile writes, it guarantees that no earlier write is moved to after the volatile one.
You're not alone in suspecting this code, but Joe Duffy explains it better than I can :)
My answer to this is to give up on lock-free coding other than by using things like PFX which are designed to insulate me from it. The memory model is just too hard for me - I'll leave it to the experts, and stick with things that I know are safe.
One day I'll update my threading article to reflect this, but I think I need to be able to discuss it more sensibly first...
(I don't know about the no-inlining part, btw. I suspect that inlining could introduce some other optimizations which aren't meant to happen around volatile reads/writes, but I could easily be wrong...)

Maybe I am oversimplifying, but I think the explanations about reordering and cache coherency and so on give too much details.
So, why the MemoryBarrier comes after the actual read?
I will try to explain this with an example that uses object instead of int.
One may think the correct is:
Thread 1 creates the object (initializes its inner data).
Thread 1 then puts the object into a variable.
Then it "does a fence" and all threads see the new value.
Then, the read is something like this:
Thread 2 "does a fence".
Thread 2 reads the object instance.
Thread 2 is sure that it has all the inner data of that instance (as it started with a fence).
The biggest problem with this is:
Thread 1 creates the object and initializes it.
Thread 1 then puts the object into a variable.
Before the Thread flushes the cache, the CPU itself flushes part of the cache... it commits only the address of the variable (not the contents of that variable).
At that moment, Thread 2 had already flushed its cache. So it is going to read everything from the main memory.
So, it reads the variable (it is there).
Then it reads the content (it is not there).
Finally, after all this, the CPU 1 executes the Thread 1 that does the fence.
So, what happens with the volatile write and read?
The volatile write makes the contents of the object go to the memory immediately (starts by the fence), then they set the variable (with may not go immediatelly to the real memory).
Then, the volatile read will first clear the cache. Then it reads the field. If it receives a value when reading the field, it is certain that the contents pointed by that reference are really there.
By those little things, yes, it is possible that you do a VolatileWrite(1) and another thread still see the value of zero. But as soon other threads see the value of 1 (using a volatile read), all other items needed that may be referenced are already there. You can't really tell it as when reading the old value (0 or null) you may simple not progress considering that you don't still have everything that you need.
I already saw some discussions that, even if that flushes the caches twice, the right pattern will be:
MemoryBarrier - will flush other variables changed before this call
Write
MemoryBarrier - will guarantee that the write was flushed
The Read will then need the same:
MemoryBarrier
Read - Guarantees that we see the latest info... maybe one that was put AFTER our memory barrier.
As something may have appeared after our MemoryBarrier and was already read, we must put another MemoryBarrier to access the contents.
Those could be two Write-Fences or two Read-Fences if that existed in .Net.
I am not sure on everything I said... that is a "compilation" of many information I got and it really explains why the VolatileRead and VolatileWrite appear to be reversed, but it also guarantees that no invalid values are read when using them.

Is accessing a variable in C# an atomic operation?

I've been raised to believe that if multiple threads can access a variable, then all reads from and writes to that variable must be protected by synchronization code, such as a "lock" statement, because the processor might switch to another thread halfway through a write.
However, I was looking through System.Web.Security.Membership using Reflector and found code like this:
public static class Membership
{
private static bool s_Initialized = false;
private static object s_lock = new object();
private static MembershipProvider s_Provider;
public static MembershipProvider Provider
{
get
{
Initialize();
return s_Provider;
}
}
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
// Perform initialization...
s_Initialized = true;
}
}
}
Why is the s_Initialized field read outside of the lock? Couldn't another thread be trying to write to it at the same time? Are reads and writes of variables atomic?

For the definitive answer go to the spec. :)
Partition I, Section 12.6.6 of the CLI spec states: "A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size is atomic when all the write accesses to a location are the same size."
So that confirms that s_Initialized will never be unstable, and that read and writes to primitve types smaller than 32 bits are atomic.
In particular, double and long (Int64 and UInt64) are not guaranteed to be atomic on a 32-bit platform. You can use the methods on the Interlocked class to protect these.
Additionally, while reads and writes are atomic, there is a race condition with addition, subtraction, and incrementing and decrementing primitive types, since they must be read, operated on, and rewritten. The interlocked class allows you to protect these using the CompareExchange and Increment methods.
Interlocking creates a memory barrier to prevent the processor from reordering reads and writes. The lock creates the only required barrier in this example.

This is a (bad) form of the double check locking pattern which is not thread safe in C#!
There is one big problem in this code:
s_Initialized is not volatile. That means that writes in the initialization code can move after s_Initialized is set to true and other threads can see uninitialized code even if s_Initialized is true for them. This doesn't apply to Microsoft's implementation of the Framework because every write is a volatile write.
But also in Microsoft's implementation, reads of the uninitialized data can be reordered (i.e. prefetched by the cpu), so if s_Initialized is true, reading the data that should be initialized can result in reading old, uninitialized data because of cache-hits (ie. the reads are reordered).
For example:
Thread 1 reads s_Provider (which is null)
Thread 2 initializes the data
Thread 2 sets s\_Initialized to true
Thread 1 reads s\_Initialized (which is true now)
Thread 1 uses the previously read Provider and gets a NullReferenceException
Moving the read of s_Provider before the read of s_Initialized is perfectly legal because there is no volatile read anywhere.
If s_Initialized would be volatile, the read of s_Provider would not be allowed to move before the read of s_Initialized and also the initialization of the Provider is not allowed to move after s_Initialized is set to true and everything is ok now.
Joe Duffy also wrote an Article about this problem: Broken variants on double-checked locking

Hang about -- the question that is in the title is definitely not the real question that Rory is asking.
The titular question has the simple answer of "No" -- but this is no help at all, when you see the real question -- which i don't think anyone has given a simple answer to.
The real question Rory asks is presented much later and is more pertinent to the example he gives.
Why is the s_Initialized field read
outside of the lock?
The answer to this is also simple, though completely unrelated to the atomicity of variable access.
The s_Initialized field is read outside of the lock because locks are expensive.
Since the s_Initialized field is essentially "write once" it will never return a false positive.
It's economical to read it outside the lock.
This is a low cost activity with a high chance of having a benefit.
That's why it's read outside of the lock -- to avoid paying the cost of using a lock unless it's indicated.
If locks were cheap the code would be simpler, and omit that first check.
(edit: nice response from rory follows. Yeh, boolean reads are very much atomic. If someone built a processor with non-atomic boolean reads, they'd be featured on the DailyWTF.)

The correct answer seems to be, "Yes, mostly."
John's answer referencing the CLI spec indicates that accesses to variables not larger than 32 bits on a 32-bit processor are atomic.
Further confirmation from the C# spec, section 5.5, Atomicity of variable references:
Reads and writes of the following data types are atomic: bool, char,
byte, sbyte, short, ushort, uint, int, float, and reference types. In
addition, reads and writes of enum types with an underlying type in
the previous list are also atomic. Reads and writes of other types,
including long, ulong, double, and decimal, as well as user-defined
types, are not guaranteed to be atomic.
The code in my example was paraphrased from the Membership class, as written by the ASP.NET team themselves, so it was always safe to assume that the way it accesses the s_Initialized field is correct. Now we know why.
Edit: As Thomas Danecker points out, even though the access of the field is atomic, s_Initialized should really be marked volatile to make sure that the locking isn't broken by the processor reordering the reads and writes.

The Initialize function is faulty. It should look more like this:
private static void Initialize()
{
if(s_initialized)
return;
lock(s_lock)
{
if(s_Initialized)
return;
s_Initialized = true;
}
}
Without the second check inside the lock it's possible the initialisation code will be executed twice. So the first check is for performance to save you taking a lock unnecessarily, and the second check is for the case where a thread is executing the initialisation code but hasn't yet set the s_Initialized flag and so a second thread would pass the first check and be waiting at the lock.

Reads and writes of variables are not atomic. You need to use Synchronisation APIs to emulate atomic reads/writes.
For an awesome reference on this and many more issues to do with concurrency, make sure you grab a copy of Joe Duffy's latest spectacle. It's a ripper!

"Is accessing a variable in C# an atomic operation?"
Nope. And it's not a C# thing, nor is it even a .net thing, it's a processor thing.
OJ is spot on that Joe Duffy is the guy to go to for this kind of info. ANd "interlocked" is a great search term to use if you're wanting to know more.
"Torn reads" can occur on any value whose fields add up to more than the size of a pointer.

An If (itisso) { check on a boolean is atomic, but even if it was not
there is no need to lock the first check.
If any thread has completed the Initialization then it will be true. It does not matter if several threads are checking at once. They will all get the same answer, and, there will be no conflict.
The second check inside the lock is necessary because another thread may have grabbed the lock first and completed the initialization process already.

You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.
That is not correct. You will still encounter the problem of a second thread passing the check before the first thread has had a chance to to set the flag which will result in multiple executions of the initialisation code.

I think you're asking if s_Initialized could be in an unstable state when read outside the lock. The short answer is no. A simple assignment/read will boil down to a single assembly instruction which is atomic on every processor I can think of.
I'm not sure what the case is for assignment to 64 bit variables, it depends on the processor, I would assume that it is not atomic but it probably is on modern 32 bit processors and certainly on all 64 bit processors. Assignment of complex value types will not be atomic.

I thought they were - I'm not sure of the point of the lock in your example unless you're also doing something to s_Provider at the same time - then the lock would ensure that these calls happened together.
Does that //Perform initialization comment cover creating s_Provider? For instance
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}
Otherwise that static property-get's just going to return null anyway.

Perhaps Interlocked gives a clue. And otherwise this one i pretty good.
I would have guessed that their not atomic.

To make your code always work on weakly ordered architectures, you must put a MemoryBarrier before you write s_Initialized.
s_Provider = new MemershipProvider;
// MUST PUT BARRIER HERE to make sure the memory writes from the assignment
// and the constructor have been wriitten to memory
// BEFORE the write to s_Initialized!
Thread.MemoryBarrier();
// Now that we've guaranteed that the writes above
// will be globally first, set the flag
s_Initialized = true;
The memory writes that happen in the MembershipProvider constructor and the write to s_Provider are not guaranteed to happen before you write to s_Initialized on a weakly ordered processor.
A lot of thought in this thread is about whether something is atomic or not. That is not the issue. The issue is the order that your thread's writes are visible to other threads. On weakly ordered architectures, writes to memory do not occur in order and THAT is the real issue, not whether a variable fits within the data bus.
EDIT: Actually, I'm mixing platforms in my statements. In C# the CLR spec requires that writes are globally visible, in-order (by using expensive store instructions for every store if necessary). Therefore, you don't need to actually have that memory barrier there. However, if it were C or C++ where no such guarantee of global visibility order exists, and your target platform may have weakly ordered memory, and it is multithreaded, then you would need to ensure that the constructors writes are globally visible before you update s_Initialized, which is tested outside the lock.

What you're asking is whether accessing a field in a method multiple times atomic -- to which the answer is no.
In the example above, the initialise routine is faulty as it may result in multiple initialization. You would need to check the s_Initialized flag inside the lock as well as outside, to prevent a race condition in which multiple threads read the s_Initialized flag before any of them actually does the initialisation code. E.g.,
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}

Ack, nevermind... as pointed out, this is indeed incorrect. It doesn't prevent a second thread from entering the "initialize" code section. Bah.
You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.