Thread.VolatileRead Implementation

Thread.VolatileRead Implementation - c#

I'm looking at the implementation of the VolatileRead/VolatileWrite methods (using Reflector), and i'm puzzled by something.
This is the implementation for VolatileRead:
[MethodImpl(MethodImplOptions.NoInlining)]
public static int VolatileRead(ref int address)
{
int num = address;
MemoryBarrier();
return num;
}
How come the memory barrier is placed after reading the value of "address"? dosen't it supposed to be the opposite? (place before reading the value, so any pending writes to "address" will be completed by the time we make the actual read.
The same thing goes to VolatileWrite, where the memory barrier is place before the assignment of the value. Why is that?
Also, why does these methods have the NoInlining attribute? what could happen if they were inlined?

I thought that until recently. Volatile reads aren't what you think they are - they're not about guaranteeing that they get the most recent value; they're about making sure that no read which is later in the program code is moved to before this read. That's what the spec guarantees - and likewise for volatile writes, it guarantees that no earlier write is moved to after the volatile one.
You're not alone in suspecting this code, but Joe Duffy explains it better than I can :)
My answer to this is to give up on lock-free coding other than by using things like PFX which are designed to insulate me from it. The memory model is just too hard for me - I'll leave it to the experts, and stick with things that I know are safe.
One day I'll update my threading article to reflect this, but I think I need to be able to discuss it more sensibly first...
(I don't know about the no-inlining part, btw. I suspect that inlining could introduce some other optimizations which aren't meant to happen around volatile reads/writes, but I could easily be wrong...)

Maybe I am oversimplifying, but I think the explanations about reordering and cache coherency and so on give too much details.
So, why the MemoryBarrier comes after the actual read?
I will try to explain this with an example that uses object instead of int.
One may think the correct is:
Thread 1 creates the object (initializes its inner data).
Thread 1 then puts the object into a variable.
Then it "does a fence" and all threads see the new value.
Then, the read is something like this:
Thread 2 "does a fence".
Thread 2 reads the object instance.
Thread 2 is sure that it has all the inner data of that instance (as it started with a fence).
The biggest problem with this is:
Thread 1 creates the object and initializes it.
Thread 1 then puts the object into a variable.
Before the Thread flushes the cache, the CPU itself flushes part of the cache... it commits only the address of the variable (not the contents of that variable).
At that moment, Thread 2 had already flushed its cache. So it is going to read everything from the main memory.
So, it reads the variable (it is there).
Then it reads the content (it is not there).
Finally, after all this, the CPU 1 executes the Thread 1 that does the fence.
So, what happens with the volatile write and read?
The volatile write makes the contents of the object go to the memory immediately (starts by the fence), then they set the variable (with may not go immediatelly to the real memory).
Then, the volatile read will first clear the cache. Then it reads the field. If it receives a value when reading the field, it is certain that the contents pointed by that reference are really there.
By those little things, yes, it is possible that you do a VolatileWrite(1) and another thread still see the value of zero. But as soon other threads see the value of 1 (using a volatile read), all other items needed that may be referenced are already there. You can't really tell it as when reading the old value (0 or null) you may simple not progress considering that you don't still have everything that you need.
I already saw some discussions that, even if that flushes the caches twice, the right pattern will be:
MemoryBarrier - will flush other variables changed before this call
Write
MemoryBarrier - will guarantee that the write was flushed
The Read will then need the same:
MemoryBarrier
Read - Guarantees that we see the latest info... maybe one that was put AFTER our memory barrier.
As something may have appeared after our MemoryBarrier and was already read, we must put another MemoryBarrier to access the contents.
Those could be two Write-Fences or two Read-Fences if that existed in .Net.
I am not sure on everything I said... that is a "compilation" of many information I got and it really explains why the VolatileRead and VolatileWrite appear to be reversed, but it also guarantees that no invalid values are read when using them.

Related

Does a MemoryBarrier guarantee memory visibility for all memory?

If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions, but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache. We think of lock blocks as protecting the variables read and modified within the block, which means:
Assuming you've properly implemented locking where necessary, those variables can only be read and written to by one thread at a time, and
Reads within the lock block see the latest versions of a variable and writes within the lock block become visible to all threads.
(Right?)
This second point is what interests me. Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads? Pardon my mental fuzziness here about how caches work, but I've read that caches hold several multi-byte "lines" of data. I think what I'm asking is, does a memory barrier force synchronization of all "dirty" cache lines or just some, and if just some, what determines which lines get synchronized?

If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions...
Right. The specification guarantees that.
but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache.
The C# specification says nothing whatsoever about "CPU cache". You've left the realm of what is guaranteed by the specification, and entered the realm of implementation details. There is no requirement that an implementation of C# execute on a CPU that has any particular cache architecture.
Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads?
Rather than try to parse your either-or question, let's say what is actually guaranteed by the language. A special effect is:
Any write to a variable, volatile or not
Any read of a volatile field
Any throw
The order of special effects is preserved at certain special points:
Reads and writes of volatile fields
locks
thread creation and termination
The runtime is required to ensure that special effects are ordered consistently with special points. So, if there is a read of a volatile field before a lock, and a write after, then the read can't be moved after the write.
So, how does the runtime achieve this? Beats the heck out of me. But the runtime is certainly not required to "guarantee that all memory is fresh for all threads". The runtime is required to ensure that certain reads, writes and throws happen in chronological order with respect to special points, and that's all.
The runtime is in particular not required that all threads observe the same order.
Finally, I always end these sorts of discussions by pointing you here:
http://blog.coverity.com/2014/03/26/reordering-optimizations/
After reading that, you should have an appreciation for the sorts of horrid things that can happen even on x86 when you act casual about eliding locks.

Reads within the lock block see the latest versions of a variable and writes within the lock block are visible to all threads.
No, that's definitely a harmful oversimplification.
When you enter the lock statement, there a memory fence which sort of means that you'll always read "fresh" data. When you exit the lock state, there's a memory fence which sort of means that all the data you've written is guaranteed to be written to main memory and available to other threads.
The important point is that if multiple threads only ever read/write memory when they "own" a particular lock, then by definition one of them will have exited the lock before the next one enters it... so all those reads and writes will be simple and correct.
If you have code which reads and writes a variable without taking a lock, then there's no guarantee that it will "see" data written by well-behaved code (i.e. code using the lock), or that well-behaved threads will "see" the data written by that bad code.
For example:
private readonly object padlock = new object();
private int x;
public void A()
{
lock (padlock)
{
// Will see changes made in A and B; may not see changes made in C
x++;
}
}
public void B()
{
lock (padlock)
{
// Will see changes made in A and B; may not see changes made in C
x--;
}
}
public void C()
{
// Might not see changes made in A, B, or C. Changes made here
// might not be visible in other threads calling A, B or C.
x = x + 10;
}
Now it's more subtle than that, but that's why using a common lock to protect a set of variables works.

C# memory model and non volatile variable initialized before the other thread creation

I have a question related to the C# memory model and threads. I am not sure if the following code is correct without the volatile keyword.
public class A {
private int variableA = 0;
public A() {
variableA = 1;
Thread B = new Thread(new ThreadStart(() => printA())).Start();
}
private void printA() {
System.Console.WriteLine(variableA);
}
}
My concern is if it is guaranteed that the Thread B will see variableA with value 1 without using volatile? In the main thread I am only assigning 1 to variableA in the constructor. After that I am not touching variableA, it is used only in the Thread B, so locking is probably not necessary.
But, is it guaranteed that the main thread will flush his cache and write the variableA contents to the main memory, so the second thread can read the newly assigned value?
Additionally, is it guaranteed that the second thread will read the variableA contents from the main memory? May some compiler optimizations occur and the Thread B can read the variableA contents from the cache instead of the main memory? It may happen when the order of the instructions is changed.
For sure, adding volatile to the variableA declaration will make the code correct. But, is it neccessary? I am asking because I wrote some code with some non volatile variables initialization in the constructor, and the variables are used later by some Timer threads, and I am not sure if it is totally correct.
What about the same code in Java?
Thanks, Michal

There are a lot of places where implicit memory barriers are created. This is one of them. Starting threads create full barriers. So the write to variableA will get committed before the thread starts and the first reads will be acquired from main memory. Of course, in Microsoft's implementation of the CLR that is somewhat of a moot point because writes already have volatile semantics. But the same guarentee is not made in the ECMA specification so it is theorectically possible that the Mono implemenation could behave differently in this regard.
My concern is if it is guaranteed that
the Thread B will see variableA with
value 1 without using volatile?
In this case...yes. However, if you continue to use variableA in the second thread there is no guarentee after the first read that it will see updates.
But, is it guaranteed that the main
thread will flush his cache and write
the variableA contents to the main
memory, so the second thread can read
the newly assigned value?
Yes.
Additionally, is it guaranteed that
the second thread will read the
variableA contents from the main
memory?
Yes, but only on the first read.
For sure, adding volatile to the
variableA declaration will make the
code correct. But, is it neccessary?
In this very specific and narrow case...no. But, in general it is advised that you use the volatile keyword in these scenarios. Not only will it make your code thread-safe as the scenario gets more complicated, but it also helps to document the fact that the field is going to be used by more than one thread and that you have considered the implications of using a lock-free strategy.

The same code in Java is definitely okay - the creation of a new thread acts as a sort of barrier, effectively. (All actions earlier in the program text than the thread creation "happen before" the new thread starts.)
I don't know what's guaranteed in .NET with respect to new thread creation, however. Even more worrying is the possibility of a delayed read when using Control.BeginInvoke and the like... I haven't seen any guarantees around memory barriers for those situations.
To be honest, I suspect it's fine. I suspect that anything which needs to coordinate between threads like this (either creating a new one or marshalling a call onto an existing one) will use a full memory barrier on both of the threads involved. However, you're absolutely right to be concerned, and I'm hoping that you'll get a more definitive answer from someone smarter than me. You might want to email Joe Duffy to get his point of view on this...

But, is it guaranteed that the main thread will flush his cache and write the variableA contents to the main memory,
Yes, this is guaranteed by the MS CLR memory model. Not necessarily so for other implementations of the CLI (ie, I'm not sure about Mono). The ECMA standard does not require it.
so the second thread can read the newly assigned value?
That requires that the cache has been refreshed. It is probably guaranteed by the creation of the Thread (like Jon Skeet said). It is however not guaranteed by the previous point. The cache is flushed on each write but not on each read.
You could make very sure by using VolatileRead(ref variableA) but it is recommended (Jeffrey Richter) to use the Interlocked class. Note that VolatileWrite() is superfluous in MS.NET.

How do I use the volatile keyword in this model?

I have a data class with lots of data in it (TV schedule data).
The data is queried from one side and periodically updated from the other side.
There are two threads: the first thread queries the data on request and the second thread updates the data on regular intervals.
To prevent locking, I use two instances (copies) of the data class: the live instance and the backup instance.
Initially, both instances are filled with the same data. The first thread only reads from the live instance.
The second thread periodically updates both instances as follows:
Update the backup instance.
Swap the backup and live instance (i.e. the backup instance becomes the live instance).
Update the backup instance.
Both backup instance and live instance are now up-to-date.
My questions is: how should I use the volatile keyword here?
public class data
{
// Lots of fields here.
// Should these fields also be declared volatile?
}
I have already made the references volatile:
public volatile data live
public volatile data backup

fields should be declared volatile if you plan to modify them outside locks, or without Interlocked. Here is the best article that explain volatile deeply: http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/

To be honest, I would just lock on it. The correctness is so much easier to check, and the need for the backup is removed.
With your plan here, the fields would also have to be volatile. Consider the case otherwise:
public class Data
{
public int SimpleInt;
}
Here we have just a single public field for simplicity, the same applies to more realistic structures. (Incidentally, captials for class names is a more common convention in C#).
Now consider live.SimpleInt as seen by thread A. Because live could be cached, we need to have it as volatile. However, consider that when the object is swapped with backup, and then swapped back to live, then live will have the same memory location as it did before (unless the GC has moved it). Therefore live.SimpleInt will have the same memory location as it did before, and therefore if it was not volatile, thread A may be using a cached version of live.SimpleInt.
However, if you created a new Data object, rather than swapping in and out, then the new value of live.SimpleInt will not be in the thread's cache, and it could be safely non-volatile.
It's also important to consider that the fields of the fields will have to be volatile too.
Indeed now you need just one stored Data object. The new one will be created as an object referenced only by one thread (hence it cannot be damaged by or do damage to another thread), and its creation will be based on values read from live, which is also safe as the other thread is only reading (barring some memoisation techniques that mean that "reads" are really writes behind the scenes, reads can't harm other reads, though they can be harmed by writes) altered while visible to just a single thread, and hence only the final write requires any concern about synchronisation which should indeed be safe with only volatile or a MemoryBarrier used for protection, since assigning a reference is atomic, and since you don't care about the old value anymore.

I do not think you are going to get the effect you want by marking things with volatile. Consider this code.
volatile data live;
void Thread1()
{
if (live.Field1)
{
Console.WriteLine(live.Field1);
}
}
In the example above false could be written to the console if the second thread swapped the live and backup references between the time the first thread entered the if and called Console.WriteLine.
If that problem does not concern you then all you really need to do is mark the live variable as volatile. You do not need to mark the individual fields in data as volatile. The reason is because volatile reads create acquire fence memory barriers and volatile writes create release fence memory barriers. What that means is that when thread 2 swaps the references then all writes to the individual fields of data must commit first and when thread 1 wants to read the individual fields of the live instance the live variable must be reacquired from main memory first. You do not need to mark the backup variable as volatile because it is never used by thread 1.
The advanced threading section in Joe Albahari's ebook goes into a great deal of detail on the semantics of volatile and should explain why you only need to mark your live reference as such.

How does memory fences affect "freshness" of data?

I have a question about the following code sample (taken from: http://www.albahari.com/threading/part4.aspx#_NonBlockingSynch)
class Foo
{
int _answer;
bool _complete;
void A()
{
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B()
{
Thread.MemoryBarrier(); // Barrier 3
if (_complete)
{
Thread.MemoryBarrier(); // Barrier 4
Console.WriteLine (_answer);
}
}
}
This is followed with the following explantion:
"Barriers 1 and 4 prevent this example from writing “0”. Barriers 2 and 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true."
I understand how using the memory barriers affect the instruction reording, but what is this "freshness gurarantee" that is mentioned?
Later in the article, the following example is also used:
static void Main()
{
bool complete = false;
var t = new Thread (() =>
{
bool toggle = false;
while (!complete)
{
toggle = !toggle;
// adding a call to Thread.MemoryBarrier() here fixes the problem
}
});
t.Start();
Thread.Sleep (1000);
complete = true;
t.Join(); // Blocks indefinitely
}
This example is followed with this explanation:
"This program never terminates because the complete variable is cached in a CPU register. Inserting a call to Thread.MemoryBarrier inside the while-loop (or locking around reading complete) fixes the error."
So again ... what happens here?

In the first case, Barrier 1 ensures _answer is written BEFORE _complete. Regardless of how the code is written, or how the compiler or CLR instructs the CPU, the memory bus read/write queues can reorder the requests. The Barrier basically says "flush the queue before continuing". Similarly, Barrier 4 makes sure _answer is read AFTER _complete. Otherwise CPU2 could reorder things and see an old _answer with a "new" _complete.
Barriers 2 and 3 are, in some sense, useless. Note that the explanation contains the word "after": ie "... if B ran after A, ...". What's it mean for B to run after A? If B and A are on the same CPU, then sure, B can be after. But in that case, same CPU means no memory barrier problems.
So consider B and A running on different CPUs. Now, very much like Einstein's relativity, the concept of comparing times at different locations/CPUs doesn't really make sense.
Another way of thinking about it - can you write code that can tell whether B ran after A? If so, well you probably used memory barriers to do that. Otherwise, you can't tell, and it doesn't make sense to ask. It's also similar to Heisenburg's Principle - if you can observe it, you've modified the experiment.
But leaving physics aside, let's say you could open the hood of your machine, and see that the actually memory location of _complete was true (because A had run). Now run B. without Barrier 3, CPU2 might STILL NOT see _complete as true. ie not "fresh".
But you probably can't open your machine and look at _complete. Nor communicate your findings to B on CPU2. Your only communication is what the CPUs themselves are doing. So if they can't determine BEFORE/AFTER without barriers, asking "what happens to B if it runs after A, without barriers" makes no sense.
By the way, I'm not sure what you have available in C#, but what is typically done, and what is really needed for Code sample # 1 is a single release barrier on write, and a single acquire barrier on read:
void A()
{
_answer = 123;
WriteWithReleaseBarrier(_complete, true); // "publish" values
}
void B()
{
if (ReadWithAcquire(_complete)) // subscribe
{
Console.WriteLine (_answer);
}
}
The word "subscribe" isn't often used to describe the situation, but "publish" is. I suggest you read Herb Sutter's articles on threading.
This puts the barriers in exactly the right places.
For Code sample #2, this isn't really a memory barrier problem, it is a compiler optimization issue - it is keeping complete in a register. A memory barrier would force it out, as would volatile, but probably so would calling an external function - if the compiler can't tell whether that external function modified complete or not, it will re-read it from memory. ie maybe pass the address of complete to some function (defined somewhere where the compiler can't examine its details):
while (!complete)
{
some_external_function(&complete);
}
even if the function doesn't modify complete, if the compiler isn't sure, it will need to reload its registers.
ie the difference between code 1 and code 2 is that code 1 only has problems when A and B are running on separate threads. code 2 could have problems even on a single threaded machine.
Actually, the other question would be - can the compiler completely remove the while loop? If it thinks complete is unreachable by other code, why not? ie if it decided to move complete into a register, it might as well remove the loop completely.
EDIT: To answer the comment from opc (my answer is too big for comment block):
Barrier 3 forces the CPU to flush any pending read (and write) requests.
So imagine if there was some other reads before reading _complete:
void B {}
{
int x = a * b + c * d; // read a,b,c,d
Thread.MemoryBarrier(); // Barrier 3
if (_complete)
...
Without the barrier, the CPU might have all of these 5 read requests 'pending':
a,b,c,d,_complete
Without the barrier, the processor could reorder these requests to optimize memory access (ie if _complete and 'a' were on the same cache line or something).
With the barrier, the CPU gets a,b,c,d back from memory BEFORE _complete is even put in as a request. ENSURING 'b' (for example) is read BEFORE _complete - ie no reordering.
The question is - what difference does it make?
If a,b,c,d are independent from _complete, then it doesn't matter. All the barrier does is SLOW THINGS DOWN. So yeah, _complete is read later. So the data is fresher. Putting a sleep(100) or some busy-wait for-loop in there before the read would make it 'fresher' as well! :-)
So the point is - keep it relative. Does the data need to be read/written BEFORE/AFTER relative to some other data or not? That's the question.
And to not put down the author of the article - he does mention "if B ran after A...". It just isn't exactly clear whether he is imagining that B after A is crucial to the code, observable by to code, or just inconsequential.

Code sample #1:
Each processor core contains a cache with a copy of a portion of memory. It may take a bit of time for the cache to be updated. The memory barriers guarantee that the caches are synchronized with main memory. For example, if you didn't have barriers 2 and 3 here, consider this situation:
Processor 1 runs A(). It writes the new value of _complete to its cache (but not necessarily to main memory yet).
Processor 2 runs B(). It reads the value of _complete. If this value was previously in its cache, it may not be fresh (i.e., not synchronized with main memory), so it would not get the updated value.
Code sample #2:
Normally, variables are stored in memory. However, suppose a value is read multiple times in a single function: As an optimization, the compiler may decide to read it into a CPU register once, and then access the register each time it is needed. This is much faster, but prevents the function from detecting changes to the variable from another thread.
The memory barrier here forces the function to re-read the variable value from memory.

Calling Thread.MemoryBarrier() immediately refreshes the register caches with the actual values for variables.
In the first example, the "freshness" for _complete is provided by calling the method right after setting it and right before using it. In the second example, the initial false value for the variable complete will be cached in the thread's own space and needs to be resynchronized in order to immediately see the actual "outside" value from "inside" the running thread.

The "freshness" guarantee simply means that Barriers 2 and 3 force the values of _complete to be visible as soon as possible as opposed to whenever they happen to be written to memory.
It's actually unnecessary from a consistency point of view, since Barriers 1 and 4 ensure that answer will be read after reading complete.

Is accessing a variable in C# an atomic operation?

I've been raised to believe that if multiple threads can access a variable, then all reads from and writes to that variable must be protected by synchronization code, such as a "lock" statement, because the processor might switch to another thread halfway through a write.
However, I was looking through System.Web.Security.Membership using Reflector and found code like this:
public static class Membership
{
private static bool s_Initialized = false;
private static object s_lock = new object();
private static MembershipProvider s_Provider;
public static MembershipProvider Provider
{
get
{
Initialize();
return s_Provider;
}
}
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
// Perform initialization...
s_Initialized = true;
}
}
}
Why is the s_Initialized field read outside of the lock? Couldn't another thread be trying to write to it at the same time? Are reads and writes of variables atomic?

For the definitive answer go to the spec. :)
Partition I, Section 12.6.6 of the CLI spec states: "A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size is atomic when all the write accesses to a location are the same size."
So that confirms that s_Initialized will never be unstable, and that read and writes to primitve types smaller than 32 bits are atomic.
In particular, double and long (Int64 and UInt64) are not guaranteed to be atomic on a 32-bit platform. You can use the methods on the Interlocked class to protect these.
Additionally, while reads and writes are atomic, there is a race condition with addition, subtraction, and incrementing and decrementing primitive types, since they must be read, operated on, and rewritten. The interlocked class allows you to protect these using the CompareExchange and Increment methods.
Interlocking creates a memory barrier to prevent the processor from reordering reads and writes. The lock creates the only required barrier in this example.

This is a (bad) form of the double check locking pattern which is not thread safe in C#!
There is one big problem in this code:
s_Initialized is not volatile. That means that writes in the initialization code can move after s_Initialized is set to true and other threads can see uninitialized code even if s_Initialized is true for them. This doesn't apply to Microsoft's implementation of the Framework because every write is a volatile write.
But also in Microsoft's implementation, reads of the uninitialized data can be reordered (i.e. prefetched by the cpu), so if s_Initialized is true, reading the data that should be initialized can result in reading old, uninitialized data because of cache-hits (ie. the reads are reordered).
For example:
Thread 1 reads s_Provider (which is null)
Thread 2 initializes the data
Thread 2 sets s\_Initialized to true
Thread 1 reads s\_Initialized (which is true now)
Thread 1 uses the previously read Provider and gets a NullReferenceException
Moving the read of s_Provider before the read of s_Initialized is perfectly legal because there is no volatile read anywhere.
If s_Initialized would be volatile, the read of s_Provider would not be allowed to move before the read of s_Initialized and also the initialization of the Provider is not allowed to move after s_Initialized is set to true and everything is ok now.
Joe Duffy also wrote an Article about this problem: Broken variants on double-checked locking

Hang about -- the question that is in the title is definitely not the real question that Rory is asking.
The titular question has the simple answer of "No" -- but this is no help at all, when you see the real question -- which i don't think anyone has given a simple answer to.
The real question Rory asks is presented much later and is more pertinent to the example he gives.
Why is the s_Initialized field read
outside of the lock?
The answer to this is also simple, though completely unrelated to the atomicity of variable access.
The s_Initialized field is read outside of the lock because locks are expensive.
Since the s_Initialized field is essentially "write once" it will never return a false positive.
It's economical to read it outside the lock.
This is a low cost activity with a high chance of having a benefit.
That's why it's read outside of the lock -- to avoid paying the cost of using a lock unless it's indicated.
If locks were cheap the code would be simpler, and omit that first check.
(edit: nice response from rory follows. Yeh, boolean reads are very much atomic. If someone built a processor with non-atomic boolean reads, they'd be featured on the DailyWTF.)

The correct answer seems to be, "Yes, mostly."
John's answer referencing the CLI spec indicates that accesses to variables not larger than 32 bits on a 32-bit processor are atomic.
Further confirmation from the C# spec, section 5.5, Atomicity of variable references:
Reads and writes of the following data types are atomic: bool, char,
byte, sbyte, short, ushort, uint, int, float, and reference types. In
addition, reads and writes of enum types with an underlying type in
the previous list are also atomic. Reads and writes of other types,
including long, ulong, double, and decimal, as well as user-defined
types, are not guaranteed to be atomic.
The code in my example was paraphrased from the Membership class, as written by the ASP.NET team themselves, so it was always safe to assume that the way it accesses the s_Initialized field is correct. Now we know why.
Edit: As Thomas Danecker points out, even though the access of the field is atomic, s_Initialized should really be marked volatile to make sure that the locking isn't broken by the processor reordering the reads and writes.

The Initialize function is faulty. It should look more like this:
private static void Initialize()
{
if(s_initialized)
return;
lock(s_lock)
{
if(s_Initialized)
return;
s_Initialized = true;
}
}
Without the second check inside the lock it's possible the initialisation code will be executed twice. So the first check is for performance to save you taking a lock unnecessarily, and the second check is for the case where a thread is executing the initialisation code but hasn't yet set the s_Initialized flag and so a second thread would pass the first check and be waiting at the lock.

Reads and writes of variables are not atomic. You need to use Synchronisation APIs to emulate atomic reads/writes.
For an awesome reference on this and many more issues to do with concurrency, make sure you grab a copy of Joe Duffy's latest spectacle. It's a ripper!

"Is accessing a variable in C# an atomic operation?"
Nope. And it's not a C# thing, nor is it even a .net thing, it's a processor thing.
OJ is spot on that Joe Duffy is the guy to go to for this kind of info. ANd "interlocked" is a great search term to use if you're wanting to know more.
"Torn reads" can occur on any value whose fields add up to more than the size of a pointer.

An If (itisso) { check on a boolean is atomic, but even if it was not
there is no need to lock the first check.
If any thread has completed the Initialization then it will be true. It does not matter if several threads are checking at once. They will all get the same answer, and, there will be no conflict.
The second check inside the lock is necessary because another thread may have grabbed the lock first and completed the initialization process already.

You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.
That is not correct. You will still encounter the problem of a second thread passing the check before the first thread has had a chance to to set the flag which will result in multiple executions of the initialisation code.

I think you're asking if s_Initialized could be in an unstable state when read outside the lock. The short answer is no. A simple assignment/read will boil down to a single assembly instruction which is atomic on every processor I can think of.
I'm not sure what the case is for assignment to 64 bit variables, it depends on the processor, I would assume that it is not atomic but it probably is on modern 32 bit processors and certainly on all 64 bit processors. Assignment of complex value types will not be atomic.

I thought they were - I'm not sure of the point of the lock in your example unless you're also doing something to s_Provider at the same time - then the lock would ensure that these calls happened together.
Does that //Perform initialization comment cover creating s_Provider? For instance
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}
Otherwise that static property-get's just going to return null anyway.

Perhaps Interlocked gives a clue. And otherwise this one i pretty good.
I would have guessed that their not atomic.

To make your code always work on weakly ordered architectures, you must put a MemoryBarrier before you write s_Initialized.
s_Provider = new MemershipProvider;
// MUST PUT BARRIER HERE to make sure the memory writes from the assignment
// and the constructor have been wriitten to memory
// BEFORE the write to s_Initialized!
Thread.MemoryBarrier();
// Now that we've guaranteed that the writes above
// will be globally first, set the flag
s_Initialized = true;
The memory writes that happen in the MembershipProvider constructor and the write to s_Provider are not guaranteed to happen before you write to s_Initialized on a weakly ordered processor.
A lot of thought in this thread is about whether something is atomic or not. That is not the issue. The issue is the order that your thread's writes are visible to other threads. On weakly ordered architectures, writes to memory do not occur in order and THAT is the real issue, not whether a variable fits within the data bus.
EDIT: Actually, I'm mixing platforms in my statements. In C# the CLR spec requires that writes are globally visible, in-order (by using expensive store instructions for every store if necessary). Therefore, you don't need to actually have that memory barrier there. However, if it were C or C++ where no such guarantee of global visibility order exists, and your target platform may have weakly ordered memory, and it is multithreaded, then you would need to ensure that the constructors writes are globally visible before you update s_Initialized, which is tested outside the lock.

What you're asking is whether accessing a field in a method multiple times atomic -- to which the answer is no.
In the example above, the initialise routine is faulty as it may result in multiple initialization. You would need to check the s_Initialized flag inside the lock as well as outside, to prevent a race condition in which multiple threads read the s_Initialized flag before any of them actually does the initialisation code. E.g.,
private static void Initialize()
{
if (s_Initialized)
return;
lock(s_lock)
{
if (s_Initialized)
return;
s_Provider = new MembershipProvider ( ... )
s_Initialized = true;
}
}

Ack, nevermind... as pointed out, this is indeed incorrect. It doesn't prevent a second thread from entering the "initialize" code section. Bah.
You could also decorate s_Initialized with the volatile keyword and forego the use of lock entirely.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.