Why not volatile on System.Double and System.Long?

Why not volatile on System.Double and System.Long? - c#

A question like mine has been asked, but mine is a bit different. The question is, "Why is the volatile keyword not allowed in C# on types System.Double and System.Int64, etc.?"
On first blush, I answered my colleague, "Well, on a 32-bit machine, those types take at least two ticks to even enter the processor, and the .Net framework has the intention of abstracting away processor-specific details like that." To which he responds, "It's not abstracting anything if it's preventing you from using a feature because of a processor-specific problem!"
He's implying that a processor-specific detail should not show up to a person using a framework that "abstracts" details like that away from the programmer. So, the framework (or C#) should abstract away those and do what it needs to do to offer the same guarantees for System.Double, etc. (whether that's a Semaphore, memory barrier, or whatever). I argued that the framework shouldn't add the overhead of a Semaphore on volatile, because the programmer isn't expecting such overhead with such a keyword, because a Semaphore isn't necessary for the 32-bit types. The greater overhead for the 64-bit types might come as a surprise, so, better for the .Net framework to just not allow it, and make you do your own Semaphore on larger types if the overhead is acceptable.
That led to our investigating what the volatile keyword is all about. (see this page). That page states, in the notes:
In C#, using the volatile modifier on a field guarantees that all access to that field uses VolatileRead or VolatileWrite.
Hmmm.....VolatileRead and VolatileWrite both support our 64-bit types!! My question, then, is,
"Why is the volatile keyword not allowed in C# on types System.Double and System.Int64, etc.?"

He's implying that a processor-specific detail should not show up to a person using a framework that "abstracts" details like that away from the programmer.
If you are using low-lock techniques like volatile fields, explicit memory barriers, and the like, then you are entirely in the world of processor-specific details. You need to understand at a deep level precisely what the processor is and is not allowed to do as far as reordering, consistency, and so on, in order to write correct, portable, robust programs that use low-lock techniques.
The point of this feature is to say "I am abandoning the convenient abstractions guaranteed by single-threaded programming and embracing the performance gains possible by having a deep implementation-specific knowledge of my processor." You should expect less abstractions at your disposal when you start using low-lock techniques, not more abstractions.
You're going "down to the metal" for a reason, presumably; the price you pay is having to deal with the quirks of said metal.

Yes. Reason is that you even can't read double or long in one operation. I agree that it is poor abstraction. I have a feeling that reason was that reading them atomically requires effort and it would be too smart for compiler. So they let you choose the best solution: locking, Interlocked, etc.
Interesting thing is that they can actually be read atomically on 32 bit using MMX registers. This is what java JIT compiler does. And they can be read atomically on 64 bit machine. So I think it is serious flaw in design.

Not really an answer to your question, but...
I'm pretty sure that the MSDN documentation you've referenced is incorrect when it states that "using the volatile modifier on a field guarantees that all access to that field uses VolatileRead or VolatileWrite".
Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing).
The VolatileRead and VolatileWrite methods use MemoryBarrier internally, which generates a full-fence.
Joe Duffy knows a thing or two about concurrent programming; this is what he has to say about volatile:
(As an aside, many people wonder about
the difference between loads and
stores of variables marked as volatile
and calls to Thread.VolatileRead and
Thread.VolatileWrite. The difference
is that the former APIs are
implemented stronger than the jitted
code: they achieve acquire/release
semantics by emitting full fences on
the right side. The APIs are more
expensive to call too, but at least
allow you to decide on a
callsite-by-callsite basis which
individual loads and stores need the
MM guarantees.)

It's a simple explanation of legacy. If you read this article - http://msdn.microsoft.com/en-au/magazine/cc163715.aspx, you'll find that the only implementation of the .NET Framework 1.x runtime was on x86 machines, so it makes sense for Microsoft to implement it against the x86 memory model. x64 and IA64 were added later. So the base memory model was always one of x86.
Could it have been implemented for x86? I'm actually not sure it can be fully implemented - a ref of a double returned from native code could be aligned to 4 bytes instead of 8. In which case, all your guarantees of atomic reads/writes no longer hold true.

Starting from .NET Framework 4.5, it is now possible to perform a volatile read or write on long or double variables by using the Volatile.Read and Volatile.Write methods. Although it's not documented, these methods perform atomic reads and writes on the long/double variables, as it's evident from their implementation:
private struct VolatileIntPtr { public volatile IntPtr Value; }
[Intrinsic]
[NonVersionable]
public static long Read(ref long location) =>
#if TARGET_64BIT
(long)Unsafe.As<long, VolatileIntPtr>(ref location).Value;
#else
// On 32-bit machines, we use Interlocked, since an ordinary volatile read would not be atomic.
Interlocked.CompareExchange(ref location, 0, 0);
#endif
Using these two methods is not as convenient as the volatile keyword though. Attention is required to not forget wrapping every read/write access of the volatile field in Volatile.Read or Volatile.Write respectively.

Related

Volatile variables

I recently had an interview with a software company who asked me the following question:
Can you describe to me what adding volatile in front of variables does? Can you explain to me why it's important?
Most of my programming knowledge comes from C, yet the job position is for C# (I thought I might add this bit of info if necessary for the question specifically)
I answered by saying it just lets the compiler know that the variable can be used across processes or threads, and that it should not use optimization on that variable; as optimizing it can deteriorate behavior. In a nutshell, it's a warning to the compiler.
According to the interviewer, however, it's the other way around, and the volatile keyword warns the OS, not the compiler.
I was a bit befuddled by this, so I did some research and actually found conflicting answers! Some sources say it's for the compiler, and others for the OS.
Which is it? Does it differ by language?

I answered by saying it just lets the compiler know that the variable can be used across processes or threads, and that it should not use optimization on that variable; as optimizing it can deteriorate behavior. In a nutshell, it's a warning to the compiler.
This is going in the right direction for C# but misses some important aspects.
First off, delete "processes" entirely. Variables are not shared across processes in C#.
Second, don't concentrate on optimizations. Instead concentrate on permissible semantics. A compiler is not required to generate optimal code; a compiler is required to generate specification-compliant code. A re-ordering need not be for performance reasons and need not be faster / smaller / whatever. A volatile declaration adds an additional restriction on the permissible semantics of a multithreaded program.
Third, don't think of it as a warning to the compiler. It's a directive to the compiler: to generate code that is guaranteed to be compliant with the specification for volatile variables. How the compiler does so is up to it.
The actual answer to the question
Can you describe to me what adding volatile in front of variables does?
is: A C# compiler and runtime environment have great latitude to re-order variable reads and writes for any reason they see fit. They are restricted to only those re-orderings which preserve the meaning of programs on a single thread. So "x = y; a = b;" could move the read of b to before the read to y; that's legal because the outcome is unchanged. (This is not the only restriction on re-ordering, but it is in some sense the most fundamental one.) However, re-orderings are permitted to be noticeable on multiple threads; it is possible that another thread observes that b is read before y. This can cause problems.
The C# compiler and runtime have additional restrictions on how volatile reads and writes may be re-ordered with respect to each other, and furthermore how they may be ordered with respect to other events such as threads starting and stopping, locks, exceptions being thrown, and so on.
Consult the C# specification for a detailed list of the restrictions on observed orderings of reads, writes and other effects.
Note in particular that even with volatile variables, there is not required to be a consistent total ordering of all variable accesses as seen from all threads. And specifically, the notion that volatile "reads the latest value of the variable" is simply false; that phrasing suggests that there is such a thing as "the latest value", which implies a total consistent ordering.
If that sounds confusing, it is. Don't write multithreaded programs that share data across threads. If you must, use the highest level abstractions at your disposal. Hardly anyone should be writing code that uses volatile; use the TPL and let it manage your threads.
Now let's consider your answer in the context of C.
The question is ill-posed with respect to C. In C# volatile is a modifier on member variable declarations; in C, it's part of a type. So saying "before a variable" is ambiguous; where before the variable? There's a difference between a volatile int * x and an int * volatile x. (Can you see the difference?)
But more importantly: the C specification does not guarantee that volatile will have any particular behaviour with respect to threads. If your C compiler does, that's an extension of the language by your compiler vendor. Volatile in C is guaranteed to have certain behaviour with respect to memory mapped IO, long jumps, and signals, and that's all; if you rely on it to have certain behaviour with respect to threads then you are writing non-portable code.
According to the interviewer: it's the other way around, and the volatile keyword warns the OS, not the compiler.
That's nonsense from start to finish. Interviewers should not ask questions that they don't understand the answers to.

To be fairly honest, the question posed by the interviewer is kinda foggy as it is.
It really depends on what he/she meant by "OS". Are they talking about the "upfront OS", the pure software side of things, or what they might be misconstruing the "OS" as the hardware-software relationship, ie the RTE and MMM (I've seen either assumptions and comparisons in some of my own personal interviews). I think it should be noted that these two are quite distinctly different! If he/she is talking about the former, then NO volatile does not "inform" the OS. If they are talking about the latter, then yes (this is a loose yes). At this point you are in the realm of the differences between the languages. As Cody Gray mentioned, C# is a managed language, and so then the latter definition of the OS does indeed "get notified" of the variable and the precautions to take.
But also, in any case or definition of OS, the compiler does specially manage and deal with the volatile field, regardless of language. Otherwise, why have the keyword in the first place?
In my personal opinion, what ever that's worth, I think you answered correctly, albeit, judging by the comments, can get complicated and hectic by nature.

Can a read instruction after an unrelated lock statement be moved before the lock?

This question is a follow-up to comments in this thread.
Let's assume we have the following code:
// (1)
lock (padlock)
{
// (2)
}
var value = nonVolatileField; // (3)
Furthermore, let's assume that no instruction in (2) has any effect on the nonVolatileField and vice versa.
Can the reading instruction (3) be reordered in such a way that in ends up before the lock statement (1) or inside it (2)?
As far as I can tell, nothing in the C# Specification (§3.10) and the CLI Specification (§I.12.6.5) prohibits such reordering.
Please note that this is not the same question as this one. Here I am asking specifically about read instructions, because as far as I understand, they are not considered side-effects and have weaker guarantees.

I believe this is partially guaranteed by the CLI spec, although it's not as clear as it might be. From I.12.6.5:
Acquiring a lock (System.Threading.Monitor.Enter or entering a synchronized method) shall implicitly perform a volatile read operation, and releasing a lock
(System.Threading.Monitor.Exit or leaving a synchronized method) shall implicitly perform a volatile write operation. See §I.12.6.7.
Then from I.12.6.7:
A volatile read has “acquire semantics” meaning that the read is guaranteed to occur prior to any references to memory that occur after the read instruction in the CIL instruction sequence. A volatile write has “release semantics” meaning that the write is guaranteed to happen after any memory references prior to the write instruction in the CIL instruction sequence.
So entering the lock should prevent (3) from moving to (1). Reading from nonVolatileField still counts as a "reference to memory", I believe. However, the read could still be performed before the volatile write when the lock exits, so it could still be moved to (2).
The C#/CLI memory model leaves a lot to be desired at the moment. I'm hoping that the whole thing can be clarified significantly (and probably tightened up, to make some "theoretically valid but practically awful" optimizations invalid).

As far as .NET is concerned, entering a monitor (the lock statement) has acquire semantics, as it implicitly performs a volatile read, and exiting a monitor (the end of the lock block) has release semantics, as it implicitly performs a volatile write (see §12.6.5 Locks and Threads in Common Language Infrastructure (CLI) Partition I).
volatile bool areWeThereYet = false;
// In thread 1
// Accesses, usually writes: create objects, initialize them
areWeThereYet = true;
// In thread 2
if (areWeThereYet)
{
// Accesses, usually reads: use created and initialized objects
}
When you write a value to areWeThereYet, all accesses before it were performed and not reordered to after the volatile write.
When you read from areWeThereYet, subsequent accesses are not reordered to before the volatile read.
In this case, when thread 2 observes that areWeThereYet has changed, it has a guarantee that the following accesses, usually reads, will observe the other thread's accesses, usually writes. Assuming there is no other code messing with the affected variables.
As for other synchronization primitives in .NET, such as SemaphoreSlim, although not explicitly documented, it would be rather useless if they didn't have similar semantics. Programs based on them could, in fact, not even work correctly in platforms or hardware architectures with a weaker memory model.
Many people share the thought that Microsoft ought to enforce a strong memory model on such architectures, similar to x86/amd64 as to keep the current code base (Microsoft's own and those of their clients) compatible.
I cannot verify myself, as I don't have an ARM device with Microsoft Windows, much less with .NET Framework for ARM, but at least one MSDN magazine article by Andrew Pardoe, CLR - .NET Development for ARM Processors, states:
The CLR is allowed to expose a stronger memory model than the ECMA CLI specification requires. On x86, for example, the memory model of the CLR is strong because the processor’s memory model is strong. The .NET team could’ve made the memory model on ARM as strong as the model on x86, but ensuring the perfect ordering whenever possible can have a notable impact on code execution performance. We’ve done targeted work to strengthen the memory model on ARM—specifically, we’ve inserted memory barriers at key points when writing to the managed heap to guarantee type safety—but we’ve made sure to only do this with a minimal impact on performance. The team went through multiple design reviews with experts to make sure that the techniques applied in the ARM CLR were correct. Moreover, performance benchmarks show that .NET code execution performance scales the same as native C++ code when compared across x86, x64 and ARM.

Do I need MemoryBarrier with ReaderWriterLockSlim?

It looks like the mono implementation has no MemoryBarrier calls inside the ReaderWriterLockSlim methods. So when I make any changes inside a write lock, I can receive old cached values in another thread which uses a read lock.
Is it really possible? Should I insert MemoryBarrier before and after the code inside Read and Write lock?

Looking at (what I think is) the mono source, the Mono ReaderWriterLockSlim is implemented using Interlocked calls.
These calls include a memory barrier on x86, so you shouldn't need to add one.

As Peter correctly points out, the implementation does introduce a memory barrier, just not explicitly.
More generally: the C# language specification requires that certain side effects be well ordered with respect to locks. Though that rule only applies to locks entered with the C# lock statement, it would be exceedingly strange for a provider of a custom locking primitive to make a locking object that did not follow the same rules. You are wise to double-check, but in general you can assume that if its a threading primitive then it has been designed to ensure that important side effects are well-ordered around it.

C# - Is "volatile" really needed as a keyword?

As I read deeper and deeper into the meaning of the volatile keyword, I keep saying to myself "this is way into implementation, this should not be a part of a high level programming language".
I mean, the fact that CPUs cache the data should be interesting for the JIT compiler, not to the C# programmer.
A considerable alternative might be an attribute (say, VolatileAttribute).
What do you think?

I think you got side-tracked. All the tech stuff about caching etc is part of an attempt to explain it in low level terms. The functional description for volatile would be "I might be shared". Given that by default nothing can be shared between threads, this is not altogether strange. And I think fundamental enough to warrant a keyword over an attribute, but I suppose it was largely influenced by historic decisions (C++)
One way to replace/optimize it is with VolatileRead() and VolatileWrite() calls. But that's even more 'implementation'.

Well, I certainly agree, it is pretty horrible that such an implementation detail is exposed. It is however the exact same kind of detail that's exposed by the lock keyword. We are still very far removed from that bug generator to be completely removed from our code.
The hardware guys have a lot of work to do. The volatile keyword matters a lot of CPU cores with a weak memory model. The marketplace isn't been kind to them, the Alpha and the Itanium haven't done well. Not exactly sure why, but I suspect that the difficulty of writing solid threaded code for these cores has a lot to do with it. Getting it wrong is quite a nightmare to debug. The verbiage in the MSDN Library documentation for volatile applies to these kind of processors, it otherwise is quite inappropriate for x86/x64 cores and makes it sound that the keyword is doing far more than it really does. Volatile merely prevents variable values from being stored in CPU registers on those cores.
Unfortunately, volatile still matters on x86 cores in very select circumstances. I haven't yet found any evidence that it matters on x64 cores. As far as I can tell, and backed up by the source code in SSCLI20, the Opcodes.Volatile instruction is a no-op for the x64 jitter, changing neither the compiler state nor emitting any machine code. That's heading the right way.
Generic advice is that wherever you're contemplating volatile, using lock or one of the synchronization classes should be your first consideration. Avoiding them to try to optimize your code is a micro-optimization, defeated by the amount of sleep you'll lose when your program is exhibiting thread race problems.

Using an attribute would be acceptable, if it were the other way around, that is, the compiler would assume that all varaibles are volatile, unless explicitly marked with an attribute saying it was safe. That would be incredibly determental to performance.
Hence it's assumed that, since having a variable's value changed outside of the view of the compiler is an abberation, the compiler would assume that it is not happeing.
However, That could happen in a program so the language itself must have a way of showing that.
Also, you seems confused about "implementation details". The term refers to things the compiler does behind your back. This is not the case here. Your code is modifying a varaible outside of the view of the compiler. SInce it's in your code, it will always be true. Hence the langauge must be able to indicate that.

volatile in c# emits the correct barriers, or fences, which would matter to the programmer doing multi-thread work. Bear in mind, that the compiler, runtime, and the processor all can reorder reads/writes to some degree (each has its own rules). Though CLR 2.0 has a stronger memory model that what CLI ECMA specifies, the CLR memory model still is not the strictest memory model so you have a need for volatile in C#.
As an attribute, I don't think you can use attributes inside a method body, so the keyword is necessary.

IIIRC, in C++ volatile was mainly about memory mapped I/O rather than cache. If you read the same port twice you get different answers. Still, I would agree with your assessment that this is more cleanly expressed in C# as an attribute.
On the other hand, most actual uses of volatile in C# can better be understood as thread lock anyway, so the choice of volatile is may be a little unfortunate.
Edit: Just to add: two links to show that in C/C++ `volatile is explicitly not for multithreading.

Why volatile is not enough?

I'm confused. Answers to my previous question seems to confirm my assumptions. But as stated here volatile is not enough to assure atomicity in .Net. Either operations like incrementation and assignment in MSIL are not translated directly to single, native OPCODE or many CPUs can simultaneously read and write to the same RAM location.
To clarify:
I want to know if writes and reads are atomic on multiple CPUs?
I understand what volatile is about. But is it enough? Do I need to use interlocked operations if I want to get latest value writen by other CPU?

Herb Sutter recently wrote an article on volatile and what it really means (how it affects ordering of memory access and atomicity) in the native C++. .NET, and Java environments. It's a pretty good read:
volatile vs. volatile

volatile in .NET does make access to the variable atomic.
The problem is, that's often not enough. What if you need to read the variable, and if it is 0 (indicating that the resource is free), you set it to 1 (indicating that it's locked, and other threads should stay away from it).
Reading the 0 is atomic. Writing the 1 is atomic. But between those two operations, anything might happen. You might read a 0, and then before you can write the 1, another thread jumps in, reads the 0, and writes an 1.
However, volatile in .NET does guarantee atomicity of accesses to the variable. It just doesn't guarantee thread safety for operations relying on multiple accesses to it. (Disclaimer: volatile in C/C++ does not even guarantee this. Just so you know. It is much weaker, and occasinoally a source of bugs because people assume it guarantees atomicity :))
So you need to use locks as well, to group together multiple operations as one thread-safe chunk. (Or, for simple operations, the Interlocked operations in .NET may do the trick)

I might be jumping the gun here but it sounds to me as though you're confusing two issues here.
One is atomicity, which in my mind means that a single operation (that may require multiple steps) should not come in conflict with another such single operation.
The other is volatility, when is this value expected to change, and why.
Take the first. If your two-step operation requires you to read the current value, modify it, and write it back, you're most certainly going to want a lock, unless this whole operation can be translated into a single CPU instruction that can work on a single cache-line of data.
However, the second issue is, even when you're doing the locking thing, what will other threads see.
A volatile field in .NET is a field that the compiler knows can change at arbitrary times. In a single-threaded world, the change of a variable is something that happens at some point in a sequential stream of instructions so the compiler knows when it has added code that changes it, or at least when it has called out to outside world that may or may not have changed it so that once the code returns, it might not be the same value it was before the call.
This knowledge allows the compiler to lift the value from the field into a register once, before a loop or similar block of code, and never re-read the value from the field for that particular code.
With multi-threading however, that might give you some problems. One thread might have adjusted the value, and another thread, due to optimization, won't be reading this value for some time, because it knows it hasn't changed.
So when you flag a field as volatile you're basically telling the compiler that it shouldn't assume that it has the current value of this at any point, except for grabbing snapshots every time it needs the value.
Locks solve multiple-step operations, volatility handles how the compiler caches the field value in a register, and together they will solve more problems.
Also note that if a field contains something that cannot be read in a single cpu-instruction, you're most likely going to want to lock read-access to it as well.
For instance, if you're on a 32-bit cpu and writing a 64-bit value, that write-operation will require two steps to complete, and if another thread on another cpu manages to read the 64-bit value before step 2 has completed, it will get half of the previous value and half of the new, nicely mixed together, which can be even worse than getting an outdated one.
Edit: To answer the comment, that volatile guarantees the atomicity of the read/write operation, that's well, true, in a way, because the volatile keyword cannot be applied to fields that are larger than 32-bit, in effect making the field single-cpu-instruction read/writeable on both 32 and 64-bit cpu's. And yes, it will prevent the value from being kept in a register as much as possible.
So part of the comment is wrong, volatile cannot be applied to 64-bit values.
Note also that volatile has some semantics regarding reordering of reads/writes.
For relevant information, see the MSDN documentation or the C# specification, found here, section 10.5.3.

On a hardware level, multiple CPUs can never write simultanously to the same atomic RAM location. The size of an atomic read/write operation dependeds on CPU architecture, but is typically 1, 2 or 4 bytes on a 32-bit architecture. However, if you try reading the result back there is always a chance that another CPU has made a write to the same RAM location inbetween. On a low level, spin-locks are typically used to synchronize access to shared memory. In a high level language, such mechanisms may be called e.g. critical regions.
The volatile type just makes sure the variable is written immediately back to memory when it is changed (even if the value is to be used in the same function). A compiler will usually keep a value in an internal register for as long as possible if the value is to be reused later in the same function, and it is stored back to RAM when all modifications are finished or when a function returns. Volatile types are mostly useful when writing to hardware registers, or when you want to be sure a value is stored back to RAM in e.g. a multithread system.

Your question doesn't entirely make sense, because volatile specifies the how the read happens, not atomicity of multi-step processes. My car doesn't mow my lawn, either, but I try not to hold that against it. :)

The problem comes in with register based cashed copies of your variable's values.
When reading a value, the cpu will first see if it's in a register (fast) before checking main memory (slower).
Volatile tells the compiler to push the value out to main memory asap, and not to trust the cached register value. It's only useful in certain cases.
If you're looking for single op code writes, you'll need to use the Interlocked.Increment related methods.. But they're fairly limited in what they can do in a single safe instruction.
Safest and most reliable bet is to lock() (if you can't do an Interlocked.*)
Edit: Writes and reads are atomic if they're in a lock or an interlocked.* statement. Volatile alone is not enough under the terms of your question

Volatile is a compiler keyword that tells the compiler what to do. It does not necessarily translate into (essentially) bus operations that are required for atomicity. That is usually left up to the operating system.
Edit: to clarify, volatile is never enough if you want to guarantee atomicity. Or rather, it's up to the compiler to make it enough or not.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.