I just found out about the Interlocked class and now I have some basic questions.
From my understanding I should use Interlocked when manipulating numeric variables when multi-threading. If that statement is true, what about doing reads or just general using of the variables?
For example:
if ((iCount % 100) == 0)
Do I need to use an Interlocked statement there?
What about when I'm initializing the variable:
Int32 iCount = 0;
I need to make sure I understand this before implmenting it.
There are various factors here, but principally volatility and atomicity. In your statement:
if ((iCount % 100) == 0)
Do I need to use an Interlocked statement there?
we first need to ask "what is iCount?". If it is long / ulong, then it is not guaranteed to be atomic, so you absolutely need some kind of protection (such as via Interlocked) to avoid getting a "torn" value (reading it half-way through being updated, giving a phantom value that it never actually was - for example, changing from 00000000 to FFFFFFFF you could read 0000FFFF or FFFF0000). If it is an int, it will be guaranteed atomic. The next question is: do I need to see updates? The CPU has various levels of caching built in, and code that appears to read from a field can end up actually just reading from a local register or cache - and never touching the actual memory. If that is a risk, then you can mitigate that by using Interlocked, although in many cases using volatile will also guard against this.
The third and tricker question comes when discussing updates: do you want "last edit blindly wins"? if so, just update the value (presumably using volatile to allow reads) - however - there is a risk of lost updates if two threads are editing. As an example, if two threads each increment and decrement at the same time the final value could be 0 or 1 - not necessarily what you wanted. Intelocked offers ways to do updates with change-detection, and helper methods to do common operations like increment / decrement / add / etc.
Re your other question:
What about when I'm initializing the variable:
Int32 iCount = 0;
Field-initializers are only executed on one thread, so do not need additional protection - that is fine.
However! Threading is hard. If you are at all unsure, keep it simple: use lock. For example (assuming you want per-instance synchronisation):
private int iCount = 0;
private readonly object syncLock = new object();
...
lock(syncLock) {
// code that reads or manipulates iCount
}
In many cases, this works fine.
When doing multithreading over shared, mutable state you need to synchronize. You do not need to use Interlocked. Interlocked is for advanced users. I suggest you use the lock C# statement and only use Interlocked for easy cases (increment a shared counter) or performance critical cases.
Interlocked can only be used to access a single variable at a time and only quite primitive operations are supported. You will have a really hard time synchronizing multiple variables with Interlocked.
In your example, nobody can tell whether you need to synchronize or not because thread safety is a property of the whole program, not of a single statement or function. You need to regard all code operating on the shared state as a whole.
I would like to add to the other answers that Microsoft has additionally introduced ImmutableInterlocked class.
This class is designed for handling Immutable Collections. The class has a set of functions for updating immutable collections using Compare-And-Swap pattern.
You can find it in the System.Collections.Immutable namespace.
https://msdn.microsoft.com/en-us/library/system.collections.immutable.immutableinterlocked(v=vs.111).aspx
Related
Joe Albahari has a great series on multithreading that's a must read and should be known by heart for anyone doing C# multithreading.
In part 4 however he mentions the problems with volatile:
Notice that applying volatile doesn’t prevent a write followed by a
read from being swapped, and this can create brainteasers. Joe Duffy
illustrates the problem well with the following example: if Test1 and
Test2 run simultaneously on different threads, it’s possible for a and
b to both end up with a value of 0 (despite the use of volatile on
both x and y)
Followed by a note that the MSDN documentation is incorrect:
The MSDN documentation states that use of the volatile keyword ensures
that the most up-to-date value is present in the field at all times.
This is incorrect, since as we’ve seen, a write followed by a read can
be reordered.
I've checked the MSDN documentation, which was last changed in 2015 but still lists:
The volatile keyword indicates that a field might be modified by
multiple threads that are executing at the same time. Fields that are
declared volatile are not subject to compiler optimizations that
assume access by a single thread. This ensures that the most
up-to-date value is present in the field at all times.
Right now I still avoid volatile in favor of the more verbose to prevent threads using stale data:
private int foo;
private object fooLock = new object();
public int Foo {
get { lock(fooLock) return foo; }
set { lock(fooLock) foo = value; }
}
As the parts about multithreading were written in 2011, is the argument still valid today? Should volatile still be avoided at all costs in favor of locks or full memory fences to prevent introducing very hard to produce bugs that as mentioned are even dependent on the CPU vendor it's running on?
Volatile in its current implementation is not broken despite popular blog posts claiming such a thing. It is however badly specified and the idea of using a modifier on a field to specify memory ordering is not that great (compare volatile in Java/C# to C++'s atomic specification that had enough time to learn from the earlier mistakes). The MSDN article on the other hand was clearly written by someone who has no business talking about concurrency and is completely bogus.. the only sane option is to completely ignore it.
Volatile guarantees acquire/release semantics when accessing the field and can only be applied to types that allow atomic reads and writes. Not more, not less. This is enough to be useful to implement many lock-free algorithms efficiently such as non-blocking hashmaps.
One very simple sample is using a volatile variable to publish data. Thanks to the volatile on x, the assertion in the following snippet cannot fire:
private int a;
private volatile bool x;
public void Publish()
{
a = 1;
x = true;
}
public void Read()
{
if (x)
{
// if we observe x == true, we will always see the preceding write to a
Debug.Assert(a == 1);
}
}
Volatile is not easy to use and in most situations you are much better off to go with some higher level concept, but when performance is important or you're implementing some low level data structures, volatile can be exceedingly useful.
As I read the MSDN documentation, I believe it is saying that if you see volatile on a variable, you do not have to worry about compiler optimizations screwing up the value because they reorder the operations. It doesn't say that you are protected from errors caused by your own code executing operations on separate threads in the wrong order. (although admittedly, the comment is not clear as to this.)
volatile is a very limited guarantee. It means that the variable isn't subject to compiler optimizations that assume access from a single thread. This means that if you write into a variable from one thread, then read it from another thread, the other thread will definitely have the latest value. Without volatile, one a multiprocessor machine without volatile, the compiler may make assumptions about single-threaded access, for example by keeping the value in a register, which prevents other processors from having access to the latest value.
As the code example you've mentioned shows, it doesn't protect you from having methods in different blocks reordered. In effect volatile makes each individual access to a volatile variable atomic. It doesn't make any guarantees as to the atomicity of groups of such accesses.
If you just want to ensure that your property has an up-to-date single value, you should be able to just use volatile.
The problem comes in if you try to perform multiple parallel operations as if they were atomic. If you have to force several operations to be atomic together, you need to lock the whole operation. Consider the example again, but using locks:
class DoLocksReallySaveYouHere
{
int x, y;
object xlock = new object(), ylock = new object();
void Test1() // Executed on one thread
{
lock(xlock) {x = 1;}
lock(ylock) {int a = y;}
...
}
void Test2() // Executed on another thread
{
lock(ylock) {y = 1;}
lock(xlock) {int b = x;}
...
}
}
The locks cause may cause some synchronization, which may prevent both a and b from having value 0 (I have not tested this). However, since both x and y are locked independently, either a or b can still non-deterministically end up with a value of 0.
So in the case of wrapping the modification of a single variable, you should be safe using volatile, and would not really be any safer using lock. If you need to atomically perform multiple operations, you need to use a lock around the entire atomic block, otherwise scheduling will still cause non-deterministic behavior.
Here are some useful disassemblies for volatile in C#: https://sharplab.io/#gist:625b1181356b543157780baf860c9173
On x86 it is just about:
using memory instead of registers
preventing compiler optimizations like in the case with the endless loop
I use volatile when I just want to tell compiler that a field might be updated from many different threads and I do not need additional features provided by interlocked operations.
I have read the threading manual and relevant MSDN pages and SO questions several times. Still, I do not completely understand if Volatile.Read/Write and interlocked operations apply only to the relevant variables, or all read/writes before/after that operations.
E.g., imagine I have an array and a counter.
long counter = 0;
var values = new double[1000000];
values[42] = 3.1415;
// Is this line needed instead of simple assignment above,
// or the implicit full-fence of Interlocked will guarantee that
// all threads will see the values[42] after interlocked increment?
//Volatile.Write(ref values[42], 3.1415);
Interlocked.Increment(ref counter);
Does interlocked increment guarantees the same result as if I used Volatile.Write(ref values[42], 3.1415); instead of values[42] = 3.1415;.
What if I have an array of reference types, e.g. some POCO, and set an instance fields before interlocked increment. Does the implicit full fence apply to all read/writes from that thread before it, or only to the counter?
I am implementing a scalable reader/writer scheme and I found the following statement in the Joe Duffy post:
If the variables protected are references to heap objects, you need to worry about using the read protection each time you touch a field. Just like locks, this technique doesn’t compose. As with anything other than simple locking, use this technique with great care and caution; although the built-in acquire and release fences shield you from memory model reordering issues, there are some easy traps you can fall into.
Is this just a general statement to discourage using low-lock constructs, or somehow applies to the example above?
What you are probably missing is an understanding of fences. This is the best resource to read up on them: http://www.albahari.com/threading/part4.aspx
The short answer is Interlocked.Increment issues a full fence which is independent of the variable it is updating. I believe Volatile.Write issues a half fence. A half fence can be constructed from Thread.MemoryBarrier. When we say Interlocked.Increment issues a full fence it means that Thread.MemoryBarrier is called before and after the operation. Volatile.Write calls Thread.MemoryBarrier before the write and Volatile.Read after. The fences determine when memory access can be reordered (and it's not variable specific as Thread.MemoryBarrier is parameterless).
Given an array of struct:
public struct Instrument
{
public double NoS;
public double Last;
}
var a1 = new Instrument[100];
And a threading task pool that is writing to those elements on the basis that a single element may at most be written to by two threads concurrently, one for each of the double fields (there is upstream queuing by topic effectively).
And the knowledge that double's can be written atomically on 64 bit. (edit this mistakenly said 32 bit originally)
I need to periodically perform a calculation using all the values in the array and I'd like them to be consistent during the calc.
So I can snapshot the array with:
var snapshot = a1.Clone();
Now the question I have is with regards to the specifics of the syncronisation. If I make the members volatile, I don't think that is going to help the clone at all, as the read/write aquire/releases are not at the array level.
Now I could have an array lock, but this will add a lot of contention on the most frequent process of writing data into the array. So not ideal.
Alternatively I could have a per row lock, but that would be a real pain as they'd all need to be aquired prior to clone, meanwhile I've got the writes all backing up.
Now it doesn't really matter if the snapshot doesn't have the very latest value if its a matter of microseconds etc, so I think I could probably get away with just having no lock. My only concern is if there could be a scenario where there isn't a cache writeback for a sustained period. Is this something I should worry about? The writers are in TPL dataflow and the sole logic is to set the two fields in the struct. I don't really know how or if function scope tends to correlate to cache write backs though.
Thoughts/advice?
edit: What about if I used an interlocked write to the variables in the struct?
edit2: The volume of writes is MUCH higher than the reads. There are also two seperate and concurrent services writing to the Nos & Last fields. So they could be being written simultaneously at once. This causes problems with a reference object approach for atomicity.
edit3: Further detail. Assume array is from 30-1000 elements and each element could be being updated multiple times a second.
Since Instrument contains two doubles (two 64-bit values), you can't write it atomically (even on 64-bit machines). This means that the Clone method can never make a thread-safe copy without doing some kind of synchronization.
TLDR; Don't use a struct, use an immutable class.
You would probably have more luck with a small redesign. Try using immutable data structures and concurrent collections from the .NET framework. For instance, make your Instrument an immutable class:
// Important: Note that Instrument is now a CLASS!!
public class Instrument
{
public Instrument(double nos, double last)
{
this.NoS = nos;
this.Last = last;
}
// NOTE: Private setters. Class can't be changed
// after initialization.
public double NoS { get; private set; }
public double Last { get; private set; }
}
This way updating an Instrument means you have to create a new one, which makes it much easier to reason about this. When you are sure that only one thread is working with a single Instrument you are done, since a worker can now safely do this:
Instrument old = a[5];
var newValue = new Instrument(old.NoS + 1, old.Last - 10);
a[5] = newValue;
Since, reference types are 32-bit (or 64-bit on a 64-bit machine) updating the reference is garanteed to be atomic. The clone will now always result in a correct copy (it might lack behind, but that doesn't seem to be a problem for you).
UPDATE
After re-reading your question, I see that I misread it, since one thread is not writing to an Instrument, but is writing to an instrument value, but the solution is practically the same: use immutable reference types. One simple trick for instance, is to change the backing fields of the NoS and Last properties to objects. This makes updating them atomic:
// Instrument can be a struct again.
public struct Instrument
{
private object nos;
private object last;
public double NoS
{
get { return (double)(this.nos ?? 0d); }
set { this.nos = value; }
}
public double Last
{
get { return (double)(this.last ?? 0d); }
set { this.last = value; }
}
}
When changing one of the properties, the value will be boxed, and boxed values are immutable reference types. This way you can safely update those properties.
And the knowledge that double's can be written atomically on 32 bit.
No, that is not guaranteed:
12.5 Atomicity of variable references
Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short,
ushort, uint, int, float, and reference types. In addition, reads and
writes of enum types with an underlying type in the previous list
shall also be atomic. Reads and writes of other types, including long,
ulong, double, and decimal, as well as user-defined types, need not
be atomic.
(emphasis mine)
No guarantee is made regarding doubles on 32-bit, or even on 64-bit. A strcut composed of 2 doubles is even more problematic. You should rethink your strategy.
You could (ab)use a ReaderWriterLockSlim.
Take a read lock when writing (since you say there is no contention between writers).
And take a write lock when cloning.
Not sure I'd do this though unless there's really no alternative. Could be confusing for whoever maintains this down the line.
Reads and writes of individual array elements, or individual struct fields, are generally independent. If while one thread is writing a particular field of a particular struct instance, no other thread will attempt to access that same field, an array of structs will be implicitly threadsafe without any locking required beyond the logic that enforces the above conditions.
If it is possible that one thread might try to read a double while another thread is writing it, but it's not possible that two threads might try to write simultaneously, there are a number of approaches you can take to ensure that a read won't see a partially-written value. One which hasn't been mentioned yet would be to define an int64 field, and use custom methods to read and write double values there (bitwise-converting them, and using Interlocked as needed).
Another approach would be to have a changeCount variable for each array slot, which gets incremented so the two LSB's are "10" before anything else before the struct is written, and Interlocked.Increment it by 2 afterward (see note below). Before code reads the struct, it should check whether a write is in progress. If not, it should perform the read and ensure a write hasn't started or happened (if a write occurred after the read was started, loop back to the beginning). If a write is in progress when code wants to read, it should acquire a shared lock, check whether the write is still in progress, and if so use an interlocked operation to set the LSB of changeCount and Monitor.Wait on the lock. The code which wrote the struct should notice in its Interlocked.Increment that the LSB got set, and should Pulse the lock. If the memory model ensures that reads by a single thread will be processed in order, and that writes by a single thread will be processed in order, and if only one thread will ever try to write an array slot at a time, this approach should limit the multi-processor overhead to a single Interlocked operation in the non-contention case. Note that one must carefully study the rules about what is or is not implied by the memory model before using this sort of code, since it can be tricky.
BTW, there are two more approaches one could take if one wanted to have each array element be a class type rather than a struct:
Use an immutable class type, and use `Interlocked.CompareExchange` any time you want to update an element. The pattern to use is this:
MyClass oldVal,newVal;
do
{
oldVal = theArray[subscript];
newVal = new MyClass(oldVal.this, oldVal.that+5); // Or whatever change
} while (Threading.Interlocked.CompareExchange(theArray[subscript], newVal, oldVal) != oldVal);
This approach will always yield a logically-correct atomic update of the array element. If, between the time the array element is read and the time it is updated, something else changes the value, the `CompareExchange` will leave the array element unaffected, and the code will loop back and try again. This approach works reasonably well in the absence of contention, though every update will require generating a new object instance. If many threads are trying to update the same array slot, however, and the constructor for `MyClass` takes any significant amount of time to execute, it's possible for code to thrash, repeatedly creating new objects and then finding out they're obsolete by the time they could be stored. Code will always make forward progress, but not necessarily quickly.
Use a mutable class, and lock on the class objects any time one wishes to read or write them. This approach would avoid having to create new class object instances any time something is changed, but locking would add some overhead of its own. Note that both reads and writes would have to be locked, whereas the immutable-class approach only required `Interlocked` methods to be used on writes.
I tend to think arrays of structs are nicer data holders than arrays of class objects, but both approaches have advantages.
Ok, so had a think about this over lunch.
I see two, possibly 3 solutions here.
First important note: The immutable idea does not work in my use case because I have two services running in parallel writing to NoS and Last independently. This means that I would need an extra layer of sync logic between those two services to ensure that whilst the new ref is being created by one services, the other one is not doing the same. Classic race condition problem so definitely not right for this problem (although yes I could have a ref for each double and do it that way but its getting ridiculous at that point)
Solution 1
Whole cache level lock. Maybe use a spinlock and just lock for all updates and the snapshot (with memcpy). This is simplest and probably totally fine for volumes I'm talking about.
Solution 2
Make all writes to doubles use interlocked writes. when I want to snapshot, iterate the array and each value using interlocked read to populate the copy. This may cause per struct tearing but the doubles are intact which is fine as this is continuously updating data so the concept of latest is a little abstract.
Solution 3
Don't think this will work, but what about interlocked writes to all doubles, and then just use memcopy. I am not sure if I will get tearing of the doubles though? (remember I don't care about tearing at struct level).
If solution 3 works then I guess its best performance, but otherwise I am more inclined for solution 1.
I have a thread that spins until an int changed by another thread is a certain value.
int cur = this.m_cur;
while (cur > this.Max)
{
// spin until cur is <= max
cur = this.m_cur;
}
Does this.m_cur need to be declared volatile for this to work? Is it possible that this will spin forever due to compiler optimization?
Yes, that's a hard requirement. The just-in-time compiler is allowed to store the value of m_cur in a processor register without refreshing it from memory. The x86 jitter in fact does, the x64 jitter doesn't (at least the last time I looked at it).
The volatile keyword is required to suppress this optimization.
Volatile means something entirely different on Itanium cores, a processor with a weak memory model. Unfortunately that's what made it into the MSDN library and C# Language Specification. What it is going to to mean on an ARM core remains to be seen.
The blog below has some fascinating detail on the memory model in c#. In short, it seems safer to use the volatile keyword.
http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
From the blog below
class Test
{
private bool _loop = true;
public static void Main()
{
Test test1 = new Test();
// Set _loop to false on another thread
new Thread(() => { test1._loop = false;}).Start();
// Poll the _loop field until it is set to false
while (test1._loop == true) ;
// The loop above will never terminate!
}
}
There are two possible ways to get the while loop to terminate: Use a
lock to protect all accesses (reads and writes) to the _loop field
Mark the _loop field as volatile There are two reasons why a read of a
non-volatile field may observe a stale value: compiler optimizations
and processor optimizations.
It depends on how m_cur is being modified. If it's using a normal assignment statement such as m_cur--;, then it does need to be volatile. However, if it's being modified using one of the Interlocked operations, then it doesn't because Interlocked's methods automatically insert a memory barrier to ensure that all threads get the memo.
In general, using Interlocked to modify atomic valued that are shared across threads is the preferable option. Not only does it take care of the memory barrier for you, but it also tends to be a bit faster than other synchronization options.
That said, like others have said polling loops are enormously wasteful. It would be better to pause the thread that needs to wait, and let whoever is modifying m_cur take charge of waking it up when the time comes. Both Monitor.Wait() and Monitor.Pulse() and AutoResetEvent might be well-suited to the task, depending on your specific needs.
Suppose I have a variable "counter", and there are several threads accessing and setting the value of "counter" by using Interlocked, i.e.:
int value = Interlocked.Increment(ref counter);
and
int value = Interlocked.Decrement(ref counter);
Can I assume that, the change made by Interlocked will be visible in all threads?
If not, what should I do to make all threads synchronize the variable?
EDIT: someone suggested me to use volatile. But when I set the "counter" as volatile, there is compiler warning "reference to volatile field will not be treated as volatile".
When I read online help, it said, "A volatile field should not normally be passed using a ref or out parameter".
InterlockedIncrement/Decrement on x86 CPUs (x86's lock add/dec) are automatically creating memory barrier which gives visibility to all threads (i.e., all threads can see its update as in-order, like sequential memory consistency). Memory barrier makes all pending memory loads/stores to be completed. volatile is not related to this question although C# and Java (and some C/C++ compilers) enforce volatile to make memory barrier. But, interlocked operation already has memory barrier by CPU.
Please also take a look my another answer in stackoverflow.
Note that I have assume that C#'s InterlockedIncrement/Decrement are intrinsic mapping to x86's lock add/dec.
Can I assume that, the change made by Interlocked will be visible in all threads?
This depends on how you read the value. If you "just" read it, then no, this won't always be visible in other threads unless you mark it as volatile. That causes an annoying warning though.
As an alternative (and much preferred IMO), read it using another Interlocked instruction. This will always see the updated value on all threads:
int readvalue = Interlocked.CompareExchange(ref counter, 0, 0);
which returns the value read, and if it was 0 swaps it with 0.
Motivation: the warning hints that something isn't right; combining the two techniques (volatile & interlocked) wasn't the intended way to do this.
Update: it seems that another approach to reliable 32-bit reads without using "volatile" is by using Thread.VolatileRead as suggested in this answer. There is also some evidence that I am completely wrong about using Interlocked for 32-bit reads, for example this Connect issue, though I wonder if the distinction is a bit pedantic in nature.
What I really mean is: don't use this answer as your only source; I'm having my doubts about this.
Actually, they aren't. If you want to safely modify counter, then you are doing the correct thing. But if you want to read counter directly you need to declare it as volatile. Otherwise, the compiler has no reason to believe that counter will change because the Interlocked operations are in code that it might not see.
Interlocked ensures that only 1 thread at a time can update the value. To ensure that other threads can read the correct value (and not a cached value) mark it as volatile.
public volatile int Counter;
No; an Interlocked-at-Write-Only alone does not ensure that variable reads in code are actually fresh; a program that does not correctly read from a field as well might not be Thread-Safe, even under a "strong memory model". This applies to any form of assigning to a field shared between threads.
Here is an example of code that will never terminate due to the JIT. (It was modified from Memory Barriers in .NET to be a runnable LINQPad program updated for the question).
// Run this as a LINQPad program in "Release Mode".
// ~ It will never terminate on .NET 4.5.2 / x64. ~
// The program will terminate in "Debug Mode" and may terminate
// in other CLR runtimes and architecture targets.
class X {
// Adding {volatile} would 'fix the problem', as it prevents the JIT
// optimization that results in the non-terminating code.
public int terminate = 0;
public int y;
public void Run() {
var r = new ManualResetEvent(false);
var t = new Thread(() => {
int x = 0;
r.Set();
// Using Volatile.Read or otherwise establishing
// an Acquire Barrier would disable the 'bad' optimization.
while(terminate == 0){x = x * 2;}
y = x;
});
t.Start();
r.WaitOne();
Interlocked.Increment(ref terminate);
t.Join();
Console.WriteLine("Done: " + y);
}
}
void Main()
{
new X().Run();
}
The explanation from Memory Barriers in .NET:
This time it is JIT, not the hardware. It’s clear that JIT has cached the value of the variable terminate [in the EAX register and the] program is now stuck in the loop highlighted above ..
Either using a lock or adding a Thread.MemoryBarrier inside the while loop will fix the problem. Or you can even use Volatile.Read [or a volatile field]. The purpose of the memory barrier here is only to suppress JIT optimizations. Now that we have seen how software and hardware can reorder memory operations, it’s time to discuss memory barriers ..
That is, an additional barrier construct is required on the read side to prevent issues with Compilation and JIT re-ordering / optimizations: this is a different issue than memory coherency!
Adding volatile here would prevent the JIT optimization, and thus 'fix the problem', even if such results in a warning. This program can also be corrected through the use of Volatile.Read or one of the various other operations that cause a barrier: these barriers are as much a part of the CLR/JIT program correctness as the underlying hardware memory fences.