I have a data class with lots of data in it (TV schedule data).
The data is queried from one side and periodically updated from the other side.
There are two threads: the first thread queries the data on request and the second thread updates the data on regular intervals.
To prevent locking, I use two instances (copies) of the data class: the live instance and the backup instance.
Initially, both instances are filled with the same data. The first thread only reads from the live instance.
The second thread periodically updates both instances as follows:
Update the backup instance.
Swap the backup and live instance (i.e. the backup instance becomes the live instance).
Update the backup instance.
Both backup instance and live instance are now up-to-date.
My questions is: how should I use the volatile keyword here?
public class data
{
// Lots of fields here.
// Should these fields also be declared volatile?
}
I have already made the references volatile:
public volatile data live
public volatile data backup
fields should be declared volatile if you plan to modify them outside locks, or without Interlocked. Here is the best article that explain volatile deeply: http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
To be honest, I would just lock on it. The correctness is so much easier to check, and the need for the backup is removed.
With your plan here, the fields would also have to be volatile. Consider the case otherwise:
public class Data
{
public int SimpleInt;
}
Here we have just a single public field for simplicity, the same applies to more realistic structures. (Incidentally, captials for class names is a more common convention in C#).
Now consider live.SimpleInt as seen by thread A. Because live could be cached, we need to have it as volatile. However, consider that when the object is swapped with backup, and then swapped back to live, then live will have the same memory location as it did before (unless the GC has moved it). Therefore live.SimpleInt will have the same memory location as it did before, and therefore if it was not volatile, thread A may be using a cached version of live.SimpleInt.
However, if you created a new Data object, rather than swapping in and out, then the new value of live.SimpleInt will not be in the thread's cache, and it could be safely non-volatile.
It's also important to consider that the fields of the fields will have to be volatile too.
Indeed now you need just one stored Data object. The new one will be created as an object referenced only by one thread (hence it cannot be damaged by or do damage to another thread), and its creation will be based on values read from live, which is also safe as the other thread is only reading (barring some memoisation techniques that mean that "reads" are really writes behind the scenes, reads can't harm other reads, though they can be harmed by writes) altered while visible to just a single thread, and hence only the final write requires any concern about synchronisation which should indeed be safe with only volatile or a MemoryBarrier used for protection, since assigning a reference is atomic, and since you don't care about the old value anymore.
I do not think you are going to get the effect you want by marking things with volatile. Consider this code.
volatile data live;
void Thread1()
{
if (live.Field1)
{
Console.WriteLine(live.Field1);
}
}
In the example above false could be written to the console if the second thread swapped the live and backup references between the time the first thread entered the if and called Console.WriteLine.
If that problem does not concern you then all you really need to do is mark the live variable as volatile. You do not need to mark the individual fields in data as volatile. The reason is because volatile reads create acquire fence memory barriers and volatile writes create release fence memory barriers. What that means is that when thread 2 swaps the references then all writes to the individual fields of data must commit first and when thread 1 wants to read the individual fields of the live instance the live variable must be reacquired from main memory first. You do not need to mark the backup variable as volatile because it is never used by thread 1.
The advanced threading section in Joe Albahari's ebook goes into a great deal of detail on the semantics of volatile and should explain why you only need to mark your live reference as such.
Related
If I have a local variable like so:
Increment()
{
int i = getFromDb(); // get count for a customer from db
};
And this is an instance class which gets incremented (each time a customer - an instance object - makes a purchase), is this variable thread safe? I hear that local variables are thread safe because each thread gets its own stack, etc etc.
Also, am I right in thinking that this variable is shared state? What I'm lacking in the thinking dept is that this variable will be working with different customer objects (e.g. John, Paul, etc) so is thread safe but this is flawed thinking and a bit of inexperience in concurrent programming. This sounds very naive, but then I don't have a lot of experience in concurrent coding like I do in general, synchronous coding.
EDIT: Also, the function call getFromDb() isn't part of the question and I don't expect anyone to guess on its thread safety as it's just a call to indicate the value is assigned from a function which gets data from the db. :)
EDIT 2: Also, getFromDb's thread safety is guaranteed as it only performs read operations.
i is declared as a local (method) variable, so it only normally exists in the stack-frame of Increment() - so yes, i is thread safe... (although I can't comment on getFromDb).
except if:
Increment is an iterator block (i.e. uses yield return or yield break)
i is used in an anonymous method ( delegate { i = i + 1;} ) or lambda (foo => {i=i+foo;}
In the above two scenarios, there are some cases when it can be exposed outside the stack. But I doubt you are doing either.
Note that fields (variables on the class) are not thread-safe, as they are trivially exposed to other threads. This is even more noticeable with static fields, since all threads automatically share the same field (except for thread-static fields).
Your statement has two separate parts - a function call, and an assignment.
The assignment is thread safe, because the variable is local. Every different invocation of this method will get its own version of the local variable, each stored in a different stack frame in a different place in memory.
The call to getFromDb() may or may not be threadsafe - depending on its implementation.
As long as the variable is local to the method it is thread-safe. If it was a static variable, then it wouldn't be by default.
class Example
{
static int var1; //not thread-safe
public void Method1()
{ int var2; //thread-safe
}
}
While your int i is thread safe, your whole case is case is probably not thread safe. Your int i is, as you said, thread safe because each thread has its own stack trace and therefore each thread has his own i. However, your thread all share the same database, therefore your database accesses are not thread safe. You need to properly synchronize your database accesses to make sure that each thread will see the database only at the correct moment.
As usual with concurrency and multithreading, you do not need to synchronize on your DB if you only read information. You do need to synchronize as soon as two thread will try to read/write the same set of informations from your DB.
i will be "thread safe" as each thread will have it's own copy of i on the stack as you suggest. The real question will be are the contents of getFromDb() thread safe?
i is a local variable so it's not shared state.
If your getFromDb() is reading from, say, an oracle sequence or a sql server autoincrement field then the db is taking care of synchronization (in most scenarios, excluding replication/distributed DBs) so you can probably safely return the result to any calling thread. That is, the DB is guaranteeing that every getFromDB() call will get a different value.
Thread-safety is usually a little bit of work - changing the type of a variable will rarely get you thread safety since it depends on how your threads will access the data. You can save yourself some headache by reworking your algorithm so that it uses a queue that all consumers synchronize against instead of trying to orchestrate a series of locks/monitors. Or better yet make the algorithm lock-free if possible.
i is thread safe syntactically. But when you assign instance variable's values,instance method return value to i, then the shared data is manipulated by multiple threads.
If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions, but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache. We think of lock blocks as protecting the variables read and modified within the block, which means:
Assuming you've properly implemented locking where necessary, those variables can only be read and written to by one thread at a time, and
Reads within the lock block see the latest versions of a variable and writes within the lock block become visible to all threads.
(Right?)
This second point is what interests me. Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads? Pardon my mental fuzziness here about how caches work, but I've read that caches hold several multi-byte "lines" of data. I think what I'm asking is, does a memory barrier force synchronization of all "dirty" cache lines or just some, and if just some, what determines which lines get synchronized?
If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions...
Right. The specification guarantees that.
but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache.
The C# specification says nothing whatsoever about "CPU cache". You've left the realm of what is guaranteed by the specification, and entered the realm of implementation details. There is no requirement that an implementation of C# execute on a CPU that has any particular cache architecture.
Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads?
Rather than try to parse your either-or question, let's say what is actually guaranteed by the language. A special effect is:
Any write to a variable, volatile or not
Any read of a volatile field
Any throw
The order of special effects is preserved at certain special points:
Reads and writes of volatile fields
locks
thread creation and termination
The runtime is required to ensure that special effects are ordered consistently with special points. So, if there is a read of a volatile field before a lock, and a write after, then the read can't be moved after the write.
So, how does the runtime achieve this? Beats the heck out of me. But the runtime is certainly not required to "guarantee that all memory is fresh for all threads". The runtime is required to ensure that certain reads, writes and throws happen in chronological order with respect to special points, and that's all.
The runtime is in particular not required that all threads observe the same order.
Finally, I always end these sorts of discussions by pointing you here:
http://blog.coverity.com/2014/03/26/reordering-optimizations/
After reading that, you should have an appreciation for the sorts of horrid things that can happen even on x86 when you act casual about eliding locks.
Reads within the lock block see the latest versions of a variable and writes within the lock block are visible to all threads.
No, that's definitely a harmful oversimplification.
When you enter the lock statement, there a memory fence which sort of means that you'll always read "fresh" data. When you exit the lock state, there's a memory fence which sort of means that all the data you've written is guaranteed to be written to main memory and available to other threads.
The important point is that if multiple threads only ever read/write memory when they "own" a particular lock, then by definition one of them will have exited the lock before the next one enters it... so all those reads and writes will be simple and correct.
If you have code which reads and writes a variable without taking a lock, then there's no guarantee that it will "see" data written by well-behaved code (i.e. code using the lock), or that well-behaved threads will "see" the data written by that bad code.
For example:
private readonly object padlock = new object();
private int x;
public void A()
{
lock (padlock)
{
// Will see changes made in A and B; may not see changes made in C
x++;
}
}
public void B()
{
lock (padlock)
{
// Will see changes made in A and B; may not see changes made in C
x--;
}
}
public void C()
{
// Might not see changes made in A, B, or C. Changes made here
// might not be visible in other threads calling A, B or C.
x = x + 10;
}
Now it's more subtle than that, but that's why using a common lock to protect a set of variables works.
I have a question related to the C# memory model and threads. I am not sure if the following code is correct without the volatile keyword.
public class A {
private int variableA = 0;
public A() {
variableA = 1;
Thread B = new Thread(new ThreadStart(() => printA())).Start();
}
private void printA() {
System.Console.WriteLine(variableA);
}
}
My concern is if it is guaranteed that the Thread B will see variableA with value 1 without using volatile? In the main thread I am only assigning 1 to variableA in the constructor. After that I am not touching variableA, it is used only in the Thread B, so locking is probably not necessary.
But, is it guaranteed that the main thread will flush his cache and write the variableA contents to the main memory, so the second thread can read the newly assigned value?
Additionally, is it guaranteed that the second thread will read the variableA contents from the main memory? May some compiler optimizations occur and the Thread B can read the variableA contents from the cache instead of the main memory? It may happen when the order of the instructions is changed.
For sure, adding volatile to the variableA declaration will make the code correct. But, is it neccessary? I am asking because I wrote some code with some non volatile variables initialization in the constructor, and the variables are used later by some Timer threads, and I am not sure if it is totally correct.
What about the same code in Java?
Thanks, Michal
There are a lot of places where implicit memory barriers are created. This is one of them. Starting threads create full barriers. So the write to variableA will get committed before the thread starts and the first reads will be acquired from main memory. Of course, in Microsoft's implementation of the CLR that is somewhat of a moot point because writes already have volatile semantics. But the same guarentee is not made in the ECMA specification so it is theorectically possible that the Mono implemenation could behave differently in this regard.
My concern is if it is guaranteed that
the Thread B will see variableA with
value 1 without using volatile?
In this case...yes. However, if you continue to use variableA in the second thread there is no guarentee after the first read that it will see updates.
But, is it guaranteed that the main
thread will flush his cache and write
the variableA contents to the main
memory, so the second thread can read
the newly assigned value?
Yes.
Additionally, is it guaranteed that
the second thread will read the
variableA contents from the main
memory?
Yes, but only on the first read.
For sure, adding volatile to the
variableA declaration will make the
code correct. But, is it neccessary?
In this very specific and narrow case...no. But, in general it is advised that you use the volatile keyword in these scenarios. Not only will it make your code thread-safe as the scenario gets more complicated, but it also helps to document the fact that the field is going to be used by more than one thread and that you have considered the implications of using a lock-free strategy.
The same code in Java is definitely okay - the creation of a new thread acts as a sort of barrier, effectively. (All actions earlier in the program text than the thread creation "happen before" the new thread starts.)
I don't know what's guaranteed in .NET with respect to new thread creation, however. Even more worrying is the possibility of a delayed read when using Control.BeginInvoke and the like... I haven't seen any guarantees around memory barriers for those situations.
To be honest, I suspect it's fine. I suspect that anything which needs to coordinate between threads like this (either creating a new one or marshalling a call onto an existing one) will use a full memory barrier on both of the threads involved. However, you're absolutely right to be concerned, and I'm hoping that you'll get a more definitive answer from someone smarter than me. You might want to email Joe Duffy to get his point of view on this...
But, is it guaranteed that the main thread will flush his cache and write the variableA contents to the main memory,
Yes, this is guaranteed by the MS CLR memory model. Not necessarily so for other implementations of the CLI (ie, I'm not sure about Mono). The ECMA standard does not require it.
so the second thread can read the newly assigned value?
That requires that the cache has been refreshed. It is probably guaranteed by the creation of the Thread (like Jon Skeet said). It is however not guaranteed by the previous point. The cache is flushed on each write but not on each read.
You could make very sure by using VolatileRead(ref variableA) but it is recommended (Jeffrey Richter) to use the Interlocked class. Note that VolatileWrite() is superfluous in MS.NET.
I have a class that has a few arraylists in it.
My main class creates a new instance of this class. My main class has at least 2 threads adding and removing from my class with the arraylists in it. At the moment everything is running fine but I was just wondering if it would be safer to declare my class with the arraylists in it as volatile eg/
private volatile myclass;
myclass = new myclass();
......
myclass.Add(...)
myclass.Clear(..)
Using the volatile keyword will not make your code thread-safe in this example. The volatile keyword is typically used to ensure that when reading or writing the value of a variable (i.e. class field) that the latest value for that variable is either read from main memory or written straight to main memory, rather than read from cache (e.g. a CPU register) for example. The volatile keyword is a way of saying "do not use caching optimizations with this shared field", and removes the issue where threads may use local copies of a field and so not see each other's updates.
In your case the value of myclass is not actually being updated (i.e. you are not re-assigning myclass) so volatile is not useful for you, and it is not the update of the myclass variable you actually want to make thread-safe in this case anyway.
If you wish to make updating of the actual class thread-safe, then using a "lock" around "Add" and "Clear" is a straight-forward alternative. This will ensure that only one thread at a time can do these operations (which update the internal state of myclass) and so should not be done in parallel.
A lock can be used as follows:
private readonly object syncObj = new object();
private readonly myclass = new myclass();
......
lock (syncObj)
{
myclass.Add(...)
}
lock (syncObj)
{
myclass.Clear(..)
}
You also need to add locking around any code that reads the state that is being updated by "Add", if that is the case although it does not appear in your example code.
It may not be obvious when first writing multi-threaded code why you would need a lock when adding to a collection. If we take List or ArrayList as an example, then the problem arises as internally these collections use an Array as a backing store, and will dynamically "grow" this Array (i.e. by creating a new larger Array and copying the old contents) as certain capacities are met when Add is called. This all happens internally and requires the maintenance of this Array and variables such as what current size the collection is (rather than the Length of the actual array which might be larger). So Adding to the collection may involve multiple steps if the internal Array needs to grow. When using multiple threads in an unsafe manner, multiple threads may indirectly cause growing to happen when Adding, and thus trample all over each others updates. As well as the issue of multiple threads Adding at the same time, there is also the issue that another thread may be trying to read the collection whilst the internal state is being changed. Using locks ensures that operations like these are done without interference from other threads.
At present, the code is wrong; adding a volatile keyword won't fix it. It's not safe to use the .NET classes across threads without adding synchronisation.
It's hard to give straightforward advice without knowing more about the structure of your code. A first step would be to start using the lock keyword around all accesses to the list object; however, there could still be assumptions in the code that don't work across multiple threads.
It's possible to use a collection class that's already safe for multithreaded access, which would avoid the need for getting the lock keyword in the right place, but it's still possible to make errors.
Can you post some more of your code? That way we can give more specific suggestions about making it thread safe.
I'm looking at the implementation of the VolatileRead/VolatileWrite methods (using Reflector), and i'm puzzled by something.
This is the implementation for VolatileRead:
[MethodImpl(MethodImplOptions.NoInlining)]
public static int VolatileRead(ref int address)
{
int num = address;
MemoryBarrier();
return num;
}
How come the memory barrier is placed after reading the value of "address"? dosen't it supposed to be the opposite? (place before reading the value, so any pending writes to "address" will be completed by the time we make the actual read.
The same thing goes to VolatileWrite, where the memory barrier is place before the assignment of the value. Why is that?
Also, why does these methods have the NoInlining attribute? what could happen if they were inlined?
I thought that until recently. Volatile reads aren't what you think they are - they're not about guaranteeing that they get the most recent value; they're about making sure that no read which is later in the program code is moved to before this read. That's what the spec guarantees - and likewise for volatile writes, it guarantees that no earlier write is moved to after the volatile one.
You're not alone in suspecting this code, but Joe Duffy explains it better than I can :)
My answer to this is to give up on lock-free coding other than by using things like PFX which are designed to insulate me from it. The memory model is just too hard for me - I'll leave it to the experts, and stick with things that I know are safe.
One day I'll update my threading article to reflect this, but I think I need to be able to discuss it more sensibly first...
(I don't know about the no-inlining part, btw. I suspect that inlining could introduce some other optimizations which aren't meant to happen around volatile reads/writes, but I could easily be wrong...)
Maybe I am oversimplifying, but I think the explanations about reordering and cache coherency and so on give too much details.
So, why the MemoryBarrier comes after the actual read?
I will try to explain this with an example that uses object instead of int.
One may think the correct is:
Thread 1 creates the object (initializes its inner data).
Thread 1 then puts the object into a variable.
Then it "does a fence" and all threads see the new value.
Then, the read is something like this:
Thread 2 "does a fence".
Thread 2 reads the object instance.
Thread 2 is sure that it has all the inner data of that instance (as it started with a fence).
The biggest problem with this is:
Thread 1 creates the object and initializes it.
Thread 1 then puts the object into a variable.
Before the Thread flushes the cache, the CPU itself flushes part of the cache... it commits only the address of the variable (not the contents of that variable).
At that moment, Thread 2 had already flushed its cache. So it is going to read everything from the main memory.
So, it reads the variable (it is there).
Then it reads the content (it is not there).
Finally, after all this, the CPU 1 executes the Thread 1 that does the fence.
So, what happens with the volatile write and read?
The volatile write makes the contents of the object go to the memory immediately (starts by the fence), then they set the variable (with may not go immediatelly to the real memory).
Then, the volatile read will first clear the cache. Then it reads the field. If it receives a value when reading the field, it is certain that the contents pointed by that reference are really there.
By those little things, yes, it is possible that you do a VolatileWrite(1) and another thread still see the value of zero. But as soon other threads see the value of 1 (using a volatile read), all other items needed that may be referenced are already there. You can't really tell it as when reading the old value (0 or null) you may simple not progress considering that you don't still have everything that you need.
I already saw some discussions that, even if that flushes the caches twice, the right pattern will be:
MemoryBarrier - will flush other variables changed before this call
Write
MemoryBarrier - will guarantee that the write was flushed
The Read will then need the same:
MemoryBarrier
Read - Guarantees that we see the latest info... maybe one that was put AFTER our memory barrier.
As something may have appeared after our MemoryBarrier and was already read, we must put another MemoryBarrier to access the contents.
Those could be two Write-Fences or two Read-Fences if that existed in .Net.
I am not sure on everything I said... that is a "compilation" of many information I got and it really explains why the VolatileRead and VolatileWrite appear to be reversed, but it also guarantees that no invalid values are read when using them.