In the current implementation of CPython, there is an object known as the "GIL" or "Global Interpreter Lock". It is essentially a mutex that prevents two Python threads from executing Python code at the same time. This prevents two threads from being able to corrupt the state of the Python interpreter, but also prevents multiple threads from really executing together. Essentially, if I do this:
# Thread A
some_list.append(3)
# Thread B
some_list.append(4)
I can't corrupt the list, because at any given time, only one of those threads are executing, since they must hold the GIL to do so. Now, the items in the list might be added in some indeterminate order, but the point is that the list isn't corrupted, and two things will always get added.
So, now to C#. C# essentially faces the same problem as Python, so, how does C# prevent this? I'd also be interested in hearing Java's story, if anyone knows it.
Clarification: I'm interested in what happens without explicit locking statements, especially to the VM. I am aware that locking primitives exist for both Java & C# - they exist in Python as well: The GIL is not used for multi-threaded code, other than to keep the interpreter sane. I am interested in the direct equivalent of the above, so, in C#, if I can remember enough... :-)
List<String> s;
// Reference to s is shared by two threads, which both execute this:
s.Add("hello");
// State of s?
// State of the VM? (And if sane, how so?)
Here's another example:
class A
{
public String s;
}
// Thread A & B
some_A.s = some_other_value;
// some_A's state must change: how does it change?
// Is the VM still in good shape afterwards?
I'm not looking to write bad C# code, I understand the lock statements. Even in Python, the GIL doesn't give you magic-multi-threaded code: you must still lock shared resources. But the GIL prevents Python's "VM" from being corrupted - it is this behavior that I'm interested in.
Most other languages that support threading don't have an equivalent of the Python GIL; they require you to use mutexes, either implicitly or explicitly.
Using lock, you would do this:
lock(some_list)
{
some_list.Add(3);
}
and in thread 2:
lock(some_list)
{
some_list.Add(4);
}
The lock statement ensures that the object inside the lock statement, some_list in this case, can only be accessed by a single thread at a time. See http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx for more information.
C# does not have an equivalent of GIL to Python.
Though they face the same issue, their design goals make them
different.
With GIL, CPython ensures that suche operations as appending a list
from two threads is simple. Which also
means that it would allow only one
thread to run at any time. This
makes lists and dictionaries thread safe. Though this makes the job
simpler and intuitive, it makes it
harder to exploit the multithreading
advantage on multicores.
With no GIL, C# does the opposite. It ensures that the burden of integrity is on the developer of the
program but allows you to take
advantage of running multiple threads
simultaneously.
As per one of the discussion -
The GIL in CPython is purely a design choice of having
a big lock vs a lock per object
and synchronisation to make sure that objects are kept in a coherent state.
This consist of a trade off - Giving up the full power of
multithreading.
It has been that most problems do not suffer from this disadvantage
and there are libraries which help you exclusively solve this issue when
required.
That means for a certain class of problems, the burden to utilize the
multicore is
passed to developer so that rest can enjoy the more simpler, intuitive
approach.
Note: Other implementation like IronPython do not have GIL.
It may be instructive to look at the documentation for the Java equivalent of the class you're discussing:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Most complex datastructures(for example lists) can be corrupted when used without locking in multiple threads.
Since changes of references are atomic, a reference always stays a valid reference.
But there is a problem when interacting with security critical code. So any datastructures used by critical code most be one of the following:
Inaccessible from untrusted code, and locked/used correctly by trusted code
Immutable (String class)
Copied before use (valuetype parameters)
Written in trusted code and uses internal locking to guarantee a safe state
For example critical code cannot trust a list accessible from untrusted code. If it gets passed in a List, it has to create a private copy, do it's precondition checks on the copy, and then operate on the copy.
I'm going to take a wild guess at what the question really means...
In Python data structures in the interpreter get corrupted because Python is using a form of reference counting.
Both C# and Java use garbage collection and in fact they do use a global lock when doing a full heap collection.
Data can be marked and moved between "generations" without a lock. But to actually clean it up everything must come to a stop. Hopefully a very short stop, but a full stop.
Here is an interesting link on CLR garbage collection as of 2007:
http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1104.entry
Related
It is my understanding that C# is a safe language and doesn't allow one to access unallocated memory, other than through the unsafe keyword. However, its memory model allows reordering when there is unsynchronized access between threads. This leads to race hazards where references to new instances appear to be available to racing threads before the instances have been fully initialized, and is a widely known problem for double-checked locking. Chris Brumme (from the CLR team) explains this in their Memory Model article:
Consider the standard double-locking protocol:
if (a == null)
{
lock(obj)
{
if (a == null)
a = new A();
}
}
This is a common technique for avoiding a lock on the read of ‘a’ in the typical case. It works just fine on X86. But it would be broken by a legal but weak implementation of the ECMA CLI spec. It’s true that, according to the ECMA spec, acquiring a lock has acquire semantics and releasing a lock has release semantics.
However, we have to assume that a series of stores have taken place during construction of ‘a’. Those stores can be arbitrarily reordered, including the possibility of delaying them until after the publishing store which assigns the new object to ‘a’. At that point, there is a small window before the store.release implied by leaving the lock. Inside that window, other CPUs can navigate through the reference ‘a’ and see a partially constructed instance.
I've always been confused by what "partially constructed instance" means. Assuming that the .NET runtime clears out memory on allocation rather than garbage collection (discussion), does this mean that the other thread might read memory that still contains data from garbage-collected objects (like what happens in unsafe languages)?
Consider the following concrete example:
byte[] buffer = new byte[2];
Parallel.Invoke(
() => buffer = new byte[4],
() => Console.WriteLine(BitConverter.ToString(buffer)));
The above has a race condition; the output would be either 00-00 or 00-00-00-00. However, is it possible that the second thread reads the new reference to buffer before the array's memory has been initialized to 0, and outputs some other arbitrary string instead?
Let's not bury the lede here: the answer to your question is no, you will never observe the pre-allocated state of memory in the CLR 2.0 memory model.
I'll now address a couple of your non-central points.
It is my understanding that C# is a safe language and doesn't allow one to access unallocated memory, other than through the unsafe keyword.
That is more or less correct. There are some mechanisms by which one can access bogus memory without using unsafe -- via unmanaged code, obviously, or by abusing structure layout. But in general, yes, C# is memory safe.
However, its memory model allows reordering when there is unsynchronized access between threads.
Again, that's more or less correct. A better way to think about it is that C# allows reordering at any point where the reordering would be invisible to a single threaded program, subject to certain constraints. Those constraints include introducing acquire and release semantics in certain cases, and preserving certain side effects at certain critical points.
Chris Brumme (from the CLR team) ...
The late great Chris's articles are gems and give a great deal of insight into the early days of the CLR, but I note that there have been some strengthenings of the memory model since 2003 when that article was written, particularly with respect to the issue you raise.
Chris is right that double-checked locking is super dangerous. There is a correct way to do double-checked locking in C#, and the moment you depart from it even slightly, you are off in the weeds of horrible bugs that only repro on weak memory model hardware.
does this mean that the other thread might read memory that still contains data from garbage-collected objects
I think your question is not specifically about the old weak ECMA memory model that Chris was describing, but rather about what guarantees are actually made today.
It is not possible for re-orderings to expose the previous state of objects. You are guaranteed that when you read a freshly-allocated object, its fields are all zeros.
This is made possible by the fact that all writes have release semantics in the current memory model; see this for details:
http://joeduffyblog.com/2007/11/10/clr-20-memory-model/
The write that initializes the memory to zero will not be moved forwards in time with respect to a read later.
I've always been confused by "partially constructed objects"
Joe discusses that here: http://joeduffyblog.com/2010/06/27/on-partiallyconstructed-objects/
Here the concern is not that we might see the pre-allocation state of an object. Rather, the concern here is that one thread might see an object while the constructor is still running on another thread.
Indeed, it is possible for the constructor and the finalizer to be running concurrently, which is super weird! Finalizers are hard to write correctly for this reason.
Put another way: the CLR guarantees you that its own invariants will be preserved. An invariant of the CLR is that newly allocated memory is observed to be zeroed out, so that invariant will be preserved.
But the CLR is not in the business of preserving your invariants! If you have a constructor which guarantees that field x is true if and only if y is non-null, then you are responsible for ensuring that this invariant is always observed to be true. If in some way this is observed by two threads, then one of those threads might observe the invariant being violated.
I search about Why .NET String is immutable? And got this answer:
Instances of immutable types are inherently thread-safe, since no
thread can modify it, the risk of a thread modifying it in a way that
interfers with another is removed (the reference itself is a different
matter).
So I want to know How Instances of immutable types are inherently thread-safe?
Why Instances of immutable types are inherently thread-safe?
Because an instance of a string type can't be mutated across multiple threads. This effectively means that one thread changing the string won't result in that same string being changed in another thread, since a new string is allocated in the place the mutation is taking place.
Generally, everything becomes easier when you create an object once, and then only observe it. Once you need to modify it, a new local copy gets created.
Wikipedia:
Immutable objects can be useful in multi-threaded applications.
Multiple threads can act on data represented by immutable objects
without concern of the data being changed by other threads. Immutable
objects are therefore considered to be more thread-safe than mutable
objects.
#xanatos (and wikipedia) point out that immutable isn't always thread-safe. We like to make that correlation because we say "any type which has persistent non-changing state is safe across thread boundaries", but may not be always the case. Assume a type is immutable from the "outside", but internally will need to modify it's state in a way which may not be safe when done in parallel from multiple threads, and may cause undetermined behavior. This means that although immutable, it is not thread safe.
To conclude, immutable != thread-safe. But immutability does take you one step closer, when done right, towards being able to do multi-threaded work correctly.
The short answer:
Because you only write the data in 1 thread and always read it after writing in multiple threads. Because there is no read/write conflict possible, it's thread safe.
The long answer:
A string is essentially a pointer to a buffer of memory. Basically what happens is that you create a buffer, fill it with characters and then expose the pointer to the outside world.
Note that you cannot access the contents of the string before the string object itself is constructed, which enforces this ordering of 'write data', then 'expose pointer'. If you would do it the other way around (I guess that's theoretically possible), problems might arrise.
If another thread (let's say: CPU) reads the pointer, it is a 'new pointer' for the CPU, which therefore requires the CPU to go to the 'real' memory and then read the data. If it would take the pointer contents from cache, we would have had a problem.
The last piece of the puzzle has to do with memory management: we have to know it's a 'new' pointer. In .NET we know this is the case: memory on the heap is basically never re-used until a GC occurs. The garbage collector then does a mark, sweep and compact.
Now, you might argue that the 'compact' phase reuses pointers, therefore changing the contents of the pointers. While this is true, the GC also has to stop the threads and force a full memory fence, which in simple terms, flushes the CPU cache. After that, all memory access is guaranteed, which ensures you always have to go to memory after the GC phase completes.
As you can see there is no way to read the data by not reading it directly from memory (the way it was written). Since it's immutable, the contents remain the same for all threads until it's eventually collected. As such, it's thread safe.
I've seen some discussion about immutable here, that suggests you can change an internal state. Of course, the moment you start changing things, you can potentially introduce read/write conflicts.
The definition of that I'm using here is to keep the contents constant after creation. That is: write once, read many, don't change (any) state after exposing the pointer. You get the picture.
One of the biggest problem in multi-threading code is two threads accessing the same memory cell at the same time with at least one of them modifying this memory cell.
If none of the threads can modify a memory cell, the problem does not exist any longer.
Because an immutable variable is not modifyable, it can be used from several threads without any further measures (for example locks).
Consider that I have a custom class called Terms and that class contains a number of strings properties. Then I create a fairly large (say 50,000) List<Terms> object. This List<Terms> only needs to be read from but it needs to be read from by multiple instances of Task.Factory.StartNew (the number of instances could vary from 1 to 100s).
How would I best pass that list into the long running task? Memory isn't too much of a concern as this is a custom application for a specific use on a specific server with plenty of memory. Should I reference it or should I just pass it off as a normal argument into the method doing the work?
Since you're passing a reference it doesn't really matter how you pass it, it won't copy the list itself. As Ket Smith said, I would pass it as a parameter to the method you are executing.
The issue is List<T> is not entirely thread-safe. Reads by multiple threads are safe but a write can cause some issues:
It is safe to perform multiple read operations on a List, but issues can occur if the collection is modified while it’s being read. To ensure thread safety, lock the collection during a read or write operation.
From List<T>
You say your list is read-only so that may be a non-issue, but a single unpredictable change could lead to unexpected behavior and so it's bug-prone.
I recommend using ImmutableList<T> which is inherently thread-safe since it's immutable.
So long as you don't try to copy it into each separate task, it shouldn't make much difference: more a matter of coding style than anything else. Each task will still be working with the same list in memory: just a different reference to the same underlying list.
That said, sheerly as a matter of coding style and maintainability, I'd probably try to pass it in as a parameter to whatever method you're executing in your Task.Factory.StartNew() (or better yet, Task.Run() - see here). That way, you've clearly called out your task's dependencies, and if you decide that you need to get the list from some other place, it's more clear what you've got to change. (But you could probably find 20 places in my own code where I haven't followed that rule: sometimes I go with what's easier for me now than with what's likely to be easier for the me six months from now.)
(I know they don't but I'm looking for the underlying reason this actually works without using volatile since there should be nothing preventing the compiler from storing a variable in a register without volatile... or is there...)
This question stems from the discord of thought that without volatile the compiler (can optimize in theory any variable in various ways including storing it in a CPU register.) While the docs say that is not needed when using synchronization like lock around variables. But there is really in some cases seemingly no way the compiler/jit can know you will use them or not in your code path. So the suspicion is something else is really happening here to make the memory model "work".
In this example what prevents the compiler/jit from optimizing _count into a register and thus having the increment done on the register rather then directly to memory (later writing to memory after the exit call)? If _count were volatile it would seem everything should be fine, but a lot of code is written without volatile. It makes sense the compiler could know not to optimize _count into a register if it saw a lock or synchronization object in the method.. but in this case the lock call is in another function.
Most documentation says you don't need to use volatile if you use a synchronization call like lock.
So what prevents the compiler from optimizing _count into a register and potentially updating just the register within the lock? I have a feeling that most member variables won't be optimized into registers for this exact reason as then every member variable would really need to be volatile unless the compiler could tell it shouldn't optimize (otherwise I suspect tons of code would fail). I saw something similar when looking at C++ years ago local function variables got stored in registers, class member variables did not.
So the main question is, is it really the only way this possibly works without volatile that the compiler/jit won't put class member variables in registers and thus volatile is then unnecessary?
(Please ignore the lack of exception handling and safety in the calls, but you get the gist.)
public class MyClass
{
object _o=new object();
int _count=0;
public void Increment()
{
Enter();
// ... many usages of count here...
count++;
Exit();
}
//lets pretend these functions are too big to inline and even call other methods
// that actually make the monitor call (for example a base class that implemented these)
private void Enter() { Monitor.Enter(_o); }
private void Exit() { Monitor.Exit(_o); } //lets pretend this function is too big to inline
// ...
// ...
}
Entering and leaving a Monitor causes a full memory fence. Thus the CLR makes sure that all writing operations before the Monitor.Enter / Monitor.Exit become visible to all other threads and that all reading operations after the method call "happen" after it. That also means that statements before the call cannot be moved after the call and vice versa.
See http://www.albahari.com/threading/part4.aspx.
The best guess answer to this question would appear to be that that any variables that are stored in CPU registers are saved to memory before any function would be called. This makes sense because compiler design viewpoint from a single thread would require that, otherwise the object might appear to be inconsistent if it were used by other functions/methods/objects.
So it may not be so much as some people/articles claim that synchronization objects/classes are detected by the compilers and non-volatile variables are made safe through their calls. (Perhaps they are when a lock is used or other synchronization objects in the same method, but once you have calls in another method that calls those synchronization objects probably not), instead it is likely that just the fact of calling another method is probably enough to cause the values stored in CPU registers to be saved to memory. Thus not requiring all variables to be volatile.
Also I suspect and others have suspected too that fields of a class are not optimized as much due to some of the threading concerns.
Some notes (my understanding):
Thread.MemoryBarrier() is mostly a CPU instruction to insure writes/reads don't bypass the barrier from a CPU perspective. (This is not directly related to values stored in registers) So this is probably not what directly causes to save variables from registers to memory (except just by the fact it is a method call as per our discussion here, would likely cause that to happen- It could have really been any method call though perhaps to affect all class fields that were used being saved from registers)
It is theoretically possible the JIT/Compiler could also take that method into an account in the same method to ensure variables are stored from CPU registers. But just following our simple proposed rule of any calls to another method or class would result in saving variables stored in registers to memory. Plus if someone wrapped that call in another method (maybe many methods deep), the compiler wouldn't likely analyze that deep to speculate on execution. The JIT could do something but again it likely wouldn't analyze that deep, and both cases need to ensure locks/synchronization work no matter what, thus the simplest optimization is the likely answer.
Unless we have anyone that writes the compilers that can confirm this its all a guess, but its likely the best guess we have of why volatile is not needed.
If that rule is followed synchronization objects just need to employ their own call to MemoryBarrier when they enter and leave to ensure the CPU has the most up to date values from its write caches so they get flushed so proper values can be read. On this site you will see that is what is suggested implicit memory barriers: http://www.albahari.com/threading/part4.aspx
So what prevents the compiler from optimizing _count into a register
and potentially updating just the register within the lock?
There is nothing in the documentation that I am aware that would preclude that from happening. The point is that the call to Monitor.Exit will effectively guarantee that the final value of _count will be committed to memory upon completion.
It makes sense the compiler could know not to optimize _count into a
register if it saw a lock or synchronization object in the method..
but in this case the lock call is in another function.
The fact that the lock is acquired and released in other methods is irrelevant from your point of view. The model memory defines a pretty rigid set of rules that must be adhered to regarding memory barrier generators. The only consequence of putting those Monitor calls in another method is that JIT compiler will have a harder time complying with those rules. But, the JIT compiler must comply; period. If the method calls get to complex or nested too deep then I suspect the JIT compiler punts on any heuristics it might have in this regard and says, "Forget it, I'm just not going to optimize anything!"
So the main question is, is it really the only way this possibly works
without volatile that the compiler/jit won't put class member
variables in registers and thus volatile is then unnecessary?
It works because the protocol is to acquire the lock prior to reading _count as well. If the readers do not do that then all bets are off.
It looks like the mono implementation has no MemoryBarrier calls inside the ReaderWriterLockSlim methods. So when I make any changes inside a write lock, I can receive old cached values in another thread which uses a read lock.
Is it really possible? Should I insert MemoryBarrier before and after the code inside Read and Write lock?
Looking at (what I think is) the mono source, the Mono ReaderWriterLockSlim is implemented using Interlocked calls.
These calls include a memory barrier on x86, so you shouldn't need to add one.
As Peter correctly points out, the implementation does introduce a memory barrier, just not explicitly.
More generally: the C# language specification requires that certain side effects be well ordered with respect to locks. Though that rule only applies to locks entered with the C# lock statement, it would be exceedingly strange for a provider of a custom locking primitive to make a locking object that did not follow the same rules. You are wise to double-check, but in general you can assume that if its a threading primitive then it has been designed to ensure that important side effects are well-ordered around it.