Threads communicating without locking

Threads communicating without locking - c#

If I can guarantee myself that only one method in my entire app will ever write to a certain variable, then may I allow other methods in my app to safely read that value ?
If so, can I get away that stunt without locking the variable ?
In this context, what I'm doing (or, trying to do, or want to do) is for one method in one thread to put a value into the variable, and then other methods in other threads will read that value and make decisions.
A very nice option would be to lock against writes, while allowing reads.
Looked here MSDN page on lock and didn't see a way to do that.

As always, it depends a lot on the context.
a variable read in a tight loop may be stored in a register or local cache, so no change will be noticed unless you have a "fence"; volatile will fix this, but as a side-effect rather than by explicit intention; most people (including me) can't properly define what volatile means - so be very careful of using it as a "fix".
an oversize type (large struct) will not be atomic (for either read or write) - and cannot be handled safely without risk of tearing
an object or value might involve multiple sub-values; if they aren't changed atomically, it could cause problems
You might, however, find that Interlocked solves most of your problems without needing a lock. At the same time, an uncontested lock is insanely fast, and even a contested lock is still alarmingly fast. Frankly, I'm not sure that it is worth the thought you are giving it: a flat lock is almost certainly fast-enough, as long as you do the thinking first outside the lock, and only lock it when you know the changes you want to make.
There is also ReaderWriterLockSlim, but the number of cases where that actually improves performance is slim - in my experience, the simplest approach possible is usually the fastest, meaning either lock or Interlocked. ReaderWriterLockSlim is a more complex beast, designed for more complex scenarios, and has a little overhead because of it. Not massive amounts, but enough to make it worth looking carefully.

Related

Does locking with many different objects has an impact performance wise comparing to only one?

I don't know if the question is stupid or not, locking and the Monitor is kind a black box to me.
But I'm dealing with a situation where I can either use the same lock object to lock everything all the time or use a indefinite number of object to lock at a more fine grain level.
I know that the second way will reduce the lock contention, but I may end up using 10K objects as locks and I don't know if it has an impact or not.
Bottom line: does too many locks hurt locking or it has no impact?
Edit
I wrote a lib that maintain a graph of objects, the number could be very high. For now it's not thread safe, mainly for the reason Eric stated in his comment.
I initially thought that if the user wanted to do some multi-threading then he/she would have to take care of the locking.
But now I'm wondering that if I would have to make it thread-safe, what would be the best way to do it (note that making it thread-safe wouldn't be a short and easy ride for me so testing both solutions is something I can't do easily)?
As the purpose is to make each object of the graph thread-safe, then I could use the instance of the object for the lock when I want to access/modify its properties. I know it's the best way to reduce contention, but I don't know if it would scale as much as having only one lock for the whole graph.
I know there's a lot to consider, how many threads and especially (I think) the chance of an object being accessed/changed by multiple threads at a time (which I estimate to be pretty low). But I can't find accurate information about locks and their overhead in such case.

To get a more clearer view of what's going on I looked at the source code of the Monitor class and its C++ counterpart in clr/src/vm/syncblk.cpp in the Shared Source Common Language Infrastructure released by Microsoft.
To answer my own question: no, having a lot of locks doesn't hurt in any harmful way I could think of.
What I learned:
1) A lock that's is already taken by the same thread is processed "almost free".
2) A lock that's taken for the first time is basically the cost of an InterlockedCompareExchange.
3) Multiple threads waiting for a lock is fairly cheap to track (a link list is maintained, O(1) complexity).
4) A thread waiting for a lock to release is by far the most costly use case, the implem first spinwaits to try to get out, but if it's not enough a thread switch will occurs, putting the thread to sleep until a mutex signals it's time to wake up because of the lock release.
I got my answer by digging for the 2): if you're always locking with the same object or 10K different one, it's basically the same (extra initialization is performed the first time you lock a given object, but it's not too bad). The InterlockedCompareExchange doesn't care about being called on the same or different memory location (AFAIK).
Contention is by far the most critical concern. Having many locks would reduce (drastically in my case) the chance of contention, so it can only be a good thing.
1) is also an important learned lesson: if I lock/unlock for each property change/access I can improve performances by locking the object first, then changing many properties and release the lock. This way there will be only one InterlockedCompareExchange and the lock/unlock inside the implementation of the property change/access will only increment an internal counter.
To dig deeper I would have to find more information about the implementation of the InterlockedCompareExchange, I think it relies on the CPU specific assembly instruction...

Typically, performance concerns around locking are related to contention. Acquiring an uncontested lock is on the order of 10s of nanoseconds. Contention is the real performance killer. As you point out, having more locks (higher lock granularity) can improve performance by decreasing contention.
The drawback to having multiple locks is typically lock management must be more complex. If multiple locks are required to perform an operation there is the increased possibility of resource starvation issues like deadlock or livelock. Proper lock management, such as enforcing lock acquisition order, can alleviate these issues.
Absent more details, I would probably go with one lock, since implementation is simpler and monitor performance of my application closely. Specifically there are .NET performance counters related to lock contention which can help diagnose/detect lock contention related perf issues.

As with all performance related answers I'd like to refer to this excepional blog post by Eric Lippert, it depends. Have a look at his six questions, what are the answers in your case? Try what happens during your conditions.
Number of cores, contention, caching etc, all matters, so see what happens for you in your case, it's really impossible to know beforehand.
For those not clicking on the link; run them horses!
I'm not talking about performance as in speed here, but rather as in what happens when the application has been running for a while. According to Lock (Monitor) internal implementation in .NET the Monitor implementation is quite smart in .NET, so the having internal locks for each object might seem a viable approach, since you said objects in the tens of thousands and not millions.
Bottom line: does too many locks hurt locking or it has no impact?
Not on it's own, but it might be a reason to have a look at the architecture of your program, having a gazillion objects locked at the same time will cause overhead though.

Thread Safety General Rules

A few questions about thread safety that I think I understand, but would like clarification on, if you could be so kind. The specific languages I program in are C++, C#, and Java. Hopefully keep these in mind when describing specific language keywords/features.
1) Cases of 1 writer, n readers. In cases such as n threads reading a variable, such as in a polled loop, and 1 writer updating this variable, is explicit locking required?
Consider:
// thread 1.
volatile bool bWorking = true;
void stopWork() { bWorking = false; }
// thread n
while (bWorking) {...}
Here, should it be enough to just have a memory barrier, and accomplish this with volatile? Since as I understand, in my above mentioned languages, simple reads and writes to primitives will not be interleaved so explicit locking is not required, however memory consistency cannot be guaranteed without some explicit lock, or volatile. Are my assumptions correct here?
2) Assuming my assumption above is correct, then it is only correct for simple reads and writes. That is bWorking = x... and x = bWorking; are the ONLY safe operations? IE complex assignments such as unary operators (++, --) are unsafe here, as are +=, *=, etc... ?
3) I assume if case 1 is correct, then it is not safe to expand that statement to also be safe for n writers and n readers when only assignment and reading is involved?

For Java:
1) a volatile variable is updated from/to the "main memory" on each reading writing, which means that the change by the updater thread will be seen by all reading threads on their next read. Also, updates are atomic (independent of variable type).
2) Yes, combined operations like ++ are not thread safe if you have multiple writers. For a single writing thread, there is no problem. (The volatile keyword makes sure that the update is seen by the other threads.)
3) As long as you only assign and read, volatile is enough - but if you have multiple writers, you can't be sure which value is the "final" one, or which will be read by which thread. Even the writing threads themselves can't reliably know that their own value is set. (If you only have boolean and will only set from true to false, there is no problem here.)
If you want more control, have a look at the classes in the java.util.concurrent.atomic package.

Do the locking. You are going to need to have locking anyway if you are writing multi-threaded code. C# and Java make it fairly simple. C++ is a little more complex but you should be able to use boost or make your own RAII classes. Given that you are going to be locking all over the place don't try to see if there are a few places where you might be able to avoid it. All will work fine until you run the code on a 64-way processor using new INtel microcode on a Tuesday in march on some misison critical customer system. Then bang.
People think that locks are expensive; they really aren't. The kernel devs spend a lot of time optimizing them and compared to one disk read they are utterly trivial; yet nobody ever seems to expend this much effort analyzing every last disk read
Add the usual statements about performance tuning evils, wise saying from Knuth, Spolsky ...... etc, etc,

For C++
1) This is tempting to try, and will usually work. However, a few things to keep in mind:
You're doing it with a boolean, so that seems safest. Other POD types might nor be so safe. E.g. it may take two instructions to set a 64-bit double on a 32-bit machine. So that would clearly be not thread safe.
If the boolean is the only thing you care about the threads sharing, this could work. If you're using it as a variant of the Double-Checked Lock Paradigm, you run into all the pitfalls therein. Consider:
std::string failure_message; // shared across threads
// some thread triggers the stop, and also reports why
failure_message = "File not found";
stopWork();
// all the other threads
while (bWorking) {...}
log << "Stopped work: " << failure_message;
This looks ok at first, because failure_message is set before bWorking is set to false. However, that may not be the case in practice. The compiler can rearrange the statements, and set bWorking first, resulting in thread unsafe access of failure_message. Even if the compiler doesn't, the hardware might. Multi-core cpus have their own caches, and thus things aren't quite so simple.
If it's just a boolean, it's probably ok. If it's more than that, it might have issues once in a while. How important is the code you're writing, and can you take that risk?
2) Correct, ++/--, +=, other operators will take multiple cpu instructions and will be thread unsafe. Depending on your platform and compiler, you may be able to write non-portable code to do atomic increments.
3) Correct, this would be unsafe in a general case. You can kinda squeak by when you have one thread, writing a single boolean once. As soon as you introduce multiple writes, you'd better have some real thread synchronization.
Note about cpu instructions
If an operation takes multiple instructions, your thread could be preempted between them -- and the operation would be partially complete. This is clearly bad for thread safety, and this is one reason why ++, +=, etc are not thread safe.
However, even if an operation takes a single instruction, that does not necessarily mean that it's thread safe. With multi-core and multi-cpu you have to worry about the visibility of a change -- when is the cpu cache flushed to main memory.
So while multiple instructions does imply not thread safe, it's false to assume that single instruction implies thread safe

With a 1-byte bool, you might be able to get away without using locking, but since you cannot guarantee the internals of the processor it'd still be a bad idea. Certainly with anything beyond 1 byte such as an integer you couldn't. One processor could be updating it while another was reading it on another thread, and you could get inconsistent results. In C# I would use a lock { } statement around the access (read or write) to bWorking. If it was something more complex, for example IO access to a large memory buffer, I'd use ReaderWriterLock or some variant of it. In C++ volatile won't help much, because that just prevents certain kinds of optimizations such as register variables which would totally cause problems in multithreading. You still need to use a locking construct.
So in summary I would never read and write anything in a multithreaded program without locking it somehow.

Updating a bool is going to be atomic on any sensible extant system. However, once your writer has written, there's no telling how long before your reader will read, especially once you take into account multiple cores, caches, scheduler oddities, and so on.
Part of the problem with increments and decrements (++, --) and compound assignments (+=, *=) is that they are misleading. They imply something is happening atomically that is actually happening in several operations. But even simple assignments can be unsafe one you have stepped away from the purity of boolean variables. Guaranteeing that a write as simple as x=foo is atomic is up to the details of your platform.
I assume by thread safe, you mean that readers will always see a consistent object no matter what the writers do. In your example this will always be the case since booleans can only evaluate to two values, both valid, and the value is only transitions once from true to false. Thread safety is going to be more difficult in a more complicated scenario.

Multithreaded access to memory

Good morning,
Say I have some 6 different threads, and I want to share the same data with each of them at the same time. Can I make a class variable with the data I want to share and make each thread access that memory concurrently without performance downgrade, or is it preferable to pass a true copy of the data to each thread?
Thank you very much.

It depends entirely on the data;
if the data is immutable (or mutable but you don't actually mutate it), then chuck all the threads at it - great
if you need to mutate it, but no two threads will ever depend on the data mutated by another - great
if you need to mutate it, and there are conflicts but you can sensibly synchronize access to the data such that there is no risk of two threads deadlocking etc - great, but not always trivial
if it is not safe to make any assumptions, then a true clone of the data is the safest approach, but has the most overhead in terms of data duplication; if the data is cheap to copy, this may be fine - and indeed may outperform synchronization
if the threads do co-depend on each other, then you have no option other than to figure out some kind of sensibly locking strategy; again - to stress: deadlocks are a problem here - some ideas:
always provide a timeout when obtaining a lock
if you need to lock two items, it may help to try locking both eagerly (rather than locking one at the start, and the other after you've done lots of changes) - then you can simply release and re-take the locks, without having to either undo changes, or put the changes back into a particular state

lock vs AcquireReader & writer locks

I have found possible slowdown in my app so I would have two questions:
What is the real difference between simple locking on object and reader/writer locks?
E.g. I have a collection of clients, that change quickly. For iterations should I use readerlock or the simple lock is enough?
In order to decrease load, I have left iteration (only reading) of one collection without any locks. This collection changes often and quickly, but items are added and removed with writerlocks. Is it safe (I dont mind occassionally skipped item, this method runs in loop and its not critical) to left this reading unsecured by lock? I just dont want to have random exceptions.

No, your current scenario is not safe.
In particular, if a collection changes while you're iterating over it, you'll get an InvalidOperationException in the iterating thread. You should obtain a reader lock for the whole duration of your iterator:
Obtain reader lock
Iterate over collection
Release reader lock
Note this is not the same as obtaining a reader lock for each step of the iteration - that won't help.
As for the difference between reader/writer locks and "normal" locks - the idea of a reader/writer lock is that multiple threads can read at the same time, but only one thread can write (and only when no-one is reading). In some cases this can improve performance - but it increases the complexity of the solution too (in terms of getting it right). I'd also advise you to use ReaderWriterLockSlim from .NET 3.5 if you possibly can - it's much more efficient than the original ReaderWriterLock, and there are some inherent problems with ReaderWriterLock IIRC.
Personally I normally use simple locks until I've proved that lock contention is a performance bottleneck. Have you profiled your application yet to find out where the bottleneck is?

Ok first about the reading iteration without locks thing. It's not safe, and you shouldn't do it. Just to illustrate the point in the most simple way - you're iterating through a collection but you never know how many items are in that collection and have no way to find out. Where do you stop? Checking the count every iteration doesn't help because it can change after you check it but before you get the element.
ReaderWriterLock is designed for a situation where you allow multiple threads have concurrent read access, but force synchronous write. From the sounds of your application you don't have multiple concurrent readers, and writes are just as common as reads, so the ReaderWriterLock provides no benefit. You'd be better served by classic locking in this case.
In general whatever tiny performance benefits you squeeze out of not locking access to shared objects with multithreading are dramatically offset by random weirdness and unexplainable behavior. Lock everything that is shared, test the application, and then when everything works you can run a profiler on it, check just how much time the app is waiting on locks and then implement some dangerous trickery if needed. But chances are the impact is going to be small.
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified” - Donald Knuth

Why volatile is not enough?

I'm confused. Answers to my previous question seems to confirm my assumptions. But as stated here volatile is not enough to assure atomicity in .Net. Either operations like incrementation and assignment in MSIL are not translated directly to single, native OPCODE or many CPUs can simultaneously read and write to the same RAM location.
To clarify:
I want to know if writes and reads are atomic on multiple CPUs?
I understand what volatile is about. But is it enough? Do I need to use interlocked operations if I want to get latest value writen by other CPU?

Herb Sutter recently wrote an article on volatile and what it really means (how it affects ordering of memory access and atomicity) in the native C++. .NET, and Java environments. It's a pretty good read:
volatile vs. volatile

volatile in .NET does make access to the variable atomic.
The problem is, that's often not enough. What if you need to read the variable, and if it is 0 (indicating that the resource is free), you set it to 1 (indicating that it's locked, and other threads should stay away from it).
Reading the 0 is atomic. Writing the 1 is atomic. But between those two operations, anything might happen. You might read a 0, and then before you can write the 1, another thread jumps in, reads the 0, and writes an 1.
However, volatile in .NET does guarantee atomicity of accesses to the variable. It just doesn't guarantee thread safety for operations relying on multiple accesses to it. (Disclaimer: volatile in C/C++ does not even guarantee this. Just so you know. It is much weaker, and occasinoally a source of bugs because people assume it guarantees atomicity :))
So you need to use locks as well, to group together multiple operations as one thread-safe chunk. (Or, for simple operations, the Interlocked operations in .NET may do the trick)

I might be jumping the gun here but it sounds to me as though you're confusing two issues here.
One is atomicity, which in my mind means that a single operation (that may require multiple steps) should not come in conflict with another such single operation.
The other is volatility, when is this value expected to change, and why.
Take the first. If your two-step operation requires you to read the current value, modify it, and write it back, you're most certainly going to want a lock, unless this whole operation can be translated into a single CPU instruction that can work on a single cache-line of data.
However, the second issue is, even when you're doing the locking thing, what will other threads see.
A volatile field in .NET is a field that the compiler knows can change at arbitrary times. In a single-threaded world, the change of a variable is something that happens at some point in a sequential stream of instructions so the compiler knows when it has added code that changes it, or at least when it has called out to outside world that may or may not have changed it so that once the code returns, it might not be the same value it was before the call.
This knowledge allows the compiler to lift the value from the field into a register once, before a loop or similar block of code, and never re-read the value from the field for that particular code.
With multi-threading however, that might give you some problems. One thread might have adjusted the value, and another thread, due to optimization, won't be reading this value for some time, because it knows it hasn't changed.
So when you flag a field as volatile you're basically telling the compiler that it shouldn't assume that it has the current value of this at any point, except for grabbing snapshots every time it needs the value.
Locks solve multiple-step operations, volatility handles how the compiler caches the field value in a register, and together they will solve more problems.
Also note that if a field contains something that cannot be read in a single cpu-instruction, you're most likely going to want to lock read-access to it as well.
For instance, if you're on a 32-bit cpu and writing a 64-bit value, that write-operation will require two steps to complete, and if another thread on another cpu manages to read the 64-bit value before step 2 has completed, it will get half of the previous value and half of the new, nicely mixed together, which can be even worse than getting an outdated one.
Edit: To answer the comment, that volatile guarantees the atomicity of the read/write operation, that's well, true, in a way, because the volatile keyword cannot be applied to fields that are larger than 32-bit, in effect making the field single-cpu-instruction read/writeable on both 32 and 64-bit cpu's. And yes, it will prevent the value from being kept in a register as much as possible.
So part of the comment is wrong, volatile cannot be applied to 64-bit values.
Note also that volatile has some semantics regarding reordering of reads/writes.
For relevant information, see the MSDN documentation or the C# specification, found here, section 10.5.3.

On a hardware level, multiple CPUs can never write simultanously to the same atomic RAM location. The size of an atomic read/write operation dependeds on CPU architecture, but is typically 1, 2 or 4 bytes on a 32-bit architecture. However, if you try reading the result back there is always a chance that another CPU has made a write to the same RAM location inbetween. On a low level, spin-locks are typically used to synchronize access to shared memory. In a high level language, such mechanisms may be called e.g. critical regions.
The volatile type just makes sure the variable is written immediately back to memory when it is changed (even if the value is to be used in the same function). A compiler will usually keep a value in an internal register for as long as possible if the value is to be reused later in the same function, and it is stored back to RAM when all modifications are finished or when a function returns. Volatile types are mostly useful when writing to hardware registers, or when you want to be sure a value is stored back to RAM in e.g. a multithread system.

Your question doesn't entirely make sense, because volatile specifies the how the read happens, not atomicity of multi-step processes. My car doesn't mow my lawn, either, but I try not to hold that against it. :)

The problem comes in with register based cashed copies of your variable's values.
When reading a value, the cpu will first see if it's in a register (fast) before checking main memory (slower).
Volatile tells the compiler to push the value out to main memory asap, and not to trust the cached register value. It's only useful in certain cases.
If you're looking for single op code writes, you'll need to use the Interlocked.Increment related methods.. But they're fairly limited in what they can do in a single safe instruction.
Safest and most reliable bet is to lock() (if you can't do an Interlocked.*)
Edit: Writes and reads are atomic if they're in a lock or an interlocked.* statement. Volatile alone is not enough under the terms of your question

Volatile is a compiler keyword that tells the compiler what to do. It does not necessarily translate into (essentially) bus operations that are required for atomicity. That is usually left up to the operating system.
Edit: to clarify, volatile is never enough if you want to guarantee atomicity. Or rather, it's up to the compiler to make it enough or not.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.