What is non-thread-safety for? - c#

There are a lot of articles and discussions explaining why it is good to build thread-safe classes. It is said that if multiple threads access e.g. a field at the same time, there can only be some bad consequences. So, what is the point of keeping non thread-safe code? I'm focusing mostly on .NET, but I believe the main reasons are not language-dependent.
E.g. .NET static fields are not thread-safe. What would be the result if they were thread-safe by default? (without a need to perform "manual" locking). What are the benefits of using (actually defaulting to) non-thread-safety?
One thing that comes to my mind is performance (more of a guess, though). It's rather intuitive that, when a function or field doesn't need to be thread-safe, it shouldn't be. However, the question is: what for? Is thread-safety just an additional amount of code you always need to implement? In what scenarios can I be 100% sure that e.g. a field won't be used by two threads at once?

Writing thread-safe code:
Requires more skilled developers
Is harder and consumes more coding efforts
Is harder to test and debug
Usually has bigger performance cost
But! Thread-safe code is not always needed. If you can be sure that some piece of code will be accessed by only one thread the list above becomes huge and unnecessary overhead. It is like renting a van when going to neighbor city when there are two of you and not much luggage.

Thread safety comes with costs - you need to lock fields that might cause problems if accessed simultaneously.
In applications that have no use of threads, but need high performance when every cpu cycle counts, there is no reason to have safe-thread classes.

So, what is the point of keeping non thread-safe code?
Cost. Like you assumed, there usually is a penalty in performance.
Also, writing thread-safe code is more difficult and time consuming.

Thread safety is not a "yes" or "no" proposition. The meaning of "thread safety" depends upon context; does it mean "concurrent-read safe, concurrent write unsafe"? Does it mean that the application just might return stale data instead of crashing? There are many things that it can mean.
The main reason not to make a class "thread safe" is the cost. If the type won't be accessed by multiple threads, there's no advantage to putting in the work and increase the maintenance cost.

Writing threadsafe code is painfully difficult at times. For example, simple lazy loading requires two checks for '== null' and a lock. It's really easy to screw up.
[EDIT]
I didn't mean to suggest that threaded lazy loading was particularly difficult, it's the "Oh and I didn't remember to lock that first!" moments that come fast and hard once you think you're done with the locking that are really the challenge.

There are situations where "thread-safe" doesn't make sense. This consideration is in addition to the higher developer skill and increased time (development, testing, and runtime all take hits).
For example, List<T> is a commonly-used non-thread-safe class. If we were to create a thread-safe equivalent, how would we implement GetEnumerator? Hint: there is no good solution.

Turn this question on its head.
In the early days of programming there was no Thread-Safe code because there was no concept of threads. A program started, then proceeded step by step to the end. Events? What's that? Threads? Huh?
As hardware became more powerful, concepts of what types of problems could be solved with software became more imaginative and developers more ambitious, the software infrastructure became more sophisticated. It also became much more top-heavy. And here we are today, with a sophisticated, powerful, and in some cases unnecessarily top-heavy software ecosystem which includes threads and "thread-safety".
I realize the question is aimed more at application developers than, say, firmware developers, but looking at the whole forest does offer insights into how that one tree evolved.

So, what is the point of keeping non thread-safe code?
By allowing for code that isn't thread safe you're leaving it up to the programmer to decide what the correct level of isolation is.
As others have mentioned this allows for complexity reduction and improved performance.
Rico Mariani wrote two articles entitled "Putting your synchronization at the correct level" and
Putting your synchronization at the correct level -- solution that have a nice example of this in action.
In the article he has a method called DoWork(). In it he calls other classes Read twice Write twice and then LogToSteam.
Read, Write, and LogToSteam all shared a lock and were thread safe. This is good except for the fact that because DoWork was also thread safe all the synchronizing work in each Read, Write and LogToSteam was a complete waste of time.
This is all related to the nature Imperative Programming. Its side effects cause the need for this.
However if you had an development platform where applications could be expressed as pure functions where there were no dependencies or side effects then it would be possible to create applications where the threading was managed without developer intervention.

So, what is the point of keeping non thread-safe code?
The rule of thumb is to avoid locking as much as possible. The Ideal code is re-entrant and thread safe with out any locking. But that would be utopia.
Coming back to reality, a good programmer tries his level best to have a sectional locking as opposed to locking the entire context. An example would be to lock few lines of code at a time in various routines than locking everything in a function.
So Also, one has to refactor the code to come up with a design that would minimize the locking if not get rid of it in entirity.
e.g. consider a foobar() function that gets new data on each call and uses switch() case on a type of data to changes a node in a tree. The locking can be mostly avoided (if not completely) As each case statement would touch a different node in a tree. This may be a more specific example but i think it elaborates my point.

Related

Multithreading application and Threadsafe Lists

I've been working in a multithread application and I'm still trying to figure the best/most efficient way to deal with a List that's being used and changed by multiple threads.
I've seen that writting a Threadsafe class for a List is not really the best option and to be honest I find all the locking somewhat messy.
I've thought about converting it into a ConcurrentDictionary as I've been using these and they seem to behave really good.
However, I've tried a different approach and would like to hear some opinions on this if it's or not a good option to take:
if(MyList.Count > 0)
{
MyStruct[] Example = null;
Example = new MyStruct[MyList.Count];
MyList.CopyTo(Example, 0);
foreach(MyStruct B in Example)
//Code here
}
This is just something that I tried and seemed to work without really having to make changes anywhere else. I'm not sure if I should even be doing this, that's why I'm looking for some opinions on this
No, this is not thread-safe.
Consider two threads for simplicity.
Thread A creates the list with 10 items
Thread B sees the list has 10 items, and creates an array for 10 items
Thread A adds an item to the list, so it now has 11 items
Thread B crashes on CopyTo, since the array isn't big enough
And that's one of the sanest things that can happen.
Don't mess around with multi-threading at random. It's messy, dangerous and you'll be left with plenty of bugs that are hard to reproduce and fix. Unless something is explicitly said to be thread-safe, don't assume any thread-safety at all. Obligatory starter on multi-threading: http://www.albahari.com/threading/
The usual checklist goes something like this:
Is it really necessary to have multi-threading in the first place?
Are you sure sure?
Are there thread-safe classes that do exactly what you need?
Can you use a simple, consistent locking architecture that is guaranteed not to deadlock?
Are you really sure you need to share that object between multiple threads?
Seriously, multi-threading is hard. Could you perhaps do with immutable data that's explicitly passed between the threads, rather than having shared mutable state?
Find the simplest correct way you can handle the synchronization between the threads.
Is it good enough? Good, stop.
It's not good enough? Consider alternate approaches to data sharing.
If you still have a bottle-neck on a shared resource, consider lock-less programming. This is much, much harder than lock-based synchronization. Make sure you know what you're doing. Even the people who designed C#/.NET are very wary of lock-less programming. Even Raymond Chen, and that's the Chuck Norris of software engineering. There be lions. You need perfect understanding of everything that is and isn't guaranteed, and what is safe on your platform and what's common to all the platforms.
You don't need to create an array of structs. Just use the CopyTo with another List or ToArray.
Additionaly, you can use BlockingCollection which is thread-safe for Producer Consumer operations.
https://msdn.microsoft.com/en-us/library/dd267312(v=vs.110).aspx

Does locking with many different objects has an impact performance wise comparing to only one?

I don't know if the question is stupid or not, locking and the Monitor is kind a black box to me.
But I'm dealing with a situation where I can either use the same lock object to lock everything all the time or use a indefinite number of object to lock at a more fine grain level.
I know that the second way will reduce the lock contention, but I may end up using 10K objects as locks and I don't know if it has an impact or not.
Bottom line: does too many locks hurt locking or it has no impact?
Edit
I wrote a lib that maintain a graph of objects, the number could be very high. For now it's not thread safe, mainly for the reason Eric stated in his comment.
I initially thought that if the user wanted to do some multi-threading then he/she would have to take care of the locking.
But now I'm wondering that if I would have to make it thread-safe, what would be the best way to do it (note that making it thread-safe wouldn't be a short and easy ride for me so testing both solutions is something I can't do easily)?
As the purpose is to make each object of the graph thread-safe, then I could use the instance of the object for the lock when I want to access/modify its properties. I know it's the best way to reduce contention, but I don't know if it would scale as much as having only one lock for the whole graph.
I know there's a lot to consider, how many threads and especially (I think) the chance of an object being accessed/changed by multiple threads at a time (which I estimate to be pretty low). But I can't find accurate information about locks and their overhead in such case.
To get a more clearer view of what's going on I looked at the source code of the Monitor class and its C++ counterpart in clr/src/vm/syncblk.cpp in the Shared Source Common Language Infrastructure released by Microsoft.
To answer my own question: no, having a lot of locks doesn't hurt in any harmful way I could think of.
What I learned:
1) A lock that's is already taken by the same thread is processed "almost free".
2) A lock that's taken for the first time is basically the cost of an InterlockedCompareExchange.
3) Multiple threads waiting for a lock is fairly cheap to track (a link list is maintained, O(1) complexity).
4) A thread waiting for a lock to release is by far the most costly use case, the implem first spinwaits to try to get out, but if it's not enough a thread switch will occurs, putting the thread to sleep until a mutex signals it's time to wake up because of the lock release.
I got my answer by digging for the 2): if you're always locking with the same object or 10K different one, it's basically the same (extra initialization is performed the first time you lock a given object, but it's not too bad). The InterlockedCompareExchange doesn't care about being called on the same or different memory location (AFAIK).
Contention is by far the most critical concern. Having many locks would reduce (drastically in my case) the chance of contention, so it can only be a good thing.
1) is also an important learned lesson: if I lock/unlock for each property change/access I can improve performances by locking the object first, then changing many properties and release the lock. This way there will be only one InterlockedCompareExchange and the lock/unlock inside the implementation of the property change/access will only increment an internal counter.
To dig deeper I would have to find more information about the implementation of the InterlockedCompareExchange, I think it relies on the CPU specific assembly instruction...
Typically, performance concerns around locking are related to contention. Acquiring an uncontested lock is on the order of 10s of nanoseconds. Contention is the real performance killer. As you point out, having more locks (higher lock granularity) can improve performance by decreasing contention.
The drawback to having multiple locks is typically lock management must be more complex. If multiple locks are required to perform an operation there is the increased possibility of resource starvation issues like deadlock or livelock. Proper lock management, such as enforcing lock acquisition order, can alleviate these issues.
Absent more details, I would probably go with one lock, since implementation is simpler and monitor performance of my application closely. Specifically there are .NET performance counters related to lock contention which can help diagnose/detect lock contention related perf issues.
As with all performance related answers I'd like to refer to this excepional blog post by Eric Lippert, it depends. Have a look at his six questions, what are the answers in your case? Try what happens during your conditions.
Number of cores, contention, caching etc, all matters, so see what happens for you in your case, it's really impossible to know beforehand.
For those not clicking on the link; run them horses!
I'm not talking about performance as in speed here, but rather as in what happens when the application has been running for a while. According to Lock (Monitor) internal implementation in .NET the Monitor implementation is quite smart in .NET, so the having internal locks for each object might seem a viable approach, since you said objects in the tens of thousands and not millions.
Bottom line: does too many locks hurt locking or it has no impact?
Not on it's own, but it might be a reason to have a look at the architecture of your program, having a gazillion objects locked at the same time will cause overhead though.

Generating cacheable data exactly once when needed and blocking otherwise?

I'm making a cool (imo) T4 template which will make caching a lot easier. One of the options I have in making this template is to allow for a "load once" type functionality, though I'm not sure how safe it is.
Basically, I want to make it so you can do something like this:
var post=MyCache.PostsCache.GetOrLockLoad(id, ()=>LoadPost(id));
and basically make it so that when the cache must be loaded, it will place a blocking lock across PostsCache. This way, other threads would block until the LoadPost() function is done. This makes it so that LoadPost will only be executed once per cache miss. The traditional way of doing this is LoadPost will be executed anytime the cache is empty, possibly multiple times if multiple requests for it come before the cache is loaded the first time.
Is this a reasonable thing to do, or is blocking other threads for something like this dangerous or wasteful? I'm thinking something along the lines that the thread locking overheads are greater than most operations, but maybe not?
Has anyone seen this kind of thing done and is it a good idea or just dangerous?
Also, although it's designed to run on any cache and application type, it's initially being targetted at ASP.Net's built in caching mechanism.
This seems ok, since in theory the requests after the first will only wait about as long as it would have taken for them to load the data themselves anyway.
But it still feels a bit iffy - what if the first loader thread gets held up due to some intermittent issue that may not affect other threads. It feels like it would be safer to let each thread try the load independently.
It's also adding the complexity and overhead of the locking mechanisms. Keep in mind the more locking you do, the more risk you introduce of getting a deadlock condition (in general). Although in your case, as long as there's no funky locking going on in the LoadPost method it shouldn't be an issue.
Given the risks, I think you would be better off going with a non-locking option.
After all, for any given thread the wait time is pretty much the same - either the time taken to load, or the time spent waiting for the first thread to load.
I'm always a little uncomfortable when a non-concurrent option is used over a concurrent one, especially if the gain seems marginal.

A lock-free priority queue in C#

I have been searching lately for information on how to construct a lock-free priority queue in C#. I have yet to even find an implementation in any language, or a decent paper on the matter. I have found several papers which appear to be copies or at least referencing one particular paper which is not actually a paper on lock free priority queues, despite its name; it is in fact a paper on a priority queue which uses fine grained locks.
The responses I have been receiving from elsewhere include "use a single thread" and "you do not need it to be lock free" and "it is impossible". All three of these responses are incorrect.
If someone has some information on this, I would greatly appreciate it.
Generally, it's a bad idea to write this kind of code yourself.
However, if you really want to write this kind of code, I say take a page from Eric Lippert's book (or blog, as it were) (web archive link), where basically, you would implement the queue but instead of having all the functions that make modifications on the queue modify the instance you call the method on, the methods return completely new instances of the queue.
This is semantically similar to the pattern that System.String uses to maintain immutability; all operations return a new System.String, the original is not modified.
The result of this is that you are forced to reassign the reference returned on every call. Because the assignments of references are atomic operations, there is no concern about thread-safety; you are guaranteed that the reads/writes will be atomic.
However, this will result in a last-in-wins situation; it's possible that multiple modifications are being made to the queue, but only the last assignment will hold, losing the other insertions into the queue.
This might be acceptable; if not, you have to use synchronization around the assignment and reading of the reference. You will still have a lock-free-priority queue, but if you have concerns about thread-safety and maintaining the integrity of the operations, you have done nothing but move the concern about synchronization outside of the data structure (which is almost all cases, is a good thing, as it gives you fine-grained explicit control).
The Art of Multiprocessor Programming. Look at Chapter 15 - Priority Queues. Book is in Java, but can be easily translated to C# since they both have GC (which is important for most implementations in the book).

Multicore programming: the hard parts

I'm writing a book on multicore programming using .NET 4 and I'm curious to know what parts of multicore programming people have found difficult to grok or anticipate being difficult to grok?
What's a useful unit of work to parallelize, and how do I find/organize one?
All these parallelism primitives aren't helpful if you fork a piece of work that is smaller than the forking overhead; in fact, that buys you a nice slowdown instead of what you are expecting.
So one of the big problems is finding units of work that are obviously more expensive than the parallelism primitives. A key problem here is that nobody knows what anything costs to execute, including the parallelism primitives themselves. Clearly calibrating these costs would be very helpful. (As an aside, we designed, implemented, and daily use a parallel programming langauge, PARLANSE whose objective was to minimize the cost of the parallelism primitives by allowing the compiler to generate and optimize them, with the goal of making smaller bits of work "more parallelizable").
One might also consider discussion big-Oh notation and its applications. We all hope that the parallelism primitives have cost O(1). If that's the case, then if you find work with cost O(x) > O(1) then that work is a good candidate for parallelization. If your proposed work is also O(1), then whether it is effective or not depends on the constant factors and we are back to calibration as above.
There's the problem of collecting work into large enough units, if none of the pieces are large enough. Code motion, algorithm replacement, ... are all useful ideas to achieve this effect.
Lastly, there's the problem of synchnonization: when do my parallel units have to interact, what primitives should I use, and how much do those primitives cost? (More than you expect!).
I guess some of it depends on how basic or advanced the book/audience is. When you go from single-threaded to multi-threaded programming for the first time, you typically fall off a huge cliff (and many never recover, see e.g. all the muddled questions about Control.Invoke).
Anyway, to add some thoughts that are less about the programming itself, and more about the other related tasks in the software process:
Measuring: deciding what metric you are aiming to improve, measuring it correctly (it is so easy to accidentally measure the wrong thing), using the right tools, differentiating signal versus noise, interpreting the results and understanding why they are as they are.
Testing: how to write tests that tolerate unimportant non-determinism/interleavings, but still pin down correct program behavior.
Debugging: tools, strategies, when "hard to debug" implies feedback to improve your code/design and better partition mutable state, etc.
Physical versus logical thread affinity: understanding the GUI thread, understanding how e.g. an F# MailboxProcessor/agent can encapsulate mutable state and run on multiple threads but always with only a single logical thread (one program counter).
Patterns (and when they apply): fork-join, map-reduce, producer-consumer, ...
I expect that there will be a large audience for e.g. "help, I've got a single-threaded app with 12% CPU utilization, and I want to learn just enough to make it go 4x faster without much work" and a smaller audience for e.g. "my app is scaling sub-linearly as we add cores because there seems to be contention here, is there a better approach to use?", and so a bit of the challenge may be serving each of those audiences.
Since you write a whole book for multi-core programming in .Net.
I think you can also go beyond multi-core a little bit.
For example, you can use a chapter talking about parallel computing in a distributed system in .Net. Unlikely, there is no mature frameworks in .Net yet. DryadLinq is the closest. (On the other side, Hadoop and its friends in Java platform are really good.)
You can also use a chapter demonstrating some GPU computing stuff.
One thing that has tripped me up is which approach to use to solve a particular type of problem. There's agents, there's tasks, async computations, MPI for distribution - for many problems you could use multiple methods but I'm having difficulty understanding why I should use one over another.
To understand: low level memory details like the difference between acquire and release semantics of memory.
Most of the rest of the concepts and ideas (anything can interleave, race conditions, ...) are not that difficult with a little usage.
Of course the practice, especially if something is failing sometimes, is very hard as you need to work at multiple levels of abstraction to understand what is going on, so keep your design simple and as far as possible design out the need for locking etc. (e.g. using immutable data and higher level abstractions).
Its not so much theoretical details, but more the practical implementation details which trips people up.
What's the deal with immutable data structures?
All the time, people try to update a data structure from multiple threads, find it too hard, and someone chimes in "use immutable data structures!", and so our persistent coder writes this:
ImmutableSet set;
ThreadLoop1()
foreach(Customer c in dataStore1)
set = set.Add(ProcessCustomer(c));
ThreadLoop2()
foreach(Customer c in dataStore2)
set = set.Add(ProcessCustomer(c));
Coder has heard all their lives that immutable data structures can be updated without locking, but the new code doesn't work for obvious reasons.
Even if your targeting academics and experienced devs, a little primer on the basics of immutable programming idioms can't hurt.
How to partition roughly equal amounts of work between threads?
Getting this step right is hard. Sometimes you break up a single process into 10,000 steps which can be executed in parallel, but not all steps take the same amount of time. If you split the work on 4 threads, and the first 3 threads finish in 1 second, and the last thread takes 60 seconds, your multithreaded program isn't much better than the single-threaded version, right?
So how do you partition problems with roughly equal amounts of work between all threads? Lots of good heuristics on solving bin packing problems should be relevant here..
How many threads?
If your problem is nicely parallelizable, adding more threads should make it faster, right? Well not really, lots of things to consider here:
Even a single core processor, adding more threads can make a program faster because more threads gives more opportunities for the OS to schedule your thread, so it gets more execution time than the single-threaded program. But with the law of diminishing returns, adding more threads increasing context-switching, so at a certain point, even if your program has the most execution time the performance could still be worse than the single-threaded version.
So how do you spin off just enough threads to minimize execution time?
And if there are lots of other apps spinning up threads and competing for resources, how do you detect performance changes and adjust your program automagically?
I find the conceptions of synchronized data moving across worker nodes in complex patterns very hard to visualize and program.
Usually I find debugging to be a bear, also.

Categories