I have found possible slowdown in my app so I would have two questions:
What is the real difference between simple locking on object and reader/writer locks?
E.g. I have a collection of clients, that change quickly. For iterations should I use readerlock or the simple lock is enough?
In order to decrease load, I have left iteration (only reading) of one collection without any locks. This collection changes often and quickly, but items are added and removed with writerlocks. Is it safe (I dont mind occassionally skipped item, this method runs in loop and its not critical) to left this reading unsecured by lock? I just dont want to have random exceptions.
No, your current scenario is not safe.
In particular, if a collection changes while you're iterating over it, you'll get an InvalidOperationException in the iterating thread. You should obtain a reader lock for the whole duration of your iterator:
Obtain reader lock
Iterate over collection
Release reader lock
Note this is not the same as obtaining a reader lock for each step of the iteration - that won't help.
As for the difference between reader/writer locks and "normal" locks - the idea of a reader/writer lock is that multiple threads can read at the same time, but only one thread can write (and only when no-one is reading). In some cases this can improve performance - but it increases the complexity of the solution too (in terms of getting it right). I'd also advise you to use ReaderWriterLockSlim from .NET 3.5 if you possibly can - it's much more efficient than the original ReaderWriterLock, and there are some inherent problems with ReaderWriterLock IIRC.
Personally I normally use simple locks until I've proved that lock contention is a performance bottleneck. Have you profiled your application yet to find out where the bottleneck is?
Ok first about the reading iteration without locks thing. It's not safe, and you shouldn't do it. Just to illustrate the point in the most simple way - you're iterating through a collection but you never know how many items are in that collection and have no way to find out. Where do you stop? Checking the count every iteration doesn't help because it can change after you check it but before you get the element.
ReaderWriterLock is designed for a situation where you allow multiple threads have concurrent read access, but force synchronous write. From the sounds of your application you don't have multiple concurrent readers, and writes are just as common as reads, so the ReaderWriterLock provides no benefit. You'd be better served by classic locking in this case.
In general whatever tiny performance benefits you squeeze out of not locking access to shared objects with multithreading are dramatically offset by random weirdness and unexplainable behavior. Lock everything that is shared, test the application, and then when everything works you can run a profiler on it, check just how much time the app is waiting on locks and then implement some dangerous trickery if needed. But chances are the impact is going to be small.
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified” - Donald Knuth
Related
I have a persistent B+tree, multiple threads are reading different chunks of the tree and performing some operations on read data. Interesting part: each thread produces a set of results, and as end user I want to see all the results in one place. What I do: one ConcurentDictionary and all threads are writing to it.
Everything works smooth this way. But the application is time critical, one extra second means a total dissatisfaction. ConcurentDictionary because of the thread-safety overhead is intrinsically slow compared to Dictionary.
I can use Dictionary, then each thread will write results to distinct dictionaries. But then I'll have the problem of merging different dictionaries.
.
My Questions:
Are concurrent collections a good decision for my scenario ?
If Not(1), then how would I merge optimally different dictionaries. Given that, (a) copying items one-by-one and (b) LINQ are known solutions and are not as optimal as expected :)
If Not(2) ;-) What would you suggest instead ?
.
A quick info:
#Thread = processorCount. The application can run on a standard laptop (i.e., 4 threads) or high-end server (i.e., <32 threads)
Item Count. The tree usually holds more than 1.0E+12 items.
From your timings it seems that the locking/building of the result dictionary is taking 3700ms per thread with the actual processing logic taking just 300ms.
I suggest that as an experiment you let each thread create its own local dictionary of results. Then you can see how much time is spent building the dictionary compared to how much is the effect of locking across threads.
If building the local dictionary adds more than 300ms then it will not be possible to meet your time limit. Because without any locking or any attempt to merge the results it has already taken too long.
Update
It seems that you can either pay the merge price as you go along, with the locking causing the threads to sit idle for a significant percentage of time, or pay the price in a post-processing merge. But the core problem is that the locking means you are not fully utilising the available CPU.
The only real solution to getting maximum performance from your cores is it use a non-blocking dictionary implementation that is also thread safe. I could not find a .NET implementation but did find a research paper detailing an algorithm that would indicate it is possible.
Implementing such an algorithm correctly is not trivial but would be fun!
Scalable and Lock-Free Concurrent Dictionaries
Had you considered async persistence?
Is it allowed in your scenario?
You can bypass to a queue in a separated thread pool (creating a thread pool would avoid the overhead of creating a (sub)thread for each request), and there you can handle the merging logic without affecting response time.
I don't know if the question is stupid or not, locking and the Monitor is kind a black box to me.
But I'm dealing with a situation where I can either use the same lock object to lock everything all the time or use a indefinite number of object to lock at a more fine grain level.
I know that the second way will reduce the lock contention, but I may end up using 10K objects as locks and I don't know if it has an impact or not.
Bottom line: does too many locks hurt locking or it has no impact?
Edit
I wrote a lib that maintain a graph of objects, the number could be very high. For now it's not thread safe, mainly for the reason Eric stated in his comment.
I initially thought that if the user wanted to do some multi-threading then he/she would have to take care of the locking.
But now I'm wondering that if I would have to make it thread-safe, what would be the best way to do it (note that making it thread-safe wouldn't be a short and easy ride for me so testing both solutions is something I can't do easily)?
As the purpose is to make each object of the graph thread-safe, then I could use the instance of the object for the lock when I want to access/modify its properties. I know it's the best way to reduce contention, but I don't know if it would scale as much as having only one lock for the whole graph.
I know there's a lot to consider, how many threads and especially (I think) the chance of an object being accessed/changed by multiple threads at a time (which I estimate to be pretty low). But I can't find accurate information about locks and their overhead in such case.
To get a more clearer view of what's going on I looked at the source code of the Monitor class and its C++ counterpart in clr/src/vm/syncblk.cpp in the Shared Source Common Language Infrastructure released by Microsoft.
To answer my own question: no, having a lot of locks doesn't hurt in any harmful way I could think of.
What I learned:
1) A lock that's is already taken by the same thread is processed "almost free".
2) A lock that's taken for the first time is basically the cost of an InterlockedCompareExchange.
3) Multiple threads waiting for a lock is fairly cheap to track (a link list is maintained, O(1) complexity).
4) A thread waiting for a lock to release is by far the most costly use case, the implem first spinwaits to try to get out, but if it's not enough a thread switch will occurs, putting the thread to sleep until a mutex signals it's time to wake up because of the lock release.
I got my answer by digging for the 2): if you're always locking with the same object or 10K different one, it's basically the same (extra initialization is performed the first time you lock a given object, but it's not too bad). The InterlockedCompareExchange doesn't care about being called on the same or different memory location (AFAIK).
Contention is by far the most critical concern. Having many locks would reduce (drastically in my case) the chance of contention, so it can only be a good thing.
1) is also an important learned lesson: if I lock/unlock for each property change/access I can improve performances by locking the object first, then changing many properties and release the lock. This way there will be only one InterlockedCompareExchange and the lock/unlock inside the implementation of the property change/access will only increment an internal counter.
To dig deeper I would have to find more information about the implementation of the InterlockedCompareExchange, I think it relies on the CPU specific assembly instruction...
Typically, performance concerns around locking are related to contention. Acquiring an uncontested lock is on the order of 10s of nanoseconds. Contention is the real performance killer. As you point out, having more locks (higher lock granularity) can improve performance by decreasing contention.
The drawback to having multiple locks is typically lock management must be more complex. If multiple locks are required to perform an operation there is the increased possibility of resource starvation issues like deadlock or livelock. Proper lock management, such as enforcing lock acquisition order, can alleviate these issues.
Absent more details, I would probably go with one lock, since implementation is simpler and monitor performance of my application closely. Specifically there are .NET performance counters related to lock contention which can help diagnose/detect lock contention related perf issues.
As with all performance related answers I'd like to refer to this excepional blog post by Eric Lippert, it depends. Have a look at his six questions, what are the answers in your case? Try what happens during your conditions.
Number of cores, contention, caching etc, all matters, so see what happens for you in your case, it's really impossible to know beforehand.
For those not clicking on the link; run them horses!
I'm not talking about performance as in speed here, but rather as in what happens when the application has been running for a while. According to Lock (Monitor) internal implementation in .NET the Monitor implementation is quite smart in .NET, so the having internal locks for each object might seem a viable approach, since you said objects in the tens of thousands and not millions.
Bottom line: does too many locks hurt locking or it has no impact?
Not on it's own, but it might be a reason to have a look at the architecture of your program, having a gazillion objects locked at the same time will cause overhead though.
If I can guarantee myself that only one method in my entire app will ever write to a certain variable, then may I allow other methods in my app to safely read that value ?
If so, can I get away that stunt without locking the variable ?
In this context, what I'm doing (or, trying to do, or want to do) is for one method in one thread to put a value into the variable, and then other methods in other threads will read that value and make decisions.
A very nice option would be to lock against writes, while allowing reads.
Looked here MSDN page on lock and didn't see a way to do that.
As always, it depends a lot on the context.
a variable read in a tight loop may be stored in a register or local cache, so no change will be noticed unless you have a "fence"; volatile will fix this, but as a side-effect rather than by explicit intention; most people (including me) can't properly define what volatile means - so be very careful of using it as a "fix".
an oversize type (large struct) will not be atomic (for either read or write) - and cannot be handled safely without risk of tearing
an object or value might involve multiple sub-values; if they aren't changed atomically, it could cause problems
You might, however, find that Interlocked solves most of your problems without needing a lock. At the same time, an uncontested lock is insanely fast, and even a contested lock is still alarmingly fast. Frankly, I'm not sure that it is worth the thought you are giving it: a flat lock is almost certainly fast-enough, as long as you do the thinking first outside the lock, and only lock it when you know the changes you want to make.
There is also ReaderWriterLockSlim, but the number of cases where that actually improves performance is slim - in my experience, the simplest approach possible is usually the fastest, meaning either lock or Interlocked. ReaderWriterLockSlim is a more complex beast, designed for more complex scenarios, and has a little overhead because of it. Not massive amounts, but enough to make it worth looking carefully.
Is it necessary to lock LINQ statements as follows? If omitting the lock, any exceptions will be countered when multiple threads execute it concurrently?
lock (syncKey)
{
return (from keyValue in dictionary
where keyValue.Key > versionNumber
select keyValue.Value).ToList();
}
PS: Writer threads do exist to mutate the dictionary.
Most types are thread-safe to read, but not thread-safe during mutation.
If none of the threads is changing the dictionary, then you don't need to do anything - just read away.
If, however, one of the threads is changing it then you have problems and need to synchronize. The simplest approach is a lock, however this prevents concurrent readers even when there is no writer. If there is a good chance you will have more readers that writers, consider using a ReaderWriterLockSlim to synchronize - this will allow any number of readers (with no writer), or: one writer.
In 4.0 you might also consider a ConcurrentDictionary<,>
So long as the query has no side-effects (such as any of the expressions calling code that make changes) there there is no need to lock a LINQ statement.
Basically, if you don't modify the data (and nothing else is modifying the data you are using) then you don't need locks.
If you are using .NET 4.0 and there is a ConcurrentDictionary that is thread safe. Here is an example of using a concurrent dictionary (admittedly not in a LINQ statement)
UPDATE
If you are modifying data then you need to use locks. If two or more threads attempt to access a locked section of code there will be a small performance loss as one or more of the threads waits for the lock to be released. NOTE: If you over-lock then you may end up with worse performance that you would if you had just built the code using a sequential algorithm from the start.
If you are only ever reading data then you don't need locks as there is no mutable shared state to protect.
If you do not use locks then you may end up with intermittent bugs where the data is not quite right or exceptions are thrown when collisions occur between readers and writers. In my experience, most of the time you may never get an exception, you just get corrupt data (except you don't necessarily know it is corrupt). Here is another example showing how data can be corrupted if you don't use locks or redesign your algorithm to cope.
You often get the best out of a system if you consider the constraints of developing in a parallel system from the outset. Sometimes you can re-write your code so it uses no shared data. Sometime you can split the data up into chunks and have each thread/task work on its own chunk then have some process at the end stitch it all back together again.
If your dictionary is static and a method where you run the query is not (or another concurrent access scenarios), and dictionary can be modified from another thread, then yes, lock is required otherwise - is not.
Yes, you need to lock your shared resources when using LINQ in multi-threaded scenarios (EDIT: of course, if your source collection is being modified as Marc said, if you are only reading it, you don't need to worry about it). If you are using .Net 4 or the parallel extensions for 3.5 you could look at replacing your Dictionary with a ConcurrentDictionary (or use some other custom implementation anyway).
There are a lot of articles and discussions explaining why it is good to build thread-safe classes. It is said that if multiple threads access e.g. a field at the same time, there can only be some bad consequences. So, what is the point of keeping non thread-safe code? I'm focusing mostly on .NET, but I believe the main reasons are not language-dependent.
E.g. .NET static fields are not thread-safe. What would be the result if they were thread-safe by default? (without a need to perform "manual" locking). What are the benefits of using (actually defaulting to) non-thread-safety?
One thing that comes to my mind is performance (more of a guess, though). It's rather intuitive that, when a function or field doesn't need to be thread-safe, it shouldn't be. However, the question is: what for? Is thread-safety just an additional amount of code you always need to implement? In what scenarios can I be 100% sure that e.g. a field won't be used by two threads at once?
Writing thread-safe code:
Requires more skilled developers
Is harder and consumes more coding efforts
Is harder to test and debug
Usually has bigger performance cost
But! Thread-safe code is not always needed. If you can be sure that some piece of code will be accessed by only one thread the list above becomes huge and unnecessary overhead. It is like renting a van when going to neighbor city when there are two of you and not much luggage.
Thread safety comes with costs - you need to lock fields that might cause problems if accessed simultaneously.
In applications that have no use of threads, but need high performance when every cpu cycle counts, there is no reason to have safe-thread classes.
So, what is the point of keeping non thread-safe code?
Cost. Like you assumed, there usually is a penalty in performance.
Also, writing thread-safe code is more difficult and time consuming.
Thread safety is not a "yes" or "no" proposition. The meaning of "thread safety" depends upon context; does it mean "concurrent-read safe, concurrent write unsafe"? Does it mean that the application just might return stale data instead of crashing? There are many things that it can mean.
The main reason not to make a class "thread safe" is the cost. If the type won't be accessed by multiple threads, there's no advantage to putting in the work and increase the maintenance cost.
Writing threadsafe code is painfully difficult at times. For example, simple lazy loading requires two checks for '== null' and a lock. It's really easy to screw up.
[EDIT]
I didn't mean to suggest that threaded lazy loading was particularly difficult, it's the "Oh and I didn't remember to lock that first!" moments that come fast and hard once you think you're done with the locking that are really the challenge.
There are situations where "thread-safe" doesn't make sense. This consideration is in addition to the higher developer skill and increased time (development, testing, and runtime all take hits).
For example, List<T> is a commonly-used non-thread-safe class. If we were to create a thread-safe equivalent, how would we implement GetEnumerator? Hint: there is no good solution.
Turn this question on its head.
In the early days of programming there was no Thread-Safe code because there was no concept of threads. A program started, then proceeded step by step to the end. Events? What's that? Threads? Huh?
As hardware became more powerful, concepts of what types of problems could be solved with software became more imaginative and developers more ambitious, the software infrastructure became more sophisticated. It also became much more top-heavy. And here we are today, with a sophisticated, powerful, and in some cases unnecessarily top-heavy software ecosystem which includes threads and "thread-safety".
I realize the question is aimed more at application developers than, say, firmware developers, but looking at the whole forest does offer insights into how that one tree evolved.
So, what is the point of keeping non thread-safe code?
By allowing for code that isn't thread safe you're leaving it up to the programmer to decide what the correct level of isolation is.
As others have mentioned this allows for complexity reduction and improved performance.
Rico Mariani wrote two articles entitled "Putting your synchronization at the correct level" and
Putting your synchronization at the correct level -- solution that have a nice example of this in action.
In the article he has a method called DoWork(). In it he calls other classes Read twice Write twice and then LogToSteam.
Read, Write, and LogToSteam all shared a lock and were thread safe. This is good except for the fact that because DoWork was also thread safe all the synchronizing work in each Read, Write and LogToSteam was a complete waste of time.
This is all related to the nature Imperative Programming. Its side effects cause the need for this.
However if you had an development platform where applications could be expressed as pure functions where there were no dependencies or side effects then it would be possible to create applications where the threading was managed without developer intervention.
So, what is the point of keeping non thread-safe code?
The rule of thumb is to avoid locking as much as possible. The Ideal code is re-entrant and thread safe with out any locking. But that would be utopia.
Coming back to reality, a good programmer tries his level best to have a sectional locking as opposed to locking the entire context. An example would be to lock few lines of code at a time in various routines than locking everything in a function.
So Also, one has to refactor the code to come up with a design that would minimize the locking if not get rid of it in entirity.
e.g. consider a foobar() function that gets new data on each call and uses switch() case on a type of data to changes a node in a tree. The locking can be mostly avoided (if not completely) As each case statement would touch a different node in a tree. This may be a more specific example but i think it elaborates my point.