When should locks be used? Only when modifying data or when accessing it as well?
public class Test {
static Dictionary<string, object> someList = new Dictionary<string, object>();
static object syncLock = new object();
public static object GetValue(string name) {
if (someList.ContainsKey(name)) {
return someList[name];
} else {
lock(syncLock) {
object someValue = GetValueFromSomeWhere(name);
someList.Add(name, someValue);
}
}
}
}
Should there be a lock around the the entire block or is it ok to just add it to the actual modification? My understanding is that there still could be some race condition where one call might not have found it and started to add it while another call right after might have also run into the same situation - but I'm not sure. Locking is still so confusing. I haven't run into any issues with the above similar code but I could just be lucky so far. Any help above would be appriciated as well as any good resources for how/when to lock objects.
You have to lock when reading too, or you can get unreliable data, or even an exception if a concurrent modification physically changes the target data structure.
In the case above, you need to make sure that multiple threads don't try to add the value at the same time, so you need at least a read lock while checking whether it is already present. Otherwise multiple threads could decide to add, find the value is not present (since this check is not locked), and then all try to add in turn (after getting the lock)
You could use a ReaderWriterLockSlim if you have many reads and only a few writes. In the code above you would acquire the read lock to do the check and upgrade to a write lock once you decide you need to add it. In most cases, only a read lock (which allows your reader threads to still run in parallel) would be needed.
There is a summary of the available .Net 4 locking primitives here. Definitely you should understand this before you get too deep into multithreaded code. Picking the correct locking mechanism can make a huge performance difference.
You are correct that you have been lucky so far - that's a frequent feature of concurrency bugs. They are often hard to reproduce without targeted load testing, meaning correct design (and exhaustive testing, of course) is vital to avoid embarrassing and confusing production bugs.
Lock the whole block before you check for the existence of name. Otherwise, in theory, another thread could add it between the check, and your code that adds it.
Actually locking just when you perform the Add really doesn't do anything at all. All that would do is prevent another thread from adding something simultaneously. But since that other thread would have already decided it was going to do the add, it would just try to do it anyway as soon as the lock was released.
If a resource can only be accessed by multiple threads, you do not need any locks.
If a resource can be accessed by multiple threads and can be modified, then all accesses/modifications need to be synchronized. In your example, if GetValueFromSomeWhere takes a long time to return, it is possible for a second call to be made with the same value in name, but the value has not been stored in the Dictionary.
ReaderWriterLock or the slim version if you under 4.0.
You will aquire the reader lock for the reads (will allow for concurrent reads) and upgrade the lock to the writer lock when something is to write (will allow only one write at the time and will block all the reads until is done, as well as the concurrent write-threads).
Make sure to release your locks with the pattern to avoid deadlocking:
void Write(object[] args)
{
this.ReaderWriterLock.AquireWriteLock(TimeOut.Infinite);
try
{
this.myData.Write(args);
}
catch(Exception ex)
{
}
finally
{
this.ReaderWriterLock.RelaseWriterLock();
}
}
Related
This is more a conceptual question. I was wondering if I used a lock inside of Parallel.ForEach<> loop if that would take away the benefits of Paralleling a foreachloop.
Here is some sample code where I have seen it done.
Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
fileLocks.Add(reference.Key, new object());
}
}
RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount, ref recursionCount);
lock (fileLocks[reference.Key])
{
reference.Value.Document.Save(reference.Key);
}
});
Where fileLockObject and fileLocks are as follows.
private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();
Does this technique completely make the loop not parallel?
I would like to see your thoughts on this.
It means all of the work inside of the lock can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.
That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary. You can instead use a ConcurrentDictionary, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.
if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.
Proportionally. When RecursiveBinUpdate() is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.
But worse, your code is not thread-safe. From your 2 operations on fileLocks, only the first is actually inside a lock.
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
...
}
}
and
lock (fileLocks[reference.Key]) // this access to fileLocks[] is not protected
change the 2nd part to:
lock (fileLockObject)
{
reference.Value.Document.Save(reference.Key);
}
and the use of ref recursionCount as a parameter looks suspicious too. It might work with Interlocked.Increment though.
The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.
When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate).
Bottom line, I see nothing substantially wrong with what you're doing here.
Specific Answers Only Please! I'm decently familiar with the better(best) practices around collection locking, thread safety etc. Just want some answers / ideas around this specific scenario.
We have some legacy code of the type:
public class GodObject
{
private readonly Dictionary<string, string> _signals;
//bunch of methods accessing the dictionary
private void SampleMethod1()
{
lock(_signals)
{
//critical code section 1
}
}
public void SampleMethod2()
{
lock(_signals)
{
//critical code section 2
}
}
}
All access to the dictionary is inside such lock statements. We're getting some bugs which could be explained if the locking was not explicitly working - meaning 2 or more threads getting simultaneous access to the dictionary.
So my question is this - is there any scenario where the critical sections could be simultaneously accessed by multiple threads?? To me, it should not be possible, since the reference is readonly, it's not as though the object could be changing, and most of the issues around the lock() are around deadlocks rather than syncronization not happening. But maybe i'm missing some nuance or something glaring?
This is running in a long running windows service .NET Framework 3.5.
There are three problems I can imagine occurring outside the code you posted:
Somebody might access the dictionary without locking on it. Using lock on an object will prevent anyone else from using lock on the same object at the same time, but it won't do anything to prevent other threads from using the object without locking on it. Note that because it would not have been overly difficult to have written Dictionary [and for that matter List] in such a way as to allow safe simultaneous use by multiple readers and one writer that only adds information, some people may assume that read methods don't need locking. Unfortunately, that assumption is false: Microsoft could have added such thread safety fairly cheaply, but didn't.
As Servy suggested, someone might be assuming that the the collection won't change between calls to two independent methods.
If some code which acquires a lock assumes a collection isn't going to change while the lock is held, but then calls some outside method while holding the lock, it's possible that the outside method could change the object despite the lock being held.
Unless the object which owns the dictionary keeps all references to itself, so that no outside code ever gets a reference to the dictionary, I think the first of these problems is perhaps the most likely. The other two problems can also occur sometimes, however.
In my app I have a List of objects. I'm going to have a process (thread) running every few minutes that will update the values in this list. I'll have other processes (other threads) that will just read this data, and they may attempt to do so at the same time.
When the list is being updated, I don't want any other process to be able to read the data. However, I don't want the read-only processes to block each other when no updating is occurring. Finally, if a process is reading the data, the process that updates the data must wait until the process reading the data is finished.
What sort of locking should I implement to achieve this?
This is what you are looking for.
ReaderWriterLockSlim is a class that will handle scenario that you have asked for.
You have 2 pair of functions at your disposal:
EnterWriteLock and ExitWriteLock
EnterReadLock and ExitReadLock
The first one will wait, till all other locks are off, both read and write, so it will give you access like lock() would do.
The second one is compatible with each other, you can have multiple read locks at any given time.
Because there's no syntactic sugar like with lock() statement, make sure you will never forget to Exit lock, because of Exception or anything else. So use it in form like this:
try
{
lock.EnterWriteLock(); //ReadLock
//Your code here, which can possibly throw an exception.
}
finally
{
lock.ExitWriteLock(); //ReadLock
}
You don't make it clear whether the updates to the list will involve modification of existing objects, or adding/removing new ones - the answers in each case are different.
To handling modification of existing items in the list, each object should handle it's own locking.
To allow modification of the list while others are iterating it, don't allow people direct access to the list - force them to work with a read/only copy of the list, like this:
public class Example()
{
public IEnumerable<X> GetReadOnlySnapshot()
{
lock (padLock)
{
return new ReadOnlyCollection<X>( MasterList );
}
}
private object padLock = new object();
}
Using a ReadOnlyCollection<X> to wrap the master list ensures that readers can iterate through a list of fixed content, without blocking modifications made by writers.
You could use ReaderWriterLockSlim. It would satisfy your requirements precisely. However, it is likely to be slower than just using a plain old lock. The reason is because RWLS is ~2x slower than lock and accessing a List would be so fast that it would not be enough to overcome the additional overhead of the RWLS. Test both ways, but it is likely ReaderWriterLockSlim will be slower in your case. Reader writer locks do better in scenarios were the number readers significantly outnumbers the writers and when the guarded operations are long and drawn out.
However, let me present another options for you. One common pattern for dealing with this type of problem is to use two separate lists. One will serve as the official copy which can accept updates and the other will serve as the read-only copy. After you update the official copy you must clone it and swap out the reference for the read-only copy. This is elegant in that the readers require no blocking whatsoever. The reason why readers do not require any blocking type of synchronization is because we are treating the read-only copy as if it were immutable. Here is how it can be done.
public class Example
{
private readonly List<object> m_Official;
private volatile List<object> m_Readonly;
public Example()
{
m_Official = new List<object>();
m_Readonly = m_Official;
}
public void Update()
{
lock (m_Official)
{
// Modify the official copy here.
m_Official.Add(...);
m_Official.Remove(...);
// Now clone the official copy.
var clone = new List<object>(m_Official);
// And finally swap out the read-only copy reference.
m_Readonly = clone;
}
}
public object Read(int index)
{
// It is safe to access the read-only copy here because it is immutable.
// m_Readonly must be marked as volatile for this to work correctly.
return m_Readonly[index];
}
}
The code above would not satisfy your requirements precisely because readers never block...ever. Which means they will still be taking place while writers are updating the official list. But, in a lot of scenarios this winds up being acceptable.
In a multi-threaded program running on a multi-cpu machine do I need to access shared state ( _data in the example code below) using volatile read/writes to ensure correctness.
In other words, can heap objects be cached on the cpu?
Using the example below and assuming multi-threads will access the GetValue and Add methods, I need ThreadA to be able to add data (using the Add Method) and ThreadB to be able to see/get that added data immediately (using the GetValue method). So do I need to add volatile reads/writes to _data to ensure this? Basically I don’t want to added data to be cached on ThreadA’s cpu.
/ I am not Locking (enforcing exclusive thread access) as the code needs to be ultra-fast and I am not removing any data from _data so I don’t need to lock _data.
Thanks.
**** Update ****************************
Obviously you guys think going lock-free using this example is bad idea. But what side effects or exceptions could I face here?
Could the Dictionary type throw an exception if 1 thread is iterating the values for read and another thread is iterating the values for update? Or would I just experience “dirty reads” (which would be fine in my case)?
**** End Update ****************************
public sealed class Data
{
private volatile readonly Dictionary<string, double> _data = new Dictionary<string, double>();
public double GetVaule(string key)
{
double value;
if (!_data.TryGetValue(key, out value))
{
throw new ArgumentException(string.Format("Key {0} does not exist.", key));
}
return value;
}
public void Add(string key, double value)
{
_data.Add(key, value);
}
public void Clear()
{
_data.Clear();
}
}
Thanks for the replies. Regarding the locks, the methods are pretty much constantly called by mulitple threads so my problem is with contested locks not the actual lock operation.
So my question is about cpu caching, can heap objects (the _data instance field) be cached on a cpu? Do i need the access the _data field using volatile reads/writes?
/Also, I am stuck with .Net 2.0.
Thanks for your help.
The MSDN docs for Dictionary<TKey, TValue> say that it's safe for multiple readers but they don't give the "one writer, multiple readers" guarantee that some other classes do. In short, I wouldn't do this.
You say you're avoiding locking because you need the code to be "ultra-fast" - have you tried locking to see what the overhead is? Uncontested locks are very cheap, and when the lock is contested that's when you're benefiting from the added safety. I'd certainly profile this extensively before deciding to worry about the concurrency issues of a lock-free solution. ReaderWriterLockSlim may be useful if you've actually got multiple readers, but it sounds like you've got a single reader and a single writer, at least at the moment - simple locking will be easier in this case.
I think you may be misunderstanding the use of the volatile keyword (either that or I am, and someone please feel free to correct me). The volatile keyword guarantees that get and set operations on the value of the variable itself from multiple threads will always deal with the same copy. For instance, if I have a bool that indicates a state then setting it in one thread will make the new value immediately available to the other.
However, you never change the value of your variable (in this case, a reference). All that you do is manipulate the area of memory that the reference points to. Declaring it as volatile readonly (which, if my understanding is sound, defeats the purpose of volatile by never allowing it to be set) won't have any effect on the actual data that's being manipulated (the back-end store for the Dictionary<>).
All that being said, you really need to use a lock in this case. Your danger extends beyond the prospect of "dirty reads" (meaning that what you read would have been, at some point, valid) into truly unknown territory. As Jon said, you really need proof that locking produces unacceptable performance before you try to go down the road of lockless coding. Otherwise that's the epitome of premature optimization.
The problem is that your add method:
public void Add(string key, double value)
{
_data.Add(key, value);
}
Could cause _data to decide to completely re-organise the data it's holding - at that point a GetVaule request could fail in any possible way.
You need a lock or a different data structure / data structure implementation.
I don't think volatile can be a replacement of locking if you start calling methods on it. You are guaranteeing that the thread A and thread B sees the same copy of the dictionary, but you can still access the dictionary simultaneously. You can use multi-moded locks to increase concurrency. See ReaderWriterLockSlim for example.
Represents a lock that is used to
manage access to a resource, allowing
multiple threads for reading or
exclusive access for writing.
The volatile keyword is not about locking, it is used to indicate that the value of the specified field might be changed or read by different thread or other thing that can run concurrently with your code. This is crucial for the compiler to know, because many optimization processes involve caching the variable value and rearranging the instructions. The volatile keyword will tell the compiler to be "cautious" when optimizing those instructions that reference to volatile variable.
For multi-thread usage of dictionary, there are many ways to do. The simplest way is using lock keyword, which has adequate performance. If you need higher performance, you might need to implement your own dictionary for your specific task.
Volatile is not locking, it has nothing to do with synchronization. It's generally safe to do lock-free reads on read-only data. Note that just because you don't remove anything from _data, you seem to call _data.Add(). That is NOT read-only. So yes, this code will blow up in your face in a variety of exciting and difficult to predict ways.
Use locks, it's simple, it's safer. If you're a lock-free guru (you're not!), AND profiling shows a bottleneck related to contention for the lock, AND you cannot solve the contention issues via partitioning or switching to spin-locks THEN AND ONLY THEN can you investigate a solution to get lock-free reads, which WILL involve writing your own Dictionary from scratch and MAY be faster than the locking solution.
Are you starting to see how far off base you are in your thinking here? Just use a damn lock!
I have a class used to cache access to a database resource. It looks something like this:
//gets registered as a singleton
class DataCacher<T>
{
IDictionary<string, T> items = GetDataFromDb();
//Get is called all the time from zillions of threads
internal T Get(string key)
{
return items[key];
}
IDictionary<string, T> GetDataFromDb() { ...expensive slow SQL access... }
//this gets called every 5 minutes
internal void Reset()
{
items.Clear();
}
}
I've simplified this code somewhat, but the gist of it is that there is a potential concurrency issue, in that while the items are being cleared, if Get is called things may go awry.
Now I can just bung lock blocks into Get and Reset, but I'm worried that the locks on the Get will reduce performance of the site, as Get is called by every request thread in the web app many many times.
I can do something with a doubly checked locks I think, but I suspect there is a cleaner way to do this using something smarter than the lock{} block. What to do?
edit: Sorry all, I didn't make this explicit earlier, but the items.Clear() implementation I am using is not in fact a straight dictionary. Its a wrapper around a ResourceProvider which requires that the dictionary implementation calls .ReleaseAllResources() on each of the items as they get removed. This means that calling code doesn't want to run against an old version that is in the midst of disposal. Given this, is the Interlocked.Exchange method the correct one?
I would start by testing it with just a lock; locks are very cheap when not contested. However - a simpler scheme is to rely on the atomic nature of reference updates:
public void Clear() {
var tmp = GetDataFromDb(); // or new Dictionary<...> for an empty one
items = tmp; // this is atomic; subsequent get/set will use this one
}
You might also want to make items a volatile field, just to be sure it isn't held in the registers anywhere.
This still has the problem that anyone expecting there to be a given key may get disappointed (via an exception), but that is a separate issue.
The more granular option might be a ReaderWriterLockSlim.
One option is to completely replace the IDictionary instance instead of Clearing it. You can do this in a thread-safe way using the Exchange method on the Interlocked class.
See if the database will tell you what data has change. You could use
Trigger to write changes to a history table
Query Notifications (SqlServer and Oracle has these, other must do as well)
Etc
So you don’t have to reload all the data based on a timer.
Failing this.
I would make the Clear method create a new IDictionary by calling GetDataFromDB(), then once the data has been loaded set the “items” field to point to the new Dictionary. (The garbage collector will clean up the old dictionary once no threads are accessing it.)
I don’t think you care if some threads
get “old” results while reloading the
data – (if you do then you will just
have to block all threads on a lock –
painful!)
If you need all thread to swap over to the new dictionary at the same time, then you need to declare the “items” field to be volatile and use the Exchange method on the Interlocked class. However it is unlikely you need this in real life.