I have a function that returns an entry on a dictionary, based on the Key (name) and if it doesn't exist, returns a newly created one.
The question I have is with the "double lock" : SomeFunction locks the _dictionary, to check for the existance of the key, then calls a function that also locks the same dictionary, it seems to work but I am not sure if there is a potential problem with this approach.
public Machine SomeFunction(string name)
{
lock (_dictionary)
{
if (!_dictionary.ContainsKey(name))
return CreateMachine(name);
return _dictionary[name];
}
}
private Machine CreateMachine(string name)
{
MachineSetup ms = new Machine(name);
lock(_dictionary)
{
_ictionary.Add(name, ms);
}
return vm;
}
That's guaranteed to work - locks are recursive in .NET. Whether it's really a good idea or not is a different matter... how about this instead:
public Machine SomeFunction(string name)
{
lock (_dictionary)
{
Machine result;
if (!_dictionary.TryGetValue(name, out result))
{
result = CreateMachine(name);
_dictionary[name] = result;
}
return result;
}
}
// This is now *just* responsible for creating the machine,
// not for maintaining the dictionary. The dictionary manipulation
// is confined to the above method.
private Machine CreateMachine(string name)
{
return new Machine(name);
}
No problem here, the lock is re-entrant by the same thread. Not all sync objects have thread affinity, Semaphore for example. But Mutex and Monitor (lock) are fine.
New since .net 4.0, check out the ConcurrentDictionary - ConcurrentDictionary is a thread-safe collection of key/value pairs that can be accessed by multiple threads concurrently. More info at https://msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx .
Related
I'm currently implementing a thread-safe dictionary in C# which uses immutable AVL trees as buckets internally. The idea is to provide fast read access without a lock because in my application context, we add entries to this dictionary only at startup and afterwards, values are mostly read (but there still are a few number of writes).
I've structured my TryGetValue and GetOrAdd methods in the following way:
public sealed class FastReadThreadSafeDictionary<TKey, TValue> where TKey : IEquatable<TKey>
{
private readonly object _bucketContainerLock = new object();
private ImmutableBucketContainer<TKey, TValue> _bucketContainer;
public bool TryGetValue(TKey key, out TValue value)
{
var bucketContainer = _bucketContainer;
return bucketContainer.TryFind(key.GetHashCode(), key, out value);
}
public bool GetOrAdd(TKey key, Func<TValue> createValue, out TValue value)
{
createValue.MustNotBeNull(nameof(createValue));
var hashCode = key.GetHashCode();
lock (_bucketContainerLock)
{
ImmutableBucketContainer<TKey, TValue> newBucketContainer;
if (_bucketContainer.GetOrAdd(hashCode, key, createValue, out value, out newBucketContainer) == false)
return false;
_bucketContainer = newBucketContainer;
return true;
}
}
// Other members omitted for sake of brevity
}
As you can see, I don't use a lock in TryGetValue because reference assignment in .NET runtimes is an atomic operation by design. By copying the reference of the field _bucketContainer to a local variable, I'm sure I can safely access the instance because it is immutable. In GetOrAdd, I use a lock to access the private _bucketContainer so I can ensure that a value is not created twice (i.e. if two or more threads are trying to add a value, only one can actually create a new ImmutableBucketContainer with the added value because of the lock).
I use Microsoft Chess for testing concurrency and in one of my tests, MCUT (Microsoft Concurrency Unit Testing) reports a data race in GetOrAdd when I exchange the new bucket container with the old one:
[DataRaceTestMethod]
public void ReadWhileAdd()
{
var testTarget = new FastReadThreadSafeDictionary<int, object>();
var writeThread = new Thread(() =>
{
for (var i = 5; i < 10; i++)
{
testTarget.GetOrAdd(i, () => new object());
Thread.Sleep(0);
}
});
var readThread = new Thread(() =>
{
object value;
testTarget.TryGetValue(5, out value);
Thread.Sleep(0);
testTarget.TryGetValue(7, out value);
Thread.Sleep(10);
testTarget.TryGetValue(9, out value);
});
readThread.Start();
writeThread.Start();
readThread.Join();
writeThread.Join();
}
MCUT reports the following message:
23> Test result: DataRace
23> ReadWhileAdd() (Context=, TestType=MChess): [DataRace]Found data race at GetOrAdd:FastReadThreadSafeDictionary.cs(68)
which is the assignment _bucketContainer = newBucketContainer; in GetOrAdd.
My actual question is: why is the assignment _bucketContainer = newBucketContainer a race condition? Threads currently executing TryGetValue always make a copy of the _bucketContainer field and thus shouldn't be bothered with the update (except that the searched value might be added to the _bucketContainer just after the copy takes place, but this doesn't matter with the data race). And in GetOrAdd, there is an explicit lock to prevent concurrent access. Is this a bug in Chess or am I missing something very obvious?
As mentioned by #CodesInChaos in the comments of the question, I missed a volatile read in TryGetValue. The method now looks like this:
public bool TryGetValue(TypeKey typeKey, out TValue value)
{
var bucketContainer = Volatile.Read(ref _bucketContainer);
return bucketContainer.TryFind(typeKey, out value);
}
This volatile read is necessary because different threads accessing this dictionary might cache data and reorder instructions independently from each other, which might lead to a data race. Additionally, the CPU architecture that is running the code also matters, e.g. x86 and x64 processors perform volatile reads by default, while this might not be true for other architectures like ARM or Itanium. That's why the read access has to be synchronized with other threads using a Memory Barrier, which is performed internally in Volatile.Read (note that lock statements also use memory barriers internally). Joseph Albahari wrote a comprehensive tutorial on this here: http://www.albahari.com/threading/part4.aspx
I have multiple threads writing data to a common source, and I would like two threads to block each other if and only if they are touching the same piece of data.
It would be nice to have a way to lock specifically on an arbitrary key:
string id = GetNextId();
AquireLock(id);
try
{
DoDangerousThing();
}
finally
{
ReleaseLock(id);
}
If nobody else is trying to lock the same key, I would expect they would be able to run concurrently.
I could achieve this with a simple dictionary of mutexes, but I would need to worry about evicting old, unused locks and that could become a problem if the set grows too large.
Is there an existing implementation of this type of locking pattern.
You can try using a ConcurrentDictionary<string, object> to create named object instances. When you need a new lock instance (that you haven't used before), you can add it to the dictionary (adding is an atomic operation through GetOrAdd) and then all threads can share the same named object once you pull it from the dictionary, based on your data.
For example:
// Create a global lock map for your lock instances.
public static ConcurrentDictionary<string, object> GlobalLockMap =
new ConcurrentDictionary<string, object> ();
// ...
var oLockInstance = GlobalLockMap.GetOrAdd ( "lock name", x => new object () );
if (oLockInstance == null)
{
// handle error
}
lock (oLockInstance)
{
// do work
}
You can use the ConcurrentDictionary<string, object> to create and reuse different locks. If you want to remove locks from the dictionary, and also to reopen in future the same named resource, you have always to check inside the critical region if the previously acquired lock has been removed or changed by other threads. And take care to remove the lock from the dictionary as the last step before leaving the critical region.
static ConcurrentDictionary<string, object> _lockDict =
new ConcurrentDictionary<string, object>();
// VERSION 1: single-shot method
public void UseAndCloseSpecificResource(string resourceId)
{
bool isSameLock;
object lockObj, lockObjCheck;
do
{
lock (lockObj = _lockDict.GetOrAdd(resourceId, new object()))
{
if (isSameLock = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
try
{
// ... open, use, and close resource identified by resourceId ...
// ...
}
finally
{
// This must be the LAST statement
_lockDict.TryRemove(resourceId, out lockObjCheck);
}
}
}
}
while (!isSameLock);
}
// VERSION 2: separated "use" and "close" methods
// (can coexist with version 1)
public void UseSpecificResource(string resourceId)
{
bool isSameLock;
object lockObj, lockObjCheck;
do
{
lock (lockObj = _lockDict.GetOrAdd(resourceId, new object()))
{
if (isSameLock = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
// ... open and use (or reuse) resource identified by resourceId ...
}
}
}
while (!isSameLock);
}
public bool TryCloseSpecificResource(string resourceId)
{
bool result = false;
object lockObj, lockObjCheck;
if (_lockDict.TryGetValue(resourceId, out lockObj))
{
lock (lockObj)
{
if (result = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
try
{
// ... close resource identified by resourceId ...
// ...
}
finally
{
// This must be the LAST statement
_lockDict.TryRemove(resourceId, out lockObjCheck);
}
}
}
}
return result;
}
The lock keyword (MSDN) already does this.
When you lock, you pass the object to lock on:
lock (myLockObject)
{
}
This uses the Monitor class with the specific object to synchronize any threads using lock on the same object.
Since string literals are "interned" – that is, they are cached for reuse so that every literal with the same value is in fact the same object – you can also do this for strings:
lock ("TestString")
{
}
Since you aren't dealing with string literals you could intern the strings you read as described in: C#: Strings with same contents.
It would even work if the reference used was copied (directly or indirectly) from an interned string (literal or explicitly interned). But I wouldn't recommend it. This is very fragile and can lead to hard-to-debug problems, due to the ease with which new instances of a string having the same value as an interned string can be created.
A lock will only block if something else has entered the locked section on the same object. Thus, no need to keep a dictionary around, just the applicable lock objects.
Realistically though, you'll need to maintain a ConcurrentDictionary or similar to allow your objects to access the appropriate lock object.
There are a great number of articles available regarding thread safe caching, here's an example:
private static object _lock = new object();
public void CacheData()
{
SPListItemCollection oListItems;
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if(oListItems == null)
{
lock (_lock)
{
// Ensure that the data was not loaded by a concurrent thread
// while waiting for lock.
oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
}
However, this example depends on the request for the cache also rebuilding the cache.
I'm looking for a solution where the request and rebuild are separate. Here's the scenario.
I have a web service that I want to monitor for certain types of error. If an error occurs, I create an monitor object and cache - it is updatable and is locked accordingly during update. Alls well so far.
Elsewhere, I check for the existence of the cached object, and the data it contains. This would work straight out of the box except for one particular scenario.
If the cache object is being updated - say a status change, I would like to wait and get the latest info rather than the current info, which if returned, would be out of date. So for my fetch code, I need to check if the object is currently being created/updating, and if so wait, then retry.
As I pointed out, there are many examples of cache locking patterns but I can't seem to find one that for this scenario. Any ideas as to how to go about this would be appreciated?
You can try the following code using two locks. Write lock in the setter is quite simple and protects cache from being written by more than one threads. The getter use a simple double-check lock.
Now, the trick is in Refresh() method, which uses the same lock as the getter. The method uses the lock and in the first step removes list from the cache. It will trigger any getter to fail the first null check and wait for the lock. The method in the meantime gets items, sets cache again and releases the lock.
When it comes back to the getter, it reads the cache again and now it contains the list.
public class CacheData
{
private static object _readLock = new object();
private static object _writeLock = new object();
public SPListItemCollection ListItem
{
get
{
var oListItems = (SPListItemCollection) Cache["ListItemCacheName"];
if (oListItems == null)
{
lock (_readLock)
{
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
return oListItems;
}
set
{
lock (_writeLock)
{
Cache.Add("ListItemCacheName", value, ..);
}
}
}
public void Refresh()
{
lock (_readLock)
{
Cache.Remove("ListItemCacheName");
var oListItems = DoQueryToReturnItems();
ListItem = oListItems;
}
}
}
You can make the method and property static, if you do not need CacheData instance.
I'd like to use .NET's Lazy<T> class to implement thread safe caching. Suppose we had the following setup:
class Foo
{
Lazy<string> cachedAttribute;
Foo()
{
invalidateCache();
}
string initCache()
{
string returnVal = "";
//CALCULATE RETURNVAL HERE
return returnVal;
}
public String CachedAttr
{
get
{
return cachedAttribute.Value;
}
}
void invalidateCache()
{
cachedAttribute = new Lazy<string>(initCache, true);
}
}
My questions are:
Would this work at all?
How would the locking have to work?
I feel like I'm missing a lock somewhere near the invalidateCache, but for the life of me I can't figure out what it is.
I'm sure there's a problem with this somewhere, I just haven't figured out where.
[EDIT]
Ok, well it looks like I was right, there were things I hadn't thought about. If a thread sees an outdated cache it'd be a very bad thing, so it looks like "Lazy" is not safe enough. The Property is accessed a lot though, so I was engaging in pre-mature optimization in hopes that I could learn something and have a pattern to use in the future for thread-safe caching. I'll keep working on it.
P.S.: I decided to make the object thread-un-safe and have access to the object be carefully controlled instead.
Well, it's not thread-safe in that one thread could still see the old value after another thread sees the new value after invalidation - because the first thread could have not seen the change to cachedAttribute. In theory, that situation could perpetuate forever, although it's pretty unlikely :)
Using Lazy<T> as a cache of unchanging values seems like a better idea to me - more in line with how it was intended - but if you can cope with the possibility of using an old "invalidated" value for an arbitrarily long period in another thread, I think this would be okay.
cachedAttribute is a shared resource that needs to be protected from concurrent modification.
Protect it with a lock:
private readonly object gate = new object();
public string CachedAttr
{
get
{
Lazy<string> lazy;
lock (gate) // 1. Lock
{
lazy = this.cachedAttribute; // 2. Get current Lazy<string>
} // 3. Unlock
return lazy.Value // 4. Get value of Lazy<string>
// outside lock
}
}
void InvalidateCache()
{
lock (gate) // 1. Lock
{ // 2. Assign new Lazy<string>
cachedAttribute = new Lazy<string>(initCache, true);
} // 3. Unlock
}
or use Interlocked.Exchange:
void InvalidateCache()
{
Interlocked.Exchange(ref cachedAttribute, new Lazy<string>(initCache, true));
}
volatile might work as well in this scenario, but it makes my head hurt.
While i was looking at some legacy application code i noticed it is using a string object to do thread synchronization. I'm trying to resolve some thread contention issues in this program and was wondering if this could lead so some strange situations. Any thoughts ?
private static string mutex= "ABC";
internal static void Foo(Rpc rpc)
{
lock (mutex)
{
//do something
}
}
Strings like that (from the code) could be "interned". This means all instances of "ABC" point to the same object. Even across AppDomains you can point to the same object (thx Steven for the tip).
If you have a lot of string-mutexes, from different locations, but with the same text, they could all lock on the same object.
The intern pool conserves string storage. If you assign a literal string constant to several variables, each variable is set to reference the same constant in the intern pool instead of referencing several different instances of String that have identical values.
It's better to use:
private static readonly object mutex = new object();
Also, since your string is not const or readonly, you can change it. So (in theory) it is possible to lock on your mutex. Change mutex to another reference, and then enter a critical section because the lock uses another object/reference. Example:
private static string mutex = "1";
private static string mutex2 = "1"; // for 'lock' mutex2 and mutex are the same
private static void CriticalButFlawedMethod() {
lock(mutex) {
mutex += "."; // Hey, now mutex points to another reference/object
// You are free to re-enter
...
}
}
To answer your question (as some others already have), there are some potential problems with the code example you provided:
private static string mutex= "ABC";
The variable mutex is not immutable.
The string literal "ABC" will refer to the same interned object reference everywhere in your application.
In general, I would advise against locking on strings. However, there is a case I've ran into where it is useful to do this.
There have been occasions where I have maintained a dictionary of lock objects where the key is something unique about some data that I have. Here's a contrived example:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
ConcurrentDictionary<int, object> _locks = new ConcurrentDictionary<int, object>();
void DoSomething(SomeEntity entity)
{
var mutex = _locks.GetOrAdd(entity.Id, id => new object());
lock(mutex)
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The goal of code like this is to serialize concurrent invocations of DoSomething() within the context of the entity's Id. The downside is the dictionary. The more entities there are, the larger it gets. It's also just more code to read and think about.
I think .NET's string interning can simplify things:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
void DoSomething(SomeEntity entity)
{
lock(string.Intern("dee9e550-50b5-41ae-af70-f03797ff2a5d:" + entity.Id))
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The difference here is that I am relying on the string interning to give me the same object reference per entity id. This simplifies my code because I don't have to maintain the dictionary of mutex instances.
Notice the hard-coded UUID string that I'm using as a namespace. This is important if I choose to adopt the same approach of locking on strings in another area of my application.
Locking on strings can be a good idea or a bad idea depending on the circumstances and the attention that the developer gives to the details.
If you need to lock a string, you can create an object that pairs the string with an object that you can lock with.
class LockableString
{
public string _String;
public object MyLock; //Provide a lock to the data in.
public LockableString()
{
MyLock = new object();
}
}
My 2 cents:
ConcurrentDictionary is 1.5X faster than interned strings. I did a benchmark once.
To solve the "ever-growing dictionary" problem you can use a dictionary of semaphores instead of a dictionary of objects. AKA use ConcurrentDictionary<string, SemaphoreSlim> instead of <string, object>. Unlike the lock statements, Semaphores can track how many threads have locked on them. And once all the locks are released - you can remove it from the dictionary. See this question for solutions like that: Asynchronous locking based on a key
Semaphores are even better because you can even control the concurrency level. Like, instead of "limiting to one concurrent run" - you can "limit to 5 concurrent runs". Awesome free bonus isn't it? I had to code an email-service that needed to limit the number of concurrent connections to a server - this came very very handy.
I imagine that locking on interned strings could lead to memory bloat if the strings generated are many and are all unique. Another approach that should be more memory efficient and solve the immediate deadlock issue is
// Returns an Object to Lock with based on a string Value
private static readonly ConditionalWeakTable<string, object> _weakTable = new ConditionalWeakTable<string, object>();
public static object GetLock(string value)
{
if (value == null) throw new ArgumentNullException(nameof(value));
return _weakTable.GetOrCreateValue(value.ToLower());
}