Is locking necessary in this ConcurrentDictionary caching scenario

Is locking necessary in this ConcurrentDictionary caching scenario - c#

I have the following code to cache instances of some class in a Concurrent Dictionary to which I use in a multi threaded application.
Simply, when I instantinate the class with the id parameter, it first checks if an instance of privateclass with the given id exists in the dictionary, and if not creates an instance of the privateclass (which takes long time, sometimes couple of seconds), and adds it to the dictionary for future use.
public class SomeClass
{
private static readonly ConcurrentDictionary<int, PrivateClass> SomeClasses =
new ConcurrentDictionary<int, PrivateClass>();
private readonly PrivateClass _privateClass;
public SomeClass(int cachedInstanceId)
{
if (!SomeClasses.TryGetValue(cachedInstanceId, out _privateClass))
{
_privateClass = new PrivateClass(); // This takes long time
SomeClasses.TryAdd(cachedInstanceId, _privateClass);
}
}
public int SomeCalculationResult()
{
return _privateClass.CalculateSomething();
}
private class PrivateClass
{
internal PrivateClass()
{
// this takes long time
}
internal int CalculateSomething()
{
// Calculates and returns something
}
}
}
My question is, do I need to add a lock around the generation and assignment part of the outer classes constructor to make this code thread safe or is it good as it is?
Update:
After SLaks's suggestion, tried to use GetOrAdd() method of ConcurrentDictionary with the combination of Lazy, but unfortunately the constructor of the PrivateClass still called more than once. See https://gist.github.com/3500955 for the test code.
Update 2:
You can see the final solution here:
https://gist.github.com/3501446

You're misusing ConcurrentDictionary.
In multi-threaded code, you should never check for the presence of an item, then add it if it's not there.
If two threads run that code at once, they will both end up adding it.
In general, there are two solutions tho this kind of problem. You can wrap all of that code in a lock, or you can redesign it to the whole thing in one atomic operation.
ConcurrentDictionary is designed to for this kind of scenario.
You should simply call
_privateClass = SomeClasses.GetOrAdd(cachedInstanceId, key => new PrivateClass());

Locking is not necessary, but what you're doing is not thread-safe. Instead of first checking the dictionary for presence of an item and then adding it if necessary, you should use ConcurrentDictionary.GetOrAdd() to do it all in one atomic operation.
Otherwise, you're exposing yourself to the same problem that you'd have with a regular dictionary: another thread might add an entry to SomeClasses after you check for existence but before you insert.

Your sample code at https://gist.github.com/3500955 using ConcurrentDictionary and Lazy<T> is incorrect - you're writing:
private static readonly ConcurrentDictionary<int, PrivateClass> SomeClasses =
new ConcurrentDictionary<int, PrivateClass>();
public SomeClass(int cachedInstanceId)
{
_privateClass = SomeClasses.GetOrAdd(cachedInstanceId, (key) => new Lazy<PrivateClass>(() => new PrivateClass(key)).Value);
}
..which should have been:
private static readonly ConcurrentDictionary<int, Lazy<PrivateClass>> SomeClasses =
new ConcurrentDictionary<int, Lazy<PrivateClass>>();
public SomeClass(int cachedInstanceId)
{
_privateClass = SomeClasses.GetOrAdd(cachedInstanceId, (key) => new Lazy<PrivateClass>(() => new PrivateClass(key))).Value;
}
You need to use ConcurrentDictionary<TKey, Lazy<TVal>>, and not ConcurrentDictionary<TKey, TVal>.
The point is that you only access the Value of the Lazy after the correct Lazy object has been returned from the GetOrAdd() - sending in the Value of the Lazy object to the GetOrAdd function defeats the whole purpose of using it.
Edit: Ah - you got it in https://gist.github.com/mennankara/3501446 :)

Related

Is it possible for a managed thread to have a Race Condition with itself

So in order to have a separate context for each thread that the program is running I set up a Context - Thread mapping class as follows
public class ContextMap : IContextMap
{
private static IContextMap _contextMap;
private Dictionary<int, IArbContext2> ContextDict;
private static string DbName;
private ContextMap()
{
if (string.IsNullOrWhiteSpace(DbName))
throw new InvalidOperationException("Setup must be called before accessing ContextMap");
ContextDict = new Dictionary<int, IArbContext2>();
}
protected internal static void Setup(IContextMap map)
{
_contextMap = map;
}
public static void Setup(string dbName)
{
DbName = dbName;
}
public static IContextMap GetInstance()
{
return _contextMap ?? (_contextMap = new ContextMap());
}
public IArbContext2 GetOrCreateContext()
{
var threadId = Thread.CurrentThread.ManagedThreadId;
if(!ContextDict.ContainsKey(threadId))
ContextDict.Add(threadId,new ArbContext(DbName));
return ContextDict[threadId];
}
public void DestroyContext()
{
if (ContextDict.ContainsKey(Thread.CurrentThread.ManagedThreadId))
ContextDict.Remove(Thread.CurrentThread.ManagedThreadId);
}
Somehow the code is (very rarely but still happening) throwing a keynotfound exception in the GetOrCreateContext method. Is it possible for a thread to be sidetracked to a separate action (e.g. the overseeing thread forces it to do another action that causes the thread to call DestroyContext after it checked if the Dict had the key but before it returned it) and then to resume where it left off. I never specifically do this but I can't understand any other reason how this error is being thrown.
Thank You.

The problem here is that Dictionary is not thread-safe. There can be unexpected behaviour when multiple threads try to access it, even if they are all using unique keys, because creating or removing a key/value pair is not an atomic action.
The easiest fix would be to use a ConcurrentDictionary in its place for ContextDict

Answering your literal question, NOT attempting to solve your problem. (#BenAaronsom has already done that.)
No: You have a "race condition" when the result of some computation depends on the order in which two or more threads access the same variable. If there is only one thread running in the race, then no matter how many times you run it, the same thread will always win. If a single-threaded program gives a non-deterministic answer, then whatever the problem is, it's not a race condition.

C# usage of lock() statements and caching of data

I have a number of static List's in my application, which are used to store data from my database and are used when looking up information:
public static IList<string> Names;
I also have some methods to refresh this data from the database:
public static void GetNames()
{
SQLEngine sql = new SQLEngine(ConnectionString);
lock (Names)
{
Names = sql.GetDataTable("SELECT * FROM Names").ToList<string>();
}
}
I initially didnt have the lock() in place, however i noticed very occasionally, the requesting thread couldnt find the information in the list. Now, I am assuming that if the requesting thread tries to access the Names list, it cant until it has been fully updated.
Is this the correct methodology and usage of the lock() statement?
As a sidenote, i noticed on MSDN that one shouldnt use lock() on public variables. Could someone please elaborate in my particular scenario?

lock is only useful if all places intended to be synchronized also apply the lock. So every time you access Names you would be required to lock. At the moment, that only stops 2 threads swapping Names at the same time, which frankly isn't a problem here anyway, as reference swaps are atomic anyway.
Another problem; presumably Names starts off null? You can't lock a null. Equally, you shouldn't lock on something that may change reference. If you want to synchronize, a common approach is something like:
// do not use for your scenario - see below
private static readonly object lockObj = new object();
then lock(lockObj) instead of your data.
With regards to not locking things that are visible externally; yes. That is because some other code could randomly choose to lock on it, which could cause unexpected blocking, and quite possibly deadlocks.
The other big risk is that some of your code obtains the names, and then does a sort/add/remove/clear/etc - anything that mutates the data. Personally, I would be using a read-only list here. In fact, with a read-only list, all you have is a reference swap; since that is atomic, you don't need any locking:
public static IList<string> Names { get; private set; }
public static void UpdateNames() {
List<string> tmp = SomeSqlQuery();
Names = tmp.AsReadOnly();
}
And finally: public fields are very very rarely a good idea. Hence the property above. This will be inlined by the JIT, so it is not a penalty.

No, it's not correct since anyone can use the Names property directly.
public class SomeClass
{
private List<string> _names;
private object _namesLock = new object();
public IEnumerable<string> Names
{
get
{
if (_names == null)
{
lock (_namesLock )
{
if (_names == null)
_names = GetNames();
}
}
return _names;
}
}
public void UpdateNames()
{
lock (_namesLock)
GetNames();
}
private void GetNames()
{
SQLEngine sql = new SQLEngine(ConnectionString);
_names = sql.GetDataTable("SELECT * FROM Names").ToList<string>();
}
}
Try to avoid static methods. At least use a singleton.
The check, lock, check is faster than a lock, check since the write will only occur once.
Assigning a property on usage is called lazy loading.
The _namesLock is required since you can't lock on null.

From the oode you have shown, the first time GetNames() is called the Names property is null. I don't known what a lock on a null object would do. I would add a variable to lock on.
static object namesLock = new object();
Then in GetNames()
lock (namesLock)
{
if (Names == null)
Names = ...;
}
We do the if test inside of the lock() to stop race conditions. I'm assuming that the caller of GetNames() also does the same test.

Can(should?) Lazy<T> be used as a caching technique?

I'd like to use .NET's Lazy<T> class to implement thread safe caching. Suppose we had the following setup:
class Foo
{
Lazy<string> cachedAttribute;
Foo()
{
invalidateCache();
}
string initCache()
{
string returnVal = "";
//CALCULATE RETURNVAL HERE
return returnVal;
}
public String CachedAttr
{
get
{
return cachedAttribute.Value;
}
}
void invalidateCache()
{
cachedAttribute = new Lazy<string>(initCache, true);
}
}
My questions are:
Would this work at all?
How would the locking have to work?
I feel like I'm missing a lock somewhere near the invalidateCache, but for the life of me I can't figure out what it is.
I'm sure there's a problem with this somewhere, I just haven't figured out where.
[EDIT]
Ok, well it looks like I was right, there were things I hadn't thought about. If a thread sees an outdated cache it'd be a very bad thing, so it looks like "Lazy" is not safe enough. The Property is accessed a lot though, so I was engaging in pre-mature optimization in hopes that I could learn something and have a pattern to use in the future for thread-safe caching. I'll keep working on it.
P.S.: I decided to make the object thread-un-safe and have access to the object be carefully controlled instead.

Well, it's not thread-safe in that one thread could still see the old value after another thread sees the new value after invalidation - because the first thread could have not seen the change to cachedAttribute. In theory, that situation could perpetuate forever, although it's pretty unlikely :)
Using Lazy<T> as a cache of unchanging values seems like a better idea to me - more in line with how it was intended - but if you can cope with the possibility of using an old "invalidated" value for an arbitrarily long period in another thread, I think this would be okay.

cachedAttribute is a shared resource that needs to be protected from concurrent modification.
Protect it with a lock:
private readonly object gate = new object();
public string CachedAttr
{
get
{
Lazy<string> lazy;
lock (gate) // 1. Lock
{
lazy = this.cachedAttribute; // 2. Get current Lazy<string>
} // 3. Unlock
return lazy.Value // 4. Get value of Lazy<string>
// outside lock
}
}
void InvalidateCache()
{
lock (gate) // 1. Lock
{ // 2. Assign new Lazy<string>
cachedAttribute = new Lazy<string>(initCache, true);
} // 3. Unlock
}
or use Interlocked.Exchange:
void InvalidateCache()
{
Interlocked.Exchange(ref cachedAttribute, new Lazy<string>(initCache, true));
}
volatile might work as well in this scenario, but it makes my head hurt.

Using string as a lock to do thread synchronization

While i was looking at some legacy application code i noticed it is using a string object to do thread synchronization. I'm trying to resolve some thread contention issues in this program and was wondering if this could lead so some strange situations. Any thoughts ?
private static string mutex= "ABC";
internal static void Foo(Rpc rpc)
{
lock (mutex)
{
//do something
}
}

Strings like that (from the code) could be "interned". This means all instances of "ABC" point to the same object. Even across AppDomains you can point to the same object (thx Steven for the tip).
If you have a lot of string-mutexes, from different locations, but with the same text, they could all lock on the same object.
The intern pool conserves string storage. If you assign a literal string constant to several variables, each variable is set to reference the same constant in the intern pool instead of referencing several different instances of String that have identical values.
It's better to use:
private static readonly object mutex = new object();
Also, since your string is not const or readonly, you can change it. So (in theory) it is possible to lock on your mutex. Change mutex to another reference, and then enter a critical section because the lock uses another object/reference. Example:
private static string mutex = "1";
private static string mutex2 = "1"; // for 'lock' mutex2 and mutex are the same
private static void CriticalButFlawedMethod() {
lock(mutex) {
mutex += "."; // Hey, now mutex points to another reference/object
// You are free to re-enter
...
}
}

To answer your question (as some others already have), there are some potential problems with the code example you provided:
private static string mutex= "ABC";
The variable mutex is not immutable.
The string literal "ABC" will refer to the same interned object reference everywhere in your application.
In general, I would advise against locking on strings. However, there is a case I've ran into where it is useful to do this.
There have been occasions where I have maintained a dictionary of lock objects where the key is something unique about some data that I have. Here's a contrived example:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
ConcurrentDictionary<int, object> _locks = new ConcurrentDictionary<int, object>();
void DoSomething(SomeEntity entity)
{
var mutex = _locks.GetOrAdd(entity.Id, id => new object());
lock(mutex)
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The goal of code like this is to serialize concurrent invocations of DoSomething() within the context of the entity's Id. The downside is the dictionary. The more entities there are, the larger it gets. It's also just more code to read and think about.
I think .NET's string interning can simplify things:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
void DoSomething(SomeEntity entity)
{
lock(string.Intern("dee9e550-50b5-41ae-af70-f03797ff2a5d:" + entity.Id))
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The difference here is that I am relying on the string interning to give me the same object reference per entity id. This simplifies my code because I don't have to maintain the dictionary of mutex instances.
Notice the hard-coded UUID string that I'm using as a namespace. This is important if I choose to adopt the same approach of locking on strings in another area of my application.
Locking on strings can be a good idea or a bad idea depending on the circumstances and the attention that the developer gives to the details.

If you need to lock a string, you can create an object that pairs the string with an object that you can lock with.
class LockableString
{
public string _String;
public object MyLock; //Provide a lock to the data in.
public LockableString()
{
MyLock = new object();
}
}

My 2 cents:
ConcurrentDictionary is 1.5X faster than interned strings. I did a benchmark once.
To solve the "ever-growing dictionary" problem you can use a dictionary of semaphores instead of a dictionary of objects. AKA use ConcurrentDictionary<string, SemaphoreSlim> instead of <string, object>. Unlike the lock statements, Semaphores can track how many threads have locked on them. And once all the locks are released - you can remove it from the dictionary. See this question for solutions like that: Asynchronous locking based on a key
Semaphores are even better because you can even control the concurrency level. Like, instead of "limiting to one concurrent run" - you can "limit to 5 concurrent runs". Awesome free bonus isn't it? I had to code an email-service that needed to limit the number of concurrent connections to a server - this came very very handy.

I imagine that locking on interned strings could lead to memory bloat if the strings generated are many and are all unique. Another approach that should be more memory efficient and solve the immediate deadlock issue is
// Returns an Object to Lock with based on a string Value
private static readonly ConditionalWeakTable<string, object> _weakTable = new ConditionalWeakTable<string, object>();
public static object GetLock(string value)
{
if (value == null) throw new ArgumentNullException(nameof(value));
return _weakTable.GetOrCreateValue(value.ToLower());
}

Large static arrays are slowing down class load, need a better/faster lookup method

I have a class with a couple static arrays:
an int[] with 17,720 elements
a string[] with 17,720 elements
I noticed when I first access this class it takes almost 2 seconds to initialize, which causes a pause in the GUI that's accessing it.
Specifically, it's a lookup for Unicode character names. The first array is an index into the second array.
static readonly int[] NAME_INDEX = {
0x0000, 0x0001, 0x0005, 0x002C, 0x003B, ...
static readonly string[] NAMES = {
"Exclamation Mark", "Digit Three", "Semicolon", "Question Mark", ...
The following code is how the arrays are used (given a character code). [Note: This code isn't a performance problem]
int nameIndex = Array.BinarySearch<int>(NAME_INDEX, code);
if (nameIndex > 0)
{
return NAMES[nameIndex];
}
I guess I'm looking at other options on how to structure the data so that 1) The class is quickly loaded, and 2) I can quickly get the "name" for a given character code.
Should I not be storing all these thousands of elements in static arrays?
Update
Thanks for all the suggestions. I've tested out a Dictionary approach and the performance of adding all the entries seems to be really poor.
Here is some code with the Unicode data to test out Arrays vs Dictionaries
http://drop.io/fontspace/asset/fontspace-unicodesupport-zip
Solution Update
I tested out my original dual arrays (which are faster than both dictionary options) with a background thread to initialize and that helped performance a bit.
However, the real surprise is how well the binary files in resource streams works. It is the fastest solution discussed in this thread. Thanks everyone for your answers!

So a couple of observations. Binary Search is only going to work if your array is sorted, and from your above code snippet, it doesn't look to be sorted.
Since your primary goal is to find a specific name, your code is begging for a hash table. I would suggest using a Dictionary, it will give you O(1) (on average) lookup, without much more overhead than just having the arrays.
As for the load time, I agree with Andrey that the best way is going to be by using a separate thread. You are going to have some initialization overhead when using the amount of data you are using. Normal practice with GUIs is to use a separate thread for these activites so you don't lock up the UI.

First
A Dictionary<int, string> is going to perform far better than your duelling arrays will. Putting aside how this data gets into the arrays/Dictionary (hardcoded vs. read in from another location, like a resource file), this is still a better and more intuitive storage mechanism
Second
As others have suggested, do your loading in another thread. I'd use a helper function to help you deal with this. You could use an approach like this:
public class YourClass
{
private static Dictionary<int, string> characterLookup;
private static ManualResetEvent lookupCreated;
static YourClass()
{
lookupCreated = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(LoadLookup);
}
static void LoadLookup(object garbage)
{
// add your pairs by calling characterLookup.Add(...)
lookupCreated.Set();
}
public static string GetDescription(int code)
{
if (lookupCreated != null)
{
lookupCreated.WaitOne();
lookupCreated.Close();
lookupCreated = null;
}
string output;
if(!characterLookup.TryGetValue(code, out output)) output = null;
return output;
}
}
In your code, call GetDescription in order to translate your integer into the corresponding string. If the UI doesn't call this until later, then you should see a marked decrease in startup time. To be safe, though, I've included a ManualResetEvent that will cause any calls to GetDescription to block until the dictionary has been fully loaded.

"Should I not be storing all these thousands of elements in static arrays?"
A much better way would be to store your data as binary stream in resources in the assembly and then load from the resources. Will be some more programming overhead but therefore doesn't need any object initialization.
Basic idea would be (no real code):
// Load data (two streams):
indices = ResourceManager.GetStream ("indexData");
strings = ResourceManager.GetStream ("stringData");
// Retrieving an entry:
stringIndex = indices.GetIndexAtPosition (char);
string = strings.GetStringFromPosition (stringIndex);
If you want a really good solution (for even some more work) look into using memmapped data files.

Initialize your arrays in separate thread that will not lock the UI
http://msdn.microsoft.com/en-us/library/hz49h034.aspx

if you store the arrays in a file you could do a lazy load
public class Class1
{
const int CountOfEntries = 17700; //or what ever the count is
IEnumerable<KeyValuePair<int, string>> load()
{
using (var reader = File.OpenText("somefile"))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var pair = line.Split(',');
yield return new KeyValuePair<int, string>(int.Parse(pair[0]), pair[1]);
}
}
}
private static Dictionary<int, string> _lookup = new Dictionary<int, string>();
private static IEnumerator<KeyValuePair<int, string>> _loader = null;
private string LookUp(int index)
{
if (_lookup.Count < CountOfEntries && !_lookup.ContainsKey(index))
{
if(_loader == null)
{
_loader = load().GetEnumerator();
}
while(_loader.MoveNext())
{
var pair = _loader.Current;
_lookup.Add(pair.Key,pair.Value);
if (pair.Key == index)
{
return index;
}
}
}
string name;
if (_lookup.TryGetValue(index,out name))
{
return return name;
}
throw new KeyNotFoundException("The given index was not found");
}
}
the code expectes the file to have one pair on each line like so:
index0,name0
index1,name1
If the first index sought is at the end this will perform slower probably (due to IO mainly) but if the access is random the average case woul be reading half of the values the first time if the access is not random make sure to keep the most used in the top of the file
there are a few more issues to considere. The above code is not threadsafe for the load operation and to increase responsiveness of the rest of the code keep the loading in a background thread
hope this helps

What about using a dictionary instead of two arrays? You could initialize the dictionary asynchronously using a thread or thread pool. The lookup would be O(1) instead of O(log(n)) as well.
public static class Lookup
{
private static readonly ManualResetEvent m_Initialized = new ManualResetEvent(false);
private static readonly Dictionary<int, string> m_Dictionary = new Dictionary<int, string>();
public static Lookup()
{
// Start an asynchronous operation to intialize the dictionary.
// You could use ThreadPool.QueueUserWorkItem instead of creating a new thread.
Thread thread = new Thread(() => { Initialize(); });
thread.Start();
}
public static string Lookup(int code)
{
m_Initialized.WaitOne();
lock (m_Dictionary)
{
return m_Dictionary[code];
}
}
private static void Initialize()
{
lock (m_Dictionary)
{
m_Dictionary.Add(0x0000, "Exclamation Point");
// Keep adding items to the dictionary here.
}
m_Initialized.Set();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Is locking necessary in this ConcurrentDictionary caching scenario - c#

Related

Is it possible for a managed thread to have a Race Condition with itself

C# usage of lock() statements and caching of data

Can(should?) Lazy<T> be used as a caching technique?

Using string as a lock to do thread synchronization

Large static arrays are slowing down class load, need a better/faster lookup method

Categories

Resources