Adding an item to Microsoft.ApplicationServer.Caching.DataCache with pessimistic locking? - c#

I'm working on a caching layer on a web server on the serverside, using Azure Shared Caching, to reduce the amount of requests to the database and thus make stuff run faster (hopefully). What I'm getting stuck on is how the make the whole endevour thread safe. I don't seem to find a reliable and usable way to lock keys in the DataCache. What I'm missing is a way to preemtively lock a key before there's anything stored on it, so that I could add a value without risk of another thread trying to do the same thing at the same time.
I have been looking exclusively at pessimistic locking so far, since that's how thread safety makes the most sense to me, I want to be sure that the stuff I'm working on is locked.
I have understood that if I am to use pessimistic locking, I am responsible for only using the methods related to that. Mixing things would mess up the whole locking mechanisms (source: http://go4answers.webhost4life.com/Example/datacacheput-unlocking-key-77158.aspx).
So basicly I only have access to these methods:
value GetAndLock(key, out DataCacheLockHandle);
void PutAndUnlock(key, value, DataCacheLockHandle);
void Unlock(key, DataCacheLockHandle);
The trouble is, "GetAndLock" throws an exception if I try to get something that isn't already in the cache. At the same time, my only method for adding something to the cache is "PutAndUnlock", and that one can't be used unless I did a successful "GetAndUnlock".
In effect, it is impossible to add anything new to the cache, only thing that can be done is replacing things that are already there (which will be nothing).
So it seems to me that I am forced to use the optimistic "Put" in the case where "GetAndLock" throws the nothing there exception. According to what I've read, though, the optimistic "Put" destroys any existing lock achieved with "GetAndLock", so that would destroy the whole attempt at thread safety.
Example plan:
1. Try to GetAndLock
2. In case of nothing there exception:
- Put a dummy item on the key.
- GetAndLock again.
3. We have a lock, do computations, query database etc
4. PutAndUnlock the computed value
One of probably several ways it would screw up:
Thread1: Tries to GetAndLock, gets nothing there exception
Thread2: Tries to GetAndLock, gets nothing there exception
Thread1: Put a dummy item on the key
Thread1: GetAndLock again, lock achieved
Thread2: Put a dummy item on the key (destroying Thread1:s lock)
Thread2: GetAndLock again, lock achieved
Thread1: We think we have a lock, do computations, query database etc
Thread2: We have a lock, do computations, query database etc
Thread1: PutAndUnlock the computed value (will this throw an exception?)
Thread2: PutAndUnlock the computed value
Basicly the two threads could write different things to the same key at the same time, ignoring locks that they both think they have.
My only conclusion can be that the pessimistic locking of DataCache is feature incomplete and completely unusable. Am I missing something? Is there a way to solve this?
All I'm missing is a way to preemtively lock a key before there's anything stored on it.

Jonathan,
Have you considered this logic for adding things to the cache (please pardon my pseudo-code)?
public bool AddToCache(string key, object value) {
DataCache dc = _factory.GetDefaultCache();
object currentVal = dc.Get(key);
if (currentVal == null) {
dc.Put(key, value);
currentVal = dc.GetAndLock(key);
if (value == currentVal) {
//handle this rare occurrence + unlock.
return false;
} else {
dc.Unlock(key);
}
} else {
currentVal = dc.GetAndLock(key);
dc.PutAndUnlock (key, value);
}
return true;
}

Related

Is there a way to lock a concurrent dictionary from being used

I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.
I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.

Best way to prevent race conditions in a multi instance web environment?

Say you have an Action in ASP.NET MVC in a multi-instance environment that looks something like this*:
public void AddLolCat(int userId)
{
var user = _Db.Users.ById(userId);
user.LolCats.Add( new LolCat() );
user.LolCatCount = user.LolCats.Count();
_Db.SaveChanges();
}
When a user repeatedly presses a button or refreshes, race conditions will occur, making it possible that LolCatCount is not similar to the amount of LolCats.
Question
What is the common way to fix these issues? You could fix it client side in JavaScript, but that might not always be possible. I.e. when something happens on a page refresh, or because someone is screwing around in Fiddler.
I guess you have to make some kind of a network based lock?
Do you really have to suffer the extra latency per call?
Can you tell an Action that it is only allowed to be executed once per User?
Is there any common pattern already in place that you can use? Like a Filter or attribute?
Do you return early, or do you really lock the process?
When you return early, is there an 'established' response / response code I should return?
When you use a lock, how do you prevent thread starvation with (semi) long running processes?
* just a stupid example shown for brevity. Real world examples are a lot more complicated.
Answer 1: (The general approach)
If the data store supports transactions you could do the following:
using(var trans = new TransactionScope(.., ..Serializable..)) {
var user = _Db.Users.ById(userId);
user.LolCats.Add( new LolCat() );
user.LolCatCount = user.LolCats.Count();
_Db.SaveChanges();
trans.Complete();
}
this will lock the user record in the database making other requests wait until the transaction has been committed.
Answer 2: (Only possible with single process)
Enabling sessions and using session will cause implicit locking between requests from the same user (session).
Session["TRIGGER_LOCKING"] = true;
Answer 3: (Example specific)
Deduce the number of LolCats from the collection instead of keeping track of it in a separate field and thus avoid inconsistency issues.
Answers to your specific questsions:
I guess you have to make some kind of a network based lock?
yes, database locks are common
Do you really have to suffer the extra latency per call?
say what?
Can you tell an Action that it is only allowed to be executed once per User
You could implement an attribute that uses the implicit session locking or some custom variant of it but that won't work between processes.
Is there any common pattern already in place that you can use? Like a Filter or attribute?
Common practice is to use locks in the database to solve the multi instance issue. No filter or attribute that I know of.
Do you return early, or do you really lock the process?
Depends on your use case. Commonly you wait ("lock the process"). However if your database store supports the async/await pattern you would do something like
var user = await _Db.Users.ByIdAsync(userId);
this will free the thread to do other work while waiting for the lock.
When you return early, is there an 'established' response / response code I should return?
I don't think so, pick something that fits your use case.
When you use a lock, how do you prevent thread starvation with (semi) long running processes?
I guess you should consider using queues.
By "multi-instance" you're obviously referring to a web farm or maybe a web garden situation where just using a mutex or monitor isn't going to be sufficient to serialize requests.
So... do you you have just one database on the back end? Why not just use a database transaction?
It sounds like you probably don't want to force serialized access to this one section of code for all user id's, right? You want to serialize requests per user id?
It seems to me that the right thinking about this is to serialize access to the source data, which is the LolCats records in the database.
I do like the idea of disabling the button or link in the browser for the duration of a request, to prevent the user from hammering away on the button over and over again before previous requests finish processing and return. That seems like an easy enough step with a lot of benefit.
But I doubt that is enough to guarantee the serialized access you want to enforce.
You could also implement shared session state and implement some kind of a lock on a session-based object, but it would probably need to be a collection (of user id's) in order to enforce the serializable-per-user paradigm.
I'd vote for using a database transaction.
I suggest, and personally use mutex on this case.
I have write here : Mutex release issues in ASP.NET C# code , a class that handle mutex but you can make your own.
So base on the class from this answer your code will be look like:
public void AddLolCat(int userId)
{
// I add here some text in front of the number, because I see its an integer
// so its better to make it a little more complex to avoid conflicts
var gl = new MyNamedLock("SiteName." + userId.ToString());
try
{
//Enter lock
if (gl.enterLockWithTimeout())
{
var user = _Db.Users.ById(userId);
user.LolCats.Add( new LolCat() );
user.LolCatCount = user.LolCats.Count();
_Db.SaveChanges();
}
else
{
// log the error
throw new Exception("Failed to enter lock");
}
}
finally
{
//Leave lock
gl.leaveLock();
}
}
Here the lock is base on the user, so different users will not block each other.
About Session Lock
If you use the asp.net session on your call then you may win a free lock "ticket" from the session. The session is lock each call until the page is return.
Read about that on this q/a:
Web app blocked while processing another web app on sharing same session
Does ASP.NET Web Forms prevent a double click submission?
jQuery Ajax calls to web service seem to be synchronous
Well MVC is stateless meaning that you'll have to handle with yourself manually. From a purist perspective I would recommend preventing the multiple presses by using a client-side lock, although my preference is to disable the button and apply an appropriate CSSClass to demonstrate its disabled state. I guess my reasoning is we cannot fully determine the consumer of the action so while you provide the example of Fiddler, there is no way to truly determine whether multiple clicks are applicable or not.
However, if you wanted to pursue a server-side locking mechanism, this article provides an example storing the requester's information in the server-side cache and returns an appropriate response depending on the timeout / actions you would want to implement.
HTH
One possible solution is to avoid the redundancy which can lead to inconsistent data.
i.e. If LolCatCount can be determined at runtime, then determine it at runtime instead of persisting this redundant information.

Using locks on objects in a Dictionary gives KeyNotFoundException

I have some code which processes a number of response objects from my database in parallel (using AsParallel()). Each response has many components. The responses may share the same components. I do some modifications to the component data and save it to the db, so I need to prevent multiple threads working on the same component object at the same time.
I use locks to achieve this. I have a ConcurrentDictionary<int, object> to hold all the necessary lock objects. Like this:
private static ConcurrentDictionary<int, object> compLocks = new ConcurrentDictionary<int, object>();
var compIds = db.components.Select(c => c.component_id).ToList();
foreach (var compId in compIds)
{
compLocks[compId] = new object();
}
Then later on I do this:
responses.AsParallel().ForAll(r =>
{
... do some time consuming stuff with web services ...
// this is a *just in case* addition,
// in case a new component was added to
// the db since the dictionary was constructed
// NOTE: it did not have any effect, and I'm no longer
// using it as #Henk pointed out it is not thread-safe.
//if (compLocks[c.component_id] == null)
//{
// compLocks[c.component_id] = new object();
//}
componentList.AsParallel().ForAll(c =>
{
lock (compLocks[c.component_id])
{
... do some processing, save the db records ...
}
});
});
This seems to run perfectly fine but towards the end of program execution (it runs for several hours as there are lots of data) I get the following exception:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Concurrent.ConcurrentDictionary`2.get_Item(TKey key)
I am sure that the ConcurrentDictionary is being populated with every possible component ID.
I have 3 questions:
How is this exception even possible, and how do I fix it?
Do I need a ConcurrentDictionary for this?
Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Post-Answer Edit
To make it clear what the cause of all this was, it's that .AsParallel() doesn't enumerate the collection of responses. It's lazy-evaluated, meaning new responses (and therefore new components) can be added to the collection during run-time (from other processes). Enforcing a snap-shot with .ToList() before the .AsParallel() fixed the problem.
My code for adding component IDs to compLocks during run-time didn't remedy this problem is because it is not thread safe.
1) How is this exception even possible?
Apparently it is, but not from the posted code alone. It would happen if data is added to the db (would it be an option to capture responses with a ToList() beforehand?)
2) Do I need a ConcurrentDictionary for this?
Not with a fixed list, but when the solution involves add-when-missing then yes, you need a Concurrent collection.
3) Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Not totally sure. The locking looks OK but you will still do the processing of duplicates multiple times. Just not at the same time.
Reaction to the edit:
if (compLocks[c.component_id] == null)
{
compLocks[c.component_id] = new object();
}
this is not thread-safe. It is now possible that multiple lock objects are created for 1 component_id value. You need to use one of the GetOrAdd() methods.
But I would not expect this to give the exception you're getting, so it's probably not the direct problem.
I would start by replacing:
lock (compLocks[c.component_id])
{
...
}
by:
object compLock;
if (!compLocks.TryGetValue(c.component_id, out compLock)) Debug.Assert(false);
lock(compLock)
{
...
}
Then set it running and go and get a coffee. When the assert fails you'll be able to debug and get a better idea of what's happening.
As for your questions:
1.How is this exception even possible?
Without seeing the rest of your code, impossible to say.
2.Do I need a ConcurrentDictionary for this?
If you initialize the dictionary once from a single thread, then subsequently only ever read from the dictionary, then it doesn't need to be a ConcurrentDictionary.
3.Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Again, difficuly to say without seeing more code, but I don't see anything obviously wrong with the small sample of code you've posted. But threading is hard and it's quite possible there are race conditions elsewhere in your code.

Lock is being skipped

I have the following code which does some database work:
[WebMethod]
public void FastBulkAdd(int addmax){
Users[] uploaders = db.Users.Take(addmax).ToArray();
Parallel.ForEach(uploaders, item =>
{
Account account;
lock (this)
{
account = item.Account;
}
}
Where every user has 1 account, which is referenced on another table in by DB via a foreign key (I am certain each user has exactly 1 account). I have to lock that bit of code because multi-threaded database connections generate errors. When I run this setting addmax to 1 (allowing 1 thread to execute), it works just fine, but if addmax is greater than 1 and more than one thread executes, account will always be null, which generates an exception later on. It's almost like the lock is being skipped.
Update: I wasn't convinced that account would always be null, so I did the following:
int tries = 0;
while (account == null && tries < 100)
{
lock (this)
{
account = item.Account;
}
tries++;
}
And it worked. Not a very neat solution. I'd like to know the cause of the problem so that I can avoid this design hazard in the future.
item.Account does a DB lookup, right? Could you replace it with a bulk select for all of the uploaders' accounts at once? That way you only make one hit to the database to select and one hit to the database to bulk update later, and you don't care about synchronized database access (which costs lots of time with every extra hit, anyway)
instead of locking this create a private static object and lock it; you may refer to this thread.
Also, you need to verify that the item.Account is not null. Another problem is that, even you are locking the Account, it seems that you are using it later in this code. Which does not seem right, even you are locking, then it may change later in the section where you are saving it to database as it is not locked. Refer to the following sample;
lock (this)
{
account = item.Account;
}
DoSomeDatabaseOperation(account); // the account may change here when another thread is also operating.
Also you may debug parallel operations; refer to this msdn page.
You can use [MethodImpl(MethodImplOptions.Synchronized)]
For example
[MethodImpl(MethodImplOptions.Synchronized)]
[WebMethod]
public void FastBulkAdd(int addmax)
{
}
Refer the below link for more details.
http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.methodimploptions.aspx

Should I check whether particular key is present in Dictionary before accessing it?

Should I check whether particular key is present in Dictionary if I am sure it will be added in dictionary by the time I reach the code to access it?
There are two ways I can access the value in dictionary
checking ContainsKey method. If it returns true then I access using indexer [key] of dictionary object.
or
TryGetValue which will return true or false as well as return value through out parameter.
(2nd will perform better than 1st if I want to get value. Benchmark.)
However if I am sure that the function which is accessing global dictionary will surely have the key then should I still check using TryGetValue or without checking I should use indexer[].
Or I should never assume that and always check?
Use the indexer if the key is meant to be present - if it's not present, it will throw an appropriate exception, which is the right behaviour if the absence of the key indicates a bug.
If it's valid for the key not to be present, use TryGetValue instead and react accordingly.
(Also apply Marc's advice about accessing a shared dictionary safely.)
If the dictionary is global (static/shared), you should be synchronizing access to it (this is important; otherwise you can corrupt it).
Even if your thread is only reading data, it needs to respect the locks of other threads that might be editing it.
However; if you are sure that the item is there, the indexer should be fine:
Foo foo;
lock(syncLock) {
foo = data[key];
}
// use foo...
Otherwise, a useful pattern is to check and add in the same lock:
Foo foo;
lock(syncLock) {
if(!data.TryGetValue(key, out foo)) {
foo = new Foo(key);
data.Add(key, foo);
}
}
// use foo...
Here we only add the item if it wasn't there... but inside the same lock.
Always check. Never say never. I assume your application is not that performance critical that you will have to save the checking time.
TIP: If you decide not to check, at least use Debug.Assert( dict.ContainsKey( key ) ); This will only be compiled when in Debug mode, your release build will not contain it. That way you could at least have the check when debugging.
Still: if possible, just check it :-)
EDIT: There have been some misconceptions here. By "always check" I did not only mean using an if somewhere. Handling an exception properly was also included in this. So, to be more precise: never take anything for granted, expect the unexpected. Check by ContainsKey or handle the potential exception, but do SOMETHING in case the element is not contained.
Personally I'd check the key is there, regardless of whether or not you are SURE it is, some may say this check is superfluous and that dictionary will throw an exception which you can catch, but imho you should not rely on that exception, you should check yourself and then either throw your own exception which means something or a result object with a success flag and reason inside... the failure mechanism is really implementation dependant.
Surely the answer is "it all depends on the situation". You need to balance the risk that the key will be missing from the dictionary (low for small systems where there is limited access to the data, where you can rely on the order things are done, larger for larger systems, multiple programmers accessing the same data, especially with read/write/delete access, where threads are involved and order cannot be guaranteed or where data originates externally and reading can fail) with the impact of the risk (safety-critical systems, commercial releases or systems that a business will rely on compared with something made for fun, for a one-off job and/or for your use only) and with any requirements for speed, size and laziness.
If I were making a system to control railway signalling I would want to be safe against all possible and impossible errors, and safe from errors in the error-handling and so on (Murphy's 2nd law: "what can't go wrong will go wrong".) If I'm chucking stuff together for fun, even if size and speed are not an issue I will be MUCH more relaxed about stuff like this - I will want to get to the fun stuff.
Of course, sometimes this is the fun stuff in itself.
TryGetValue is the same code as indexing it by key, except the former returns a default value (for the out parameter) where the latter throws an exception. Use TryGetValue and you'll get consistent checks with absolutely no performance loss.
Edit: As Jon said, if you know it will always have the key, then you can index it and let it throw the appropriate exception. However, if you can provide better context information by throwing it yourself with a detailed message, that would be preferable.
There's 2 trains of thought on this from a performance point of view.
1) Avoid exceptions where possible, as exceptions are expensive - i.e. check before you try to retrieve a specific key from the dictionary, whether it exists or not. Better approach in my opinion if there's a fair chance it may not exist. This would prevent fairly common exceptions.
2) If you're confident the item will exist in there 99% of the time, then don't check for it's existence before accessing it. The 1% of times when it doesn't exist, an exception will be thrown but you've saved time for the other 99% of the time by not checking.
What I'm saying is, optimise for the majority if there is a clear one. If there is any real degree in uncertainty about an item existing, then check before retrieving.
If you know that the dictionary normally contains the key, you don't have to check for it before accessing it.
If something would be wrong and the dictionary doesn't contain the items that it should, you can let the dictionary throw the exception. The only reason for checking for the key first would be if you want to take care of this problem situation yourself without getting the exception. Letting the dictionary throw the exception and catch that is however a perfectly valid way of handling the situation.
I think Marc and Jon have it (as usual) pretty sown up. Since you also mention performance in your question it might be worth considering how you lock the dictionary.
The straightforward lock serialises all read access which may not be desirable if read is massively frequent and writes are relatively few. In that case using a ReaderWriterLockSlim might be better. The downside is the code is a little more complex and writes are slightly slower.

Categories