I've seen many examples of the lock usage, and it's usually something like this:
private static readonly object obj = new object();
lock (obj)
{
// code here
}
Is it possible to lock based on a property of a class? I didn't want to lock globally for any calls to the method with the lock statement, I'd like to lock only if the object passed as argument had the same property value as another object which was being processed prior to that.
Is that possible? Does that make sense at all?
This is what I had in mind:
public class GmailController : Controller
{
private static readonly ConcurrentQueue<PushRequest> queue = new ConcurrentQueue<PushRequest>();
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
var existingPush = queue.FirstOrDefault(q => q.Matches(push));
if (existingPush == null)
{
queue.Enqueue(push);
existingPush = push;
}
try
{
// lock if there is an existing push in the
// queue that matches the requested one
lock (existingPush)
{
// process the push notification
}
}
finally
{
queue.TryDequeue(out existingPush);
}
}
}
Background: I have an API where I receive push notifications from Gmail's API when our users send/receive emails. However, if someone sends a message to two users at the same time, I get two push notifications. My first idea was querying the database before inserting (based on subject, sender, etc). In some rare cases, the query of the second call is made before the SaveChanges of the previous call, so I end up having duplicates.
I know that if I ever wanted to scale out, lock would become useless. I also know I could just create a job to check recent entries and eliminate duplicates, but I was trying something different. Any suggestions are welcome.
Let me first make sure I understand the proposal. The problem given is that we have some resource shared to multiple threads, call it database, and it admits two operations: Read(Context) and Write(Context). The proposal is to have lock granularity based on a property of the context. That is:
void MyRead(Context c)
{
lock(c.P) { database.Read(c); }
}
void MyWrite(Context c)
{
lock(c.P) { database.Write(c); }
}
So now if we have a call to MyRead where the context property has value X, and a call to MyWrite where the context property has value Y, and the two calls are racing on two different threads, they are not serialized. However, if we have, say, two calls to MyWrite and a call to MyRead, and in all of them the context property has value Z, those calls are serialized.
Is this possible? Yes. That doesn't make it a good idea. As implemented above, this is a bad idea and you shouldn't do it.
It is instructive to learn why it is a bad idea.
First, this simply fails if the property is a value type, like an integer. You might think, well, my context is an ID number, that's an integer, and I want to serialize all accesses to the database using ID number 123, and serialize all accesses using ID number 345, but not serialize those accesses with respect to each other. Locks only work on reference types, and boxing a value type always gives you a freshly allocated box, so the lock would never be contested even if the ids were the same. It would be completely broken.
Second, it fails badly if the property is a string. Locks are logically "compared" by reference, not by value. With boxed integers, you always get different references. With strings, you sometimes get different references! (Because of interning being applied inconsistently.) You could be in a situation where you are locking on "ABC" and sometimes another lock on "ABC" waits, and sometimes it does not!
But the fundamental rule that is broken is: you must never lock on an object unless that object has been specifically designed to be a lock object, and the same code which controls access to the locked resource controls access to the lock object.
The problem here is not "local" to the lock but rather global. Suppose your property is a Frob where Frob is a reference type. You don't know if any other code in your process is also locking on that same Frob, and therefore you don't know what lock ordering constraints are necessary to prevent deadlocks. Whether a program deadlocks or not is a global property of a program. Just like you can build a hollow house out of solid bricks, you can build a deadlocking program out of a collection of locks that are individually correct. By ensuring that every lock is only taken out on a private object that you control, you ensure that no one else is ever locking on one of your objects, and therefore the analysis of whether your program contains a deadlock becomes simpler.
Note that I said "simpler" and not "simple". It reduces it to almost impossible to get correct, from literally impossible to get correct.
So if you were hell bent on doing this, what would be the right way to do it?
The right way would be to implement a new service: a lock object provider. LockProvider<T> needs to be able to hash and compare for equality two Ts. The service it provides is: you tell it that you want a lock object for a particular value of T, and it gives you back the canonical lock object for that T. When you're done, you say you're done. The provider keeps a reference count of how many times it has handed out a lock object and how many times it got it back, and deletes it from its dictionary when the count goes to zero, so that we don't have a memory leak.
Obviously the lock provider needs to be threadsafe and needs to be extremely low contention, because it is a mechanism designed to prevent contention, so it had better not cause any! If this is the road you intend to go down, you need to get an expert on C# threading to design and implement this object. It is very easy to get this wrong. As I have noted in comments to your post, you are attempting to use a concurrent queue as a sort of poor lock provider and it is a mass of race condition bugs.
This is some of the hardest code to get correct in all of .NET programming. I have been a .NET programmer for almost 20 years and implemented parts of the compiler and I do not consider myself competent to get this stuff right. Seek the help of an actual expert.
Although I find Eric Lippert's answer fantastic and marked it as the correct one (and I won't change that), his thoughts made me think and I wanted to share an alternative solution I found to this problem (and I'd appreciate any feedbacks), even though I'm not going to use it as I ended up using Azure functions with my code (so this wouldn't make sense), and a cron job to detected and eliminate possible duplicates.
public class UserScopeLocker : IDisposable
{
private static readonly object _obj = new object();
private static ICollection<string> UserQueue = new HashSet<string>();
private readonly string _userId;
protected UserScopeLocker(string userId)
{
this._userId = userId;
}
public static UserScopeLocker Acquire(string userId)
{
while (true)
{
lock (_obj)
{
if (UserQueue.Contains(userId))
{
continue;
}
UserQueue.Add(userId);
return new UserScopeLocker(userId);
}
}
}
public void Dispose()
{
lock (_obj)
{
UserQueue.Remove(this._userId);
}
}
}
...then you would use it like this:
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
using(var scope = UserScopeLocker.Acquire(push.UserId))
{
// process the push notification
// two threads can't enter here for the same UserId
// the second one will be blocked until the first disposes
}
}
The idea is:
UserScopeLocker has a protected constructor, ensuring you call Acquire.
_obj is private static readonly, only the UserScopeLocker can lock this object.
_userId is a private readonly field, ensuring even its own class can't change its value.
lock is done when checking, adding and removing, so two threads can't compete on these actions.
Possible flaws I detected:
Since UserScopeLocker relies on IDisposable to release some UserId, I can't guarantee the caller will properly use using statement (or manually dispose the scope object).
I can't guarantee the scope won't be used in a recursive function (thus possibly causing a deadlock).
I can't guarantee the code inside the using statement won't call another function which also tries to acquire a scope to the user (this would also cause a deadlock).
Related
I'm looking at some code that I don't understand the point of.
private object myProperty_lock = new Object();
private SomeType myProperty_backing;
public SomeType MyProperty
{
get { lock(myProperty_lock) { return myProperty_backing; } }
set { lock(myProperty_lock) { myProperty_backing = value; } }
}
This pattern is used many times within the same class.
Each time this pattern is used, there's a new lock object. (It's not a shared lock object for all properties.)
The types used are reference types and primitives. (No non-primitive structs.)
Does this code do anything? References & primitives are assigned atomically, so we don't need to protect against a thread switch in the middle of the assignment. The lock object isn't used anywhere else, so there's no protection there.
Is there something with memory barriers, perhaps? I had assumed that a lock inside a method didn't affect things outside of that method.
The fact that code is inside a method does not imply a memory barrier. So you may be on the right track for suspecting that the locks are for that fresh read memory guarantees.
Of course it also could have been added due to the person adding it was a cargo cult programmer and did not understand why to do it and only did it because he saw a code example that does it.
The problem I see here is that by using lock the developer indicates a concern regarding thread safety. They thought that concurrent threads might be accessing this property.
My first question would be whether that's actually the case - is there concurrent access to this property?
There might be a valid scenario, but is there a reason why any number of threads might be able to set that reference? What sort of logic is happening if one thread sets the property, presumably for some valid reason, only to have it immediately overwritten by another thread? How is the application doing something predictable? Did the reference set by the previous caller just not matter? Then why did it set the property?
And what about the object - SomeType - returned from the property? Now any number of threads can have a reference to the same instance. Can SomeType can be altered, and if so, is it thread safe?
I normally wouldn't wonder, but when I see something that looks odd with multithreading I like to dig a little deeper. Maybe they have it all patched together and it works, but sometimes they don't.
I have been learning about locking on threads and I have not found an explanation for why creating a typical System.Object, locking it and carrying out whatever actions are required during the lock provides the thread safety?
Example
object obj = new object()
lock (obj) {
//code here
}
At first I thought that it was just being used as a place holder in examples and meant to be swapped out with the Type you are dealing with. But I find examples such as Dennis Phillips points out, doesn't appear to be anything different than actually using an instance of Object.
So taking an example of needing to update a private dictionary, what does locking an instance of System.Object do to provide thread safety as opposed to actually locking the dictionary (I know locking the dictionary in this case could case synchronization issues)?
What if the dictionary was public?
//what if this was public?
private Dictionary<string, string> someDict = new Dictionary<string, string>();
var obj = new Object();
lock (obj) {
//do something with the dictionary
}
The lock itself provides no safety whatsoever for the Dictionary<TKey, TValue> type. What a lock does is essentially
For every use of lock(objInstance) only one thread will ever be in the body of the lock statement for a given object (objInstance)
If every use of a given Dictionary<TKey, TValue> instance occurs inside a lock. And every one of those lock uses the same object then you know that only one thread at a time is ever accessing / modifying the dictionary. This is critical to preventing multiple threads from reading and writing to it at the same time and corrupting its internal state.
There is one giant problem with this approach though: You have to make sure every use of the dictionary occurs inside a lock and it uses the same object. If you forget even one then you've created a potential race condition, there will be no compiler warnings and likely the bug will remain undiscovered for some time.
In the second sample you showed you're using a local object instance (var indicates a method local) as a lock parameter for an object field. This is almost certainly the wrong thing to do. The local will live only for the lifetime of the method. Hence 2 calls to the method will use lock on different locals and hence all methods will be able to simultaneously enter the lock.
It used to be common practice to lock on the shared data itself:
private Dictionary<string, string> someDict = new Dictionary<string, string>();
lock (someDict )
{
//do something with the dictionary
}
But the (somewhat theoretical) objection is that other code, outside of your control, could also lock on someDict and then you might have a deadlock.
So it is recommended to use a (very) private object, declared in 1-to-1 correspondence with the data, to use as a stand-in for the lock. As long as all code that accesses the dictionary locks on on obj the tread-safety is guaranteed.
// the following 2 lines belong together!!
private Dictionary<string, string> someDict = new Dictionary<string, string>();
private object obj = new Object();
// multiple code segments like this
lock (obj)
{
//do something with the dictionary
}
So the purpose of obj is to act as a proxy for the dictionary, and since its Type doesn't matter we use the simplest type, System.Object.
What if the dictionary was public?
Then all bets are off, any code could access the Dictionary and code outside the containing class is not even able to lock on the guard object. And before you start looking for fixes, that simply is not an sustainable pattern. Use a ConcurrentDictionary or keep a normal one private.
The object which is used for locking does not stand in relation to the objects that are modified during the lock. It could be anything, but should be private and no string, as public objects could be modified externally and strings could be used by two locks by mistake.
So far as I understand it, the use of a generic object is simply to have something to lock (as an internally lockable object). To better explain this; say you have two methods within a class, both access the Dictionary, but may be running on different threads. To prevent both methods from modifying the Dictionary at the same time (and potentially causing deadlock), you can lock some object to control the flow. This is better illustrated by the following example:
private readonly object mLock = new object();
public void FirstMethod()
{
while (/* Running some operations */)
{
// Get the lock
lock (mLock)
{
// Add to the dictionary
mSomeDictionary.Add("Key", "Value");
}
}
}
public void SecondMethod()
{
while (/* Running some operation */)
{
// Get the lock
lock (mLock)
{
// Remove from dictionary
mSomeDictionary.Remove("Key");
}
}
}
The use of the lock(...) statement in both methods on the same object prevents the two methods from accessing the resource at the same time.
The important rules for the object you lock on are:
It must be an object visible only to the code that needs to lock on it. This avoids other code also locking on it.
This rules out strings that could be interned, and Type objects.
This rules out this in most cases, and the exceptions are too few and offer little in exploiting, so just don't use this.
Note also that some cases internal to the framework lock on Types and this, so while "it's okay as long as nobody else does it" is true, but it's already too late.
It must be static to protect static static operations, it may be instance to protect instance operations (including those internal to a instance that is held in a static).
You don't want to lock on a value-type. If you really wanted too you could lock on a particular boxing of it, but I can't think of anything that this would gain beyond proving that it's technically possible - it's still going to lead to the code being less clear as to just what locks on what.
You don't want to lock on a field that you may change during the lock being held, as you'll no longer have the lock on what you appear to have the lock on (it's just about plausible that there's a practical use for the effect of this, but there's going to be an impedance between what the code appears to do at first read and what it really does, which is never good).
The same object must be used to lock on all operations that may conflict with each other.
While you can have correctness with overly-broad locks, you can get better performance with finer. E.g. if you had a lock that was protecting 6 operations, and realised that 2 of those operations couldn't interfere with the other 4, so you changed to having 2 lock objects, then you can gain by having better coherency (or crash-and-burn if you were wrong in that analysis!)
The first point rules out locking on anything that is either visible or which could be made visible (e.g. a private instance that is returned by a protected or public member should be considered public as far as this analysis goes, anything captured by a delegate could end up elsewhere, and so on).
The last two points can mean that there's no obvious "type you are dealing with" as you put it, because locks don't protect objects, the protect operations done on objects and you may either have more than one object affected, or the same object affected by more than one group of operations that must be locked.
Hence it can be good practice to have an object that exists purely to lock on. Since it's doing nothing else, it can't get mixed up with other semantics or written over when you don't expect. And since it does nothing else it may as well be the lightest reference type that exists in .NET; System.Object.
Personally, I do prefer to lock on an object related to an operation when it does clearly fit the bill of the "type you are dealing with", and none of the other concerns apply, as it seems to me to be quite self-documenting, but to others the risk of doing it wrong out-weighs that benefit.
In my app I have a List of objects. I'm going to have a process (thread) running every few minutes that will update the values in this list. I'll have other processes (other threads) that will just read this data, and they may attempt to do so at the same time.
When the list is being updated, I don't want any other process to be able to read the data. However, I don't want the read-only processes to block each other when no updating is occurring. Finally, if a process is reading the data, the process that updates the data must wait until the process reading the data is finished.
What sort of locking should I implement to achieve this?
This is what you are looking for.
ReaderWriterLockSlim is a class that will handle scenario that you have asked for.
You have 2 pair of functions at your disposal:
EnterWriteLock and ExitWriteLock
EnterReadLock and ExitReadLock
The first one will wait, till all other locks are off, both read and write, so it will give you access like lock() would do.
The second one is compatible with each other, you can have multiple read locks at any given time.
Because there's no syntactic sugar like with lock() statement, make sure you will never forget to Exit lock, because of Exception or anything else. So use it in form like this:
try
{
lock.EnterWriteLock(); //ReadLock
//Your code here, which can possibly throw an exception.
}
finally
{
lock.ExitWriteLock(); //ReadLock
}
You don't make it clear whether the updates to the list will involve modification of existing objects, or adding/removing new ones - the answers in each case are different.
To handling modification of existing items in the list, each object should handle it's own locking.
To allow modification of the list while others are iterating it, don't allow people direct access to the list - force them to work with a read/only copy of the list, like this:
public class Example()
{
public IEnumerable<X> GetReadOnlySnapshot()
{
lock (padLock)
{
return new ReadOnlyCollection<X>( MasterList );
}
}
private object padLock = new object();
}
Using a ReadOnlyCollection<X> to wrap the master list ensures that readers can iterate through a list of fixed content, without blocking modifications made by writers.
You could use ReaderWriterLockSlim. It would satisfy your requirements precisely. However, it is likely to be slower than just using a plain old lock. The reason is because RWLS is ~2x slower than lock and accessing a List would be so fast that it would not be enough to overcome the additional overhead of the RWLS. Test both ways, but it is likely ReaderWriterLockSlim will be slower in your case. Reader writer locks do better in scenarios were the number readers significantly outnumbers the writers and when the guarded operations are long and drawn out.
However, let me present another options for you. One common pattern for dealing with this type of problem is to use two separate lists. One will serve as the official copy which can accept updates and the other will serve as the read-only copy. After you update the official copy you must clone it and swap out the reference for the read-only copy. This is elegant in that the readers require no blocking whatsoever. The reason why readers do not require any blocking type of synchronization is because we are treating the read-only copy as if it were immutable. Here is how it can be done.
public class Example
{
private readonly List<object> m_Official;
private volatile List<object> m_Readonly;
public Example()
{
m_Official = new List<object>();
m_Readonly = m_Official;
}
public void Update()
{
lock (m_Official)
{
// Modify the official copy here.
m_Official.Add(...);
m_Official.Remove(...);
// Now clone the official copy.
var clone = new List<object>(m_Official);
// And finally swap out the read-only copy reference.
m_Readonly = clone;
}
}
public object Read(int index)
{
// It is safe to access the read-only copy here because it is immutable.
// m_Readonly must be marked as volatile for this to work correctly.
return m_Readonly[index];
}
}
The code above would not satisfy your requirements precisely because readers never block...ever. Which means they will still be taking place while writers are updating the official list. But, in a lot of scenarios this winds up being acceptable.
I have a class used to cache access to a database resource. It looks something like this:
//gets registered as a singleton
class DataCacher<T>
{
IDictionary<string, T> items = GetDataFromDb();
//Get is called all the time from zillions of threads
internal T Get(string key)
{
return items[key];
}
IDictionary<string, T> GetDataFromDb() { ...expensive slow SQL access... }
//this gets called every 5 minutes
internal void Reset()
{
items.Clear();
}
}
I've simplified this code somewhat, but the gist of it is that there is a potential concurrency issue, in that while the items are being cleared, if Get is called things may go awry.
Now I can just bung lock blocks into Get and Reset, but I'm worried that the locks on the Get will reduce performance of the site, as Get is called by every request thread in the web app many many times.
I can do something with a doubly checked locks I think, but I suspect there is a cleaner way to do this using something smarter than the lock{} block. What to do?
edit: Sorry all, I didn't make this explicit earlier, but the items.Clear() implementation I am using is not in fact a straight dictionary. Its a wrapper around a ResourceProvider which requires that the dictionary implementation calls .ReleaseAllResources() on each of the items as they get removed. This means that calling code doesn't want to run against an old version that is in the midst of disposal. Given this, is the Interlocked.Exchange method the correct one?
I would start by testing it with just a lock; locks are very cheap when not contested. However - a simpler scheme is to rely on the atomic nature of reference updates:
public void Clear() {
var tmp = GetDataFromDb(); // or new Dictionary<...> for an empty one
items = tmp; // this is atomic; subsequent get/set will use this one
}
You might also want to make items a volatile field, just to be sure it isn't held in the registers anywhere.
This still has the problem that anyone expecting there to be a given key may get disappointed (via an exception), but that is a separate issue.
The more granular option might be a ReaderWriterLockSlim.
One option is to completely replace the IDictionary instance instead of Clearing it. You can do this in a thread-safe way using the Exchange method on the Interlocked class.
See if the database will tell you what data has change. You could use
Trigger to write changes to a history table
Query Notifications (SqlServer and Oracle has these, other must do as well)
Etc
So you don’t have to reload all the data based on a timer.
Failing this.
I would make the Clear method create a new IDictionary by calling GetDataFromDB(), then once the data has been loaded set the “items” field to point to the new Dictionary. (The garbage collector will clean up the old dictionary once no threads are accessing it.)
I don’t think you care if some threads
get “old” results while reloading the
data – (if you do then you will just
have to block all threads on a lock –
painful!)
If you need all thread to swap over to the new dictionary at the same time, then you need to declare the “items” field to be volatile and use the Exchange method on the Interlocked class. However it is unlikely you need this in real life.
I'm using C# & .NEt 3.5. What is the difference between the OptionA and OptionB ?
class MyClass
{
private object m_Locker = new object();
private Dicionary<string, object> m_Hash = new Dictionary<string, object>();
public void OptionA()
{
lock(m_Locker){
// Do something with the dictionary
}
}
public void OptionB()
{
lock(m_Hash){
// Do something with the dictionary
}
}
}
I'm starting to dabble in threading (primarly for creating a cache for a multi-threaded app, NOT using the HttpCache class, since it's not attached to a web site), and I see the OptionA syntax in a lot of the examples I see online, but I don't understand what, if any, reason that is done over OptionB.
Option B uses the object to be protected to create a critical section. In some cases, this more clearly communicates the intent. If used consistently, it guarantees only one critical section for the protected object will be active at a time:
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
lock (m_Hash)
{
// Across all threads, I can be in one and only one of these two blocks
// Do something with the dictionary
}
Option A is less restrictive. It uses a secondary object to create a critical section for the object to be protected. If multiple secondary objects are used, it's possible to have more than one critical section for the protected object active at a time.
private object m_LockerA = new object();
private object m_LockerB = new object();
lock (m_LockerA)
{
// It's possible this block is active in one thread
// while the block below is active in another
// Do something with the dictionary
}
lock (m_LockerB)
{
// It's possible this block is active in one thread
// while the block above is active in another
// Do something with the dictionary
}
Option A is equivalent to Option B if you use only one secondary object. As far as reading code, Option B's intent is clearer. If you're protecting more than one object, Option B isn't really an option.
It's important to understand that lock(m_Hash) does NOT prevent other code from using the hash. It only prevents other code from running that is also using m_Hash as its locking object.
One reason to use option A is because classes are likely to have private variables that you will use inside the lock statement. It is much easier to just use one object which you use to lock access to all of them instead of trying to use finer grain locks to lock access to just the members you will need. If you try to go with the finer grained method you will probably have to take multiple locks in some situations and then you need to make sure you are always taking them in the same order to avoid deadlocks.
Another reason to use option A is because it is possible that the reference to m_Hash will be accessible outside your class. Perhaps you have a public property which supplies access to it, or maybe you declare it as protected and derived classes can use it. In either case once external code has a reference to it, it is possible that the external code will use it for a lock. This also opens up the possibility of deadlocks since you have no way to control or know what order the lock will be taken in.
Actually, it is not good idea to lock on object if you are using its members.
Jeffrey Richter wrote in his book "CLR via C#" that there is no guarantee that a class of object that you are using for synchronization will not use lock(this) in its implementation (It's interesting, but it was a recommended way for synchronization by Microsoft for some time... Then, they found that it was a mistake), so it is always a good idea to use a special separate object for synchronization. So, as you can see OptionB will not give you a guarantee of deadlock - safety.
So, OptionA is much safer that OptionB.
It's not what you're "Locking", its the code that's contained between the lock { ... } thats important and that you're preventing from being executed.
If one thread takes out a lock() on any object, it prevents other threads from obtaining a lock on the same object, and hence prevents the second thread from executing the code between the braces.
So that's why most people just create a junk object to lock on, it prevents other threads from obtaining a lock on that same junk object.
I think the scope of the variable you "pass" in will determine the scope of the lock.
i.e. An instance variable will be in respect of the instance of the class whereas a static variable will be for the whole AppDomain.
Looking at the implementation of the collections (using Reflector), the pattern seems to follow that an instance variable called SyncRoot is declared and used for all locking operations in respect of the instance of the collection.
Well, it depends on what you wanted to lock(be made threadsafe).
Normally I would choose OptionB to provide threadsafe access to m_Hash ONLY. Where as OptionA, I would used for locking value type, which can't be used with the lock, or I had a group of objects that need locking concurrently, but I don't what to lock the whole instance by using lock(this)
Locking the object that you're using is simply a matter of convenience. An external lock object can make things simpler, and is also needed if the shared resource is private, like with a collection (in which case you use the ICollection.SyncRoot object).
OptionA is the way to go here as long as in all your code, when accessing the m_hash you use the m_Locker to lock on it.
Now Imagine this case. You lock on the object. And that object in one of the functions you call has a lock(this) code segment. In this case that is a sure unrecoverable deadlock