I'm exploring the source code for a class that was mentioned(for educational purposes) but stuck in one place:
Each instance of a lock has a unique _lockID assigned.
There is an internal ReaderWriterCount helper defined to store thread specific data per lock.
There is a pool of records defined for a thread in the thread-static field:
[ThreadStatic]
private static ReaderWriterCount? t_rwc;
The GetThreadRWCount methods returns a first empty record from the pool or append a new one if needed. Record is updated with _lockId reference:
empty.lockID = _lockID;
return empty;
And one more utility method to check if the record is referencing the current lock:
private bool IsRwHashEntryChanged(ReaderWriterCount lrwc)
{
return lrwc.lockID != _lockID;
}
And, finally, the logic I'm struggling to understand:
_spinLock.Enter(EnterSpinLockReason.EnterAnyRead);
lrwc = GetThreadRWCount(dontAllocate: false)!;
// ********* some other code ********** //
_spinLock.Exit();
spinCount++;
SpinWait(spinCount);
_spinLock.Enter(EnterSpinLockReason.EnterAnyRead);
// The per-thread structure may have been recycled as the lock is acquired (due to message pumping), load again.
if (IsRwHashEntryChanged(lrwc))
lrwc = GetThreadRWCount(dontAllocate: false)!;
continue;
The question is why this lrwc re-check is needed. The fact that it's placed within the lock and re-validated after the lock is acquired again after pause signalizes that the record(its owner) could be changed in between, but that means the same thread must request an access to a different lock at the same. And even if that's a case(have no idea how it is possible) how the spin-lock, that is local to a current instance, can help to solve the issue. There seems to be some flow in my understanding of multithreading. I would be very grateful if you could help me figure it out.
Related
I'm working on an old and large WPF application. The customer reported a bug, which they were able to reproduce, but I can't. There is a class in the application that looks like this:
public static class PermissionProvider
{
private static Dictionary<string, bool> Permissions;
public static void Init()
{
Permissions = new Dictionary<string, bool>();
}
private static object _lock = new object();
public static bool HasPermission(string permission)
{
if (string.IsNullOrEmpty(permission)) return false;
lock (_lock)
{
if (Permissions.ContainsKey(permission)) return Permissions[permission];
var hasPermission = true; // Expensive call a third party module to check user permissions.
Permissions.Add(permission, hasPermission);
return hasPermission;
}
}
}
According to the log files provided by the customer, the line Permissions.Add(permission, hasPermission) threw an ArgumentException (key already exists). This doesn't make sense to me; the code checks for the key inside the same lock.
Based on a test run, all calls to HasPermission seem to be made from the main thread. The program uses Dispatcher.BeginInvoke at places, but my understanding is that locking is not even necessary for that. The dictionary is private and not accessed from anywhere else.
In what situation could this exception happen?
My first thought was that the customer was running an old version of the application, but it turns out that this class was only added in the latest one.
This particular exception should be easy enough to avoid by just changing the Permissions.Add(permission, hasPermission) to Permissions[permission] = hasPermission, but I would prefer to understand why it happened first.
It is possible, but hard to tell without the whole source code.
The expensive call
var hasPermission = true; // Expensive call a third party module to check user permissions.
could do something that calls HasPermission() again. Thus, the same thread would enter
lock (_lock) { ... }
again (which is allowed), possibly adding the the permissing, leaving the lock, leaving the method and returning into HasPermission() where it came from, adding the same key again.
This might either require production debugging at your customer. If you're not familiar with that and you can convince your customer to replace the affected DLL for a moment (let him create a backup copy), you could try the following:
lock (_lock)
{
var stack = Environment.StackTrace;
if (stack.Split(new []{nameof(HasPermission)}, StringSplitOptions.None).Length> 2) throw new Exception("I should not be in here twice");
...
}
This should crash the application (unless general catch block somewhere) with a call stack that has the affected method twice, thus you can analyze where the second call comes from. Do whatever you would do in such a case: generate a crash dump, analyze your logs, ...
Generating a stack trace is considerably expensive, so this will change timing and thus potentially make the problem disappear. A disappeared problem is not a fixed problem, though.
I agree with Thomas Weller that the most likely reason is that the same thread reenter the lock for some reason. But i wanted to suggest another approach to these kinds of problems.
Holding a lock while calling arbitrary code can be dangerous, it may lead to deadlocks and various other issues. To limit such risks it is a good idea to only hold locks for short sections of code, only call code you know is safe, and does not raise events or can run arbitrary code some other way.
One option would be to switch to a 'publication only' model for thread safety that releases the lock while calling the 'expensive method'. This might allow multiple threads to call the expensive method at the same time, and this might or might not be an issue in your particular case. Something like:
lock (_lock)
{
if (Permissions.ContainsKey(permission)) return Permissions[permission];
}
var hasPermission = true; // Expensive call a third party module to check user permissions.
lock (_lock)
{
if (Permissions.ContainsKey(permission)) return Permissions[permission];
Permissions.Add(permission, hasPermission);
return hasPermission;
}
Or use ConcurrentDictionary.GetOrAdd that does more or less the same thing.
I would also caution against mutable global state in general since this can also make code hard to read and predict.
As pointed out by JonasH in a comment, the Init method looks highly suspicious. Your program could crash if this method is not called exactly once. If you are not sure how many times it's called, at least protect the code it contains with the same lock.
public static void Init()
{
lock (_lock)
{
Permissions = new Dictionary<string, bool>();
}
}
I know that there are .NET collections which are thread safe which I could use, but I still want to understand the following situation:
I have a Buffer class (below) which is used to writer data from a different thread, in a update loop interval (game) the main thread handles the data by swaping the list instance (to prevent both threads acting on the same instance at the same time).
So there is only a single additional thread who uses the "writerData" list, everything else is done on the main thread, but im not sure if the Swap method is thread "safe", because after searching for a while everyone seems to have a different opinion about swapping reference fields.
Some say that swapping reference doesn't require any locking other say that Interlocked.Exchange must be used in this case and other say it's not required, other say that the field must be violate and other say the keyword is "evil".
I know that threading is a difficult topic and maybe the other questions were too broad, but can someone help me to understand if any/which kind of "locking" is required in my specific case in the Swap method?
public class Buffer
{
List<byte> readerData;
List<byte> writerData; // This is the only field which is used by the other thread (by calling the Add method), well and by the Swap method, which is called from the main thread
// This method is only called by the other thread and it's the only method which is called from there
public void Add(byte data)
{
writerData.Add(data);
}
// Called on the main thread, before handling the readerData
public void Swap()
{
var tmp = readerData;
readerData = writerData
writerData = tmp;
}
// ... some other methods (which are only called from the main thread) to get the data from the (current) readerData field after calling the Swap method
}
I've seen many examples of the lock usage, and it's usually something like this:
private static readonly object obj = new object();
lock (obj)
{
// code here
}
Is it possible to lock based on a property of a class? I didn't want to lock globally for any calls to the method with the lock statement, I'd like to lock only if the object passed as argument had the same property value as another object which was being processed prior to that.
Is that possible? Does that make sense at all?
This is what I had in mind:
public class GmailController : Controller
{
private static readonly ConcurrentQueue<PushRequest> queue = new ConcurrentQueue<PushRequest>();
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
var existingPush = queue.FirstOrDefault(q => q.Matches(push));
if (existingPush == null)
{
queue.Enqueue(push);
existingPush = push;
}
try
{
// lock if there is an existing push in the
// queue that matches the requested one
lock (existingPush)
{
// process the push notification
}
}
finally
{
queue.TryDequeue(out existingPush);
}
}
}
Background: I have an API where I receive push notifications from Gmail's API when our users send/receive emails. However, if someone sends a message to two users at the same time, I get two push notifications. My first idea was querying the database before inserting (based on subject, sender, etc). In some rare cases, the query of the second call is made before the SaveChanges of the previous call, so I end up having duplicates.
I know that if I ever wanted to scale out, lock would become useless. I also know I could just create a job to check recent entries and eliminate duplicates, but I was trying something different. Any suggestions are welcome.
Let me first make sure I understand the proposal. The problem given is that we have some resource shared to multiple threads, call it database, and it admits two operations: Read(Context) and Write(Context). The proposal is to have lock granularity based on a property of the context. That is:
void MyRead(Context c)
{
lock(c.P) { database.Read(c); }
}
void MyWrite(Context c)
{
lock(c.P) { database.Write(c); }
}
So now if we have a call to MyRead where the context property has value X, and a call to MyWrite where the context property has value Y, and the two calls are racing on two different threads, they are not serialized. However, if we have, say, two calls to MyWrite and a call to MyRead, and in all of them the context property has value Z, those calls are serialized.
Is this possible? Yes. That doesn't make it a good idea. As implemented above, this is a bad idea and you shouldn't do it.
It is instructive to learn why it is a bad idea.
First, this simply fails if the property is a value type, like an integer. You might think, well, my context is an ID number, that's an integer, and I want to serialize all accesses to the database using ID number 123, and serialize all accesses using ID number 345, but not serialize those accesses with respect to each other. Locks only work on reference types, and boxing a value type always gives you a freshly allocated box, so the lock would never be contested even if the ids were the same. It would be completely broken.
Second, it fails badly if the property is a string. Locks are logically "compared" by reference, not by value. With boxed integers, you always get different references. With strings, you sometimes get different references! (Because of interning being applied inconsistently.) You could be in a situation where you are locking on "ABC" and sometimes another lock on "ABC" waits, and sometimes it does not!
But the fundamental rule that is broken is: you must never lock on an object unless that object has been specifically designed to be a lock object, and the same code which controls access to the locked resource controls access to the lock object.
The problem here is not "local" to the lock but rather global. Suppose your property is a Frob where Frob is a reference type. You don't know if any other code in your process is also locking on that same Frob, and therefore you don't know what lock ordering constraints are necessary to prevent deadlocks. Whether a program deadlocks or not is a global property of a program. Just like you can build a hollow house out of solid bricks, you can build a deadlocking program out of a collection of locks that are individually correct. By ensuring that every lock is only taken out on a private object that you control, you ensure that no one else is ever locking on one of your objects, and therefore the analysis of whether your program contains a deadlock becomes simpler.
Note that I said "simpler" and not "simple". It reduces it to almost impossible to get correct, from literally impossible to get correct.
So if you were hell bent on doing this, what would be the right way to do it?
The right way would be to implement a new service: a lock object provider. LockProvider<T> needs to be able to hash and compare for equality two Ts. The service it provides is: you tell it that you want a lock object for a particular value of T, and it gives you back the canonical lock object for that T. When you're done, you say you're done. The provider keeps a reference count of how many times it has handed out a lock object and how many times it got it back, and deletes it from its dictionary when the count goes to zero, so that we don't have a memory leak.
Obviously the lock provider needs to be threadsafe and needs to be extremely low contention, because it is a mechanism designed to prevent contention, so it had better not cause any! If this is the road you intend to go down, you need to get an expert on C# threading to design and implement this object. It is very easy to get this wrong. As I have noted in comments to your post, you are attempting to use a concurrent queue as a sort of poor lock provider and it is a mass of race condition bugs.
This is some of the hardest code to get correct in all of .NET programming. I have been a .NET programmer for almost 20 years and implemented parts of the compiler and I do not consider myself competent to get this stuff right. Seek the help of an actual expert.
Although I find Eric Lippert's answer fantastic and marked it as the correct one (and I won't change that), his thoughts made me think and I wanted to share an alternative solution I found to this problem (and I'd appreciate any feedbacks), even though I'm not going to use it as I ended up using Azure functions with my code (so this wouldn't make sense), and a cron job to detected and eliminate possible duplicates.
public class UserScopeLocker : IDisposable
{
private static readonly object _obj = new object();
private static ICollection<string> UserQueue = new HashSet<string>();
private readonly string _userId;
protected UserScopeLocker(string userId)
{
this._userId = userId;
}
public static UserScopeLocker Acquire(string userId)
{
while (true)
{
lock (_obj)
{
if (UserQueue.Contains(userId))
{
continue;
}
UserQueue.Add(userId);
return new UserScopeLocker(userId);
}
}
}
public void Dispose()
{
lock (_obj)
{
UserQueue.Remove(this._userId);
}
}
}
...then you would use it like this:
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
using(var scope = UserScopeLocker.Acquire(push.UserId))
{
// process the push notification
// two threads can't enter here for the same UserId
// the second one will be blocked until the first disposes
}
}
The idea is:
UserScopeLocker has a protected constructor, ensuring you call Acquire.
_obj is private static readonly, only the UserScopeLocker can lock this object.
_userId is a private readonly field, ensuring even its own class can't change its value.
lock is done when checking, adding and removing, so two threads can't compete on these actions.
Possible flaws I detected:
Since UserScopeLocker relies on IDisposable to release some UserId, I can't guarantee the caller will properly use using statement (or manually dispose the scope object).
I can't guarantee the scope won't be used in a recursive function (thus possibly causing a deadlock).
I can't guarantee the code inside the using statement won't call another function which also tries to acquire a scope to the user (this would also cause a deadlock).
I saw a code where they have the data access layer like this:
public class CustomerDA{
private static readonly object _sync = new object();
private static readonly CustomerDA _mutex = new CustomerDA();
private CustomerDA(){
}
public CustomerDA GetInstance(){
lock(_sync){
return _mutex;
}
}
public DataSet GetCustomers(){
//database SELECT
//return a DataSet
}
public int UpdateCustomer(some parameters){
//update some user
}
}
public class CustomerBO{
public DataSet GetCustomers(){
//some bussiness logic
return CustomerDA.GetInstance().GetCustomers();
}
}
I was using it, but start thinking... "and what if had to build a facebook like application where there are hundreds of thousands of concurrent users? would I be blocking each user from doing his things until the previous user ends his database stuff? and for the Update method, is it useful to LOCK THREADS in the app when database engines already manage concurrency at database server level?"
Then I started to think about moving the lock to the GetCustomers and UpdateCustomer methods, but think again: "is it useful at all?"
Edit on January 03:
you're all right, I missed the "static" keyword in the "GetInstance" method.
Antoher thing: I was in the idea that no thread could access the _mutex variable if there was another thread working in the same data access class. I mean, I thought that since the _mutex variable is being returned from inside the lock statement, no thread could access the _mutex until the ";" was reached in the following sentence:
return CustomerDA.GetInstance().GetCustomer();
After doing some tracing, I realize I was making the wrong assumption. Could you please confirm that I was making the wrong assumption?
So... Can I say for sure that my Data Access layer does not need any lock statement (even on INSERT, UPDATE, DELETE) and that it does not matter if methods in my DataAccess are static or instance methods?
Thanks again... your comments are so useful to me
The lock in that code is completely pointless. It locks around code that returns a value that never changes, so there is no reason to have a lock there. The purpose of the lock in the code is to make the object a singleton, but as it's not using lazy initialisation, the lock is not needed at all.
Making the data access layer a singleton is a really bad idea, that means that only one thread at a time can access the database. It also means that the methods in the class have to use locks to make sure that only one thread at a time accesses the database, or the code won't work properly.
Instead, each thread should get their own instance of the data access layer, with their own connection to the database. That way the database takes care of the concurrency issues, and the theads doesn't have to do any locking at all.
Set your lock where it is needed, so where concurrent accesses happen. Put in only as much code inside lock/critical section as much really need.
That GetInstance shouldn't be static ?
the following pseudo code explains how GetInstance operates:
LOCK
rval = _mutex
UNLOCK
Return rval
_mutex is readonly, refers to a non-null object, so it can't be changed, why lock ?
If your database provides concurrency management, but in your program you create two thread writing the same data in the same time in your own domain while waiting for the data,
how could your database help ?
I have a class used to cache access to a database resource. It looks something like this:
//gets registered as a singleton
class DataCacher<T>
{
IDictionary<string, T> items = GetDataFromDb();
//Get is called all the time from zillions of threads
internal T Get(string key)
{
return items[key];
}
IDictionary<string, T> GetDataFromDb() { ...expensive slow SQL access... }
//this gets called every 5 minutes
internal void Reset()
{
items.Clear();
}
}
I've simplified this code somewhat, but the gist of it is that there is a potential concurrency issue, in that while the items are being cleared, if Get is called things may go awry.
Now I can just bung lock blocks into Get and Reset, but I'm worried that the locks on the Get will reduce performance of the site, as Get is called by every request thread in the web app many many times.
I can do something with a doubly checked locks I think, but I suspect there is a cleaner way to do this using something smarter than the lock{} block. What to do?
edit: Sorry all, I didn't make this explicit earlier, but the items.Clear() implementation I am using is not in fact a straight dictionary. Its a wrapper around a ResourceProvider which requires that the dictionary implementation calls .ReleaseAllResources() on each of the items as they get removed. This means that calling code doesn't want to run against an old version that is in the midst of disposal. Given this, is the Interlocked.Exchange method the correct one?
I would start by testing it with just a lock; locks are very cheap when not contested. However - a simpler scheme is to rely on the atomic nature of reference updates:
public void Clear() {
var tmp = GetDataFromDb(); // or new Dictionary<...> for an empty one
items = tmp; // this is atomic; subsequent get/set will use this one
}
You might also want to make items a volatile field, just to be sure it isn't held in the registers anywhere.
This still has the problem that anyone expecting there to be a given key may get disappointed (via an exception), but that is a separate issue.
The more granular option might be a ReaderWriterLockSlim.
One option is to completely replace the IDictionary instance instead of Clearing it. You can do this in a thread-safe way using the Exchange method on the Interlocked class.
See if the database will tell you what data has change. You could use
Trigger to write changes to a history table
Query Notifications (SqlServer and Oracle has these, other must do as well)
Etc
So you don’t have to reload all the data based on a timer.
Failing this.
I would make the Clear method create a new IDictionary by calling GetDataFromDB(), then once the data has been loaded set the “items” field to point to the new Dictionary. (The garbage collector will clean up the old dictionary once no threads are accessing it.)
I don’t think you care if some threads
get “old” results while reloading the
data – (if you do then you will just
have to block all threads on a lock –
painful!)
If you need all thread to swap over to the new dictionary at the same time, then you need to declare the “items” field to be volatile and use the Exchange method on the Interlocked class. However it is unlikely you need this in real life.