I would like to use net wisdom to clarify some moments regarding multi-threading in .net. There are a lot of stuff in the internet about it however I was not able to find a good answer to my question.
Let say we want to maintain a state of something in our class with safety for concurrent threads. Easy case is when state is int:
class Class1
{
volatile int state = 0;
public int State
{
get
{
return state;
}
}
public Action<int> StateUpdated;
public void UpdateState(int newState)
{
state = newState;
if (StateUpdated != null)
StateUpdated(newState);
}
}
'volatile' should be enough in this case. Whatever thread needs to get current state it can use 'State' property which will never be cached. Whatever thread wants to update state it can do it safely using 'UpdateState'.
However, what to do if state is a structure? Is a complete 'lock' the only way? Side question: can a variable still be cached inside the lock?
struct StateData
{
//some fields
}
class Class1
{
StateData state;
public StateData State
{
get
{
return state;
}
}
public Action<StateData> StateUpdated;
public void UpdateState(StateData newState)
{
state = newState;
if (StateUpdated != null)
StateUpdated(newState);
}
}
And eventually the main question: will this code be sufficient for managing a collection of state objects in multi-threading environment? Or there might be some hidden problems.
public struct StateData
{
//some fields
}
public delegate void StateChangedHandler(StateData oldState, StateData newState);
class Class1
{
ConcurrentDictionary<string, StateData> stateCollection = new ConcurrentDictionary<string, StateData>();
public StateData? GetState(string key)
{
StateData o;
if (stateCollection.TryGetValue(key, out o))
return o;
else
return null;
}
public StateChangedHandler StateUpdated;
void UpdateState(string key, StateData o)
{
StateData? prev = null;
stateCollection.AddOrUpdate(key, o,
(id, old) =>
{
prev = old;
return o;
}
);
if (prev != null && StateUpdated != null)
StateUpdated(prev.Value, o);
}
}
Thanks for your answers.
However, what to do if state is a structure? Is a complete 'lock' the only way?
The volatile keyword is applicable only to types that can be updated atomically, such as reference type variables, pointers, and primitive types.
However, note that volatile provides some guarantees besides just access to the marked variable. In particular, all writes to memory that occur before a write to a volatile-marked memory location will be seen by any code that reads from memory after reading from that same volatile-marked memory location.
In theory, this means you can use a volatile field to provide volatile-like behavior for other memory locations.
In practice though, there is still a problem: new writes to the same memory locations may or may not be visible, and of course not all writes can be completed atomically. So this use of volatile is really only good for other memory locations that could be marked volatile, and even then doesn't ensure you won't get newer values than the volatile-marked memory location would otherwise indicate.
Which is a long way of saying, you should just use lock and be done with it. :)
Side question: can a variable still be cached inside the lock?
I'm not entirely sure what you mean by this question. But in general: the compilers are permitted to make optimizations as long as those optimizations don't affect the observed behavior of the code in a single thread. If the type of caching you are thinking of would not violate this rule, it might be allowed. Otherwise it wouldn't be.
will this code be sufficient for managing a collection of state objects in multi-threading environment?
The code as posted seems fine, at least as far as ensuring that the dictionary is always providing coherent state values.
volatile ensures that you get the latest value of a variable from any thread. This works only with primitive types such as int, short etc. and only with value-types.
A volatile reference type will only ensure that you get the latest reference of the declaration and not the value of where it points to.
You should only use volatile for immutable data types.
Some thread-safe ways of juggling with collections are: using lock, Mutex, and one of the latest goodies of .NET 4.5 ConcurrentCollections (which unfortunately proved to be slower than the generic bag-lock combo).
If performance and freedom is what you want go with lock
If you are managing a very extensive application go with Mutex
Depending on how long (&how many) operations take place in a collection you could go with ConcurrentCollections
Here's a nice comparison between locks and thread-safe collections.
Related
I've seen many examples of the lock usage, and it's usually something like this:
private static readonly object obj = new object();
lock (obj)
{
// code here
}
Is it possible to lock based on a property of a class? I didn't want to lock globally for any calls to the method with the lock statement, I'd like to lock only if the object passed as argument had the same property value as another object which was being processed prior to that.
Is that possible? Does that make sense at all?
This is what I had in mind:
public class GmailController : Controller
{
private static readonly ConcurrentQueue<PushRequest> queue = new ConcurrentQueue<PushRequest>();
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
var existingPush = queue.FirstOrDefault(q => q.Matches(push));
if (existingPush == null)
{
queue.Enqueue(push);
existingPush = push;
}
try
{
// lock if there is an existing push in the
// queue that matches the requested one
lock (existingPush)
{
// process the push notification
}
}
finally
{
queue.TryDequeue(out existingPush);
}
}
}
Background: I have an API where I receive push notifications from Gmail's API when our users send/receive emails. However, if someone sends a message to two users at the same time, I get two push notifications. My first idea was querying the database before inserting (based on subject, sender, etc). In some rare cases, the query of the second call is made before the SaveChanges of the previous call, so I end up having duplicates.
I know that if I ever wanted to scale out, lock would become useless. I also know I could just create a job to check recent entries and eliminate duplicates, but I was trying something different. Any suggestions are welcome.
Let me first make sure I understand the proposal. The problem given is that we have some resource shared to multiple threads, call it database, and it admits two operations: Read(Context) and Write(Context). The proposal is to have lock granularity based on a property of the context. That is:
void MyRead(Context c)
{
lock(c.P) { database.Read(c); }
}
void MyWrite(Context c)
{
lock(c.P) { database.Write(c); }
}
So now if we have a call to MyRead where the context property has value X, and a call to MyWrite where the context property has value Y, and the two calls are racing on two different threads, they are not serialized. However, if we have, say, two calls to MyWrite and a call to MyRead, and in all of them the context property has value Z, those calls are serialized.
Is this possible? Yes. That doesn't make it a good idea. As implemented above, this is a bad idea and you shouldn't do it.
It is instructive to learn why it is a bad idea.
First, this simply fails if the property is a value type, like an integer. You might think, well, my context is an ID number, that's an integer, and I want to serialize all accesses to the database using ID number 123, and serialize all accesses using ID number 345, but not serialize those accesses with respect to each other. Locks only work on reference types, and boxing a value type always gives you a freshly allocated box, so the lock would never be contested even if the ids were the same. It would be completely broken.
Second, it fails badly if the property is a string. Locks are logically "compared" by reference, not by value. With boxed integers, you always get different references. With strings, you sometimes get different references! (Because of interning being applied inconsistently.) You could be in a situation where you are locking on "ABC" and sometimes another lock on "ABC" waits, and sometimes it does not!
But the fundamental rule that is broken is: you must never lock on an object unless that object has been specifically designed to be a lock object, and the same code which controls access to the locked resource controls access to the lock object.
The problem here is not "local" to the lock but rather global. Suppose your property is a Frob where Frob is a reference type. You don't know if any other code in your process is also locking on that same Frob, and therefore you don't know what lock ordering constraints are necessary to prevent deadlocks. Whether a program deadlocks or not is a global property of a program. Just like you can build a hollow house out of solid bricks, you can build a deadlocking program out of a collection of locks that are individually correct. By ensuring that every lock is only taken out on a private object that you control, you ensure that no one else is ever locking on one of your objects, and therefore the analysis of whether your program contains a deadlock becomes simpler.
Note that I said "simpler" and not "simple". It reduces it to almost impossible to get correct, from literally impossible to get correct.
So if you were hell bent on doing this, what would be the right way to do it?
The right way would be to implement a new service: a lock object provider. LockProvider<T> needs to be able to hash and compare for equality two Ts. The service it provides is: you tell it that you want a lock object for a particular value of T, and it gives you back the canonical lock object for that T. When you're done, you say you're done. The provider keeps a reference count of how many times it has handed out a lock object and how many times it got it back, and deletes it from its dictionary when the count goes to zero, so that we don't have a memory leak.
Obviously the lock provider needs to be threadsafe and needs to be extremely low contention, because it is a mechanism designed to prevent contention, so it had better not cause any! If this is the road you intend to go down, you need to get an expert on C# threading to design and implement this object. It is very easy to get this wrong. As I have noted in comments to your post, you are attempting to use a concurrent queue as a sort of poor lock provider and it is a mass of race condition bugs.
This is some of the hardest code to get correct in all of .NET programming. I have been a .NET programmer for almost 20 years and implemented parts of the compiler and I do not consider myself competent to get this stuff right. Seek the help of an actual expert.
Although I find Eric Lippert's answer fantastic and marked it as the correct one (and I won't change that), his thoughts made me think and I wanted to share an alternative solution I found to this problem (and I'd appreciate any feedbacks), even though I'm not going to use it as I ended up using Azure functions with my code (so this wouldn't make sense), and a cron job to detected and eliminate possible duplicates.
public class UserScopeLocker : IDisposable
{
private static readonly object _obj = new object();
private static ICollection<string> UserQueue = new HashSet<string>();
private readonly string _userId;
protected UserScopeLocker(string userId)
{
this._userId = userId;
}
public static UserScopeLocker Acquire(string userId)
{
while (true)
{
lock (_obj)
{
if (UserQueue.Contains(userId))
{
continue;
}
UserQueue.Add(userId);
return new UserScopeLocker(userId);
}
}
}
public void Dispose()
{
lock (_obj)
{
UserQueue.Remove(this._userId);
}
}
}
...then you would use it like this:
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
using(var scope = UserScopeLocker.Acquire(push.UserId))
{
// process the push notification
// two threads can't enter here for the same UserId
// the second one will be blocked until the first disposes
}
}
The idea is:
UserScopeLocker has a protected constructor, ensuring you call Acquire.
_obj is private static readonly, only the UserScopeLocker can lock this object.
_userId is a private readonly field, ensuring even its own class can't change its value.
lock is done when checking, adding and removing, so two threads can't compete on these actions.
Possible flaws I detected:
Since UserScopeLocker relies on IDisposable to release some UserId, I can't guarantee the caller will properly use using statement (or manually dispose the scope object).
I can't guarantee the scope won't be used in a recursive function (thus possibly causing a deadlock).
I can't guarantee the code inside the using statement won't call another function which also tries to acquire a scope to the user (this would also cause a deadlock).
Take the following as an example
public class MyClass
{
private MyEnum _sharedEnumVal { get; set; }
}
If methods within MyClass ran on different threads and read/updated _sharedEnumVal, am I right in saying that a lock, or other mechanism, would be required to keep the variable thread safe like other primitives or are enums special?
Thanks
Thread-safety is a tricky subject. The updates to the enum are always atomic. So even if thousands of threads try to update the same enum at once, you will never get an invalid, half-updated enum value. The value itself will always be valid. But even when you update the enum it is never guaranteed that other threads would read the "latest" value due to cache-incoherency between multiple cores. To ensure that all cores are synchronized you would need a memory barrier.
But even that is not the guarantee of thread-safety because data races can still happen. Say you have this logic somewhere in your class:
public void DoSomething()
{
if (_sharedEnumVal == MyEnum.First) {
DoPrettyThings();
} else {
DoUglyThings();
}
}
public void UpdateValue(MyEnum newValue)
{
_sharedEnumVal = newValue;
}
and you have these two different threads:
static MyClass threadSafeClass = new MyClass();
void ThreadOne()
{
while (true)
{
threadSafeClass.UpdateValue(MyEnum.Second);
DoSomething();
}
}
void ThreadTwo()
{
while (true)
{
threadSafeClass.UpdateValue(MyEnum.First);
DoSomething();
}
}
Here, although the updates to the enum are atomic, two threads will be "racing" to change and use enum value to their own purposes and when DoSomething is called, there is no guarantee what value the enum would have. You would get completely unexpected results. ThreadTwo might cause pretty things and ThreadOne would cause ugly things to happen, the exact opposite of what's expected.
In that case you would still need locking to ensure thread-safety of the class behavior.
I failed to understand, why this topic was downvoted:).
There are some good points and some bad ideas and some even upvoted here!
So let's sort the bits.
The question here is actually about atomicity.
If the operation is atomic, then it is inherently thread-safe without locking for some operations like read/write and other operations allowed thanks to Interlocked class for given type.
Now, .Net is stating, that int read/write is atomic. Same for all types that fit into 32bit's, 64bit types are not atomic! read/write of the object reference is atomic too.
Some operations are atomic, some not, like increment, unless you are calling Interlocked.Increment.
Now why I talk about int? Well by default, enum is of type int, 32bit, unless explicitly specified otherwise.
That means, that reading/writing is atomic => thread-safe.
Btw, it is usually a bad idea to keep a naked property, I would rather use variable behind the property and play with the variable because it is necessary to use Interlocked methods.
There are many useful ways where atomicity is good enough guarantee to work with without locking. For example background thread status. Or a property that allowing background workers to work, until it is changed to some expected value, providing info for background workers to stop etc.
Also, Interlocked class is extending these scenarios for shared iterating variable and many more.
As Chris Hannon noted, the simple read/write can lead to the stale as data won't be updated unless specifically read/write operations would be decorated by memory barrier or Interlocked operations would be used, Interlocked.Add for reading, interlocked.CompareExchange for writing, where caches will be updated.
Thanks to Chris for good point I missed!
I posted an earlier question about returning collections, and the topic of thread safety came up. I was given this link to do some more reading, and I found this particular line:
In general, avoid locking on a public type, or instances beyond your
code's control.
First, correct me if I'm wrong, but doesn't the example that Microsoft give lock on a public type, the balance variable?
Secondly, how would I go about locking my own getter/setter property. Suppose I have the following class:
private int ID;
public Person(int id)
{
this.Identification= id;
}
public int Identification
{
get { return this.ID; }
private set
{
if (value == 0)
{
throw new ArgumentNullException("Must Include ID#");
}
this.ID = value;
}
}
The getter is public correct? Only the setter is declared private. So, how would I lock, or make my getter/setter properties thread safe?
you should define a variable in Person class like this
private readonly object _lock_ = new Object();
if you want to make synchronization over all instances of Person you should make it static.
then when you want to lock you can do it like this
lock(_lock_) //whose there? it's me, I kill you! oops sorry that was knock knock
{
//do what you want
}
I suggest you to read this article: 1
When you need to lock on a variable, you need to lock around every place where the variable is used. A lock is not for a variable - it's for a region of code where a variable is used.
It doesn't matter if you 'only read' in one place - if you need locking for a variable, you need it everywhere where that variable is used.
An alternative to lock is the Interlocked class - this uses processor-level primitives for locking that's a bit faster. Interlocked, however cannot protect multiple statements (and having 2 Interlocked stataments is not the same as having those 2 statements inside a single lock).
When you lock, you must lock on an instance of a reference type (which, in most cases (but not always), should also be a static instance). This is to ensure that all locks are actually taken out on the same instance, not a copy of it. Obviously, if you're using a copy in different places, you're not locking the same thing so your code won't be correctly serialized.
For example:
private static readonly object m_oLock = new object ();
...
lock ( m_oLock )
{
...
}
Whether it's safe to use a non-static lock requires detailed analysis of the code - in some situations, it leads to more parallelism because the same region of code is locked less but the analysis of it could be very tricky - if you're unsure, just use a static lock object. The cost of taking an open lock is minimal but incorrect analysis may lead to errors that take ages to debug.
Edit:
Here's an example showing how to lock property access:
private int ID; // do NOT lock on value type instances
private static readonly object Lock = new object ();
public Person(int id)
{
this.Identification = id;
}
public int Identification
{
get
{
lock ( Lock )
{
return this.ID;
}
}
private set
{
if (value == 0)
throw new ArgumentNullException("Must Include ID#");
lock ( Lock )
{
this.ID = value;
}
}
}
Since your property only does a trivial get/set operation, you can try using Interlocked.CompareExchange instead of a full lock - it will make things slightly faster. Keep in mind, though, that an interlocked operation is not the same as a lock.
Edit 2:
Just one more thing: a trivial get / set on an int doesn't need a lock - both reading and writing a 32-bit value (in and of itself) is already atomic. So this example is too simple - as long as you're not trying to use ID in multiple operations that should be completed in an atomic fashion, the lock is not needed. However, if your real code is actually more complicated with ID being checked and set, you may need locking and you'll need to lock around all the operations that make up the atomic operation. This means that you may have to pull the lock out of the getter / setter - 2 locks on a get/set pair of a variable is not the same as a single lock around them.
The answer to your first question about the Microsoft article:
No. The article doesn't lock on the balance variable. It locks on the private thisLock variable. So the example is good.
Secondly, based on the code you have posted, you don't need to add any locking to make your class thread safe, because your data is immutable. Once you create an instance of Person and set the value for the Identification property from within the constructor, your class design doesn't allow for that property to change again. That's immutability, and that in itself provides thread safety. So you don't need to bother with adding locks and such. Again, assuming your code sample is accurate.
EDIT:
This link may be useful to you.
One of my classes has a property of type Guid. This property can read and written simultaneously by more than one thread. I'm under the impression that reads and writes to a Guid are NOT atomic, therefore I should lock them.
I've chosen to do it like this:
public Guid TestKey
{
get
{
lock (_testKeyLock)
{
return _testKey;
}
}
set
{
lock (_testKeyLock)
{
_testKey = value;
}
}
}
(Inside my class, all access to the Guid is also done through that property rather than accessing _testKey directly.)
I have two questions:
(1) Is it really necessary to lock the Guid like that to prevent torn reads? (I'm pretty sure it is.)
(2) Is that a reasonable way to do the locking? Or do I need to do it like the following:
get
{
Guid result;
lock (_testKeyLock)
{
result = _testKey;
}
return result;
}
[EDIT] This article does confirm that Guids will suffer from torn reads: http://msdn.microsoft.com/en-us/magazine/jj863136.aspx
1: yes; to protect from torn values if you have one thread reading and one writing; Guid is not guaranteed to be atomic
2: "like the following": they are effectively the same; at the IL level you cannot ret from a try/catch block, so the compiler implements your first example by introducing a local variable, just like in your second example.
Another approach might be to box it; a reference is atomic:
object boxedTestKey;
public Guid TestKey
{
get { return (Guid)boxedTestKey; }
set { boxedTestKey = value; }
}
No locking required, but a small overhead from the box.
1) Is it really necessary to lock the Guid like that to prevent torn reads? (I'm pretty sure it is.)
Yes it is.
2) Is that a reasonable way to do the locking?
Again: Yes.
If there had existed an Interlocked method for Guid then that would have been better (faster).
For double (another non-atomic struct) there is support from Interlocked and for references it is not needed.
So as a pattern this is only required for larger structs that are not supported by Interlocked.
In multi-threaded code, when an instance may be read or written by multiple threads, they need to be locked on to perform these operations safely.
To avoid the repetition of creating an object to lock on and writing a bunch of lock statements through code, I've created a generic class to handle the locking.
Am I missing anything, conceptually? This should work, right?
public class Locked<T> where T : new()
{
private readonly object locker = new object();
private T value;
public Locked()
: this(default(T))
{ }
public Locked(T value)
{
this.value = value;
}
public T Get()
{
lock (this.locker)
{
return this.value;
}
}
public void Set(T value)
{
lock (this.locker)
{
this.value = value;
}
}
}
And an example of it being used in a class:
private Locked<bool> stopWorkerThread = new Locked<bool>();
public void WorkerThreadEntryPoint()
{
while (true)
{
if (this.stopWorkerThread.Get())
{
break;
}
Also, how would I test something like this, in an automated way (e.g. create a unit test)?
Lastly, what can I do to implement a ++ and -- operator, to avoid this:
this.runningThreads.Set(this.runningThreads.Get() + 1);
That only locks for the duration of the get/set; of course, in many common cases this will be atomic anyway, simply due to to data size.
However, in reality most locks need to span more than this, in the same way that collections locking over just the Add etc don't help much - a caller typically needs a single lock to span the "is it there? if so update, else add" sequence.
For something as simple as a bool, "volatile" might solve the problem a lot more simply - especially if it is just for a loop exit.
You might also want to consider [MethodImpl(MethodImplOptions.Synchronized)] - although personally I prefer a private lock object (like you have used) to prevent issues with external people locking on the object (the above uses "this" as the lock).
For unit testing this, you'd need something to prove it broken first - which would be hard since the operations are so small (and already atomic for most data types). One of the other things it avoids (that volatile also fixes) is caching in a register, but again that is an optimisation and hard to force to prove it is broken.
If you are interested in a lock-wrapper, you might consider existing code like this.
Your code above has quite a few potential and real multi-threading issues, and I wouldn't use something like it in a real-world situation. For example:
this.runningThreads.Set(this.runningThreads.Get() + 1);
There is a pretty obvious race condition here. When the Get() call returns, the object is no longer locked. To do a real post or pre increment, the counter would need to be locked from before the Get to after the Set.
Also you don't always need to do a full lock if all you are doing is synchronous reads.
A better lock interface would (I think) require you to explicitly lock the instance where you need to do it. My experience is mainly with C++ so I can't recommend a full implementation, but my preferred syntax might look something like this:
using (Locked<T> lock = Locked<T>(instance))
{
// write value
instance++;
}
// read value
print instance;