Better solution to multithreading riddle? - c#

Here's the task: I need to lock based on a filename. There can be up to a million different filenames. (This is used for large-scale disk-based caching).
I want low memory usage and low lookup times, which means I need a GC'd lock dictionary. (Only in-use locks can be present in the dict).
The callback action can take minutes to complete, so a global lock is unacceptable. High throughput is critical.
I've posted my current solution below, but I'm unhappy with the complexity.
EDIT: Please do not post solutions that are not 100% correct. For example, a solution which permits a lock to be removed from the dictionary between the 'get lock object' phase and the 'lock' phase is NOT correct, whether or not it is an 'accepted' design pattern or not.
Is there a more elegant solution than this?
Thanks!
[EDIT: I updated my code to use looping vs. recursion based on RobV's suggestion]
[EDIT: Updated the code again to allow 'timeouts' and a simpler calling pattern. This will probably be the final code I use. Still the same basic algorithm as in the original post.]
[EDIT: Updated code again to deal with exceptions inside callback without orphaning lock objects]
public delegate void LockCallback();
/// <summary>
/// Provides locking based on a string key.
/// Locks are local to the LockProvider instance.
/// The class handles disposing of unused locks. Generally used for
/// coordinating writes to files (of which there can be millions).
/// Only keeps key/lock pairs in memory which are in use.
/// Thread-safe.
/// </summary>
public class LockProvider {
/// <summary>
/// The only objects in this collection should be for open files.
/// </summary>
protected Dictionary<String, Object> locks =
new Dictionary<string, object>(StringComparer.Ordinal);
/// <summary>
/// Synchronization object for modifications to the 'locks' dictionary
/// </summary>
protected object createLock = new object();
/// <summary>
/// Attempts to execute the 'success' callback inside a lock based on 'key'. If successful, returns true.
/// If the lock cannot be acquired within 'timoutMs', returns false
/// In a worst-case scenario, it could take up to twice as long as 'timeoutMs' to return false.
/// </summary>
/// <param name="key"></param>
/// <param name="success"></param>
/// <param name="failure"></param>
/// <param name="timeoutMs"></param>
public bool TryExecute(string key, int timeoutMs, LockCallback success){
//Record when we started. We don't want an infinite loop.
DateTime startedAt = DateTime.UtcNow;
// Tracks whether the lock acquired is still correct
bool validLock = true;
// The lock corresponding to 'key'
object itemLock = null;
try {
//We have to loop until we get a valid lock and it stays valid until we lock it.
do {
// 1) Creation/aquire phase
lock (createLock) {
// We have to lock on dictionary writes, since otherwise
// two locks for the same file could be created and assigned
// at the same time. (i.e, between TryGetValue and the assignment)
if (!locks.TryGetValue(key, out itemLock))
locks[key] = itemLock = new Object(); //make a new lock!
}
// Loophole (part 1):
// Right here - this is where another thread (executing part 2) could remove 'itemLock'
// from the dictionary, and potentially, yet another thread could
// insert a new value for 'itemLock' into the dictionary... etc, etc..
// 2) Execute phase
if (System.Threading.Monitor.TryEnter(itemLock, timeoutMs)) {
try {
// May take minutes to acquire this lock.
// Trying to detect an occurence of loophole above
// Check that itemLock still exists and matches the dictionary
lock (createLock) {
object newLock = null;
validLock = locks.TryGetValue(key, out newLock);
validLock = validLock && newLock == itemLock;
}
// Only run the callback if the lock is valid
if (validLock) {
success(); // Extremely long-running callback, perhaps throwing exceptions
return true;
}
} finally {
System.Threading.Monitor.Exit(itemLock);//release lock
}
} else {
validLock = false; //So the finally clause doesn't try to clean up the lock, someone else will do that.
return false; //Someone else had the lock, they can clean it up.
}
//Are we out of time, still having an invalid lock?
if (!validLock && Math.Abs(DateTime.UtcNow.Subtract(startedAt).TotalMilliseconds) > timeoutMs) {
//We failed to get a valid lock in time.
return false;
}
// If we had an invalid lock, we have to try everything over again.
} while (!validLock);
} finally {
if (validLock) {
// Loophole (part 2). When loophole part 1 and 2 cross paths,
// An lock object may be removed before being used, and be orphaned
// 3) Cleanup phase - Attempt cleanup of lock objects so we don't
// have a *very* large and slow dictionary.
lock (createLock) {
// TryEnter() fails instead of waiting.
// A normal lock would cause a deadlock with phase 2.
// Specifying a timeout would add great and pointless overhead.
// Whoever has the lock will clean it up also.
if (System.Threading.Monitor.TryEnter(itemLock)) {
try {
// It succeeds, so no-one else is working on it
// (but may be preparing to, see loophole)
// Only remove the lock object if it
// still exists in the dictionary as-is
object existingLock = null;
if (locks.TryGetValue(key, out existingLock)
&& existingLock == itemLock)
locks.Remove(key);
} finally {
// Remove the lock
System.Threading.Monitor.Exit(itemLock);
}
}
}
}
}
// Ideally the only objects in 'locks' will be open operations now.
return true;
}
}
Usage example
LockProvider p = new LockProvider();
bool success = p.TryExecute("filename",1000,delegate(){
//This code executes within the lock
});

Depending on what you are doing with the files (you say disk based caching so I assume reads as well as writes) then I would suggest trying something based upon ReaderWriterLock, if you can upgrade to .Net 3.5 then try ReaderWriterLockSlim instead as it performs much better.
As a general step to reducing the potential endless recursion case in your example change the first bit of the code to the following:
do
{
// 1) Creation/aquire phase
lock (createLock){
// We have to lock on dictionary writes, since otherwise
// two locks for the same file could be created and assigned
// at the same time. (i.e, between TryGetValue and the assignment)
if (!locks.TryGetValue(key, out itemLock))
locks[key] = itemLock = new Object(); //make a new lock!
}
// Loophole (part 1):
// Right here - this is where another thread could remove 'itemLock'
// from the dictionary, and potentially, yet another thread could
// insert a new value for 'itemLock' into the dictionary... etc, etc..
// 2) Execute phase
lock(itemLock){
// May take minutes to acquire this lock.
// Real version would specify a timeout and a failure callback.
// Trying to detect an occurence of loophole above
// Check that itemLock still exists and matches the dictionary
lock(createLock){
object newLock = null;
validLock = locks.TryGetValue(key, out newLock);
validLock = validLock && newLock == itemLock;
}
// Only run the callback if the lock is valid
if (validLock) callback(); // Extremely long-running callback.
}
// If we had an invalid lock, we have to try everything over again.
} while (!validLock);
This replaces your recursion with a loop which avoids any chance of a StackOverflow by endless recursion.

That solution sure looks brittle and complex. Having public callbacks inside locks is bad practice. Why won't you let LockProvider return some sort of 'lock' objects, so that the consumers do the lock themselves. This separates the locking of the locks dictionary from the execution. It might look like this:
public class LockProvider
{
private readonly object globalLock = new object();
private readonly Dictionary<String, Locker> locks =
new Dictionary<string, Locker>(StringComparer.Ordinal);
public IDisposable Enter(string key)
{
Locker locker;
lock (this.globalLock)
{
if (!this.locks.TryGetValue(key, out locker))
{
this.locks[key] = locker = new Locker(this, key);
}
// Increase wait count ínside the global lock
locker.WaitCount++;
}
// Call Enter and decrease wait count óutside the
// global lock (to prevent deadlocks).
locker.Enter();
// Only one thread will be here at a time for a given locker.
locker.WaitCount--;
return locker;
}
private sealed class Locker : IDisposable
{
private readonly LockProvider provider;
private readonly string key;
private object keyLock = new object();
public int WaitCount;
public Locker(LockProvider provider, string key)
{
this.provider = provider;
this.key = key;
}
public void Enter()
{
Monitor.Enter(this.keyLock);
}
public void Dispose()
{
if (this.keyLock != null)
{
this.Exit();
this.keyLock = null;
}
}
private void Exit()
{
lock (this.provider.globalLock)
{
try
{
// Remove the key before releasing the lock, but
// only when no threads are waiting (because they
// will have a reference to this locker).
if (this.WaitCount == 0)
{
this.provider.locks.Remove(this.key);
}
}
finally
{
// Release the keyLock inside the globalLock.
Monitor.Exit(this.keyLock);
}
}
}
}
}
And the LockProvider can be used as follows:
public class Consumer
{
private LockProvider provider;
public void DoStufOnFile(string fileName)
{
using (this.provider.Enter(fileName))
{
// Long running operation on file here.
}
}
}
Note that Monitor.Enter is called before we enter the try statement (using), which means in certain host environments (such as ASP.NET and SQL Server) we have the possibility of locks never being released when an asynchronous exception happens. Hosts like ASP.NET and SQL Server aggressively kill threads when timeouts occur. Rewriting this with the Enter outside the Monitor.Enter inside the try is a bit tricky though.
I hope this helps.

Could you not simply used a named Mutex, with the name derived from your filename?
Although not a lightweight synchronization primitive, it's simpler than managing your own synchronized dictionary.
However if you really do want to do it this way, I'd have thought the following implementation looks simpler. You need a synchonized dictionary - either the .NET 4 ConcurrentDictionary or your own implementation if you're on .NET 3.5 or lower.
try
{
object myLock = new object();
lock(myLock)
{
object otherLock = null;
while(otherLock != myLock)
{
otherLock = lockDictionary.GetOrAdd(key, myLock);
if (otherLock != myLock)
{
// Another thread has a lock in the dictionary
if (Monitor.TryEnter(otherLock, timeoutMs))
{
// Another thread still has a lock after a timeout
failure();
return;
}
else
{
Monitor.Exit(otherLock);
}
}
}
// We've successfully added myLock to the dictionary
try
{
// Do our stuff
success();
}
finally
{
lockDictionary.Remove(key);
}
}
}

There doesn't seem to be an elegant way to do this in .NET, although I have improved the algorithm thanks to #RobV's suggestion of a loop. Here is the final solution I settled on.
It is immune to the 'orphaned reference' bug that seems to be typical of the standard pattern followed by #Steven's answer.
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
namespace ImageResizer.Plugins.DiskCache {
public delegate void LockCallback();
/// <summary>
/// Provides locking based on a string key.
/// Locks are local to the LockProvider instance.
/// The class handles disposing of unused locks. Generally used for
/// coordinating writes to files (of which there can be millions).
/// Only keeps key/lock pairs in memory which are in use.
/// Thread-safe.
/// </summary>
public class LockProvider {
/// <summary>
/// The only objects in this collection should be for open files.
/// </summary>
protected Dictionary<String, Object> locks =
new Dictionary<string, object>(StringComparer.Ordinal);
/// <summary>
/// Synchronization object for modifications to the 'locks' dictionary
/// </summary>
protected object createLock = new object();
/// <summary>
/// Attempts to execute the 'success' callback inside a lock based on 'key'. If successful, returns true.
/// If the lock cannot be acquired within 'timoutMs', returns false
/// In a worst-case scenario, it could take up to twice as long as 'timeoutMs' to return false.
/// </summary>
/// <param name="key"></param>
/// <param name="success"></param>
/// <param name="failure"></param>
/// <param name="timeoutMs"></param>
public bool TryExecute(string key, int timeoutMs, LockCallback success){
//Record when we started. We don't want an infinite loop.
DateTime startedAt = DateTime.UtcNow;
// Tracks whether the lock acquired is still correct
bool validLock = true;
// The lock corresponding to 'key'
object itemLock = null;
try {
//We have to loop until we get a valid lock and it stays valid until we lock it.
do {
// 1) Creation/aquire phase
lock (createLock) {
// We have to lock on dictionary writes, since otherwise
// two locks for the same file could be created and assigned
// at the same time. (i.e, between TryGetValue and the assignment)
if (!locks.TryGetValue(key, out itemLock))
locks[key] = itemLock = new Object(); //make a new lock!
}
// Loophole (part 1):
// Right here - this is where another thread (executing part 2) could remove 'itemLock'
// from the dictionary, and potentially, yet another thread could
// insert a new value for 'itemLock' into the dictionary... etc, etc..
// 2) Execute phase
if (System.Threading.Monitor.TryEnter(itemLock, timeoutMs)) {
try {
// May take minutes to acquire this lock.
// Trying to detect an occurence of loophole above
// Check that itemLock still exists and matches the dictionary
lock (createLock) {
object newLock = null;
validLock = locks.TryGetValue(key, out newLock);
validLock = validLock && newLock == itemLock;
}
// Only run the callback if the lock is valid
if (validLock) {
success(); // Extremely long-running callback, perhaps throwing exceptions
return true;
}
} finally {
System.Threading.Monitor.Exit(itemLock);//release lock
}
} else {
validLock = false; //So the finally clause doesn't try to clean up the lock, someone else will do that.
return false; //Someone else had the lock, they can clean it up.
}
//Are we out of time, still having an invalid lock?
if (!validLock && Math.Abs(DateTime.UtcNow.Subtract(startedAt).TotalMilliseconds) > timeoutMs) {
//We failed to get a valid lock in time.
return false;
}
// If we had an invalid lock, we have to try everything over again.
} while (!validLock);
} finally {
if (validLock) {
// Loophole (part 2). When loophole part 1 and 2 cross paths,
// An lock object may be removed before being used, and be orphaned
// 3) Cleanup phase - Attempt cleanup of lock objects so we don't
// have a *very* large and slow dictionary.
lock (createLock) {
// TryEnter() fails instead of waiting.
// A normal lock would cause a deadlock with phase 2.
// Specifying a timeout would add great and pointless overhead.
// Whoever has the lock will clean it up also.
if (System.Threading.Monitor.TryEnter(itemLock)) {
try {
// It succeeds, so no-one else is working on it
// (but may be preparing to, see loophole)
// Only remove the lock object if it
// still exists in the dictionary as-is
object existingLock = null;
if (locks.TryGetValue(key, out existingLock)
&& existingLock == itemLock)
locks.Remove(key);
} finally {
// Remove the lock
System.Threading.Monitor.Exit(itemLock);
}
}
}
}
}
// Ideally the only objects in 'locks' will be open operations now.
return true;
}
}
}
Consuming this code is very simple:
LockProvider p = new LockProvider();
bool success = p.TryExecute("filename",1000,delegate(){
//This code executes within the lock
});

Related

Named Lock Collection in C#?

I have multiple threads writing data to a common source, and I would like two threads to block each other if and only if they are touching the same piece of data.
It would be nice to have a way to lock specifically on an arbitrary key:
string id = GetNextId();
AquireLock(id);
try
{
DoDangerousThing();
}
finally
{
ReleaseLock(id);
}
If nobody else is trying to lock the same key, I would expect they would be able to run concurrently.
I could achieve this with a simple dictionary of mutexes, but I would need to worry about evicting old, unused locks and that could become a problem if the set grows too large.
Is there an existing implementation of this type of locking pattern.
You can try using a ConcurrentDictionary<string, object> to create named object instances. When you need a new lock instance (that you haven't used before), you can add it to the dictionary (adding is an atomic operation through GetOrAdd) and then all threads can share the same named object once you pull it from the dictionary, based on your data.
For example:
// Create a global lock map for your lock instances.
public static ConcurrentDictionary<string, object> GlobalLockMap =
new ConcurrentDictionary<string, object> ();
// ...
var oLockInstance = GlobalLockMap.GetOrAdd ( "lock name", x => new object () );
if (oLockInstance == null)
{
// handle error
}
lock (oLockInstance)
{
// do work
}
You can use the ConcurrentDictionary<string, object> to create and reuse different locks. If you want to remove locks from the dictionary, and also to reopen in future the same named resource, you have always to check inside the critical region if the previously acquired lock has been removed or changed by other threads. And take care to remove the lock from the dictionary as the last step before leaving the critical region.
static ConcurrentDictionary<string, object> _lockDict =
new ConcurrentDictionary<string, object>();
// VERSION 1: single-shot method
public void UseAndCloseSpecificResource(string resourceId)
{
bool isSameLock;
object lockObj, lockObjCheck;
do
{
lock (lockObj = _lockDict.GetOrAdd(resourceId, new object()))
{
if (isSameLock = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
try
{
// ... open, use, and close resource identified by resourceId ...
// ...
}
finally
{
// This must be the LAST statement
_lockDict.TryRemove(resourceId, out lockObjCheck);
}
}
}
}
while (!isSameLock);
}
// VERSION 2: separated "use" and "close" methods
// (can coexist with version 1)
public void UseSpecificResource(string resourceId)
{
bool isSameLock;
object lockObj, lockObjCheck;
do
{
lock (lockObj = _lockDict.GetOrAdd(resourceId, new object()))
{
if (isSameLock = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
// ... open and use (or reuse) resource identified by resourceId ...
}
}
}
while (!isSameLock);
}
public bool TryCloseSpecificResource(string resourceId)
{
bool result = false;
object lockObj, lockObjCheck;
if (_lockDict.TryGetValue(resourceId, out lockObj))
{
lock (lockObj)
{
if (result = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
try
{
// ... close resource identified by resourceId ...
// ...
}
finally
{
// This must be the LAST statement
_lockDict.TryRemove(resourceId, out lockObjCheck);
}
}
}
}
return result;
}
The lock keyword (MSDN) already does this.
When you lock, you pass the object to lock on:
lock (myLockObject)
{
}
This uses the Monitor class with the specific object to synchronize any threads using lock on the same object.
Since string literals are "interned" – that is, they are cached for reuse so that every literal with the same value is in fact the same object – you can also do this for strings:
lock ("TestString")
{
}
Since you aren't dealing with string literals you could intern the strings you read as described in: C#: Strings with same contents.
It would even work if the reference used was copied (directly or indirectly) from an interned string (literal or explicitly interned). But I wouldn't recommend it. This is very fragile and can lead to hard-to-debug problems, due to the ease with which new instances of a string having the same value as an interned string can be created.
A lock will only block if something else has entered the locked section on the same object. Thus, no need to keep a dictionary around, just the applicable lock objects.
Realistically though, you'll need to maintain a ConcurrentDictionary or similar to allow your objects to access the appropriate lock object.

Asp.Net caching pattern

There are a great number of articles available regarding thread safe caching, here's an example:
private static object _lock = new object();
public void CacheData()
{
SPListItemCollection oListItems;
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if(oListItems == null)
{
lock (_lock)
{
// Ensure that the data was not loaded by a concurrent thread
// while waiting for lock.
oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
}
However, this example depends on the request for the cache also rebuilding the cache.
I'm looking for a solution where the request and rebuild are separate. Here's the scenario.
I have a web service that I want to monitor for certain types of error. If an error occurs, I create an monitor object and cache - it is updatable and is locked accordingly during update. Alls well so far.
Elsewhere, I check for the existence of the cached object, and the data it contains. This would work straight out of the box except for one particular scenario.
If the cache object is being updated - say a status change, I would like to wait and get the latest info rather than the current info, which if returned, would be out of date. So for my fetch code, I need to check if the object is currently being created/updating, and if so wait, then retry.
As I pointed out, there are many examples of cache locking patterns but I can't seem to find one that for this scenario. Any ideas as to how to go about this would be appreciated?
You can try the following code using two locks. Write lock in the setter is quite simple and protects cache from being written by more than one threads. The getter use a simple double-check lock.
Now, the trick is in Refresh() method, which uses the same lock as the getter. The method uses the lock and in the first step removes list from the cache. It will trigger any getter to fail the first null check and wait for the lock. The method in the meantime gets items, sets cache again and releases the lock.
When it comes back to the getter, it reads the cache again and now it contains the list.
public class CacheData
{
private static object _readLock = new object();
private static object _writeLock = new object();
public SPListItemCollection ListItem
{
get
{
var oListItems = (SPListItemCollection) Cache["ListItemCacheName"];
if (oListItems == null)
{
lock (_readLock)
{
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
return oListItems;
}
set
{
lock (_writeLock)
{
Cache.Add("ListItemCacheName", value, ..);
}
}
}
public void Refresh()
{
lock (_readLock)
{
Cache.Remove("ListItemCacheName");
var oListItems = DoQueryToReturnItems();
ListItem = oListItems;
}
}
}
You can make the method and property static, if you do not need CacheData instance.

How to freeze a popsicle in .NET (make a class immutable)

I'm designing a class that I wish to make readonly after a main thread is done configuring it, i.e. "freeze" it. Eric Lippert calls this popsicle immutability. After it is frozen, it can be accessed by multiple threads concurrently for reading.
My question is how to write this in a thread safe way that is realistically efficient, i.e. without trying to be unnecessarily clever.
Attempt 1:
public class Foobar
{
private Boolean _isFrozen;
public void Freeze() { _isFrozen = true; }
// Only intended to be called by main thread, so checks if class is frozen. If it is the operation is invalid.
public void WriteValue(Object val)
{
if (_isFrozen)
throw new InvalidOperationException();
// write ...
}
public Object ReadSomething()
{
return it;
}
}
Eric Lippert seems to suggest this would be OK in this post.
I know writes have release semantics, but as far as I understand this only pertains to ordering, and it doesn't necessarily mean that all threads will see the value immediately after the write. Can anyone confirm this? This would mean this solution is not thread safe (this may not be the only reason of course).
Attempt 2:
The above, but using Interlocked.Exchange to ensure the value is actually published:
public class Foobar
{
private Int32 _isFrozen;
public void Freeze() { Interlocked.Exchange(ref _isFrozen, 1); }
public void WriteValue(Object val)
{
if (_isFrozen == 1)
throw new InvalidOperationException();
// write ...
}
}
Advantage here would be that we ensure the value is published without suffering the overhead on every read. If none of the reads are moved before the write to _isFrozen as the Interlocked method uses a full memory barrier I would guess this is thread safe. However, who knows what the compiler will do (and according to section 3.10 of the C# spec that seems like quite a lot), so I don't know if this is threadsafe.
Attempt 3:
Also do the read using Interlocked.
public class Foobar
{
private Int32 _isFrozen;
public void Freeze() { Interlocked.Exchange(ref _isFrozen, 1); }
public void WriteValue(Object val)
{
if (Interlocked.CompareExchange(ref _isFrozen, 0, 0) == 1)
throw new InvalidOperationException();
// write ...
}
}
Definitely thread safe, but it seems a little wasteful to have to do the compare exchange for every read. I know this overhead is probably minimal, but I'm looking for a reasonably efficient method (although perhaps this is it).
Attempt 4:
Using volatile:
public class Foobar
{
private volatile Boolean _isFrozen;
public void Freeze() { _isFrozen = true; }
public void WriteValue(Object val)
{
if (_isFrozen)
throw new InvalidOperationException();
// write ...
}
}
But Joe Duffy declared "sayonara volatile", so I won't consider this a solution.
Attempt 5:
Lock everything, seems a bit overkill:
public class Foobar
{
private readonly Object _syncRoot = new Object();
private Boolean _isFrozen;
public void Freeze() { lock(_syncRoot) _isFrozen = true; }
public void WriteValue(Object val)
{
lock(_syncRoot) // as above we could include an attempt that reads *without* this lock
if (_isFrozen)
throw new InvalidOperationException();
// write ...
}
}
Also seems definitely thread safe, but has more overhead than using the Interlocked approach above, so I would favour attempt 3 over this one.
And then I can come up with at least some more (I'm sure there are many more):
Attempt 6: use Thread.VolatileWrite and Thread.VolatileRead, but these are supposedly a little on the heavy side.
Attempt 7: use Thread.MemoryBarrier, seems a little too internal.
Attempt 8: create an immutable copy - don't want to do this
Summarising:
which attempt would you use and why (or how would you do it if entirely different)? (i.e. what is the best way for publishing a value once that is then read concurrently, while being reasonably efficient without being overly "clever"?)
does .NET's memory model "release" semantics of writes imply that all other threads see updates (cache coherency etc.)? I generally don't want to think too much about this, but it's nice to have an understanding.
EDIT:
Perhaps my question wasn't clear, but I am looking in particular for reasons as to why the above attempts are good or bad. Note that I am talking here about a scenario of one single writer that writes then freezes before any concurrent reads. I believe attempt 1 is OK but I'd like to know exactly why (as I wonder if reads could be optimized away somehow, for example).
I care less about whether or not this is good design practice but more about the actual threading aspect of it.
Many thanks for the response the question received, but I have chosen to mark this as an answer myself because I feel that the answers given do not quite answer my question and I do not want to give the impression to anyone visiting the site that the marked answer is correct simply because it was automatically marked as such due to the bounty expiring.
Furthermore I do not think the answer with the highest number of votes was overwhelmingly voted for, not enough to mark it automatically as an answer.
I am still leaning to attempt #1 being correct, however, I would have liked some authoritative answers. I understand x86 has a strong model, but I don't want to (and shouldn't) code for a particular architecture, after all that's one of the nice things about .NET.
If you are in doubt about the answer, go for one of the locking approaches, perhaps with the optimizations shown here to avoid a lot of contention on the lock.
Maybe slightly off topic but just out of curiosity :) Why don't you use "real" immutability? e.g. making Freeze() return an immutable copy (without "write methods" or any other possibility to change the inner state) and using this copy instead of the original object. You could even go without changing the state and return a new copy (with the changed state) on each write operation instead (afaik the string class works this). "Real immutability" is inherently thread safe.
I vote for Attempt 5, use the lock(this) implementation.
This is the most reliable means of making this work. Reader/writer locks could be employed, but to very little gain. Just go with using a normal lock.
If necessary you could improve the 'frozen' performance by first checking _isFrozen and then locking:
void Freeze() { lock (this) _isFrozen = true; }
object ReadValue()
{
if (_isFrozen)
return Read();
else
lock (this) return Read();
}
void WriteValue(object value)
{
lock (this)
{
if (_isFrozen) throw new InvalidOperationException();
Write(value);
}
}
If you really create, fill and freeze the object before showing it to other threads, then you don't need anything special to deal with thread-safety (the strong memory model of .NET is already your guarantee), so the solution 1 is valid.
But, if you give the unfrozen object to another thread (or if you are simple creating your class without knowing how users will use it) then using the version the solution that returns a new fully immutable instance is probably better. In this case, the Mutable instance is like the StringBuilder and the immutable instance is like the string. If you need an extra guarantee, the mutable instance may check its creator thread and throw exceptions if it is used from any other thread (in all methods... to avoid possible partial reads).
Attempt 2 is thread safe on x86 and other processors that have a strong memory model, but how I would do it is to make thread safety the consumers problem because there is no way for you to efficiently do it within the consumed code. Consider:
if(!foo.frozen)
{
foo.apropery = "avalue";
}
the thread saftey of the frozen property and the guard code in apropery's setter doesn't really matter because even they are perfectly thread safe you still have a race condition. Instead I would write it like
lock(foo)
{
if(!foo.frozen)
{
foo.apropery = "avalue";
}
}
and have neither of the properties inherently thread safe.
#1 - reader not threadsafe - I believe problem would be in reader side, not writer (code not shown)
#2 - reader not threadsafe - same as #1
#3 - promising, read check can be optimized out for most cases (when CPU caches are in sync)
Attempt 3:
Also do the read using Interlocked.
public class Foobar {
private object _syncRoot = new object();
private int _isFrozen = 0; // perf compiler warning, but training code, so show defaults
// Why Exchange to 1 then throw away result. Best to just increment.
//public void Freeze() { Interlocked.Exchange(ref _isFrozen, 1); }
public void Freeze() { Interlocked.Increment(ref _isFrozen); }
public void WriteValue(Object val) {
// if this core can see _isFrozen then no special lock or sync needed
if (_isFrozen != 0)
throw new InvalidOperationException();
lock(_syncRoot) {
if (_isFrozen != 0)
throw new InvalidOperationException(); // the 'throw' is 100x-1000x more costly than the lock, just eat it
_val = val;
}
}
public object Read() {
// frozen is one-way, if one-way state has been published
// to my local CPU cache then just read _val.
// There are very strange corner cases when _isFrozen and _val fields are in
// different cache lines, but should be nearly impossible to hit unless
// dealing with very large structs (make it more likely to cross
// 4k cache line).
if (_isFrozen != 0)
return _val;
// else
lock(_syncRoot) { // _isFrozen is 0 here
if (_isFrozen != 0) // if _isFrozen is 1 here we just collided with writer using lock on other thread, or our CPU cache was out of sync and lock() forced the dirty cache line to be read from main memory
return _val;
throw new InvalidOperationException(); // throw is 100x-1000x more expensive than lock, eat the cost of lock
}
}
}
Joe Duffy's post about 'volatile is dead' is, I think, in the context of his next-gen CLR/OS architecture and for CLR on ARM. Those of us doing multi-core x64/x86 I think volatile is fine. If perf is the primary concern I suggest you measure the code above and compare it to volatile.
Unlike other folks posting answers I wouldn't jump straight to lock() if you have lots of readers (3 or more threads likely to read the same object at the same time). But in your sample you mix perf-sensitive question with exceptions when a collision happens, which doesn't make much sense. If you're using exceptions, then you can also use other higher-level constructs.
If you want complete safety but need to optimize for lots of concurrent readers change lock()/Monitor to ReaderWriterLockSlim.
.NET has new primitives to handle publishing values. Take a look at Rx. It can be very fast and lockless for some cases (I think they use optimizations similar to above).
If written multiple times but only one value is kept - in Rx that is "new ReplaySubject(bufferSize: 1)". If you try it you might be surprised how fast it. At the same time I applaud your attempt to learn this level of detail.
If you want to go lockless get over your distaste for Thread.MemoryBarrier(). It is extremely important. But it has the same gotchas as volatile as described by Joe Duffy - it was designed as a hint to the compiler & CPU to prevent reordering of memory reads (which take a long time in CPU terms, so they are aggressively reordered when there are no hints present). When this reordering is combined with CLR constructs like auto-inline of functions and you can see very surprising behavior at the memory & register level. MemoryBarrier() just disables those single-threaded memory access assumptions that CPU and CLR use most of the time.
Perhaps my question wasn't clear, but I am looking in particular for reasons as to why the above attempts are good or bad. Note that I am talking here about a scenario of one single writer that writes then freezes before any concurrent reads. I believe attempt 1 is OK but I'd like to know exactly why (as I wonder if reads could be optimized away somehow, for example). I care less about whether or not this is good design practice but more about the actual threading aspect of it.
Ok, now I better understand what you are doing and looking for in a response. Allow me to elaborate on my previous answer promoting the use of locks by first addressing each of your attempts.
Attempt 1:
The approach of using a simple class that has no synchronization primitives of any form is entirely viable in your example. Since the 'authoring' thread is the only thread having access to this class during it's mutating state this should be safe. If an only if another thread has the potential to access before the class is 'frozen' would you need to provide synchronization. Essentially, it's not possible for a thread to have a cache of something it has never seen.
Aside from a thread having a cached copy of the internal state of this list there is one other concurrency issue that you should be concerned with. You should consider write reordering by the authoring thread. You example solution doesn't have enough code for me to address this, but the process of handing this 'frozen' list to another thread is the heart of the issue. Are you using Interlocked.Exchange or writing to a volatile state?
I still advocate that is not the best approach simply because there is no guarantee that another thread has not seen the instance while it's mutating.
Attempt 2:
While attempt 2 should not be used. If you are using atomic writes to a member, one should also use atomic reads. I would never recommend one without the other as without both reads and writes being atomic you haven't gained anything. The correct application of atomic reads and writes is your 'Attempt 3'.
Attempt 3:
This will guarantee an exception is thrown if a thread has attempted to mutate an frozen list. However it makes no assertion that a read is only acceptable on a frozen instance. This, IMHO, is just as bad as accessing our _isFrozen variable with atomic and non-atomic accessors. If you are going to say that it's important to safeguard writes, then you should always safeguard reads. One without the other is just 'odd'.
Overlooking my own feeling towards writing code that gaurds writes but not reads this is an acceptable approach given your specific uses. I have one writer, I write, I freeze, then I make it available to readers. Under this scenario you code works correctly. You rely on the atomic operation on the set of _isFrozen to provide the required memory barrier prior to handing the class to another thread.
In a nutshell this approach works, but again if a thread has an instance that is not frozen it's going to break.
Attempt 4:
While at heart this is nearly the same as attempt 3 (given one writer) there is one big difference. In this example, if you check _isFrozen in the reader then every access will require a memory barrier. This is unnecessary overhead once the list is frozen.
Still this has the same issue as Attempt 3 in that no assertions are made about the state of _isFrozen during the read so the performance should be identical in your example usage.
Attempt 5:
As I said this is my preference given the modification to read as appears in my other answer.
Attempt 6:
Is essentially the same as #4.
Attempt 7:
You could solve your specific needs with a Thread.MemoryBarrier. Essentially using the code from Attempt 1, you create the instance, call Freeze(), add your Thread.MemoryBarrier, and then share the instance (or share it within a lock). This should work great, again only under your limited use case.
Attempt 8:
Without knowing more about this, I can't advise on the cost of the copy.
Summary
Again I prefer using a class that has some threading guarantee or none at all. Creating a class that is only 'partially' thread safe is, IMO, dangerous.
In the words of a famous jedi master:
Either do or do not there is no try.
The same goes for thread safety. The class should either be thread safe or not. Taking this approach you are left with either using my augmentation of Attempt 5, or using Attempt 7. Given the choice, I would never recommend #7.
So my recommendation stands firmly behind a completely thread-safe version. The performance cost between the two is so infinitesimally small it's almost non-existent. The reader threads will never hit the lock simply because of your usage scenario of having a single writer. Yet, if they do, proper behavior is still a certainty. Thus as your code changes over time and suddenly your instance is being shared prior to being frozen you don't wind up with race condition that crashes your program. Thread safe, or not, don't be half-in or you wind up with nasty surprise someday.
My preference is all classes shared by more than one thread are one of two types:
Completely immutable.
Completely Thread-safe.
Since a popsicle list is not immutable by design it does not fit #1. Therefore if you are going to share the object across threads it should fit #2.
Hopefully all this ranting further explains my reasoning :)
_syncRoot
Many people have noticed that I skipped the use of a _syncRoot on my locking implementation. While the reasons to use _syncRoot are valid they are not always necessary. In your example usage where you have a single writer the use of lock(this) should suffice nicely without adding another heap allocation for _syncRoot.
Is the thing constructed and written to, then permanently frozen and read multiple times?
Or do you freeze and unfreeze and refreeze it multiple times?
If it's the former, then perhaps the "is frozen" check should be in the reader method not the writer method (to prevent it reading before it's frozen).
Or, if it's the latter, then the use case you need to beware of is:
Main thread invokes the writer method, finds that it's not frozen, and therefore begins to write
Before the write has finished, someone tries to freeze the object and then reads from it, while the other (main) thread is still writing
In the latter case, Google shows a lot of results for multiple reader single writer which you might find interesting.
In general, each mutable object should have precisely one clearly-defined "owner"; shared objects should be immutable. Popsicles should not be accessible by multiple threads until after they are frozen.
Personally, I don't like forms of popsicle immunity with an exposed "freeze" method. I think a cleaner approach is to have AsMutable and AsImmutable methods (each of which would simply return the object unmodified when appropriate). Such an approach can allow for more robust promises about immutability. For example, if an "unshared mutable object" is being mutated while its AsImmutable member is being called (behavior which would be contrary to the object being "unshared"), the state of the data in the copy may be indeterminate, but whatever was returned would be immutable. By contrast, if one thread froze an object and then assumed it was immutable while another thread was writing to it, the "immutable" object could end up changing after it was frozen and its values were read.
Edit
Based on further description, I would suggest having code which writes to the object do so within a monitor lock, and having the freeze routine look something like:
public Thingie Freeze(void) // Returns the object in question
{
if (isFrozen) // Private field
return this;
else
return DoFreeze();
}
Thingie DoFreeze(void)
{
if (Monitor.TryEnter(whatever))
{
isFrozen = true;
return this;
}
else if (isFrozen)
return this;
else
throw new InvalidOperationException("Object in use by writer");
}
The Freeze method may be called any number of times by any number of threads; it should be short enough to be inlined (though I haven't profiled it), and should thus take almost no time to execute. If the first access of the object in any thread is via the Freeze method, that should guarantee proper visibility under any reasonable memory model (even if the thread didn't see the updates to the object performed by the thread which created and originally froze it, it would perform the TryEnter, which would guarantee a memory barrier, and after that failed it would notice that the object was frozen and return it.
If code which is going to write the object acquires the lock first, an attempt to write to a frozen object could deadlock. If one would rather have such code throw an exception, one use TryEnter and throw an exception if it can't get the lock.
The object used for locking should be something which is exclusively held by the object to be frozen. If the object to be frozen doesn't hold a purely-private reference to anything, one could either lock on this or create a private object purely for locking purposes. Note that it is safe to abandon 'entered' monitor locks without cleanup; the GC will simply forget about them, since if no references exist to a lock there's no way anybody will ever care (or could even ask) whether the lock was entered at the time it was abandoned.
I am not sure in terms of cost how the following approach will do, but it is a bit different. Only initially if there are multiple threads trying to write value simultaneously will they encounter locks. Once it is frozen all later calls will get the exception directly.
Attempt 9:
public class Foobar
{
private readonly Object _syncRoot = new Object();
private object _val;
private Boolean _isFrozen;
private Action<object> WriteValInternal;
public void Freeze() { _isFrozen = true; }
public Foobar()
{
WriteValInternal = BeforeFreeze;
}
private void BeforeFreeze(object val)
{
lock (_syncRoot)
{
if (_isFrozen == false)
{
//Write the values....
_val = val;
//...
//...
//...
//and then modify the write value function
WriteValInternal = AfterFreeze;
Freeze();
}
else
{
throw new InvalidOperationException();
}
}
}
private void AfterFreeze(object val)
{
throw new InvalidOperationException();
}
public void WriteValue(Object val)
{
WriteValInternal(val);
}
public Object ReadSomething()
{
return _val;
}
}
Have you checked out Lazy
http://msdn.microsoft.com/en-us/library/dd642331.aspx
which uses ThreadLocal
http://msdn.microsoft.com/en-us/library/dd642243.aspx
And actually looking further there is a Freezable class...
http://msdn.microsoft.com/en-us/library/vstudio/ms602734(v=vs.100).aspx
you may achieve this using POST Sharp
take one interface
public interface IPseudoImmutable
{
bool IsFrozen { get; }
bool Freeze();
}
then derive your attribute from InstanceLevelAspect like this
/// <summary>
/// implement by divyang
/// </summary>
[Serializable]
[IntroduceInterface(typeof(IPseudoImmutable),
AncestorOverrideAction = InterfaceOverrideAction.Ignore, OverrideAction = InterfaceOverrideAction.Fail)]
public class PseudoImmutableAttribute : InstanceLevelAspect, IPseudoImmutable
{
private volatile bool isFrozen;
#region "IPseudoImmutable"
[IntroduceMember]
public bool IsFrozen
{
get
{
return this.isFrozen;
}
}
[IntroduceMember(IsVirtual = true, OverrideAction = MemberOverrideAction.Fail)]
public bool Freeze()
{
if (!this.isFrozen)
{
this.isFrozen = true;
}
return this.IsFrozen;
}
#endregion
[OnLocationSetValueAdvice]
[MulticastPointcut(Targets = MulticastTargets.Property | MulticastTargets.Field)]
public void OnValueChange(LocationInterceptionArgs args)
{
if (!this.IsFrozen)
{
args.ProceedSetValue();
}
}
}
public class ImmutableException : Exception
{
/// <summary>
/// The location name.
/// </summary>
private readonly string locationName;
/// <summary>
/// Initializes a new instance of the <see cref="ImmutableException"/> class.
/// </summary>
/// <param name="message">
/// The message.
/// </param>
public ImmutableException(string message)
: base(message)
{
}
public ImmutableException(string message, string locationName)
: base(message)
{
this.locationName = locationName;
}
public string LocationName
{
get
{
return this.locationName;
}
}
}
then apply in your class like this
[PseudoImmutableAttribute]
public class TestClass
{
public string MyString { get; set; }
public int MyInitval { get; set; }
}
then run it in multi thread
/// <summary>
/// The program.
/// </summary>
public class Program
{
/// <summary>
/// The main.
/// </summary>
/// <param name="args">
/// The args.
/// </param>
public static void Main(string[] args)
{
Console.Title = "Divyang Demo ";
var w = new Worker();
w.Run();
Console.ReadLine();
}
}
internal class Worker
{
private object SyncObject = new object();
public Worker()
{
var r = new Random();
this.ObjectOfMyTestClass = new MyTestClass { MyInitval = r.Next(500) };
}
public MyTestClass ObjectOfMyTestClass { get; set; }
public void Run()
{
Task readWork;
readWork = Task.Factory.StartNew(
action: () =>
{
for (;;)
{
Task.Delay(1000);
try
{
this.DoReadWork();
}
catch (Exception exception)
{
// Console.SetCursorPosition(80,80);
// Console.SetBufferSize(100,100);
Console.WriteLine("Read Exception : {0}", exception.Message);
}
}
// ReSharper disable FunctionNeverReturns
});
Task writeWork;
writeWork = Task.Factory.StartNew(
action: () =>
{
for (int i = 0; i < int.MaxValue; i++)
{
Task.Delay(1000);
try
{
this.DoWriteWork();
}
catch (Exception exception)
{
Console.SetCursorPosition(80, 80);
Console.SetBufferSize(100, 100);
Console.WriteLine("write Exception : {0}", exception.Message);
}
if (i == 5000)
{
((IPseudoImmutable)this.ObjectOfMyTestClass).Freeze();
}
}
});
Task.WaitAll();
}
/// <summary>
/// The do read work.
/// </summary>
public void DoReadWork()
{
// ThreadId where reading is done
var threadId = System.Threading.Thread.CurrentThread.ManagedThreadId;
// printing on screen
lock (this.SyncObject)
{
Console.SetCursorPosition(0, 0);
Console.SetBufferSize(290, 290);
Console.WriteLine("\n");
Console.WriteLine("Read Start");
Console.WriteLine("Read => Thread Id: {0} ", threadId);
Console.WriteLine("Read => this.objectOfMyTestClass.MyInitval: {0} ", this.ObjectOfMyTestClass.MyInitval);
Console.WriteLine("Read => this.objectOfMyTestClass.MyString: {0} ", this.ObjectOfMyTestClass.MyString);
Console.WriteLine("Read End");
Console.WriteLine("\n");
}
}
/// <summary>
/// The do write work.
/// </summary>
public void DoWriteWork()
{
// ThreadId where reading is done
var threadId = System.Threading.Thread.CurrentThread.ManagedThreadId;
// random number generator
var r = new Random();
var count = r.Next(15);
// new value for Int property
var tempInt = r.Next(5000);
this.ObjectOfMyTestClass.MyInitval = tempInt;
// new value for string Property
var tempString = "Randome" + r.Next(500).ToString(CultureInfo.InvariantCulture);
this.ObjectOfMyTestClass.MyString = tempString;
// printing on screen
lock (this.SyncObject)
{
Console.SetBufferSize(290, 290);
Console.SetCursorPosition(125, 25);
Console.WriteLine("\n");
Console.WriteLine("Write Start");
Console.WriteLine("Write => Thread Id: {0} ", threadId);
Console.WriteLine("Write => this.objectOfMyTestClass.MyInitval: {0} and New Value :{1} ", this.ObjectOfMyTestClass.MyInitval, tempInt);
Console.WriteLine("Write => this.objectOfMyTestClass.MyString: {0} and New Value :{1} ", this.ObjectOfMyTestClass.MyString, tempString);
Console.WriteLine("Write End");
Console.WriteLine("\n");
}
}
}
but still it will allow you to change property like array ,list . but if you apply more login in that then it may work for all type of property and field
I'd do something like this, inspired by C++ movable types. Just remember not to access the object after Freeze/Thaw.
Of course, you can add a _data != null check/throw if you want to be clear about why the user gets an NRE if accessing after thaw/freeze.
public class Data
{
public string _foo;
public int _bar;
}
public class Mutable
{
private Data _data = new Data();
public Mutable() {}
public string Foo { get => _data._foo; set => _data._foo = value; }
public int Bar { get => _data._bar; set => _data._bar = value; }
public Frozen Freeze()
{
var f = new Frozen(_data);
_data = null;
return f;
}
}
public class Frozen
{
private Data _data;
public Frozen(Data data) => _data = data;
public string Foo => _data._foo;
public int Bar => _data._bar;
public Mutable Thaw()
{
var m = new Mutable(_data);
_data = null;
return m;
}
}

Pattern for concurrent cache sharing

Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're
going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
Will the Semaphore please stand up! stand up! stand up!
use Semaphore you can easily synchronize your threads with it.
on both cases where
you are trying to load a page that is currently being cached
you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
my solution is use atomicBoolean to control access database when cache is timeout or unexist;
at the same moment, only one thread(i call it read-th) can access database, the other threads spin until the read-th return data and write it into cache;
here codes; implement by java;
public class CacheBreakDownDefender<K, R> {
/**
* false = do not write null to cache when get null value from database;
*/
private final boolean writeNullToCache;
/**
* cache different query key
*/
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}
private CacheBreakDownDefender(boolean writeNullToCache) {
this.writeNullToCache = writeNullToCache;
}
public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
R result = getFromCache.apply(key);
if (result == null) {
final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
if (selectingDB.compareAndSet(false, true)) {
try {
result = getFromDB.apply(key);
if (result != null || writeNullToCache) {
writeCache.accept(key, result);
}
} finally {
selectingDB.getAndSet(false);
selectingDBTagMap.remove(key);
}
} else {
while (selectingDB.get()) {
TimeUnit.MILLISECONDS.sleep(0L);
//do nothing...
}
return getFromCache.apply(key);
}
}
return result;
}
public static void main(String[] args) throws InterruptedException {
Map<String, String> map = new ConcurrentHashMap<>();
CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);
for (int i = 0; i < 9; i++) {
int finalI = i;
new Thread(() -> {
String kele = null;
try {
if (finalI == 6) {
kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
} else
kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("resut= {}", kele);
}).start();
}
TimeUnit.SECONDS.sleep(2L);
}
}
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)

How to lock on an integer in C#?

Is there any way to lock on an integer in C#? Integers can not be used with lock because they are boxed (and lock only locks on references).
The scenario is as follows: I have a forum based website with a moderation feature. What I want to do is make sure that no more than one moderator can moderate a post at any given time. To achieve this, I want to lock on the ID of the post.
I've had a couple of ideas so far (e.g. using a dictionary<int, object>), but I'm looking for a better and cleaner way.
Any suggestions?
I like doing it like this
public class Synchronizer {
private Dictionary<int, object> locks;
private object myLock;
public Synchronizer() {
locks = new Dictionary<int, object>();
myLock = new object();
}
public object this[int index] {
get {
lock (myLock) {
object result;
if (locks.TryGetValue(index, out result))
return result;
result = new object();
locks[index] = result;
return result;
}
}
}
}
Then, to lock on an int you simply (using the same synchronizer every time)
lock (sync[15]) { ... }
This class returns the same lock object when given the same index twice. When a new index comes, it create an object, returning it, and stores it in the dictionary for next times.
It can easily be changed to work generically with any struct or value type, or to be static so that the synchronizer object does not have to be passed around.
If it's a website then using an in-process lock probably isn't the best approach as if you need to scale the site out onto multiple servers, or add another site hosting an API (or anything else that would require another process accessing the same data to exist) then all your locking strategies are immediately ineffective.
I'd be inclined to look into database-based locking for this. The simplest approach is to use optimistic locking with something like a timestamp of when the post was last updated, and to reject updates made to a post unless the timestamps match.
I've read a lot of comments mentioning that locking isn't safe for web applications, but, other than web farms, I haven't seen any explanations of why. I would be interested in hearing the arguments against it.
I have a similar need, though I'm caching re-sized images on the hard drive (which is obviously a local action so a web farm scenario isn't an issue).
Here is a redone version of what #Configurator posted. It includes a couple features that #Configurator didn't include:
Unlocking: Ensures the list doesn't grow unreasonably large (we have millions of photos and we can have many different sizes for each).
Generic: Allows locking based on different data types (such as int or string).
Here's the code...
/// <summary>
/// Provides a way to lock a resource based on a value (such as an ID or path).
/// </summary>
public class Synchronizer<T>
{
private Dictionary<T, SyncLock> mLocks = new Dictionary<T, SyncLock>();
private object mLock = new object();
/// <summary>
/// Returns an object that can be used in a lock statement. Ex: lock(MySync.Lock(MyValue)) { ... }
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public SyncLock Lock(T value)
{
lock (mLock)
{
SyncLock theLock;
if (mLocks.TryGetValue(value, out theLock))
return theLock;
theLock = new SyncLock(value, this);
mLocks.Add(value, theLock);
return theLock;
}
}
/// <summary>
/// Unlocks the object. Called from Lock.Dispose.
/// </summary>
/// <param name="theLock"></param>
public void Unlock(SyncLock theLock)
{
mLocks.Remove(theLock.Value);
}
/// <summary>
/// Represents a lock for the Synchronizer class.
/// </summary>
public class SyncLock
: IDisposable
{
/// <summary>
/// This class should only be instantiated from the Synchronizer class.
/// </summary>
/// <param name="value"></param>
/// <param name="sync"></param>
internal SyncLock(T value, Synchronizer<T> sync)
{
Value = value;
Sync = sync;
}
/// <summary>
/// Makes sure the lock is removed.
/// </summary>
public void Dispose()
{
Sync.Unlock(this);
}
/// <summary>
/// Gets the value that this lock is based on.
/// </summary>
public T Value { get; private set; }
/// <summary>
/// Gets the synchronizer this lock was created from.
/// </summary>
private Synchronizer<T> Sync { get; set; }
}
}
Here's how you can use it...
public static readonly Synchronizer<int> sPostSync = new Synchronizer<int>();
....
using(var theLock = sPostSync.Lock(myID))
lock (theLock)
{
...
}
This option builds on the good answer provided by configurator with the following modifications:
Prevents the size of the dictionary from growing uncontrollably. Since, new posts will get new ids, your dictionary of locks will grow indefinitely. The solution is to mod the id against a maximum dictionary size. This does mean that some ids will have the same lock (and have to wait when they would otherwise not have to), but this will be acceptable for some dictionary size.
Uses ConcurrentDictionary so there is no need for a separate dictionary lock.
The code:
internal class IdLock
{
internal int LockDictionarySize
{
get { return m_lockDictionarySize; }
}
const int m_lockDictionarySize = 1000;
ConcurrentDictionary<int, object> m_locks = new ConcurrentDictionary<int, object>();
internal object this[ int id ]
{
get
{
object lockObject = new object();
int mapValue = id % m_lockDictionarySize;
lockObject = m_locks.GetOrAdd( mapValue, lockObject );
return lockObject;
}
}
}
Also, just for completeness, there is the alternative of string interning: -
Mod the id against the maximum number of interned id strings you will allow.
Convert this modded value to a string.
Concatenate the modded string with a GUID or namespace name for name collision safety.
Intern this string.
lock on the interned string.
See this answer for some information:
The only benefit of the string interning approach is that you don't need to manage a dictionary. I prefer the dictionary of locks approach as the intern approach makes a lot of assumptions about how string interning works and that it will continue to work in this way. It also uses interning for something it was never meant / designed to do.
I would personally go with either Greg's or Konrad's approach.
If you really do want to lock against the post ID itself (and assuming that your code will only ever be running in a single process) then something like this isn't too dirty:
public class ModeratorUtils
{
private static readonly HashSet<int> _LockedPosts = new HashSet<int>();
public void ModeratePost(int postId)
{
bool lockedByMe = false;
try
{
lock (_LockedPosts)
{
lockedByMe = _LockedPosts.Add(postId);
}
if (lockedByMe)
{
// do your editing
}
else
{
// sorry, can't edit at this time
}
}
finally
{
if (lockedByMe)
{
lock (_LockedPosts)
{
_LockedPosts.Remove(postId);
}
}
}
}
}
Why don't you lock on the whole posting instead just on its ID?
Coresystem at codeplex has two class for thread synchronization based on value types, for details see http://codestand.feedbook.org/2012/06/lock-on-integer-in-c.html
I doubt you should use a database or O/S level feature such as locks for a business level decision. Locks incur significant overheads when held for long times (and in these contexts, anything beyond a couple of hundred milliseconds is an eternity).
Add a status field to the post. If you deal with several therads directly, then you can use O/S level locks -- to set the flag.
You need a whole different approach to this.
Remember that with a website, you don't actually have a live running application on the other side that responds to what the user does.
You basically start a mini-app, which returns the web-page, and then the server is done. That the user ends up sending some data back is a by-product, not a guarantee.
So, you need to lock to persist after the application has returned the moderation page back to the moderator, and then release it when the moderator is done.
And you need to handle some kind of timeout, what if the moderator closes his browser after getting the moderation page back, and thus never communicates back with the server that he/she is done with the moderation process for that post.
Ideally you can avoid all the complex and brittle C# locking and replace it with database locking, if your transactions are designed correctly then you should be able to get by with DB transactions only.
Two boxed integers that happen to have the same value are completely indepent objects.
So if you wanted to do this, your idea of Dictionary would probably be the way to go. You'd need to synchronize access to the dictionary to make sure you are always getting the same instance. And you'd have the problem of the dictionary growing in size.
C# locking is for thread safety and doesn't work the way you want it to for web applications.
The simplest solution is adding a column to the table that you want to lock and when somone locks it write to the db that that column is locked.
Dont let anyone open a post in edit mode if the column is locked for editing.
Otherwise maintain a static list of locked entry Ids and compare to that before allowing an edit.
You want to make sure that a delete doesn't happen twice?
CREATE PROCEDURE RemovePost( #postID int )
AS
if exists(select postID from Posts where postID = #postID)
BEGIN
DELETE FROM Posts where postID = #postID
-- Do other stuff
END
This is pretty much SQL server syntax, I'm not familiar with MyISAM. But it allows stored procedures. I'm guessing you can mock up a similar procedure.
Anyhow, this will work for the majority of cases. The only time it will fail is if two moderators submit at almost exactly the same time, and the exists() function passes on one request just before the DELETE statement executes on another request. I would happily use this for a small site. You could take it a step further and check that the delete actually deleted a row before continuing with the rest, which would guarantee the atomicity of it all.
Trying to create a lock in code, for this use case, I consider very impractical. You lose nothing by having two moderators attempting to delete a post, with one succeeding, and the other having no effect.
You should use a sync object like this:
public class YourForm
{
private static object syncObject = new object();
public void Moderate()
{
lock(syncObject)
{
// do your business
}
}
}
But this approach shouldn't be used in a web app scenario.
public static class ConexoesDeTeste
{
private static int NumeroDeConexoes = 0;
public static void Incrementar()
{
Interlocked.Increment(ref NumeroDeConexoes);
}
public static void Decrementar()
{
Interlocked.Decrement(ref NumeroDeConexoes);
}
public static int Obter() => NumeroDeConexoes;
}

Categories