Does anyone have a good resource on implementing a shared object pool strategy for a limited resource in vein of Sql connection pooling? (ie would be implemented fully that it is thread safe).
To follow up in regards to #Aaronaught request for clarification the pool usage would be for load balancing requests to an external service. To put it in a scenario that would probably be easier to immediately understand as opposed to my direct situtation. I have a session object that functions similarly to the ISession object from NHibernate. That each unique session manages it's connection to the database. Currently I have 1 long running session object and am encountering issues where my service provider is rate limiting my usage of this individual session.
Due to their lack of expectation that a single session would be treated as a long running service account they apparently treat it as a client that is hammering their service. Which brings me to my question here, instead of having 1 individual session I would create a pool of different sessions and split the requests up to the service across those multiple sessions instead of creating a single focal point as I was previously doing.
Hopefully that background offers some value but to directly answer some of your questions:
Q: Are the objects expensive to create?
A: No objects are a pool of limited resources
Q: Will they be acquired/released very frequently?
A: Yes, once again they can be thought of NHibernate ISessions where 1 is usually acquired and released for the duration of every single page request.
Q: Will a simple first-come-first-serve suffice or do you need something more intelligent, i.e. that would prevent starvation?
A: A simple round robin type distribution would suffice, by starvation I assume you mean if there are no available sessions that callers become blocked waiting for releases. This isn't really applicable since the sessions can be shared by different callers. My goal is distribute the usage across multiple sessions as opposed to 1 single session.
I believe this is probably a divergence from a normal usage of an object pool which is why I originally left this part out and planned just to adapt the pattern to allow sharing of objects as opposed to allowing a starvation situation to ever occur.
Q: What about things like priorities, lazy vs. eager loading, etc.?
A: There is no prioritization involved, for simplicity's sake just assume that I would create the pool of available objects at the creation of the pool itself.
This question is a little trickier than one might expect due to several unknowns: The behaviour of the resource being pooled, the expected/required lifetime of objects, the real reason that the pool is required, etc. Typically pools are special-purpose - thread pools, connection pools, etc. - because it is easier to optimize one when you know exactly what the resource does and more importantly have control over how that resource is implemented.
Since it's not that simple, what I've tried to do is offer up a fairly flexible approach that you can experiment with and see what works best. Apologies in advance for the long post, but there is a lot of ground to cover when it comes to implementing a decent general-purpose resource pool. and I'm really only scratching the surface.
A general-purpose pool would have to have a few main "settings", including:
Resource loading strategy - eager or lazy;
Resource loading mechanism - how to actually construct one;
Access strategy - you mention "round robin" which is not as straightforward as it sounds; this implementation can use a circular buffer which is similar, but not perfect, because the pool has no control over when resources are actually reclaimed. Other options are FIFO and LIFO; FIFO will have more of a random-access pattern, but LIFO makes it significantly easier to implement a Least-Recently-Used freeing strategy (which you said was out of scope, but it's still worth mentioning).
For the resource loading mechanism, .NET already gives us a clean abstraction - delegates.
private Func<Pool<T>, T> factory;
Pass this through the pool's constructor and we're about done with that. Using a generic type with a new() constraint works too, but this is more flexible.
Of the other two parameters, the access strategy is the more complicated beast, so my approach was to use an inheritance (interface) based approach:
public class Pool<T> : IDisposable
{
// Other code - we'll come back to this
interface IItemStore
{
T Fetch();
void Store(T item);
int Count { get; }
}
}
The concept here is simple - we'll let the public Pool class handle the common issues like thread-safety, but use a different "item store" for each access pattern. LIFO is easily represented by a stack, FIFO is a queue, and I've used a not-very-optimized-but-probably-adequate circular buffer implementation using a List<T> and index pointer to approximate a round-robin access pattern.
All of the classes below are inner classes of the Pool<T> - this was a style choice, but since these really aren't meant to be used outside the Pool, it makes the most sense.
class QueueStore : Queue<T>, IItemStore
{
public QueueStore(int capacity) : base(capacity)
{
}
public T Fetch()
{
return Dequeue();
}
public void Store(T item)
{
Enqueue(item);
}
}
class StackStore : Stack<T>, IItemStore
{
public StackStore(int capacity) : base(capacity)
{
}
public T Fetch()
{
return Pop();
}
public void Store(T item)
{
Push(item);
}
}
These are the obvious ones - stack and queue. I don't think they really warrant much explanation. The circular buffer is a little more complicated:
class CircularStore : IItemStore
{
private List<Slot> slots;
private int freeSlotCount;
private int position = -1;
public CircularStore(int capacity)
{
slots = new List<Slot>(capacity);
}
public T Fetch()
{
if (Count == 0)
throw new InvalidOperationException("The buffer is empty.");
int startPosition = position;
do
{
Advance();
Slot slot = slots[position];
if (!slot.IsInUse)
{
slot.IsInUse = true;
--freeSlotCount;
return slot.Item;
}
} while (startPosition != position);
throw new InvalidOperationException("No free slots.");
}
public void Store(T item)
{
Slot slot = slots.Find(s => object.Equals(s.Item, item));
if (slot == null)
{
slot = new Slot(item);
slots.Add(slot);
}
slot.IsInUse = false;
++freeSlotCount;
}
public int Count
{
get { return freeSlotCount; }
}
private void Advance()
{
position = (position + 1) % slots.Count;
}
class Slot
{
public Slot(T item)
{
this.Item = item;
}
public T Item { get; private set; }
public bool IsInUse { get; set; }
}
}
I could have picked a number of different approaches, but the bottom line is that resources should be accessed in the same order that they were created, which means that we have to maintain references to them but mark them as "in use" (or not). In the worst-case scenario, only one slot is ever available, and it takes a full iteration of the buffer for every fetch. This is bad if you have hundreds of resources pooled and are acquiring and releasing them several times per second; not really an issue for a pool of 5-10 items, and in the typical case, where resources are lightly used, it only has to advance one or two slots.
Remember, these classes are private inner classes - that is why they don't need a whole lot of error-checking, the pool itself restricts access to them.
Throw in an enumeration and a factory method and we're done with this part:
// Outside the pool
public enum AccessMode { FIFO, LIFO, Circular };
private IItemStore itemStore;
// Inside the Pool
private IItemStore CreateItemStore(AccessMode mode, int capacity)
{
switch (mode)
{
case AccessMode.FIFO:
return new QueueStore(capacity);
case AccessMode.LIFO:
return new StackStore(capacity);
default:
Debug.Assert(mode == AccessMode.Circular,
"Invalid AccessMode in CreateItemStore");
return new CircularStore(capacity);
}
}
The next problem to solve is loading strategy. I've defined three types:
public enum LoadingMode { Eager, Lazy, LazyExpanding };
The first two should be self-explanatory; the third is sort of a hybrid, it lazy-loads resources but doesn't actually start re-using any resources until the pool is full. This would be a good trade-off if you want the pool to be full (which it sounds like you do) but want to defer the expense of actually creating them until first access (i.e. to improve startup times).
The loading methods really aren't too complicated, now that we have the item-store abstraction:
private int size;
private int count;
private T AcquireEager()
{
lock (itemStore)
{
return itemStore.Fetch();
}
}
private T AcquireLazy()
{
lock (itemStore)
{
if (itemStore.Count > 0)
{
return itemStore.Fetch();
}
}
Interlocked.Increment(ref count);
return factory(this);
}
private T AcquireLazyExpanding()
{
bool shouldExpand = false;
if (count < size)
{
int newCount = Interlocked.Increment(ref count);
if (newCount <= size)
{
shouldExpand = true;
}
else
{
// Another thread took the last spot - use the store instead
Interlocked.Decrement(ref count);
}
}
if (shouldExpand)
{
return factory(this);
}
else
{
lock (itemStore)
{
return itemStore.Fetch();
}
}
}
private void PreloadItems()
{
for (int i = 0; i < size; i++)
{
T item = factory(this);
itemStore.Store(item);
}
count = size;
}
The size and count fields above refer to the maximum size of the pool and the total number of resources owned by the pool (but not necessarily available), respectively. AcquireEager is the simplest, it assumes that an item is already in the store - these items would be preloaded at construction, i.e. in the PreloadItems method shown last.
AcquireLazy checks to see if there are free items in the pool, and if not, it creates a new one. AcquireLazyExpanding will create a new resource as long as the pool hasn't reached its target size yet. I've tried to optimize this to minimize locking, and I hope I haven't made any mistakes (I have tested this under multi-threaded conditions, but obviously not exhaustively).
You might be wondering why none of these methods bother checking to see whether or not the store has reached the maximum size. I'll get to that in a moment.
Now for the pool itself. Here is the full set of private data, some of which has already been shown:
private bool isDisposed;
private Func<Pool<T>, T> factory;
private LoadingMode loadingMode;
private IItemStore itemStore;
private int size;
private int count;
private Semaphore sync;
Answering the question I glossed over in the last paragraph - how to ensure we limit the total number of resources created - it turns out that the .NET already has a perfectly good tool for that, it's called Semaphore and it's designed specifically to allow a fixed number of threads access to a resource (in this case the "resource" is the inner item store). Since we're not implementing a full-on producer/consumer queue, this is perfectly adequate for our needs.
The constructor looks like this:
public Pool(int size, Func<Pool<T>, T> factory,
LoadingMode loadingMode, AccessMode accessMode)
{
if (size <= 0)
throw new ArgumentOutOfRangeException("size", size,
"Argument 'size' must be greater than zero.");
if (factory == null)
throw new ArgumentNullException("factory");
this.size = size;
this.factory = factory;
sync = new Semaphore(size, size);
this.loadingMode = loadingMode;
this.itemStore = CreateItemStore(accessMode, size);
if (loadingMode == LoadingMode.Eager)
{
PreloadItems();
}
}
Should be no surprises here. Only thing to note is the special-casing for eager loading, using the PreloadItems method already shown earlier.
Since almost everything's been cleanly abstracted away by now, the actual Acquire and Release methods are really very straightforward:
public T Acquire()
{
sync.WaitOne();
switch (loadingMode)
{
case LoadingMode.Eager:
return AcquireEager();
case LoadingMode.Lazy:
return AcquireLazy();
default:
Debug.Assert(loadingMode == LoadingMode.LazyExpanding,
"Unknown LoadingMode encountered in Acquire method.");
return AcquireLazyExpanding();
}
}
public void Release(T item)
{
lock (itemStore)
{
itemStore.Store(item);
}
sync.Release();
}
As explained earlier, we're using the Semaphore to control concurrency instead of religiously checking the status of the item store. As long as acquired items are correctly released, there's nothing to worry about.
Last but not least, there's cleanup:
public void Dispose()
{
if (isDisposed)
{
return;
}
isDisposed = true;
if (typeof(IDisposable).IsAssignableFrom(typeof(T)))
{
lock (itemStore)
{
while (itemStore.Count > 0)
{
IDisposable disposable = (IDisposable)itemStore.Fetch();
disposable.Dispose();
}
}
}
sync.Close();
}
public bool IsDisposed
{
get { return isDisposed; }
}
The purpose of that IsDisposed property will become clear in a moment. All the main Dispose method really does is dispose the actual pooled items if they implement IDisposable.
Now you can basically use this as-is, with a try-finally block, but I'm not fond of that syntax, because if you start passing around pooled resources between classes and methods then it's going to get very confusing. It's possible that the main class that uses a resource doesn't even have a reference to the pool. It really becomes quite messy, so a better approach is to create a "smart" pooled object.
Let's say we start with the following simple interface/class:
public interface IFoo : IDisposable
{
void Test();
}
public class Foo : IFoo
{
private static int count = 0;
private int num;
public Foo()
{
num = Interlocked.Increment(ref count);
}
public void Dispose()
{
Console.WriteLine("Goodbye from Foo #{0}", num);
}
public void Test()
{
Console.WriteLine("Hello from Foo #{0}", num);
}
}
Here's our pretend disposable Foo resource which implements IFoo and has some boilerplate code for generating unique identities. What we do is to create another special, pooled object:
public class PooledFoo : IFoo
{
private Foo internalFoo;
private Pool<IFoo> pool;
public PooledFoo(Pool<IFoo> pool)
{
if (pool == null)
throw new ArgumentNullException("pool");
this.pool = pool;
this.internalFoo = new Foo();
}
public void Dispose()
{
if (pool.IsDisposed)
{
internalFoo.Dispose();
}
else
{
pool.Release(this);
}
}
public void Test()
{
internalFoo.Test();
}
}
This just proxies all of the "real" methods to its inner IFoo (we could do this with a Dynamic Proxy library like Castle, but I won't get into that). It also maintains a reference to the Pool that creates it, so that when we Dispose this object, it automatically releases itself back to the pool. Except when the pool has already been disposed - this means we are in "cleanup" mode and in this case it actually cleans up the internal resource instead.
Using the approach above, we get to write code like this:
// Create the pool early
Pool<IFoo> pool = new Pool<IFoo>(PoolSize, p => new PooledFoo(p),
LoadingMode.Lazy, AccessMode.Circular);
// Sometime later on...
using (IFoo foo = pool.Acquire())
{
foo.Test();
}
This is a very good thing to be able to do. It means that the code which uses the IFoo (as opposed to the code which creates it) does not actually need to be aware of the pool. You can even inject IFoo objects using your favourite DI library and the Pool<T> as the provider/factory.
I've put the complete code on PasteBin for your copy-and-pasting enjoyment. There's also a short test program you can use to play around with different loading/access modes and multithreaded conditions, to satisfy yourself that it's thread-safe and not buggy.
Let me know if you have any questions or concerns about any of this.
Object Pooling in .NET Core
The dotnet core has an implementation of object pooling added to the base class library (BCL). You can read the original GitHub issue here and view the code for System.Buffers. Currently the ArrayPool is the only type available and is used to pool arrays. There is a nice blog post here.
namespace System.Buffers
{
public abstract class ArrayPool<T>
{
public static ArrayPool<T> Shared { get; internal set; }
public static ArrayPool<T> Create(int maxBufferSize = <number>, int numberOfBuffers = <number>);
public T[] Rent(int size);
public T[] Enlarge(T[] buffer, int newSize, bool clearBuffer = false);
public void Return(T[] buffer, bool clearBuffer = false);
}
}
An example of its usage can be seen in ASP.NET Core. Because it is in the dotnet core BCL, ASP.NET Core can share it's object pool with other objects such as Newtonsoft.Json's JSON serializer. You can read this blog post for more information on how Newtonsoft.Json is doing this.
Object Pooling in Microsoft Roslyn C# Compiler
The new Microsoft Roslyn C# compiler contains the ObjectPool type, which is used to pool frequently used objects which would normally get new'ed up and garbage collected very often. This reduces the amount and size of garbage collection operations which have to happen. There are a few different sub-implementations all using ObjectPool (See: Why are there so many implementations of Object Pooling in Roslyn?).
1 - SharedPools - Stores a pool of 20 objects or 100 if the BigDefault is used.
// Example 1 - In a using statement, so the object gets freed at the end.
using (PooledObject<Foo> pooledObject = SharedPools.Default<List<Foo>>().GetPooledObject())
{
// Do something with pooledObject.Object
}
// Example 2 - No using statement so you need to be sure no exceptions are not thrown.
List<Foo> list = SharedPools.Default<List<Foo>>().AllocateAndClear();
// Do something with list
SharedPools.Default<List<Foo>>().Free(list);
// Example 3 - I have also seen this variation of the above pattern, which ends up the same as Example 1, except Example 1 seems to create a new instance of the IDisposable [PooledObject<T>][4] object. This is probably the preferred option if you want fewer GC's.
List<Foo> list = SharedPools.Default<List<Foo>>().AllocateAndClear();
try
{
// Do something with list
}
finally
{
SharedPools.Default<List<Foo>>().Free(list);
}
2 - ListPool and StringBuilderPool - Not strictly separate implementations but wrappers around the SharedPools implementation shown above specifically for List and StringBuilder's. So this re-uses the pool of objects stored in SharedPools.
// Example 1 - No using statement so you need to be sure no exceptions are thrown.
StringBuilder stringBuilder= StringBuilderPool.Allocate();
// Do something with stringBuilder
StringBuilderPool.Free(stringBuilder);
// Example 2 - Safer version of Example 1.
StringBuilder stringBuilder= StringBuilderPool.Allocate();
try
{
// Do something with stringBuilder
}
finally
{
StringBuilderPool.Free(stringBuilder);
}
3 - PooledDictionary and PooledHashSet - These use ObjectPool directly and have a totally separate pool of objects. Stores a pool of 128 objects.
// Example 1
PooledHashSet<Foo> hashSet = PooledHashSet<Foo>.GetInstance()
// Do something with hashSet.
hashSet.Free();
// Example 2 - Safer version of Example 1.
PooledHashSet<Foo> hashSet = PooledHashSet<Foo>.GetInstance()
try
{
// Do something with hashSet.
}
finally
{
hashSet.Free();
}
Microsoft.IO.RecyclableMemoryStream
This library provides pooling for MemoryStream objects. It's a drop-in replacement for System.IO.MemoryStream. It has exactly the same semantics. It was designed by Bing engineers. Read the blog post here or see the code on GitHub.
var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7};
var manager = new RecyclableMemoryStreamManager();
using (var stream = manager.GetStream())
{
stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}
Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process–this is the pool. It is perfectly fine to use multiple pools if you desire.
Something like this might suit your needs.
/// <summary>
/// Represents a pool of objects with a size limit.
/// </summary>
/// <typeparam name="T">The type of object in the pool.</typeparam>
public sealed class ObjectPool<T> : IDisposable
where T : new()
{
private readonly int size;
private readonly object locker;
private readonly Queue<T> queue;
private int count;
/// <summary>
/// Initializes a new instance of the ObjectPool class.
/// </summary>
/// <param name="size">The size of the object pool.</param>
public ObjectPool(int size)
{
if (size <= 0)
{
const string message = "The size of the pool must be greater than zero.";
throw new ArgumentOutOfRangeException("size", size, message);
}
this.size = size;
locker = new object();
queue = new Queue<T>();
}
/// <summary>
/// Retrieves an item from the pool.
/// </summary>
/// <returns>The item retrieved from the pool.</returns>
public T Get()
{
lock (locker)
{
if (queue.Count > 0)
{
return queue.Dequeue();
}
count++;
return new T();
}
}
/// <summary>
/// Places an item in the pool.
/// </summary>
/// <param name="item">The item to place to the pool.</param>
public void Put(T item)
{
lock (locker)
{
if (count < size)
{
queue.Enqueue(item);
}
else
{
using (item as IDisposable)
{
count--;
}
}
}
}
/// <summary>
/// Disposes of items in the pool that implement IDisposable.
/// </summary>
public void Dispose()
{
lock (locker)
{
count = 0;
while (queue.Count > 0)
{
using (queue.Dequeue() as IDisposable)
{
}
}
}
}
}
Example Usage
public class ThisObject
{
private readonly ObjectPool<That> pool = new ObjectPool<That>(100);
public void ThisMethod()
{
var that = pool.Get();
try
{
// Use that ....
}
finally
{
pool.Put(that);
}
}
}
Sample from MSDN: How to: Create an Object Pool by Using a ConcurrentBag
Back in the day Microsoft provided a framework through Microsoft Transaction Server (MTS) and later COM+ to do object pooling for COM objects. That functionality was carried forward to System.EnterpriseServices in the .NET Framework and now in Windows Communication Foundation.
Object Pooling in WCF
This article is from .NET 1.1 but should still apply in the current versions of the Framework (even though WCF is the preferred method).
Object Pooling .NET
I really like Aronaught's implementation -- especially since he handles the waiting on resource to become available through the use of a semaphore. There are several additions I would like to make:
Change sync.WaitOne() to sync.WaitOne(timeout) and expose the timeout as a parameter on Acquire(int timeout) method. This would also necessitate handling the condition when the thread times out waiting on an object to become available.
Add Recycle(T item) method to handle situations when an object needs to be recycled when a failure occurs, for example.
This is another implementation, with limited number of objects in pool.
public class ObjectPool<T>
where T : class
{
private readonly int maxSize;
private Func<T> constructor;
private int currentSize;
private Queue<T> pool;
private AutoResetEvent poolReleasedEvent;
public ObjectPool(int maxSize, Func<T> constructor)
{
this.maxSize = maxSize;
this.constructor = constructor;
this.currentSize = 0;
this.pool = new Queue<T>();
this.poolReleasedEvent = new AutoResetEvent(false);
}
public T GetFromPool()
{
T item = null;
do
{
lock (this)
{
if (this.pool.Count == 0)
{
if (this.currentSize < this.maxSize)
{
item = this.constructor();
this.currentSize++;
}
}
else
{
item = this.pool.Dequeue();
}
}
if (null == item)
{
this.poolReleasedEvent.WaitOne();
}
}
while (null == item);
return item;
}
public void ReturnToPool(T item)
{
lock (this)
{
this.pool.Enqueue(item);
this.poolReleasedEvent.Set();
}
}
}
Java oriented, this article expose the connectionImpl pool pattern and the abstracted object pool pattern and could be a good first approach :
http://www.developer.com/design/article.php/626171/Pattern-Summaries-Object-Pool.htm
Object pool Pattern:
You may use the NuGet package Microsoft.Extensions.ObjectPool
Documentations here:
https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-3.1
https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.objectpool
Related
I use ConcurrentDictionary to collect data in memory in web api application. Using api methods I add and update objects in ConcurrentDictionary. And there is background thread which analyze and clean up this dictionary based on object properties. Now I'm considering two approaches:
1. use lock on dictionary item in updateValueFactory in AddOrUpdate method, but question is how to read properties properly to be sure I have the latest version of it and that I'm not reading property in not stable state.
public class ThreadsafeService2
{
private readonly ConcurrentDictionary<string, ThreadSafeItem2> _storage =
new ConcurrentDictionary<string, ThreadSafeItem2>();
public void AddOrUpdate(string name)
{
var newVal = new ThreadSafeItem2();
_storage.AddOrUpdate(name, newVal, (key, oldVal) =>
{
//use lock
lock (oldVal)
{
oldVal.Increment();
}
return oldVal;
});
}
public void Analyze()
{
foreach (var key in _storage.Keys)
{
if (_storage.TryGetValue(key, out var item))
{
//how to read it properly?
long ticks = item.ModifiedTicks;
}
}
}
}
public class ThreadSafeItem2
{
private long _modifiedTicks;
private int _counter;
public void Increment()
{
//no interlocked here
_modifiedTicks = DateTime.Now.Ticks;
_counter++;
}
//now interlocked here
public long ModifiedTicks => _modifiedTicks;
public int Counter => _counter;
}
2. use Interlocked and memory barriers on property level without lock, looks a bit verbose for me.
public class ThreadsafeService1
{
private readonly ConcurrentDictionary<string, ThreadSafeItem1> _storage =
new ConcurrentDictionary<string, ThreadSafeItem1>();
public void AddOrUpdate(string name)
{
var newVal = new ThreadSafeItem1();
_storage.AddOrUpdate(name, newVal, (key, oldVal) =>
{
//no lock here
oldVal.Increment();
return oldVal;
});
}
public void Analyze()
{
foreach(var key in _storage.Keys)
{
if(_storage.TryGetValue(key, out var item))
{
//reading through interloacked
long ticks = item.ModifiedTicks;
}
}
}
}
public class ThreadSafeItem1
{
private long _modifiedTicks;
private int _counter;
public void Increment()
{
//make changes in atomic manner
Interlocked.Exchange(ref _modifiedTicks, DateTime.Now.Ticks);
Interlocked.Increment(ref _counter);
}
public long ModifiedTicks => Interlocked.Read(ref _modifiedTicks);
public int Counter => Thread.VolatileRead(ref _counter);
}
What is the best practices here?
So both of your implementations have major problems. The first solution locks when incrementing, but doesn't lock when reading, meaning the other places accessing the data can read invalid state.
A non-technical problem, but a major issue nonetheless, is that you've named your class ThreadSaveItem and yet it's not actually designed to be accessed safely from multiple threads. It's the callers responsibility, in this implementation, to ensure that the item isn't accessed from multiple threads. If I see a class called ThreadSafeItem I'm going to assume it's safe to access it from multiple threads, and that I don't need to synchronize my access to it so long as each operation I perform is the only thing that needs to be logically atomic.
Your Interlocked solution is problematic in that you have to fields that you're modifying, that are conceptually tied together, but you don't synchronize their changes together, meaning someone can observe a modification to one and not the other, which is a problem for that code.
Next, your use of AddOrUpdate in both solutions isn't really appropriate. The whole point of the method call is to add an item or replace it with another item, not to mutate the provided item (that's why it takes a return value; you're supposed to produce a new item). If you want to go with the approach of getting a mutable item and mutating it, the way to go would be to call GetOrAdd to either get an existing item or create a new one, and then to mutate it in a thread safe manner using the returned value.
The whole solution is radically simplified by simply making ThreadSafeItem immutable. It lets you use AddOrUpdate on the ConcurrentDictionary for the update, and it means that the only synchronization that needs to be done is the updating of the value of the ConcurrentDictionary, and it already handles synchronization of its own state, no synchronization needs to be done at all when accessing ThreadSafeItem, because all access to the data is inherently thread safe because it's immutable. This means that you never actually need to write any synchronization code at all, which is exactly what you want to strive for whenever possible.
And finally, we have the actual code:
public class ThreadsafeService3
{
private readonly ConcurrentDictionary<string, ThreadSafeItem3> _storage =
new ConcurrentDictionary<string, ThreadSafeItem3>();
public void AddOrUpdate(string name)
{
_storage.AddOrUpdate(name, _ => new ThreadSafeItem3(), (_, oldValue) => oldValue.Increment());
}
public void Analyze()
{
foreach (var pair in _storage)
{
long ticks = pair.Value.ModifiedTicks;
//Note, the value may have been updated since we checked;
//you've said you don't care and it's okay for a newer item to be removed here if it loses the race.
if (isTooOld(ticks))
_storage.TryRemove(pair.Key, out _);
}
}
}
public class ThreadSafeItem3
{
public ThreadSafeItem3()
{
Counter = 0;
}
private ThreadSafeItem3(int counter)
{
Counter = counter;
}
public ThreadSafeItem3 Increment()
{
return new ThreadSafeItem3(Counter + 1);
}
public long ModifiedTicks { get; } = DateTime.Now.Ticks;
public int Counter { get; }
}
The solution proposed by Servy (using an immutable Item type) is probably the best solution for your scenario. I would also suggest switching from class to readonly struct for reducing the allocations, although the ConcurrentDictionary is probably going to wrap the struct in a reference-type Node internally, so you might not gain anything from this.
For the sake of completeness I will propose an alternative solution, which is to use the GetOrAdd instead of the AddOrUpdate, and lock on the Item whenever you are doing anything with it:
public class Item // Mutable and thread-unsafe
{
public long ModifiedTicks { get; private set; }
public int Counter { get; private set; }
public void Increment()
{
ModifiedTicks = DateTime.Now.Ticks;
Counter++;
}
}
public class Service
{
private readonly ConcurrentDictionary<string, Item> _storage = new();
public void AddOrUpdate(string name)
{
Item item = _storage.GetOrAdd(name, _ => new());
lock (item) item.Increment(); // Dont't forget to lock!
}
public void Analyze()
{
foreach (var (key, item) in _storage.ToArray())
{
lock (item) // Dont't forget to lock!
{
long ticks = item.ModifiedTicks;
}
}
}
}
This solution offers probably the best performance, but the burden of remembering to lock correctly everywhere cannot be underestimated.
I can't comment on the specifics of what exactly you are doing, but interlock and Concurrent dictionary is better than locks you do yourself.
I would question this approach though. Your data is important enough, but not so important to persist it? Depending on the usage of the application this approach will slow it down by some degree. Again, not knowing exactly what you are doing, you could throw each "Add" into an MSMQ, and then have an external exe run at some schedule to process the items. The website will just fire and forget, with no threading requirements.
Say we have a POD type:
private class Messages {
public byte[] last;
public byte[] next;
}
and its instance messages.
When a user (caller) requests instance we want to give him deep copy of messages object (that can be not the latest). When a user sets his own version we want to provide it to others as sun as possible yet not interrupting read requests (older version should be removed, not interrupting reads as soon as possible).
How to do such object versioning using System.Collections.Concurrent?
What have I tried:
internal class CuncurrentMessagesHelper {
private readonly ConcurrentStack<Messages> _stack = new ConcurrentStack<Messages>();
public CuncurrentMessagesHelper() {
}
public void SetLatest(Messages m) {
var length = _stack.Count;
_stack.Push(m);
var range = new Messages[length];
_stack.TryPopRange(range, 0, length);
}
public bool ReadLatest(out Messages result) {
return _stack.TryPeek(out result);
}
}
Yet such helper approach seems as ugly hack.
because even we know result is garanteed we use try and return bool instead of object;
it TryPopRange makes us create addintional array with size of all previous versions.
This isn't POD. It's a POCO. I suggest you read up on the difference between .NET's value types and reference types, since their semantics are crucial while writing safe concurrent code.
Since C# references are guaranteed to be atomic, the solution is simple (and doesn't require any special concurrent containers).
Assuming your Messages object is immutable once passed in:
internal class ConcurrentMessagesHelper {
private volatile Messages _current;
public void SetLatest(Messages m) {
_current = m;
}
public Messages ReadLatest() {
return _current;
}
}
Note that it's the reference to the object that's being copied here (atomically), and not the object's byte[] fields. volatile is required since the reference is accessed by multiple threads (it ensures correct behaviour, in particular with regards to memory ordering and limiting the optimizations the JIT can perform to only thread-safe ones).
If the Messages object passed to SetLatest can change while it's the latest, then all you have to do is make a copy first. SetLatest becomes:
public void SetLatest(Messages m) {
_current = DeepClone(m);
}
If readers are allowed to change the Messages object returned, then you have to copy it before letting them have it too. ReadLatest becomes:
public Messages ReadLatest() {
return DeepClone(_current);
}
Note that if the values contained in the byte[] fields of Messages are immutable during each message's lifetime, all you need is a shallow copy, not a deep one.
You can make the interface even nicer by wrapping it in a simple property:
internal class ConcurrentMessagesHelper {
private volatile Messages _current;
public Messages Current {
get { return DeepClone(_current); }
set { _current = DeepClone(value); }
}
private static Messages DeepClone(Messages m)
{
if (m == null)
return null;
return new Messages {
last = m.last == null ? null : (byte[])m.last.Clone(),
next = m.next == null ? null : (byte[])m.next.Clone()
};
}
}
If you actually did have a POD type (e.g. struct Messages), then I suggest the simplest solution would be to wrap it in a class so you can have an atomic reference to a copy of it, which would allow you to use the solution above. StrongBox<T> comes to mind.
The code in such a case becomes even simpler, because no explicit copying is required:
private struct Messages {
public byte[] last;
public byte[] next;
}
internal class ConcurrentMessagesHelper {
private volatile StrongBox<Messages> _current;
public Messages Current {
get { return _current.Value; }
set { _current = new StrongBox<Messages>(value); }
}
}
If the byte arrays in Messages can change during the object's lifetime, then we still need the deep cloning, though:
internal class ConcurrentMessagesHelper {
private volatile StrongBox<Messages> _current;
public Messages Current {
get { return DeepClone(_current.Value); }
set { _current = new StrongBox<Messages>(DeepClone(value)); }
}
private static Messages DeepClone(Messages m)
{
return new Messages {
last = m.last == null ? null : (byte[])m.last.Clone(),
next = m.next == null ? null : (byte[])m.next.Clone()
};
}
}
I have a static class 'Logger' with a public property called 'LogLevels' as in code below.
When the property is used concurrently in a multi-user or multi-threaded environment, could it cause problems?
Do I need to use thread synchronization for the code within the property 'LogLevels'?
public class Logger
{
private static List<LogLevel> _logLevels = null;
public static List<LogLevel> LogLevels
{
get
{
if (_logLevels == null)
{
_logLevels = new List<LogLevel>();
if (!string.IsNullOrWhiteSpace(System.Configuration.ConfigurationManager.AppSettings["LogLevels"]))
{
string[] lls = System.Configuration.ConfigurationManager.AppSettings["LogLevels"].Split(",".ToCharArray());
foreach (string ll in lls)
{
_logLevels.Add((LogLevel)System.Enum.Parse(typeof(LogLevel), ll));
}
}
}
if (_logLevels.Count == 0)
{
_logLevels.Add(LogLevel.Error);
}
return _logLevels;
}
}
}
UPDATE: I ended up using thread synchronization to solve concurrency problem in a static class, as in code below.
public class Logger
{
private static readonly System.Object _object = new System.Object();
private static List<LogLevel> _logLevels = null;
private static List<LogLevel> LogLevels
{
get
{
//Make sure that in a multi-threaded or multi-user scenario, we do not run into concurrency issues with this code.
lock (_object)
{
if (_logLevels == null)
{
_logLevels = new List<LogLevel>();
if (!string.IsNullOrWhiteSpace(System.Configuration.ConfigurationManager.AppSettings["SimpleDBLogLevelsLogger"]))
{
string[] lls = System.Configuration.ConfigurationManager.AppSettings["SimpleDBLogLevelsLogger"].Split(",".ToCharArray());
foreach (string ll in lls)
{
_logLevels.Add((LogLevel)System.Enum.Parse(typeof(LogLevel), ll));
}
}
}
if (_logLevels.Count == 0)
{
_logLevels.Add(LogLevel.Error);
}
}
return _logLevels;
}
}
}
When the property is used concurrently in a multi-user or multi-threaded environment, could it cause problems?
Absolutely. List<T> is not designed for multiple threads, except for the case where there are just multiple readers (no writers).
Do I need to use thread synchronization for the code within the property 'LogLevels'?
Well that's one approach. Or just initialize it on type initialization, and then return a read-only wrapper around it. (You really don't want multiple threads modifying it.)
Note that in general, doing significant amounts of work in a static constructor is a bad idea. Are you happy enough that if this fails, every access to this property will fail, forever?
This code posses race conditions and cannot be safely executed from multiple threads. The primary problem is the List<T> type is not thread safe yet this code will freely write to. This mean the writes can occur in parallel and hence break the implicit contract of List<T>
The short answer is "yes" and "yes" you do need threads synchronization.
The other question is, why re-invent the wheel? You can use something like log4net or .NET logging framework.
I would like to have a global object similar to a multi-value Dictionary that is shared among different Threads.
I would like the object to be created only once (for example getting the data from a Database) and then used by the different Threads.
The Object should be easily extendable with additional properties (currently have only JobName and URL).
If possible, I would prefer to avoid locking.
I am facing the following issues:
The current version displayed below is not Thread safe;
I cannot use a ConcurrentDictionary since I have extended the Dictionary object to allow multiple values for each key;
This is the object structure that should be modified easily:
public struct JobData
{
public string JobName;
public string URL;
}
I have extended the Dictionary object to allow multiple values for each key:
public class JobsDictionary : Dictionary<string, JobData>
{
public void Add(string key, string jobName, string url)
{
JobData data;
data.JobName = jobName;
data.URL = url;
this.Add(key, data);
}
}
Static class that is shared among Threads.
As you can see it creates a Dictionary entry for the specific Job the first time it is called for that Job.
For instance, the first time it is called for "earnings" it will create the "earnings" dictionary entry. This creates issues with Thread safety:
public static class GlobalVar
{
private static JobsDictionary jobsDictionary = new JobsDictionary();
public static JobData Job(string jobCat)
{
if (jobsDictionary.ContainsKey(jobCat))
return jobsDictionary[jobCat];
else
{
String jobName;
String url = null;
//TODO: get the Data from the Database
switch (jobCat)
{
case "earnings":
jobName="EarningsWhispers";
url = "http://www.earningswhispers.com/stocks.asp?symbol={0}";
break;
case "stock":
jobName="YahooStock";
url = "http://finance.yahoo.com/q?s={0}";
break;
case "functions":
jobName = "Functions";
url = null;
break;
default:
jobName = null;
url = null;
break;
}
jobsDictionary.Add(jobCat, jobName, url);
return jobsDictionary[jobCat];
}
}
In each Thread I get the specific Job property in this way:
//Get the Name
string JobName= GlobalVar.Job(jobName).JobName;
//Get the URL
string URL = string.Format((GlobalVar.Job(jobName).URL), sym);
How can I create a custom Dictionary that is "instantiated" once (I know it is not the right term since it is static...) and it is Thread-safe ?
Thanks
UPDATE
Ok, here is the new version.
I have simplified the code by removing the switch statement and loading all dictionary items at once (I need all of them anyway).
The advantage of this solution is that it is locked only once: when the dictionary data is added (the first Thread entering the lock will add data to the dictionary).
When the Threads access the dictionary for reading, it is not locked.
It should be Thread-Safe and it should not incur in deadlocks since jobsDictionary is private.
public static class GlobalVar
{
private static JobsDictionary jobsDictionary = new JobsDictionary();
public static JobData Job(string jobCat)
{
JobData result;
if (jobsDictionary.TryGetValue(jobCat, out result))
return result;
//if the jobsDictionary is not initialized yet...
lock (jobsDictionary)
{
if (jobsDictionary.Count == 0)
{
//TODO: get the Data from the Database
jobsDictionary.Add("earnings", "EarningsWhispers", "http://www.earningswhispers.com/stocks.asp?symbol={0}");
jobsDictionary.Add("stock", "YahooStock", "http://finance.yahoo.com/q?s={0}");
jobsDictionary.Add("functions", "Functions", null);
}
return jobsDictionary[jobCat];
}
}
}
If you are populating the collection once, you don't need any locking at all, since a Dictionary is thread-safe when it is only read from. If you want prevent multiple threads from initializing multiple times you can use a double-checked lock during initalization, like this:
static readonly object syncRoot = new object();
static Dictionary<string, JobData> cache;
static void Initialize()
{
if (cache == null)
{
lock (syncRoot)
{
if (cache == null)
{
cache = LoadFromDatabase();
}
}
}
}
Instead of allowing every thread to access the dictionary, hide it behind a facade that only exposes the operations you really need. This makes it much easier to reason about thread-safety. For instance:
public class JobDataCache : IJobData
{
readonly object syncRoot = new object();
Dictionary<string, JobData> cache;
public void AddJob(string key, JobData data)
{
lock (this.syncRoot)
{
cache[key] = data;
}
}
}
Trying to prevent locking without having measured that locking actually has a too big impact on performance is bad. Prevent doing that. Often using a simple lock statement is much simpler than writing lock-free code. There is a nasty problem with concurrency bugs compared to normal software bugs. They are very hard to reproduce and very hard to track down. If you can, prevent writing concurrency bugs. You can do this by writing the simplest code you can, even if it is slower. If it proves to be too slow, you can always optimize.
If you want to write lock-free code anyway, try using immutable data structures, or prevent changing existing data. This is one trick I used when writing the Simple Injector (a reusable library). In this framework, I never update the internal dictionary, but always completely replace it with a new one. The dictionary itself is therefore never changed, the reference to that instance is just replaced with a completely new dictionary. This prevents you from having to do locks completely. However, you must realize that it is possible to loose updates. In other words, when multiple threads are updating that dictionary, one can loose its changes, simply because each thread creates a new copy of that dictionary and adds its own value too its own copy, before making that reference public to other threads.
In other words, you can only use this method when external callers only read (and you can recover from lost changes, for instance by querying the database again).
UPDATE
Your updated version is still not thread-safe, because of the reasons I explained on #ili's answer. The following will do the trick:
public static class GlobalVar
{
private static readonly object syncRoot = new object();
private static JobsDictionary jobsDictionary = null;
public static JobData Job(string jobCat)
{
Initialize();
return jobsDictionary[jobCat];
}
private void Initialize()
{
// Double-checked lock.
if (jobsDictionary == null)
{
lock (syncRoot)
{
if (jobsDictionary == null)
{
jobsDictionary = CreateJobsDictionary();
}
}
}
}
private static JobsDictionary CreateJobsDictionary()
{
var jobs = new JobsDictionary();
//TODO: get the Data from the Database
jobs.Add("earnings", "EarningsWhispers", "http://...");
jobs.Add("stock", "YahooStock", "http://...");
jobs.Add("functions", "Functions", null);
return jobs;
}
}
You can also use the static constructor, which would prevent you from having to write the double checked lock yourself. However, it is dangarous to call the database inside a static constructor, because a static constructor will only run once and when it fails, the complete type will be unusable for as long as the AppDomain lives. In other words your application must be restarted when this happens.
UPDATE 2:
You can also use .NET 4.0's Lazy<T>, which is safer than a double checked lock, since it is easier to implement (and easier to implement correctly) and is is also thread-safe on processor architectures with weak memory models (weaker than x86 such as ARM):
static Lazy<Dictionary<string, JobData>> cache =
new Lazy<Dictionary<string, JobData>>(() => LoadFromDatabase());
1) Use singleton patern to have one instance (one of the ways is to use static class as you have done)
2) To make anything thread safe you should use lock or it's analog. If you are afraids of unnessessary locks do like this:
public object GetValue(object key)
{
object result;
if(_dictionary.TryGetValue(key, out result)
return result;
lock(_dictionary)
{
if(_dictionary.TryGetValue(key, out result)
return result;
//some get data code
_dictionary[key]=result;
return result;
}
}
Often, when I want a class which is thread-safe, I do something like the following:
public class ThreadSafeClass
{
private readonly object theLock = new object();
private double propertyA;
public double PropertyA
{
get
{
lock (theLock)
{
return propertyA;
}
}
set
{
lock (theLock)
{
propertyA = value;
}
}
}
private double propertyB;
public double PropertyB
{
get
{
lock (theLock)
{
return propertyB;
}
}
set
{
lock (theLock)
{
propertyB = value;
}
}
}
public void SomeMethod()
{
lock (theLock)
{
PropertyA = 2.0 * PropertyB;
}
}
}
It works, but it is very verbose. Sometimes I even create a lock object for each method and property creating more verbosity and complexity.
I know that it is also possible to lock classes using the Synchronization attribute but I'm not sure how well that scales -- as I often expect to have hundreds of thousands, if not millions, of instances of thread-safe objects. This approach would create a synchronization context for every instance of the class, and requires the class to be derived from ContextBoundObject and therefore could not be derived from anything else -- since C# doesn't allow for multiple inheritance -- which is a show stopper in many cases.
Edit: As several of the responders have emphasized, there is no "silver bullet" thread-safe class design. I'm just trying to understand if the pattern I'm using is one of the good solutions. Of course the best solution in any particular situation is problem dependent. Several of the answers below contain alternative designs which should be considered.
Edit: Moreover, there is more than one definition of thread safety. For example, in my implementation above, the following code would NOT be thread-safe:
var myObject = new ThreadSafeClass();
myObject.PropertyA++; // NOT thread-safe
So, does the class definition above represent a good approach? If not, what would you recommend for a design with similar behavior which would be thread-safe for a similar set of uses?
There is no "one-size-fits-all" solution to the multi-threading problem. Do some research on creating immutable classes and learn about the different synchronization primitives.
This is an example of a semi-immutable or the-programmers-immutable class .
public class ThreadSafeClass
{
public double A { get; private set; }
public double B { get; private set; }
public double C { get; private set; }
public ThreadSafeClass(double a, double b, double c)
{
A = a;
B = b;
C = c;
}
public ThreadSafeClass RecalculateA()
{
return new ThreadSafeClass(2.0 * B, B, C);
}
}
This example moves your synchronization code into another class and serializes access to an instance. In reality, you don't really want more than one thread operating on an object at any given time.
public class ThreadSafeClass
{
public double PropertyA { get; set; }
public double PropertyB { get; set; }
public double PropertyC { get; set; }
private ThreadSafeClass()
{
}
public void ModifyClass()
{
// do stuff
}
public class Synchronizer
{
private ThreadSafeClass instance = new ThreadSafeClass();
private readonly object locker = new object();
public void Execute(Action<ThreadSafeClass> action)
{
lock (locker)
{
action(instance);
}
}
public T Execute<T>(Func<ThreadSafeClass, T> func)
{
lock (locker)
{
return func(instance);
}
}
}
}
Here is a quick example of how you would use it. It may seem a little clunky but it allows you to execute many actions on the instance in one go.
var syn = new ThreadSafeClass.Synchronizer();
syn.Execute(inst => {
inst.PropertyA = 2.0;
inst.PropertyB = 2.0;
inst.PropertyC = 2.0;
});
var a = syn.Execute<double>(inst => {
return inst.PropertyA + inst.PropertyB;
});
I know this might sound like an smart a** answer but ... the BEST way to develop threadsafe classes is to actually know about multithreading, about its implications, its intricacies and what does it implies. There's no silver bullet.
First you need a good reason to use it. Threads are a tool, you don't want to hit everything with your new found hammer.
Secondly, learn about the problems of multithreading... deadlocks, race conditions, starvation and so on
Third, make sure is worth it. I'm talking about benefit/cost.
Finally... be prepared to heavy debugging. Debugging multithreaded code is much more difficult than standard old sequential code. Learn some techniques about how to do that.
Seriously... don't try to multithread (in production scenarios I mean) until you know what you're getting yourself into... It can be a huge mistake.
Edit: You should of course know the synchronization primitives of both the operating system and your language of choice (C# under Windows in this case, I guess).
I'm sorry I'm not giving just the code to just make a class threadsafe. That's because it does not exist. A completely threadsafe class will probably just be slower than just avoiding threads and will probably act as a bottleneck to whatever you're doing... effectively undoing whatever you thing you're achieving by using threads.
Bear in mind that the term "thread safe" is not specific; what you're doing here would be more accurately referred to as "synchronization" through the use of a Monitor lock.
That said, the verbosity around synchronized code is pretty much unavoidable. You could cut down on some of the whitespace in your example by turning things like this:
lock (theLock)
{
propertyB = value;
}
into this:
lock (theLock) propertyB = value;
As to whether or not this is the right approach for you we really need more information. Synchronization is just one approach to "thread safety"; immutable objects, semaphores, etc. are all different mechanisms that fit different use-cases. For the simple example you provide (where it looks like you're trying to ensure the atomicity of a get or set operation), then it looks like you've done the right things, but if your code is intended to be more of an illustration than an example then things may not be as simple.
Since no else seems to be doing it, here is some analysis on your specific design.
Want to read any single property? Threadsafe
Want to update to any single property? Threadsafe
Want to read a single property and then update it based on its original value? Not Threadsafe
Thread 2 could update the value between thread 1's read and update.
Want to update two related properties at the same time? Not Threadsafe
You could end up with Property A having thread 1's value and Property B having thread 2's value.
Thread 1 Update A
Thread 2 Update A
Thread 1 Update B
Thread 2 Update B
Want to read two related properties at the same time? Not Threadsafe
Again, you could be interrupted between the first and second read.
I could continue, but you get the idea. Threadsafety is purely based on how you plan to access the objects and what promises you need to make.
You may find the Interlocked class helpful. It contains several atomic operations.
One thing you could do that could help you avoid the extra code is use something like PostSharp to automatically inject those lock statements into your code, even if you had hundreds of them. All you'd need is one attribute attached to the class, and the attribute's implementation which would add the extra locking variables.
As per my comment above - it gets a little hairier if you want simultaneous readers allowed but only one writer allowed. Note, if you have .NET 3.5, use ReaderWriterLockSlim rather than ReaderWriterLock for this type of pattern.
public class ThreadSafeClass
{
private readonly ReaderWriterLock theLock = new ReaderWriterLock();
private double propertyA;
public double PropertyA
{
get
{
theLock.AcquireReaderLock(Timeout.Infinite);
try
{
return propertyA;
}
finally
{
theLock.ReleaseReaderLock();
}
}
set
{
theLock.AcquireWriterLock(Timeout.Infinite);
try
{
propertyA = value;
}
finally
{
theLock.ReleaseWriterLock();
}
}
}
private double propertyB;
public double PropertyB
{
get
{
theLock.AcquireReaderLock(Timeout.Infinite);
try
{
return propertyB;
}
finally
{
theLock.ReleaseReaderLock();
}
}
set
{
theLock.AcquireWriterLock(Timeout.Infinite);
try
{
propertyB = value;
}
finally
{
theLock.ReleaseWriterLock();
}
}
}
public void SomeMethod()
{
theLock.AcquireWriterLock(Timeout.Infinite);
try
{
theLock.AcquireReaderLock(Timeout.Infinite);
try
{
PropertyA = 2.0 * PropertyB;
}
finally
{
theLock.ReleaseReaderLock();
}
}
finally
{
theLock.ReleaseWriterLock();
}
}
}