I have a class representing a unique real-world object, its name is personally identifiable information (a vehicle license plate if you're interested) so as a basic first step I am hashing the name and using that instead. (I know - salt required etc. - this is just a foundation)
I have a test which instantiates the object with a fixed input (a test name - "A") and asserts that the Id (Base 64 string of the hash output) is as expected. However, it occasionally fails(!!)
I dug a bit deeper and here's a screenshot of a conditional breakpoint which breaks only when the hash output isn't the norm. The input is still as expected (the 'bytes' variable contains { 65 }, but the output is different from the normal sha384 output hash (normally it's "rRSq8lAgvvL9Tj617AxQJyzf1mB0sO0DfJoRJUMhqsBymYU3S+6qW4ClBNBIvhhk")
Lines 19-25 are split up a bit to facilitate debugging, but otherwise this class is as it should be.
Any clues as to how this is possible would be very welcome. Running Windows 11, using .NET 7 and the latest version of Visual Studio Enterprise.
Here's a pic of a conditional breakpoint being hit where hash output is not the norm:
Here's the code should anyone wish to try to reproduce it (note it's not consistent: it's just occasional)
using System.Security.Cryptography;
using System.Text;
namespace Domain.Models.Object
{
/// <summary>
/// One-way identifier
/// </summary>
public record ObjectIdentifier
{
private static SHA384 hash = SHA384.Create();
public ObjectIdentifier(string name)
{
if (name == null)
{
throw new ArgumentNullException(nameof(name));
}
var bytes = Encoding.UTF8.GetBytes(name);
Id = Convert.ToBase64String(hash.ComputeHash(bytes));
int x = 0;
x++;
}
public string Id { get; init; }
public override string ToString()
{
return Id;
}
}
}
And here's the test:
[Fact]
public void ObjectIdentifier_AsExpected()
{
// arrange
var obj = new ObjectIdentifier("A");
// assert
Assert.Equal("rRSq8lAgvvL9Tj617AxQJyzf1mB0sO0DfJoRJUMhqsBymYU3S+6qW4ClBNBIvhhk", obj.Id);
}
I also note that the new hash value is not consistently the same: here's another failure with a different output from the previous screenshot:
I also note that adding this to line 20 causes the inconsistency to stop happening altogether...
Unfortunately, this is not a suitable fix :P
Debug.Assert(bytes.Length == 1 && bytes[0] == 65)
Update
The invalid outputs appear to just be the two supplied above, I haven't observed any further variants.
Also, changing it to be a class (instead of a record) makes no difference.
I'm also observing this effect in a test console app which has only two identifiers supplied to it, but in fact more than two hashes are output:
So, as Matt kindly pointed out in the comments, this operation isn't thread-safe. The shared worker object (shared because it was static) was to blame. An easy one to miss - and a great example of why unit tests are a must!
I opted to use an object pool to get a unique worker per instantiation while allowing for some reduction in allocation overhead:
using Domain.Implementations;
using Microsoft.Extensions.ObjectPool;
using System.Text;
namespace Domain.Models.Object
{
/// <summary>
/// One-way identifier
/// </summary>
public class ObjectIdentifier
{
private static ObjectPool<HashWrapper> hashObjects = ObjectPool.Create<HashWrapper>();
public ObjectIdentifier(string name)
{
if (name == null)
{
throw new ArgumentNullException(nameof(name));
}
var hashworker = hashObjects.Get();
try
{
Id = Convert.ToBase64String(hashworker.Value.ComputeHash(Encoding.UTF8.GetBytes(name)));
}
finally
{
hashObjects.Return(hashworker);
}
}
public string Id { get; init; }
public override string ToString()
{
return Id;
}
}
}
The hash wrapper class (wraps the worker and provides a public constructor for the object pool)
internal class HashWrapper
{
public HashWrapper()
{
Value = SHA384.Create();
}
public HashAlgorithm Value { get; private set; }
}
Update
This implementation with a shared pool provided a (very minor) increase in speed and memory allocation when constructing higher numbers of objects. For smaller numbers, there is a (very minor) performance penalty. YMMV - see below.
This benchmark ran a version with Shared = true (normal lock on a shared object), false (new sha worker instance for each instantiation) and null [? in the column] : to signal using an object pool
Related
One of our projects makes use of key-value pairs where certain runtime parameters - that do not change per instance of the program - determine the value gotten. For example:
Program run in test mode with the parameter "Municipal":
Key: "testMunicipalDirectory"
Value: "C:\Foo\Bar\"
Program run with the parameter "State":
Key: "StateDirectory"
Value: "C:\Bar\Baz\"
To make it slightly more complicated, if there is no matching key for, say "testMunicipalImagesDirectory", there is a fallback of "defaultImagesDirectory":
Key: "testMunicipalImagesDirectory" ?? "defaultImagesDirectory"
Value: "C:\Foo\Bar\Images" ?? "C:\Images"
Currently, there's a lot of code duplication/inefficiencies, and room for error. Every time one of these is referenced there's string concatenation and null-coalescing and other stuff going on.
It seems like this would benefit from a single-instance object that is passed certain parameters on initialization (test or not, "State" or "Municipal", etc), that will return the correct values for each different property the keys represent.
Many answers I found to questions asking how to use the singleton design pattern with parameters basically boil down to "if it uses parameters, you probably do not want a singleton". In my case, it is invalid to attempt to initialize the object with different values, and an exception should be thrown if this happens.
This is how I would accomplish this goal (pseudo-C#) (lazy-loading is not a requirement but is used here):
public sealed class Helper
{
// how can we enforce that Init has been called?
private static readonly Lazy<Helper> lazyLoader = new Lazy<Helper>(() => new Helper(name, test));
public static Helper Instance { get { return lazyLoader.Value; } }
public static void Init(string name, bool test)
{
// if it has already been initalized
throw new InvalidOperationException("This has already been initalized.");
// else initalize it
}
private string Name { get; set; }
private bool Test { get; set; }
private Helper(string name, bool test) { } // assign to properties, any other ctor logic
public string Directory
{ get { return ValueGetter.Get((this.Test ? "test" : "") + this.Name + "Directory"); } }
}
public static class ValueGetter
{
public static string Get(string key, string fallbackKey)
{
if (Keys.Any(k => k == key))
return Keys[key].Value;
else
return Keys[fallbackKey].Value;
}
}
But as you can see, there are questions remaining. How can it enforce calling Init before using the Instance, but not require those parameters to be passed every time Instance is accessed?
Is this the correct direction to go, or is there a better design pattern to use?
I'm want a unique ID (preferably static, without computation) for each class implementation, but not instance. The most obvious way to do this is just hardcode a value in the class, but keeping the values unique becomes a task for an human and isn't ideal.
class Base
{
abstract int GetID();
}
class Foo: Base
{
int GetID() => 10;
}
class Bar: Base
{
int GetID() => 20;
}
Foo foo1 = new Foo();
Foo foo2 = new Foo();
Bar bar = new Bar();
foo1.GetID() == foo2.GetID();
foo1.GetID() != bar.GetID()
The class name would be an obvious unique identifier, but I need an int (or fixed length bytes). I pack the entire object into bytes, and use the id to know what class it is when I unpack it at the other end.
Hashing the class name every time I call GetID() seems needlessly process heavy just to get an ID number.
I could also make an enum as a lookup, but again I need to populate the enum manually.
EDIT: People have been asking important questions, so I'll put the info here.
Needs to be unique per class, not per instance (this is why the identified duplicate question doesn't answer this one).
ID value needs to be persistent between runs.
Value needs to be fixed length bytes or int. Variable length strings such as class name are not acceptable.
Needs to reduce CPU load wherever possible (caching results or using assembly based metadata instead of doing a hash each time).
Ideally, the ID can be retrieved from a static function. This means I can make a static lookup function that matches ID to class.
Number of different classes that need ID isn't that big (<100) so collisions isn't a major concern.
EDIT2:
Some more colour since people are skeptical that this is really needed. I'm open to a different approach.
I'm writing some networking code for a game, and its broken down into message objects. Each different message type is a class that inherits from MessageBase, and adds it's own fields which will be sent.
The MessageBase class has a method for packing itself into bytes, and it sticks a message identifier (the class ID) on the front. When it comes to unpacking it at the other end, I use the identifier to know how to unpack the bytes. This results in some easy to pack/unpack messages and very little overhead (few bytes for ID, then just class property values).
Currently I hard code an ID number in the classes, but it doesn't seem like the best way of doing things.
EDIT3: Here is my code after implementing the accepted answer.
public class MessageBase
{
public MessageID id { get { return GetID(); } }
private MessageID cacheId;
private MessageID GetID()
{
// Check if cacheID hasn't been intialised
if (cacheId == null)
{
// Hash the class name
MD5 md5 = MD5.Create();
byte[] md5Bytes = md5.ComputeHash(Encoding.UTF8.GetBytes(GetType().AssemblyQualifiedName));
// Convert the first few bytes into a uint32, and create the messageID from it and store in cache
cacheId = new MessageID(BitConverter.ToUInt32(md5Bytes, 0));
}
// Return the cacheId
return cacheId;
}
}
public class Protocol
{
private Dictionary<Type, MessageID> messageTypeToId = new Dictionary<Type, int>();
private Dictionary<MessageID, Type> idToMessageType = new Dictionary<int, Type>();
private Dictionary<MessageID, Action<MessageBase>> handlers = new Dictionary<int, Action<MessageBase>>();
public Protocol()
{
// Create a list of all classes that are a subclass of MessageBase this namespace
IEnumerable<Type> messageClasses = from t in Assembly.GetExecutingAssembly().GetTypes()
where t.Namespace == GetType().Namespace && t.IsSubclassOf(typeof(MessageBase))
select t;
// Iterate through the list of message classes, and store their type and id in the dicts
foreach(Type messageClass in messageClasses)
{
MessageID = (MessageID)messageClass.GetField("id").GetValue(null);
messageTypeToId[messageClass] = id;
idToMessageType[id] = messageClass;
}
}
}
Given that you can get a Type by calling GetType on the instance, you can easily cache the results. That reduces the problem to working out how to generate an ID for each type. You'd then call something like:
int id = typeIdentifierCache.GetIdentifier(foo1.GetType());
... or make GetIdentifier accept object and it can call GetType(), leaving you with
int id = typeIdentifierCache.GetIdentifier(foo1);
At that point, the detail is all in the type identifier cache.
A simple option would be to take a hash (e.g. SHA-256, for stability and making it very unlikely that you'll encounter collisions) of the fully-qualified type name. To prove that you have no collisions, you could easily write a unit test that runs over all the type names in the assembly and hashes them, then checks there are no duplicates. (Even that might be overkill, given the nature of SHA-256.)
This is all assuming that the types are in a single assembly. If you need to cope with multiple assemblies, you may want to hash the assembly-qualified name instead.
Here is one suggestion. I have used a sha256 byte array which is guaranteed to be a fixed size and astronomically unlikely to have a collision. That may well be overkill, you can easily substitute it out for something smaller. You could also use the AssemblyQualifiedName rather than FullName if you need to worry about version differences or the same class name in multiple assemblies
Firstly, here are all my usings
using System;
using System.Collections.Concurrent;
using System.Text;
using System.Security.Cryptography;
Next, a static cached type hasher object to remember the mapping between your types and the resulting byte arrays. You don't need the Console.WriteLines below, they are just there to demonstrate that you are not computing it over and over again.
public static class TypeHasher
{
private static ConcurrentDictionary<Type, byte[]> cache = new ConcurrentDictionary<Type, byte[]>();
public static byte[] GetHash(Type type)
{
byte[] result;
if (!cache.TryGetValue(type, out result))
{
Console.WriteLine("Computing Hash for {0}", type.FullName);
SHA256Managed sha = new SHA256Managed();
result = sha.ComputeHash(Encoding.UTF8.GetBytes(type.FullName));
cache.TryAdd(type, result);
}
else
{
// Not actually required, but shows that hashing only done once per type
Console.WriteLine("Using cached Hash for {0}", type.FullName);
}
return result;
}
}
Next, an extension method on object so that you can ask for anything's id. Of course if you have a more suitable base class, it doesn't need to go on object per se.
public static class IdExtension
{
public static byte[] GetId(this object obj)
{
return TypeHasher.GetHash(obj.GetType());
}
}
Next, here are some random classes
public class A
{
}
public class ChildOfA : A
{
}
public class B
{
}
And finally, here is everything put together.
public class Program
{
public static void Main()
{
A a1 = new A();
A a2 = new A();
B b1 = new B();
ChildOfA coa = new ChildOfA();
Console.WriteLine("a1 hash={0}", Convert.ToBase64String(a1.GetId()));
Console.WriteLine("b1 hash={0}", Convert.ToBase64String(b1.GetId()));
Console.WriteLine("a2 hash={0}", Convert.ToBase64String(a2.GetId()));
Console.WriteLine("coa hash={0}", Convert.ToBase64String(coa.GetId()));
}
}
Here is the console output
Computing Hash for A
a1 hash=VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
Computing Hash for B
b1 hash=335w5QIVRPSDS77mSp43if68S+gUcN9inK1t2wMyClw=
Using cached Hash for A
a2 hash=VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
Computing Hash for ChildOfA
coa hash=wSEbCG22Dyp/o/j1/9mIbUZTbZ82dcRkav4olILyZs4=
On the other side, you would use reflection to iterate all of the types in your library and store a reverse dictionary of hash to type.
Have not seen you answer the question if the same value needs to persist between different runs, but if all you need is a unique ID for a class, then use the built-in and simple GetHashCode method:
class BaseClass
{
public int ClassId() => typeof(this).GetHashCode();
}
If you are worried about performance of multiple calls to GetHashCode(), then first, don't, that is ridiculous micro-optimization, but if you insist, then store it.
GetHashCode() is fast, that is its entire purpose, as a fast way to compare values in a hash.
EDIT:
After doing some tests, the same hash code is returned between different runs using this method. I did not test after altering the classes, though, I am not aware of the exact method on how a Type is hashed.
Edit: all answers below (as at 19th Dec '16) are useful in making a decision. I accepted the most thorough answer to my question; but in the end chose to simply hash the file.
I am caching objects and using the assembly version as part of the key to invalidate the cached objects every time the build changes. This is inefficient because the actual class of the cached objects rarely change and are valid across builds.
How can I instead use a hash of the specific class signature (basically all properties) for the key, such that it only changes when the class itself changes?
I can think of a somewhat complicated way using reflection, but I wonder if there is a simple trick I'm missing or any compile time mechanism.
Thanks!
E.g. Signature of Foo --> #ABCD
public class Foo {
public string Bar {get; set;}
}
New signature of Foo (property type changed) --> #WXYZ
public class Foo {
public char[] Bar {get; set;}
}
As others have pointed out it is dangerous to do something like that because a signature doesn't define the logic behind it. That being sad:
This is an extensible approach:
The method basically uses reflection to crawl through all properties of your type.
It then gets some specific values of those properties and calls ToString() on them.
Those values are appended to a string and GetHashCode() will be used on that string.
private int GetTypeHash<T>()
{
var propertiesToCheck = typeof(T).GetProperties();
if(propertiesToCheck == null || propertiesToCheck.Length == 0)
return 0;
StringBuilder sb = new StringBuilder();
foreach(var propertyToCheck in propertiesToCheck)
{
//Some simple things that could change:
sb.Append((int)propertyToCheck.Attributes);
sb.Append(propertyToCheck.CanRead);
sb.Append(propertyToCheck.CanWrite);
sb.Append(propertyToCheck.IsSpecialName);
sb.Append(propertyToCheck.Name);
sb.Append(propertyToCheck.PropertyType.AssemblyQualifiedName);
//It might be an index property
var indexParams = propertyToCheck.GetIndexParameters();
if(indexParams != null && indexParams.Length != 0)
{
sb.Append(indexParams.Length);
}
//It might have custom attributes
var customAttributes = propertyToCheck.CustomAttributes;
if(customAttributes != null)
{
foreach(var cusAttr in customAttributes)
{
sb.Append(cusAttr.GetType().AssemblyQualifiedName);
}
}
}
return sb.ToString().GetHashCode();
}
You can hash the whole class file and use that as a key. When the file changes, the hash will change and that will meet your need
You can use the public properties of the class and generate an hash based on the name and type of each property:
int ComputeTypeHash<T>()
{
return typeof(T).GetProperties()
.SelectMany(p => new[] { p.Name.GetHashCode(), p.PropertyType.GetHashCode() })
.Aggregate(17, (h, x) => unchecked(h * 23 + x));
}
ComputeTypeHash<Foo_v1>().Dump(); // 1946663838
ComputeTypeHash<Foo_v2>().Dump(); // 1946663838
ComputeTypeHash<Foo_v3>().Dump(); // 1985957629
public class Foo_v1
{
public string Bar { get; set; }
}
public class Foo_v2
{
public string Bar { get; set; }
}
public class Foo_v3
{
public char[] Bar { get; set; }
}
Doing something like this is dangerous as you (or someone else) could be introducing logic into the properties themselves at some point. It's also possible that the properties make internal calls to other methods that do change (among other things). You won't be detecting changes that go beyond the signature so you are leaving the door open to disaster.
If these group of classes you refer to rarely change, consider moving them out of the main assembly and into their own one or even break it down into more than one assembly if it makes sense. That way their assembly(ies) will not change versions and there will be no cache refresh.
Given a Queue<MyMessage>, where MyMessage is the base class for some types of messages: all message types have different fields, so they will use a different amount of bytes. Therefore it would make sense to measure the fill level of this queue in terms of bytes rather than of elements present in the queue.
In fact, since this queue is associated with a connection, I could better control the message flow, reducing the traffic if the queue is nearly full.
In order to get this target, I thought to wrap a simple Queue with a custom class MyQueue.
public class MyQueue
{
private Queue<MyMessage> _outputQueue;
private Int32 _byteCapacity;
private Int32 _currentSize; // number of used bytes
public MyQueue(int byteCapacity)
{
this._outputQueue = new Queue<MyMessage>();
this._byteCapacity = byteCapacity;
this._currentSize = 0;
}
public void Enqueue(MyMessage msg)
{
this._outputQueue.Enqueue(msg);
this._currentSize += Marshal.SizeOf(msg.GetType());
}
public MyMessage Dequeue()
{
MyMessage result = this._outputQueue.Dequeue();
this._currentSize -= Marshal.SizeOf(result.GetType());
return result;
}
}
The problem is that this is not good for classes, because Marshal.SizeOf throws an ArgumentException exception.
Is it possible to calculate in some way the size of an object (instance of a class)?
Are there some alternatives to monitor the fill level of a queue in terms of bytes?
Are there any queues that can be managed in this way?
UPDATE: As an alternative solution I could add a method int SizeBytes() on each message type, but this solution seems a little ugly, although it would perhaps be the most efficient since You cannot easily measure a reference type.
public interface MyMessage
{
Guid Identifier
{
get;
set;
}
int SizeBytes();
}
The classes that implement this interface must, in addition to implementing the SizeBytes() method, also implement an Identifier property.
public class ExampleMessage
{
public Guid Identifier { get; set; } // so I have a field and its Identifier property
public String Request { get; set; }
public int SizeBytes()
{
return (Marshal.SizeOf(Identifier)); // return 16
}
}
The sizeof operator can not be used with Guid because it does not have a predefined size, so I use Marshal.SizeOf(). But at this point perhaps I should use the experimentally determined values: for example, since Marshal.SizeOf() returns 16 for a Guid and since a string consists of N char, then the SizeBytes() method could be as following:
public int SizeBytes()
{
return (16 + Request.Length * sizeof(char));
}
If you could edit the MyMessage base class with a virtual method SizeOf(), then you could have the message classes use the c# sizeof operator on its primitive types. If you can do that, the rest of your code is gold.
You can get an indication of the size of your objects by measuring the length of their binary serialization. Note that this figure will typically be higher than you expect, since .NET may also include metadata in the serialized representation. This approach would also require all your classes to be marked with the [Serializable] attribute.
public static long GetSerializedSize(object root)
{
using (var memoryStream = new MemoryStream())
{
var binaryFormatter = new BinaryFormatter();
binaryFormatter.Serialize(memoryStream, root);
return memoryStream.Length;
}
}
Does anyone have a good resource on implementing a shared object pool strategy for a limited resource in vein of Sql connection pooling? (ie would be implemented fully that it is thread safe).
To follow up in regards to #Aaronaught request for clarification the pool usage would be for load balancing requests to an external service. To put it in a scenario that would probably be easier to immediately understand as opposed to my direct situtation. I have a session object that functions similarly to the ISession object from NHibernate. That each unique session manages it's connection to the database. Currently I have 1 long running session object and am encountering issues where my service provider is rate limiting my usage of this individual session.
Due to their lack of expectation that a single session would be treated as a long running service account they apparently treat it as a client that is hammering their service. Which brings me to my question here, instead of having 1 individual session I would create a pool of different sessions and split the requests up to the service across those multiple sessions instead of creating a single focal point as I was previously doing.
Hopefully that background offers some value but to directly answer some of your questions:
Q: Are the objects expensive to create?
A: No objects are a pool of limited resources
Q: Will they be acquired/released very frequently?
A: Yes, once again they can be thought of NHibernate ISessions where 1 is usually acquired and released for the duration of every single page request.
Q: Will a simple first-come-first-serve suffice or do you need something more intelligent, i.e. that would prevent starvation?
A: A simple round robin type distribution would suffice, by starvation I assume you mean if there are no available sessions that callers become blocked waiting for releases. This isn't really applicable since the sessions can be shared by different callers. My goal is distribute the usage across multiple sessions as opposed to 1 single session.
I believe this is probably a divergence from a normal usage of an object pool which is why I originally left this part out and planned just to adapt the pattern to allow sharing of objects as opposed to allowing a starvation situation to ever occur.
Q: What about things like priorities, lazy vs. eager loading, etc.?
A: There is no prioritization involved, for simplicity's sake just assume that I would create the pool of available objects at the creation of the pool itself.
This question is a little trickier than one might expect due to several unknowns: The behaviour of the resource being pooled, the expected/required lifetime of objects, the real reason that the pool is required, etc. Typically pools are special-purpose - thread pools, connection pools, etc. - because it is easier to optimize one when you know exactly what the resource does and more importantly have control over how that resource is implemented.
Since it's not that simple, what I've tried to do is offer up a fairly flexible approach that you can experiment with and see what works best. Apologies in advance for the long post, but there is a lot of ground to cover when it comes to implementing a decent general-purpose resource pool. and I'm really only scratching the surface.
A general-purpose pool would have to have a few main "settings", including:
Resource loading strategy - eager or lazy;
Resource loading mechanism - how to actually construct one;
Access strategy - you mention "round robin" which is not as straightforward as it sounds; this implementation can use a circular buffer which is similar, but not perfect, because the pool has no control over when resources are actually reclaimed. Other options are FIFO and LIFO; FIFO will have more of a random-access pattern, but LIFO makes it significantly easier to implement a Least-Recently-Used freeing strategy (which you said was out of scope, but it's still worth mentioning).
For the resource loading mechanism, .NET already gives us a clean abstraction - delegates.
private Func<Pool<T>, T> factory;
Pass this through the pool's constructor and we're about done with that. Using a generic type with a new() constraint works too, but this is more flexible.
Of the other two parameters, the access strategy is the more complicated beast, so my approach was to use an inheritance (interface) based approach:
public class Pool<T> : IDisposable
{
// Other code - we'll come back to this
interface IItemStore
{
T Fetch();
void Store(T item);
int Count { get; }
}
}
The concept here is simple - we'll let the public Pool class handle the common issues like thread-safety, but use a different "item store" for each access pattern. LIFO is easily represented by a stack, FIFO is a queue, and I've used a not-very-optimized-but-probably-adequate circular buffer implementation using a List<T> and index pointer to approximate a round-robin access pattern.
All of the classes below are inner classes of the Pool<T> - this was a style choice, but since these really aren't meant to be used outside the Pool, it makes the most sense.
class QueueStore : Queue<T>, IItemStore
{
public QueueStore(int capacity) : base(capacity)
{
}
public T Fetch()
{
return Dequeue();
}
public void Store(T item)
{
Enqueue(item);
}
}
class StackStore : Stack<T>, IItemStore
{
public StackStore(int capacity) : base(capacity)
{
}
public T Fetch()
{
return Pop();
}
public void Store(T item)
{
Push(item);
}
}
These are the obvious ones - stack and queue. I don't think they really warrant much explanation. The circular buffer is a little more complicated:
class CircularStore : IItemStore
{
private List<Slot> slots;
private int freeSlotCount;
private int position = -1;
public CircularStore(int capacity)
{
slots = new List<Slot>(capacity);
}
public T Fetch()
{
if (Count == 0)
throw new InvalidOperationException("The buffer is empty.");
int startPosition = position;
do
{
Advance();
Slot slot = slots[position];
if (!slot.IsInUse)
{
slot.IsInUse = true;
--freeSlotCount;
return slot.Item;
}
} while (startPosition != position);
throw new InvalidOperationException("No free slots.");
}
public void Store(T item)
{
Slot slot = slots.Find(s => object.Equals(s.Item, item));
if (slot == null)
{
slot = new Slot(item);
slots.Add(slot);
}
slot.IsInUse = false;
++freeSlotCount;
}
public int Count
{
get { return freeSlotCount; }
}
private void Advance()
{
position = (position + 1) % slots.Count;
}
class Slot
{
public Slot(T item)
{
this.Item = item;
}
public T Item { get; private set; }
public bool IsInUse { get; set; }
}
}
I could have picked a number of different approaches, but the bottom line is that resources should be accessed in the same order that they were created, which means that we have to maintain references to them but mark them as "in use" (or not). In the worst-case scenario, only one slot is ever available, and it takes a full iteration of the buffer for every fetch. This is bad if you have hundreds of resources pooled and are acquiring and releasing them several times per second; not really an issue for a pool of 5-10 items, and in the typical case, where resources are lightly used, it only has to advance one or two slots.
Remember, these classes are private inner classes - that is why they don't need a whole lot of error-checking, the pool itself restricts access to them.
Throw in an enumeration and a factory method and we're done with this part:
// Outside the pool
public enum AccessMode { FIFO, LIFO, Circular };
private IItemStore itemStore;
// Inside the Pool
private IItemStore CreateItemStore(AccessMode mode, int capacity)
{
switch (mode)
{
case AccessMode.FIFO:
return new QueueStore(capacity);
case AccessMode.LIFO:
return new StackStore(capacity);
default:
Debug.Assert(mode == AccessMode.Circular,
"Invalid AccessMode in CreateItemStore");
return new CircularStore(capacity);
}
}
The next problem to solve is loading strategy. I've defined three types:
public enum LoadingMode { Eager, Lazy, LazyExpanding };
The first two should be self-explanatory; the third is sort of a hybrid, it lazy-loads resources but doesn't actually start re-using any resources until the pool is full. This would be a good trade-off if you want the pool to be full (which it sounds like you do) but want to defer the expense of actually creating them until first access (i.e. to improve startup times).
The loading methods really aren't too complicated, now that we have the item-store abstraction:
private int size;
private int count;
private T AcquireEager()
{
lock (itemStore)
{
return itemStore.Fetch();
}
}
private T AcquireLazy()
{
lock (itemStore)
{
if (itemStore.Count > 0)
{
return itemStore.Fetch();
}
}
Interlocked.Increment(ref count);
return factory(this);
}
private T AcquireLazyExpanding()
{
bool shouldExpand = false;
if (count < size)
{
int newCount = Interlocked.Increment(ref count);
if (newCount <= size)
{
shouldExpand = true;
}
else
{
// Another thread took the last spot - use the store instead
Interlocked.Decrement(ref count);
}
}
if (shouldExpand)
{
return factory(this);
}
else
{
lock (itemStore)
{
return itemStore.Fetch();
}
}
}
private void PreloadItems()
{
for (int i = 0; i < size; i++)
{
T item = factory(this);
itemStore.Store(item);
}
count = size;
}
The size and count fields above refer to the maximum size of the pool and the total number of resources owned by the pool (but not necessarily available), respectively. AcquireEager is the simplest, it assumes that an item is already in the store - these items would be preloaded at construction, i.e. in the PreloadItems method shown last.
AcquireLazy checks to see if there are free items in the pool, and if not, it creates a new one. AcquireLazyExpanding will create a new resource as long as the pool hasn't reached its target size yet. I've tried to optimize this to minimize locking, and I hope I haven't made any mistakes (I have tested this under multi-threaded conditions, but obviously not exhaustively).
You might be wondering why none of these methods bother checking to see whether or not the store has reached the maximum size. I'll get to that in a moment.
Now for the pool itself. Here is the full set of private data, some of which has already been shown:
private bool isDisposed;
private Func<Pool<T>, T> factory;
private LoadingMode loadingMode;
private IItemStore itemStore;
private int size;
private int count;
private Semaphore sync;
Answering the question I glossed over in the last paragraph - how to ensure we limit the total number of resources created - it turns out that the .NET already has a perfectly good tool for that, it's called Semaphore and it's designed specifically to allow a fixed number of threads access to a resource (in this case the "resource" is the inner item store). Since we're not implementing a full-on producer/consumer queue, this is perfectly adequate for our needs.
The constructor looks like this:
public Pool(int size, Func<Pool<T>, T> factory,
LoadingMode loadingMode, AccessMode accessMode)
{
if (size <= 0)
throw new ArgumentOutOfRangeException("size", size,
"Argument 'size' must be greater than zero.");
if (factory == null)
throw new ArgumentNullException("factory");
this.size = size;
this.factory = factory;
sync = new Semaphore(size, size);
this.loadingMode = loadingMode;
this.itemStore = CreateItemStore(accessMode, size);
if (loadingMode == LoadingMode.Eager)
{
PreloadItems();
}
}
Should be no surprises here. Only thing to note is the special-casing for eager loading, using the PreloadItems method already shown earlier.
Since almost everything's been cleanly abstracted away by now, the actual Acquire and Release methods are really very straightforward:
public T Acquire()
{
sync.WaitOne();
switch (loadingMode)
{
case LoadingMode.Eager:
return AcquireEager();
case LoadingMode.Lazy:
return AcquireLazy();
default:
Debug.Assert(loadingMode == LoadingMode.LazyExpanding,
"Unknown LoadingMode encountered in Acquire method.");
return AcquireLazyExpanding();
}
}
public void Release(T item)
{
lock (itemStore)
{
itemStore.Store(item);
}
sync.Release();
}
As explained earlier, we're using the Semaphore to control concurrency instead of religiously checking the status of the item store. As long as acquired items are correctly released, there's nothing to worry about.
Last but not least, there's cleanup:
public void Dispose()
{
if (isDisposed)
{
return;
}
isDisposed = true;
if (typeof(IDisposable).IsAssignableFrom(typeof(T)))
{
lock (itemStore)
{
while (itemStore.Count > 0)
{
IDisposable disposable = (IDisposable)itemStore.Fetch();
disposable.Dispose();
}
}
}
sync.Close();
}
public bool IsDisposed
{
get { return isDisposed; }
}
The purpose of that IsDisposed property will become clear in a moment. All the main Dispose method really does is dispose the actual pooled items if they implement IDisposable.
Now you can basically use this as-is, with a try-finally block, but I'm not fond of that syntax, because if you start passing around pooled resources between classes and methods then it's going to get very confusing. It's possible that the main class that uses a resource doesn't even have a reference to the pool. It really becomes quite messy, so a better approach is to create a "smart" pooled object.
Let's say we start with the following simple interface/class:
public interface IFoo : IDisposable
{
void Test();
}
public class Foo : IFoo
{
private static int count = 0;
private int num;
public Foo()
{
num = Interlocked.Increment(ref count);
}
public void Dispose()
{
Console.WriteLine("Goodbye from Foo #{0}", num);
}
public void Test()
{
Console.WriteLine("Hello from Foo #{0}", num);
}
}
Here's our pretend disposable Foo resource which implements IFoo and has some boilerplate code for generating unique identities. What we do is to create another special, pooled object:
public class PooledFoo : IFoo
{
private Foo internalFoo;
private Pool<IFoo> pool;
public PooledFoo(Pool<IFoo> pool)
{
if (pool == null)
throw new ArgumentNullException("pool");
this.pool = pool;
this.internalFoo = new Foo();
}
public void Dispose()
{
if (pool.IsDisposed)
{
internalFoo.Dispose();
}
else
{
pool.Release(this);
}
}
public void Test()
{
internalFoo.Test();
}
}
This just proxies all of the "real" methods to its inner IFoo (we could do this with a Dynamic Proxy library like Castle, but I won't get into that). It also maintains a reference to the Pool that creates it, so that when we Dispose this object, it automatically releases itself back to the pool. Except when the pool has already been disposed - this means we are in "cleanup" mode and in this case it actually cleans up the internal resource instead.
Using the approach above, we get to write code like this:
// Create the pool early
Pool<IFoo> pool = new Pool<IFoo>(PoolSize, p => new PooledFoo(p),
LoadingMode.Lazy, AccessMode.Circular);
// Sometime later on...
using (IFoo foo = pool.Acquire())
{
foo.Test();
}
This is a very good thing to be able to do. It means that the code which uses the IFoo (as opposed to the code which creates it) does not actually need to be aware of the pool. You can even inject IFoo objects using your favourite DI library and the Pool<T> as the provider/factory.
I've put the complete code on PasteBin for your copy-and-pasting enjoyment. There's also a short test program you can use to play around with different loading/access modes and multithreaded conditions, to satisfy yourself that it's thread-safe and not buggy.
Let me know if you have any questions or concerns about any of this.
Object Pooling in .NET Core
The dotnet core has an implementation of object pooling added to the base class library (BCL). You can read the original GitHub issue here and view the code for System.Buffers. Currently the ArrayPool is the only type available and is used to pool arrays. There is a nice blog post here.
namespace System.Buffers
{
public abstract class ArrayPool<T>
{
public static ArrayPool<T> Shared { get; internal set; }
public static ArrayPool<T> Create(int maxBufferSize = <number>, int numberOfBuffers = <number>);
public T[] Rent(int size);
public T[] Enlarge(T[] buffer, int newSize, bool clearBuffer = false);
public void Return(T[] buffer, bool clearBuffer = false);
}
}
An example of its usage can be seen in ASP.NET Core. Because it is in the dotnet core BCL, ASP.NET Core can share it's object pool with other objects such as Newtonsoft.Json's JSON serializer. You can read this blog post for more information on how Newtonsoft.Json is doing this.
Object Pooling in Microsoft Roslyn C# Compiler
The new Microsoft Roslyn C# compiler contains the ObjectPool type, which is used to pool frequently used objects which would normally get new'ed up and garbage collected very often. This reduces the amount and size of garbage collection operations which have to happen. There are a few different sub-implementations all using ObjectPool (See: Why are there so many implementations of Object Pooling in Roslyn?).
1 - SharedPools - Stores a pool of 20 objects or 100 if the BigDefault is used.
// Example 1 - In a using statement, so the object gets freed at the end.
using (PooledObject<Foo> pooledObject = SharedPools.Default<List<Foo>>().GetPooledObject())
{
// Do something with pooledObject.Object
}
// Example 2 - No using statement so you need to be sure no exceptions are not thrown.
List<Foo> list = SharedPools.Default<List<Foo>>().AllocateAndClear();
// Do something with list
SharedPools.Default<List<Foo>>().Free(list);
// Example 3 - I have also seen this variation of the above pattern, which ends up the same as Example 1, except Example 1 seems to create a new instance of the IDisposable [PooledObject<T>][4] object. This is probably the preferred option if you want fewer GC's.
List<Foo> list = SharedPools.Default<List<Foo>>().AllocateAndClear();
try
{
// Do something with list
}
finally
{
SharedPools.Default<List<Foo>>().Free(list);
}
2 - ListPool and StringBuilderPool - Not strictly separate implementations but wrappers around the SharedPools implementation shown above specifically for List and StringBuilder's. So this re-uses the pool of objects stored in SharedPools.
// Example 1 - No using statement so you need to be sure no exceptions are thrown.
StringBuilder stringBuilder= StringBuilderPool.Allocate();
// Do something with stringBuilder
StringBuilderPool.Free(stringBuilder);
// Example 2 - Safer version of Example 1.
StringBuilder stringBuilder= StringBuilderPool.Allocate();
try
{
// Do something with stringBuilder
}
finally
{
StringBuilderPool.Free(stringBuilder);
}
3 - PooledDictionary and PooledHashSet - These use ObjectPool directly and have a totally separate pool of objects. Stores a pool of 128 objects.
// Example 1
PooledHashSet<Foo> hashSet = PooledHashSet<Foo>.GetInstance()
// Do something with hashSet.
hashSet.Free();
// Example 2 - Safer version of Example 1.
PooledHashSet<Foo> hashSet = PooledHashSet<Foo>.GetInstance()
try
{
// Do something with hashSet.
}
finally
{
hashSet.Free();
}
Microsoft.IO.RecyclableMemoryStream
This library provides pooling for MemoryStream objects. It's a drop-in replacement for System.IO.MemoryStream. It has exactly the same semantics. It was designed by Bing engineers. Read the blog post here or see the code on GitHub.
var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7};
var manager = new RecyclableMemoryStreamManager();
using (var stream = manager.GetStream())
{
stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}
Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process–this is the pool. It is perfectly fine to use multiple pools if you desire.
Something like this might suit your needs.
/// <summary>
/// Represents a pool of objects with a size limit.
/// </summary>
/// <typeparam name="T">The type of object in the pool.</typeparam>
public sealed class ObjectPool<T> : IDisposable
where T : new()
{
private readonly int size;
private readonly object locker;
private readonly Queue<T> queue;
private int count;
/// <summary>
/// Initializes a new instance of the ObjectPool class.
/// </summary>
/// <param name="size">The size of the object pool.</param>
public ObjectPool(int size)
{
if (size <= 0)
{
const string message = "The size of the pool must be greater than zero.";
throw new ArgumentOutOfRangeException("size", size, message);
}
this.size = size;
locker = new object();
queue = new Queue<T>();
}
/// <summary>
/// Retrieves an item from the pool.
/// </summary>
/// <returns>The item retrieved from the pool.</returns>
public T Get()
{
lock (locker)
{
if (queue.Count > 0)
{
return queue.Dequeue();
}
count++;
return new T();
}
}
/// <summary>
/// Places an item in the pool.
/// </summary>
/// <param name="item">The item to place to the pool.</param>
public void Put(T item)
{
lock (locker)
{
if (count < size)
{
queue.Enqueue(item);
}
else
{
using (item as IDisposable)
{
count--;
}
}
}
}
/// <summary>
/// Disposes of items in the pool that implement IDisposable.
/// </summary>
public void Dispose()
{
lock (locker)
{
count = 0;
while (queue.Count > 0)
{
using (queue.Dequeue() as IDisposable)
{
}
}
}
}
}
Example Usage
public class ThisObject
{
private readonly ObjectPool<That> pool = new ObjectPool<That>(100);
public void ThisMethod()
{
var that = pool.Get();
try
{
// Use that ....
}
finally
{
pool.Put(that);
}
}
}
Sample from MSDN: How to: Create an Object Pool by Using a ConcurrentBag
Back in the day Microsoft provided a framework through Microsoft Transaction Server (MTS) and later COM+ to do object pooling for COM objects. That functionality was carried forward to System.EnterpriseServices in the .NET Framework and now in Windows Communication Foundation.
Object Pooling in WCF
This article is from .NET 1.1 but should still apply in the current versions of the Framework (even though WCF is the preferred method).
Object Pooling .NET
I really like Aronaught's implementation -- especially since he handles the waiting on resource to become available through the use of a semaphore. There are several additions I would like to make:
Change sync.WaitOne() to sync.WaitOne(timeout) and expose the timeout as a parameter on Acquire(int timeout) method. This would also necessitate handling the condition when the thread times out waiting on an object to become available.
Add Recycle(T item) method to handle situations when an object needs to be recycled when a failure occurs, for example.
This is another implementation, with limited number of objects in pool.
public class ObjectPool<T>
where T : class
{
private readonly int maxSize;
private Func<T> constructor;
private int currentSize;
private Queue<T> pool;
private AutoResetEvent poolReleasedEvent;
public ObjectPool(int maxSize, Func<T> constructor)
{
this.maxSize = maxSize;
this.constructor = constructor;
this.currentSize = 0;
this.pool = new Queue<T>();
this.poolReleasedEvent = new AutoResetEvent(false);
}
public T GetFromPool()
{
T item = null;
do
{
lock (this)
{
if (this.pool.Count == 0)
{
if (this.currentSize < this.maxSize)
{
item = this.constructor();
this.currentSize++;
}
}
else
{
item = this.pool.Dequeue();
}
}
if (null == item)
{
this.poolReleasedEvent.WaitOne();
}
}
while (null == item);
return item;
}
public void ReturnToPool(T item)
{
lock (this)
{
this.pool.Enqueue(item);
this.poolReleasedEvent.Set();
}
}
}
Java oriented, this article expose the connectionImpl pool pattern and the abstracted object pool pattern and could be a good first approach :
http://www.developer.com/design/article.php/626171/Pattern-Summaries-Object-Pool.htm
Object pool Pattern:
You may use the NuGet package Microsoft.Extensions.ObjectPool
Documentations here:
https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-3.1
https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.objectpool