I've created an application that reads properties from files using the Windows-API-Code-Pack from this package. I'm having an issue when retrieving properties
var width = fileInfo.Properties.GetProperty(SystemProperties.System.Video.FrameWidth).ValueAsObject;
The code breaks here giving me
System.ArgumentException: An item with the same key has already been added.
at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at Microsoft.WindowsAPICodePack.Shell.PropertySystem.ShellPropertyFactory.GenericCreateShellProperty[T](PropertyKey propKey, T thirdArg)
at Microsoft.WindowsAPICodePack.Shell.PropertySystem.ShellProperties.GetProperty(PropertyKey key)
This happens mostly when calling this portion of a code in a PLINQ
.AsParallel().WithDegreeOfParallelism(_maxConcurrentThreads).ForAll(...)
even if the degree is set to 1. How can I solve it?
To extend on your existing answer, switching the Dictionary to a ConcurrentDictionary would also solve the problem and remove the need for locks.
private static ConcurrentDictionary<int, Func<PropertyKey, ShellPropertyDescription, object, IShellProperty>> _storeCache
= new ConcurrentDictionary<int, Func<PropertyKey, ShellPropertyDescription, object, IShellProperty>>();
...
private static IShellProperty GenericCreateShellProperty<T>(PropertyKey propKey, T thirdArg)
{
...
Func<PropertyKey, ShellPropertyDescription, object, IShellProperty> ctor;
ctor = _storeCache.GetOrAdd((hash, (key, args) -> {
Type[] argTypes = { typeof(PropertyKey), typeof(ShellPropertyDescription), args.thirdType };
return ExpressConstructor(args.type, argTypes);
}, {thirdType, type});
return ctor(propKey, propDesc, thirdArg);
}
Following stuartd suggestion I was able to solve this issue by modifying the source code of the package and adding locks in this code at lines 57 and 62, like this
lock (_storeCache)
{
if (!_storeCache.TryGetValue(hash, out ctor))
{
Type[] argTypes = { typeof(PropertyKey), typeof(ShellPropertyDescription), thirdType };
ctor = ExpressConstructor(type, argTypes);
lock (_storeCache)
_storeCache.Add(hash, ctor);
}
}
Related
Consider the following C# code using MemoryCache to generate a new value for a given key if not already preset in the cache:
private static MemoryCache _memoryCache = new MemoryCache();
public T Apply<T>(string key, Func<T> factory)
{
var expiration ...
var newValue = new Lazy<T>(factory);
var value = (Lazy<T>)_memoryCache.AddOrGetExisting(key, newValue, expiration);
return (value ?? newValue).Value;
}
Consider now this:
var hugeObject = new HugeObject();
return cache.Apply("SomeKey", () =>
{
return hugeObject.GetValue();
});
The factory will be invoked "immediately" after AddOrGetExisting or never, so the question is:
Does the Lazy class clear the reference to the factory delegate after having generated the value (so all the resources used by the factory like, in this case, hugeObject, can be released)?
Looking at the reference source, I believe it does release the factory, and call out why:
// We successfully created and stored the value. At this point, the value factory delegate is
// no longer needed, and we don't want to hold onto its resources.
m_valueFactory = ALREADY_INVOKED_SENTINEL;
There is quite a lot of threading code in there so I'm not sure it does so every time, but you'd hope that if they've realise they need to, they will have done so properly.
I'm currently implementing a thread-safe dictionary in C# which uses immutable AVL trees as buckets internally. The idea is to provide fast read access without a lock because in my application context, we add entries to this dictionary only at startup and afterwards, values are mostly read (but there still are a few number of writes).
I've structured my TryGetValue and GetOrAdd methods in the following way:
public sealed class FastReadThreadSafeDictionary<TKey, TValue> where TKey : IEquatable<TKey>
{
private readonly object _bucketContainerLock = new object();
private ImmutableBucketContainer<TKey, TValue> _bucketContainer;
public bool TryGetValue(TKey key, out TValue value)
{
var bucketContainer = _bucketContainer;
return bucketContainer.TryFind(key.GetHashCode(), key, out value);
}
public bool GetOrAdd(TKey key, Func<TValue> createValue, out TValue value)
{
createValue.MustNotBeNull(nameof(createValue));
var hashCode = key.GetHashCode();
lock (_bucketContainerLock)
{
ImmutableBucketContainer<TKey, TValue> newBucketContainer;
if (_bucketContainer.GetOrAdd(hashCode, key, createValue, out value, out newBucketContainer) == false)
return false;
_bucketContainer = newBucketContainer;
return true;
}
}
// Other members omitted for sake of brevity
}
As you can see, I don't use a lock in TryGetValue because reference assignment in .NET runtimes is an atomic operation by design. By copying the reference of the field _bucketContainer to a local variable, I'm sure I can safely access the instance because it is immutable. In GetOrAdd, I use a lock to access the private _bucketContainer so I can ensure that a value is not created twice (i.e. if two or more threads are trying to add a value, only one can actually create a new ImmutableBucketContainer with the added value because of the lock).
I use Microsoft Chess for testing concurrency and in one of my tests, MCUT (Microsoft Concurrency Unit Testing) reports a data race in GetOrAdd when I exchange the new bucket container with the old one:
[DataRaceTestMethod]
public void ReadWhileAdd()
{
var testTarget = new FastReadThreadSafeDictionary<int, object>();
var writeThread = new Thread(() =>
{
for (var i = 5; i < 10; i++)
{
testTarget.GetOrAdd(i, () => new object());
Thread.Sleep(0);
}
});
var readThread = new Thread(() =>
{
object value;
testTarget.TryGetValue(5, out value);
Thread.Sleep(0);
testTarget.TryGetValue(7, out value);
Thread.Sleep(10);
testTarget.TryGetValue(9, out value);
});
readThread.Start();
writeThread.Start();
readThread.Join();
writeThread.Join();
}
MCUT reports the following message:
23> Test result: DataRace
23> ReadWhileAdd() (Context=, TestType=MChess): [DataRace]Found data race at GetOrAdd:FastReadThreadSafeDictionary.cs(68)
which is the assignment _bucketContainer = newBucketContainer; in GetOrAdd.
My actual question is: why is the assignment _bucketContainer = newBucketContainer a race condition? Threads currently executing TryGetValue always make a copy of the _bucketContainer field and thus shouldn't be bothered with the update (except that the searched value might be added to the _bucketContainer just after the copy takes place, but this doesn't matter with the data race). And in GetOrAdd, there is an explicit lock to prevent concurrent access. Is this a bug in Chess or am I missing something very obvious?
As mentioned by #CodesInChaos in the comments of the question, I missed a volatile read in TryGetValue. The method now looks like this:
public bool TryGetValue(TypeKey typeKey, out TValue value)
{
var bucketContainer = Volatile.Read(ref _bucketContainer);
return bucketContainer.TryFind(typeKey, out value);
}
This volatile read is necessary because different threads accessing this dictionary might cache data and reorder instructions independently from each other, which might lead to a data race. Additionally, the CPU architecture that is running the code also matters, e.g. x86 and x64 processors perform volatile reads by default, while this might not be true for other architectures like ARM or Itanium. That's why the read access has to be synchronized with other threads using a Memory Barrier, which is performed internally in Volatile.Read (note that lock statements also use memory barriers internally). Joseph Albahari wrote a comprehensive tutorial on this here: http://www.albahari.com/threading/part4.aspx
I have the following code, and somehow yesterday evening it had thrown a lot of exceptions:
Exception of type 'System.Web.HttpUnhandledException' was thrown. ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
I just don't see how this is possible, I check for null and if the key is available. This is the only method where lastTimeoutCheck is used.
private static Dictionary<string, DateTime> lastTimeoutCheck;
private static readonly object CacheLock = new object();
private static void CheckTimeout(string groupName)
{
if (lastTimeoutCheck == null)
{
lastTimeoutCheck = new Dictionary<string, DateTime>();
return;
}
if (!lastTimeoutCheck.ContainsKey(groupName))
{
lastTimeoutCheck.Add(groupName, DateTime.UtcNow);
return;
}
if (lastTimeoutCheck[groupName] <
DateTime.UtcNow.AddMinutes(-GroupConfigSection.TimeOutCheckMinutes))
{
lock (CheckLock)
{
if (lastTimeoutCheck[groupName] <
DateTime.UtcNow.AddMinutes(-GroupConfigSection.TimeOutCheckMinutes))
{
GroupHolder groupHolder =
(GroupHolder) System.Web.HttpContext.Current.Cache.Get(groupName);
if (groupHolder != null)
{
groupHolder.UpdateTime();
}
lastTimeoutCheck[groupName] = DateTime.UtcNow;
}
}
}
}
Since your variable is static and the error indicates it runs on a web server, you are most likely facing the problem that two threads access the same value at the same time, resulting in two adds at the same time.
The solution depends on your situation:
Don't make the dictionary static, if you don't intend it to be shared across sessions. This doesn't really fix the problem. It makes it more unlikely to occur;
Use a thread safe dictionary type: ConcurrentDictionary.
On an ASP.NET MVC project we have several instances of data that requires good amount of resources and time to build. We want to cache them.
MemoryCache provides certain level of thread-safety but not enough to avoid running multiple instances of building code in parallel. Here is an example:
var data = cache["key"];
if(data == null)
{
data = buildDataUsingGoodAmountOfResources();
cache["key"] = data;
}
As you can see on a busy website hundreds of threads could go inside the if statement simultaneously until the data is built and make the building operation even slower, unnecessarily consuming the server resources.
There is an atomic AddOrGetExisting implementation in MemoryCache but it incorrectly requires "value to set" instead of "code to retrieve the value to set" which I think renders the given method almost completely useless.
We have been using our own ad-hoc scaffolding around MemoryCache to get it right however it requires explicit locks. It's cumbersome to use per-entry lock objects and we usually get away by sharing lock objects which is far from ideal. That made me think that reasons to avoid such convention could be intentional.
So I have two questions:
Is it a better practice not to lock building code? (That could have been proven more responsive for one, I wonder)
What's the right way to achieve per-entry locking for MemoryCache for such a lock? The strong urge to use key string as the lock object is dismissed at ".NET locking 101".
We solved this issue by combining Lazy<T> with AddOrGetExisting to avoid a need for a lock object completely. Here is a sample code (which uses infinite expiration):
public T GetFromCache<T>(string key, Func<T> valueFactory)
{
var newValue = new Lazy<T>(valueFactory);
// the line belows returns existing item or adds the new value if it doesn't exist
var value = (Lazy<T>)cache.AddOrGetExisting(key, newValue, MemoryCache.InfiniteExpiration);
return (value ?? newValue).Value; // Lazy<T> handles the locking itself
}
That's not complete. There are gotchas like "exception caching" so you have to decide about what you want to do in case your valueFactory throws exception. One of the advantages, though, is the ability to cache null values too.
For the conditional add requirement, I always use ConcurrentDictionary, which has an overloaded GetOrAdd method which accepts a delegate to fire if the object needs to be built.
ConcurrentDictionary<string, object> _cache = new
ConcurrenctDictionary<string, object>();
public void GetOrAdd(string key)
{
return _cache.GetOrAdd(key, (k) => {
//here 'k' is actually the same as 'key'
return buildDataUsingGoodAmountOfResources();
});
}
In reality I almost always use static concurrent dictionaries. I used to have 'normal' dictionaries protected by a ReaderWriterLockSlim instance, but as soon as I switched to .Net 4 (it's only available from that onwards) I started converting any of those that I came across.
ConcurrentDictionary's performance is admirable to say the least :)
Update Naive implementation with expiration semantics based on age only. Also should ensure that individual items are only created once - as per #usr's suggestion. Update again - as #usr has suggested - simply using a Lazy<T> would be a lot simpler - you can just forward the creation delegate to that when adding it to the concurrent dictionary. I'be changed the code, as actually my dictionary of locks wouldn't have worked anyway. But I really should have thought of that myself (past midnight here in the UK though and I'm beat. Any sympathy? No of course not. Being a developer, I have enough caffeine coursing through my veins to wake the dead).
I do recommend implementing the IRegisteredObject interface with this, though, and then registering it with the HostingEnvironment.RegisterObject method - doing that would provide a cleaner way to shut down the poller thread when the application pool shuts-down/recycles.
public class ConcurrentCache : IDisposable
{
private readonly ConcurrentDictionary<string, Tuple<DateTime?, Lazy<object>>> _cache =
new ConcurrentDictionary<string, Tuple<DateTime?, Lazy<object>>>();
private readonly Thread ExpireThread = new Thread(ExpireMonitor);
public ConcurrentCache(){
ExpireThread.Start();
}
public void Dispose()
{
//yeah, nasty, but this is a 'naive' implementation :)
ExpireThread.Abort();
}
public void ExpireMonitor()
{
while(true)
{
Thread.Sleep(1000);
DateTime expireTime = DateTime.Now;
var toExpire = _cache.Where(kvp => kvp.First != null &&
kvp.Item1.Value < expireTime).Select(kvp => kvp.Key).ToArray();
Tuple<string, Lazy<object>> removed;
object removedLock;
foreach(var key in toExpire)
{
_cache.TryRemove(key, out removed);
}
}
}
public object CacheOrAdd(string key, Func<string, object> factory,
TimeSpan? expiry)
{
return _cache.GetOrAdd(key, (k) => {
//get or create a new object instance to use
//as the lock for the user code
//here 'k' is actually the same as 'key'
return Tuple.Create(
expiry.HasValue ? DateTime.Now + expiry.Value : (DateTime?)null,
new Lazy<object>(() => factory(k)));
}).Item2.Value;
}
}
Taking the top answer into C# 7, here's my implementation that allows storage from any source type T to any return type TResult.
/// <summary>
/// Creates a GetOrRefreshCache function with encapsulated MemoryCache.
/// </summary>
/// <typeparam name="T">The type of inbound objects to cache.</typeparam>
/// <typeparam name="TResult">How the objects will be serialized to cache and returned.</typeparam>
/// <param name="cacheName">The name of the cache.</param>
/// <param name="valueFactory">The factory for storing values.</param>
/// <param name="keyFactory">An optional factory to choose cache keys.</param>
/// <returns>A function to get or refresh from cache.</returns>
public static Func<T, TResult> GetOrRefreshCacheFactory<T, TResult>(string cacheName, Func<T, TResult> valueFactory, Func<T, string> keyFactory = null) {
var getKey = keyFactory ?? (obj => obj.GetHashCode().ToString());
var cache = new MemoryCache(cacheName);
// Thread-safe lazy cache
TResult getOrRefreshCache(T obj) {
var key = getKey(obj);
var newValue = new Lazy<TResult>(() => valueFactory(obj));
var value = (Lazy<TResult>) cache.AddOrGetExisting(key, newValue, ObjectCache.InfiniteAbsoluteExpiration);
return (value ?? newValue).Value;
}
return getOrRefreshCache;
}
Usage
/// <summary>
/// Get a JSON object from cache or serialize it if it doesn't exist yet.
/// </summary>
private static readonly Func<object, string> GetJson =
GetOrRefreshCacheFactory<object, string>("json-cache", JsonConvert.SerializeObject);
var json = GetJson(new { foo = "bar", yes = true });
Here is simple solution as MemoryCache extension method.
public static class MemoryCacheExtensions
{
public static T LazyAddOrGetExitingItem<T>(this MemoryCache memoryCache, string key, Func<T> getItemFunc, DateTimeOffset absoluteExpiration)
{
var item = new Lazy<T>(
() => getItemFunc(),
LazyThreadSafetyMode.PublicationOnly // Do not cache lazy exceptions
);
var cachedValue = memoryCache.AddOrGetExisting(key, item, absoluteExpiration) as Lazy<T>;
return (cachedValue != null) ? cachedValue.Value : item.Value;
}
}
And test for it as usage description.
[TestMethod]
[TestCategory("MemoryCacheExtensionsTests"), TestCategory("UnitTests")]
public void MemoryCacheExtensions_LazyAddOrGetExitingItem_Test()
{
const int expectedValue = 42;
const int cacheRecordLifetimeInSeconds = 42;
var key = "lazyMemoryCacheKey";
var absoluteExpiration = DateTimeOffset.Now.AddSeconds(cacheRecordLifetimeInSeconds);
var lazyMemoryCache = MemoryCache.Default;
#region Cache warm up
var actualValue = lazyMemoryCache.LazyAddOrGetExitingItem(key, () => expectedValue, absoluteExpiration);
Assert.AreEqual(expectedValue, actualValue);
#endregion
#region Get value from cache
actualValue = lazyMemoryCache.LazyAddOrGetExitingItem(key, () => expectedValue, absoluteExpiration);
Assert.AreEqual(expectedValue, actualValue);
#endregion
}
Sedat's solution of combining Lazy with AddOrGetExisting is inspiring. I must point out that this solution has a performance issue, which seems very important for a solution for caching.
If you look at the code of AddOrGetExisting(), you will find that AddOrGetExisting() is not a lock-free method. Comparing to the lock-free Get() method, it wastes the one of the advantage of MemoryCache.
I would like to recommend to follow solution, using Get() first and then use AddOrGetExisting() to avoid creating object multiple times.
public T GetFromCache<T>(string key, Func<T> valueFactory)
{
T value = (T)cache.Get(key);
if (value != null)
{
return value;
}
var newValue = new Lazy<T>(valueFactory);
// the line belows returns existing item or adds the new value if it doesn't exist
var oldValue = (Lazy<T>)cache.AddOrGetExisting(key, newValue, MemoryCache.InfiniteExpiration);
return (oldValue ?? newValue).Value; // Lazy<T> handles the locking itself
}
Here is a design that follows what you seem to have in mind. The first lock only happens for a short time. The final call to data.Value also locks (underneath), but clients will only block if two of them are requesting the same item at the same time.
public DataType GetData()
{
lock(_privateLockingField)
{
Lazy<DataType> data = cache["key"] as Lazy<DataType>;
if(data == null)
{
data = new Lazy<DataType>(() => buildDataUsingGoodAmountOfResources();
cache["key"] = data;
}
}
return data.Value;
}
First of all, sorry if this has been asked before. I've done a pretty comprehensive search and found nothing quite like it, but I may have missed something.
And now to the question: I'm trying to invoke a constructor through reflection, with no luck. Basically, I have an object that I want to clone, so I look up the copy constructor for its type and then want to invoke it. Here's what I have:
public Object clone(Object toClone) {
MethodBase copyConstructor = type.GetConstructor(
new Type[] { toClone.GetType() });
return method.Invoke(toClone, new object[] { toClone }); //<-- doesn't work
}
I call the above method like so:
List<int> list = new List<int>(new int[] { 0, 1, 2 });
List<int> clone = (List<int>) clone(list);
Now, notice the invoke method I'm using is MethodBase's invoke. ConstructorInfo provides an invoke method that does work if invoked like this:
return ((ConstructorInfo) method).Invoke(new object[] { toClone });
However, I want to use MethodBase's method, because in reality instead of looking up the copy constructor every time I will store it in a dictionary, and the dictionary contains both methods and constructors, so it's a Dictionary<MethodBase>, not Dictionary<ConstructorInfo>.
I could of course cast to ConstructorInfo as I do above, but I'd rather avoid the casting and use the MethodBase method directly. I just can't figure out the right parameters.
Any help? Thanks so much.
EDIT
Benjamin,
Thanks so much for your suggestions. I was actually doing exactly what you suggest in your second edit, except (and that's a big "except") my dictionary was where
class ClonerMethod {
public MethodBase method;
public bool isConstructor;
...
public Object invoke(Object toClone) {
return isConstructor ?
((ConstructorInfo) method).Invoke(new object[] { toClone }) : //<-- I wanted to avoid this cast
method.Invoke(toClone, null);
}
}
And then I called ClonerMethod's invoke on what I found in the dictionary. I didn't add the code the deals with all that because the answer I was looking for was just how to call Invoke on a ConstructorInfo using MethodBase's Invoke method, so I didn't want to add unnecessary info and too much code for you guys to read through. However, I like your use of Func<,> much MUCH better, so I'm switching to that. Also making the Clone method generic is a nice addition, but in my case the caller doesn't know the type of the object, so I'll keep it non-generic instead.
I didn't know about Func<,>, and if I knew about the lambda operator I had forgotten (I hadn't really needed something like this before), so I've actually learnt a lot from your answer. I always love to learn new things, and this will come in very handy in the future, so thanks a lot! :)
If you know that the object is having a constructor like that, did you think about using this overload of Activator.CreateInstance instead?
Update: So you have a cascading search for MethodInfo/MethodBase already and store them -> You don't want/cannot use Activator.
In that case I don't see a way to do what you want without a cast. But - maybe you could change the architecture to store a Dictionary<Type, Func<object, object>> and add those Func<> instances instead. Makes the calling code nicer (I assume) and would allow you to do this cast once:
// Constructor
dictionary.Add(type,
source => ((ConstructorInfo) method).Invoke(new object[] {source})
);
// Clone
dictionary.Add(type,
source => method.Invoke(source, new object[]{})
);
In fact, since you only care about the difference between constructor and normal method at the very site where you grab them, you wouldn't need a cast at all, would you?
// Constructor 2
dictionary.Add(type,
source => yourConstructorInfo.Invoke(new object[] {source})
);
Unless I'm missing something (quite possible, of course) this could resolve the problem by doing this once on the defining side of the fence and the caller wouldn't need to mind if this is constructor or not?
One last time, then I'm going to stop the edit spam. I was bored and came up with the following code. Is that what you are trying to accomplish?
public class Cloner {
private readonly IDictionary<Type, Func<object, object>> _cloneMap =
new Dictionary<Type, Func<object, object>>();
public T Clone<T>(T source) {
Type sourceType = source.GetType();
Func<object, object> cloneFunc;
if (_cloneMap.TryGetValue(sourceType, out cloneFunc)) {
return (T)cloneFunc(source);
}
if (TryGetCopyConstructorCloneFunc(sourceType, out cloneFunc)) {
_cloneMap.Add(sourceType, cloneFunc);
return (T)cloneFunc(source);
}
if (TryGetICloneableCloneFunc(sourceType, out cloneFunc)) {
_cloneMap.Add(sourceType, cloneFunc);
return (T)cloneFunc(source);
}
return default(T);
}
private bool TryGetCopyConstructorCloneFunc(Type type,
out Func<object, object> cloneFunc) {
var constructor = type.GetConstructor(new[] { type });
if (constructor == null) {
cloneFunc = source => null;
return false;
}
cloneFunc = source => constructor.Invoke(new[] { source });
return true;
}
private bool TryGetICloneableCloneFunc(Type type,
out Func<object, object> cloneFunc) {
bool isICloneable = typeof(ICloneable).IsAssignableFrom(type);
var cloneMethod = type.GetMethod("Clone", new Type[] { });
if (!isICloneable || (cloneMethod == null)) {
cloneFunc = source => null;
return false;
}
cloneFunc = source => cloneMethod.Invoke(source, new object[] {});
return true;
}
}