GetOrAdd new vs factory performance

GetOrAdd new vs factory performance - c#

Which of the following two pieces of code will perform better in different cases and why?
1.
private readonly ConcurrentDictionary<int, List<T>> _coll;
_coll.GetOrAdd(1, new List<T>());
This creates a new List on every call even when it is not needed (How much does this statement still matter if we pass the capacity as 0?).
2.
private readonly ConcurrentDictionary<int, List<T>> _coll;
_coll.GetOrAdd(1, (val) => new List<T>());
This only creates the List on demand, but has a delegate call.

In terms of memory, the first way is going to cause an allocation every time, while the second will use a cached delegate object since it does not capture any variables. The compiler handles the generation of the cached delegate. There is no difference in the first case for capacity set to zero since the default constructor for List<T> uses an empty array on initialization, the same as an explicit capacity of 0.
In terms of execution instructions, they are the same when the key is found since the second argument is not used. If the key is not found, the first way simply has to read a local variable while the second way will have a layer of indirection to invoke a delegate. Also, looking into the source code, it appears that GetOrAdd with the factory will do an additional lookup (via TryGetValue) to avoid invoking the factory. The delegate could also potentially be executed multiple times. GetOrAdd simply guarantees that you see one entry in the dictionary, not that the factory is invoked only once.
In summary, the first way might be more performant if the key is typically not found since the allocation needs to happen anyway and there is no indirection via a delegate. However if the key is typically found, the second way is more performant because there are fewer allocations. For an implementation in a cache, you typically expect there to be lots of hits so if that is where this is, I would recommend the second way. In practice, the difference between the two depends on how sensitive the overall application is to allocations in this code path.
Also whatever implementation that is using this will likely need to implement locking around the List<T> that is returned since it is not thread-safe.

I can't imagine you'd see much of a difference in performance, unless you were working with an extremely large dataset. It would also depend on how likely each of your items are hit. Generics are extremely well optimised at the runtime level, and using a delegate results in an allocation either way.
My suggestion would be to use Enumerable.Empty<T>() as you'll be saving yourself an allocation on each array item.

Related

Migrating from Dictionary to ConcurrentDictionary, what are the common traps that I should be aware of?

I am looking at migrating from Dictionary to ConcurrentDictionary for a multi thread environment.
Specific to my use case, a kvp would typically be <string, List<T>>
What do I need to look out for?
How do I implement successfully for thread safety?
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?

What do I need to look out for?
Depends on what you are trying to achieve :)
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?
Those should be handled by the dictionary itself in thread safe manner. With several caveats:
Functions accepting factories like ConcurrentDictionary<TKey,TValue>.GetOrAdd(TKey, Func<TKey,TValue>) are not thread safe in terms of factory invocation (i.e. dictionary does not guarantee that the factory would be invoked only one time if multiple threads try to get or add the item, for example). Quote from the docs:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary<TKey,TValue> class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
In your particular case value - List<T> is not thread safe itself so while dictionary operations will be thread safe (with the exception from the previous point), mutating operations with value itself - will not, consider using something like ConcurrentBag or switching to IReadOnlyDictionary.
Personally I would be cautions working with concurrent dictionary via explicitly implemented interfaces like IDictionary<TKey, TValue> and/or indexer (can lead to race conditions in read-update-write scenarios).

The ConcurrentDictionary<TKey,TValue> collection is surprisingly difficult to master. The pitfalls that are waiting to trap the unwary are numerous and subtle. Here are some of them:
Giving the impression that the ConcurrentDictionary<TKey,TValue> blesses with thread-safery everything it contains. That's not true. If the TValue is a mutable class, and is allowed to be mutated by multiple threads, it can be corrupted just as easily as if it wasn't contained in the dictionary.
Using the ConcurrentDictionary<TKey,TValue> with patterns familiar from the Dictionary<TKey,TValue>. Race conditions can trivially emerge. For example if (dict.Contains(x)) list = dict[x] is wrong. In a multithreaded environment it is entirely possible that the key x will be removed between the dict.Contains(x) and the list = dict[x], resulting in a KeyNotFoundException. The ConcurrentDictionary<TKey,TValue> is equiped with special atomic APIs that should be used instead of the previous chatty check-then-act pattern.
Using the Count == 0 for checking if the dictionary is empty. The Count property is very cheep for a Dictionary<TKey,TValue>, and very expensive for a ConcurrentDictionary<TKey,TValue>. The correct property to use is the IsEmpty.
Assuming that the AddOrUpdate method can be safely used for updating a mutable TValue object. This is not a correct assumption. The "Update" in the name of the method means "update the dictionary, by replacing an existing value with a new value". It doesn't mean "modify an existing value".
Assuming that enumerating a ConcurrentDictionary<TKey,TValue> will yield the entries that were stored in the dictionary at the point in time that the enumeration started. That's not true. The enumerator does not maintain a snapshot of the dictionary. The behavior of the enumerator is not documented precisely. It's not even guaranteed that a single enumeration of a ConcurrentDictionary<TKey,TValue> will yield unique keys. In case you want to do an enumeration with snapshot semantics you must first take a snapshot explicitly with the (expensive) ToArray method, and then enumerate the snapshot. You might even consider switching to an ImmutableDictionary<TKey,TValue>, which is exceptionally good at providing these semantics.
Assuming that calling extension methods on ConcurrentDictionary<TKey,TValue>s interfaces is safe. This is not the case. For example the ToArray method is safe because it's a native method of the class. The ToList is not safe because it is a LINQ extension method on the IEnumerable<KeyValuePair<TKey,TValue>> interface. This method internally first calls the Count property of the ICollection<KeyValuePair<TKey,TValue>> interface, and then the CopyTo of the same interface. In a multithread environment the Count obtained by the first operation might not be compatible with the second operation, resulting in either an ArgumentException, or a list that contains empty elements at the end.
In conclusion, migrating from a Dictionary<TKey,TValue> to a ConcurrentDictionary<TKey,TValue> is not trivial. In many scenarios sticking with the Dictionary<TKey,TValue> and adding synchronization around it might be an easier (and safer) path to thread-safety. IMHO the ConcurrentDictionary<TKey,TValue> should be considered more as a performance-optimization over a synchronized Dictionary<TKey,TValue>, than as the tool of choice when a dictionary is needed in a multithreading scenario.

Why does this lambda closure generate garbage although it is not executed at runtime?

I've noticed that the following code generates heap allocations which trigger the garbage collector at some point and I would like to know why this is the case and how to avoid it:
private Dictionary<Type, Action> actionTable = new Dictionary<Type, Action>();
private void Update(int num)
{
Action action;
// if (!actionTable.TryGetValue(typeof(int), out action))
if (false)
{
action = () => Debug.Log(num);
actionTable.Add(typeof(int), action);
}
action?.Invoke();
}
I understand that using a lambda such as () => Debug.Log(num) will generate a small helper class (e.g. <>c__DisplayClass7_0) to hold the local variable. This is why I wanted to test if I could cache this allocation in a dictionary. However, I noticed, that the call to Update leads to allocations even when the lambda code is never reached due to the if-statement. When I comment out the lambda, the allocation disappears from the profiler. I am using the Unity Profiler tool (a performance reporting tool within the Unity game engine) which shows such allocations in bytes per frame while in development/debug mode.
I surmise that the compiler or JIT compiler generates the helper class for the lambda for the scope of the method even though I don't understand why this would be desirable.
Finally, is there any way of caching delegates in this manner without allocating and without forcing the calling code to cache the action in advance? (I do know, that I could also allocate the action once in the client code, but in this example I would strictly like to implement some kind of automatic caching because I do not have complete control over the client).
Disclaimer: This is mostly a theoretical question out of interest. I do realize that most applications will not benefit from micro-optimizations like this.

Servy's answer is correct and gives a good workaround. I thought I might add a few more details.
First off: implementation choices of the C# compiler are subject to change at any time and for any reason; nothing I say here is a requirement of the language and you should not depend on it.
If you have a closed-over outer variable of a lambda then all closed-over variables are made into fields of a closure class, and that closure class is allocated from the long-term pool ("the heap") as soon as the function is activated. This happens regardless of whether the closure class is ever read from.
The compiler team could have chosen to defer creation of the closure class until the first point where it was used: where a local was read or written or a delegate was created. However, that would then add additional complexity to the method! That makes the method larger, it makes it slower, it makes it more likely that you'll have a cache miss, it makes the jitter work harder, it makes more basic blocks so the jitter might skip an optimization, and so on. This optimization likely does not pay for itself.
However, the compiler team does make similar optimizations in cases where it is more likely to pay off. Two examples:
The 99.99% likely scenario for an iterator block (a method with a yield return in it) is that the IEnumerable will have GetEnumerator called exactly once. The generated enumerable therefore has logic that implements both IEnumerable and IEnumerator; the first time GetEnumerator is called, the object is cast to IEnumerator and returned. The second time, we allocate a second enumerator. This saves one object in the highly likely scenario, and the extra code generated is pretty simple and rarely called.
It is common for async methods to have a "fast path" that returns without ever awaiting -- for example, you might have an expensive asynchronous call the first time, and then the result is cached and returned the second time. The C# compiler generates code that avoids creating the "state machine" closure until the first await is encountered, and therefore prevents an allocation on the fast path, if there is one.
These optimizations tend to pay off, but 99% of the time when you have a method that makes a closure, it actually makes the closure. It's not really worth deferring it.

I surmise that the compiler or JIT compiler generates the helper class for the lambda for the scope of the method even though I don't understand why this would be desirable.
Consider the case where there's more than one anonymous method with a closure in the same method (a common enough occurrence). Do you want to create a new instance for every single one, or just have them all share a single instance? They went with the latter. There are advantages and disadvantages to either approach.
Finally, is there any way of caching delegates in this manner without allocating and without forcing the calling code to cache the action in advance?
Simply move that anonymous method into its own method, so that when that method is called the anonymous method is created unconditionally.
private void Update(int num)
{
Action action = null;
// if (!actionTable.TryGetValue(typeof(int), out action))
if (false)
{
Action CreateAction()
{
return () => Debug.Log(num);
}
action = CreateAction();
actionTable.Add(typeof(int), action);
}
action?.Invoke();
}
(I didn't check if the allocation happened for a nested method. If it does, make it a non-nested method and pass in the int.)

Using the ConcurrentBag type as a thread-safe substitute for a List

In genereal, would using the ConcurrentBag type be an acceptable thread-safe substitute for a List? I have read some answers on here that have suggested the use of ConcurrentBag when one was having a problem with thread safety with generic Lists in C#.
After reading a bit about ConcurrentBag, however, it seems performing a lot of searches and looping through the collection does not match with its intended usage. It seems to mostly be intended to solve producer/consumer problems, where jobs are being (some-what randomly) added and removed from the collection.
This is an example of the type of (IEnumerable) operations I want to use with the ConcurrentBag:
...
private readonly ConcurrentBag<Person> people = new ConcurrentBag<Person>();
public void AddPerson(Person person)
{
people.Add(person);
}
public Person GetPersonWithName(string name)
{
return people.Where(x => name.Equals(x.Name)).FirstOrDefault();
}
...
Would this cause performance concerns, and is it even a correct way to use a ConcurrentBag collection?

.NET's in-built concurrent data structures are most designed for patterns like producer-consumer, where there is a constant flow of work through the container.
In your case, the list seems to be long-term (relative to the lifetime of the class) storage, rather than just a resting place for some data before a consumer comes along to take it away and do something with it. In this case I'd suggest using a normal List<T> (or whichever non-concurrent collection is most appropriate for the operations you're intending), and simply using locks to control access to it.

A Bag is just the most general form of collection, allowing multiple identical entries, and without even the ordering of a List. It does happen to be useful in producer/consumer contexts where fairness is not an issue, but it is not specifically designed for that.
Because a Bag does not have any structure with respect to its contents, it's not very suitable for performing searches. In particular, the use case you mention will require time that scales with the size of the bag. A HashSet might be better if you don't need to be able to store multiple copies of an item and if manual synchronization is acceptable for your use case.

As far as I understand the ConcurrentBag it makes use of multiple lists. It creates a List for each thread using the ConcurrentBag. Thus when reading or accessing the ConcurrentBag within the same thread again the performance should be roughly the same as when just using a normal List, but if the ConcurrentBag is accessed from a different thread there will be a performance overhead as it has to search for the value in the "internal" lists created for each thread.
The MSDN page says the following regarding the ConcurrentBag.
Bags are useful for storing objects when ordering doesn't matter, and unlike sets, bags support duplicates. ConcurrentBag is a thread-safe bag implementation, optimized for scenarios where the same thread will be both producing and consuming data stored in the bag.
http://msdn.microsoft.com/en-us/library/dd381779%28v=VS.100%29.aspx

In genereal, would using the ConcurrentBag type be an acceptable thread-safe substitute for a List?
No, not in general, because, concurring with Warren Dew, a List is ordered, while a Bag is not (surely mine isn't ;)
But in cases where (potentially concurrent) reads greatly outnumber writes, you could just copy-on-write wrap your List.
That is a general solution, as you are working with original List instances, except (as explained in more detail in above link) you have to make sure that everyone modifying the List uses the appropriate copy-on-write utility method - which you could enforce by using List.AsReadOnly().
In highly concurrent programs, copy-on-write has many desirable performance properties in mostly-read scenarios, compared to locking.

Performance and static method versus public method

I have a helper method that takes a begin date and an end date and through certain business logic yields an integer result. This helper method is sometimes called in excess of 10,000 times for a given set of data (though this doesn't occur often).
Question:
Considering performance only, is it more efficient to make this helper method as a static method to some helper class, or would it be more gainful to have the helper method as a public method to a class?
Static method example:
// an iterative loop
foreach (var result in results) {
int daysInQueue = HelperClass.CalcDaysInQueue(dtBegin, dtEnd);
}
Public member method example:
// an iterative loop
HelperClass hc = new HelperClass();
foreach (var result in results) {
int daysInQueue = hc.CalcDaysInQueue(dtBegin, dtEnd);
}
Thanks in advance for the help!

When you call an instance method the compiler always invisibly passes one extra parameter, available inside that method under this name. static methods are not called on behalf of any object, thus they don't have this reference.
I see few benefits of marking utility methods as static:
small performance improvement, you don't pay for a reference to this which you don't really use. However I doubt you will ever see the difference.
convenience - you can call static method wherever and whenever you want, the compiler is not forcing you to provide an instance of an object, which is not really needed for that method
readability: instance method should operate on instance's state, not merely on parameters. If it's an instance method not needing an instance to work, it's confusing.

The difference in performance here is effectively nothing. You will have a hard time actually measuring the difference in time (and getting over the "noise" of other stuff going on with your CPU), that's how small it will be.
Unless you happen to go and perform a whole bunch of database queries or read in several gigabytes of info from files in the constructor of the object (I'm assuming here that' it's just empty) it will have a fairly small cost, and since it's out of the loop it doesn't scale at all.
You should be making this decision based on what logically makes sense, not based on performance, until you have a strong reason to believe that there is a significant, and necessary performance gain to be had by violating standard practices/readability/etc.
In this particular case your operation is logically 'static'. There is no state that is used, so there is no need to have an instance of the object, as such the method should be made static. Others have said that it might perform better, which is very possibly true, but that shouldn't be why you make it static. If the operation logically made sense as an instance method you shouldn't try to force it into a static method just to try to get it to run faster; that's learning the wrong lesson here.

Just benchmark it :) In theory a static method should be faster since it leaves out the virtual call overhead but this overhead might not be significant in your case (but I'm not even sure what language the example is in). Just time both loops with a large enough number of iterations for it to take a minute or so and see for yourself. Jut make sure you use non-trivial data so your compiler doesn't optimize the calls out.

Based on my understanding, it would be more beneficial for performance to make it a static method. This means that there isn't an instance of the object created, although the performance difference would be negligible, I think. That is the case if there isn't some data that has to be recreated every time you call the static function, which could be stored in the class object.

You say 'considering performance only'. In that case you should fully focus on whats inside
HelperClass.CalcDaysInQueue(dtBegin, dtEnd);
And not on the 0.0001% of runtime spent in calling that routine. If it's a short routine the JIT compiler will inline it anyway and in that case there will be NO performance difference between the static and instance method.

How should I make my hashtable cache implementation thread safe when it gets cleared?

I have a class used to cache access to a database resource. It looks something like this:
//gets registered as a singleton
class DataCacher<T>
{
IDictionary<string, T> items = GetDataFromDb();
//Get is called all the time from zillions of threads
internal T Get(string key)
{
return items[key];
}
IDictionary<string, T> GetDataFromDb() { ...expensive slow SQL access... }
//this gets called every 5 minutes
internal void Reset()
{
items.Clear();
}
}
I've simplified this code somewhat, but the gist of it is that there is a potential concurrency issue, in that while the items are being cleared, if Get is called things may go awry.
Now I can just bung lock blocks into Get and Reset, but I'm worried that the locks on the Get will reduce performance of the site, as Get is called by every request thread in the web app many many times.
I can do something with a doubly checked locks I think, but I suspect there is a cleaner way to do this using something smarter than the lock{} block. What to do?
edit: Sorry all, I didn't make this explicit earlier, but the items.Clear() implementation I am using is not in fact a straight dictionary. Its a wrapper around a ResourceProvider which requires that the dictionary implementation calls .ReleaseAllResources() on each of the items as they get removed. This means that calling code doesn't want to run against an old version that is in the midst of disposal. Given this, is the Interlocked.Exchange method the correct one?

I would start by testing it with just a lock; locks are very cheap when not contested. However - a simpler scheme is to rely on the atomic nature of reference updates:
public void Clear() {
var tmp = GetDataFromDb(); // or new Dictionary<...> for an empty one
items = tmp; // this is atomic; subsequent get/set will use this one
}
You might also want to make items a volatile field, just to be sure it isn't held in the registers anywhere.
This still has the problem that anyone expecting there to be a given key may get disappointed (via an exception), but that is a separate issue.
The more granular option might be a ReaderWriterLockSlim.

One option is to completely replace the IDictionary instance instead of Clearing it. You can do this in a thread-safe way using the Exchange method on the Interlocked class.

See if the database will tell you what data has change. You could use
Trigger to write changes to a history table
Query Notifications (SqlServer and Oracle has these, other must do as well)
Etc
So you don’t have to reload all the data based on a timer.
Failing this.
I would make the Clear method create a new IDictionary by calling GetDataFromDB(), then once the data has been loaded set the “items” field to point to the new Dictionary. (The garbage collector will clean up the old dictionary once no threads are accessing it.)
I don’t think you care if some threads
get “old” results while reloading the
data – (if you do then you will just
have to block all threads on a lock –
painful!)
If you need all thread to swap over to the new dictionary at the same time, then you need to declare the “items” field to be volatile and use the Exchange method on the Interlocked class. However it is unlikely you need this in real life.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.